How to build a WebFountain: An architecture for very large-scale text analytics

被引:34
作者
Gruhl, D
Chavet, L
Gibson, D
Meyer, J
Pattanayak, P
Tomkins, A
Zien, J
机构
[1] IBM Res Div, Almaden Res Ctr, San Jose, CA 95120 USA
[2] Microsoft Corp, Redmond, WA 98052 USA
关键词
D O I
10.1147/sj.431.0064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
WebFountain is a platform for very large-scale text analytics applications. The platform allows uniform access to a wide variety of sources, scalable system-managed deployment of a variety of document-level "augmenters" and corpus-level "miners," and finally creation of an extensible set of hosted Web services containing information that drives end-user applications. Analytical components can be authored remotely by partners using a collection of Web service APIs (application programming interfaces). The system is operational and supports live customers. This paper surveys the high-level decisions made in creating such a system.
引用
收藏
页码:64 / 77
页数:14
相关论文
共 30 条
  • [1] Querying documents in object databases
    Abiteboul S.
    Cluet S.
    Christophides V.
    Milo T.
    Moerkotte G.
    Siméon J.
    [J]. International Journal on Digital Libraries, 1997, 1 (1) : 5 - 19
  • [2] AGRAWAL R, 2001, P 10 INT WORLD WID W, P355
  • [3] [Anonymous], 2001, AUTONOMIC COMPUTING
  • [4] AROCENA GO, 1997, P WWW6, P1305
  • [5] BHARAT K, 1998, P 7 INT WORLD WID WE, P14
  • [6] BRODER A, HDB MASSIVE DATA SET
  • [7] CHAKRABARTI S, 1998, ACM SIGMOD INT C MAN, P307
  • [8] CLARKE C, 1995, P 4 TEXT RETR C NOV
  • [9] FOSTER I, 2001, LECT NOTES COMPUTER, V2150
  • [10] GRUHL D, 2000, THESIS MIT CAMBRIDGE