IBM Streams Processing Language: Analyzing Big Data in motion

被引:92
作者
Hirzel, M. [1 ]
Andrade, H. [2 ]
Gedik, B. [3 ]
Jacques-Silva, G. [1 ]
Khandekar, R. [4 ]
Kumar, V. [1 ]
Mendell, M. [5 ]
Nasgaard, H. [5 ]
Schneider, S. [1 ]
Soule, R. [6 ]
Wu, K. -L. [1 ]
机构
[1] Thomas J Watson Res Ctr, IBM Res Div, Yorktown Hts, NY 10598 USA
[2] Goldman Sachs, New York, NY 10282 USA
[3] Bilkent Univ, Dept Comp, TR-06800 Ankara, Turkey
[4] Knight Capital Grp, Jersey, NJ 07310 USA
[5] IBM Canada, Markham, ON L6G 1C7, Canada
[6] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA
关键词
PROGRAMMING LANGUAGE; MODEL;
D O I
10.1147/JRD.2013.2243535
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
The IBM Streams Processing Language (SPL) is the programming language for IBM InfoSphere (R) Streams, a platform for analyzing Big Data in motion. By "Big Data in motion," we mean continuous data streams at high data-transfer rates. InfoSphere Streams processes such data with both high throughput and short response times. To meet these performance demands, it deploys each application on a cluster of commodity servers. SPL abstracts away the complexity of the distributed system, instead exposing a simple graph-of-operators view to the user. SPL has several innovations relative to prior streaming languages. For performance and code reuse, SPL provides a code-generation interface to C++ and Java (R). To facilitate writing well-structured and concise applications, SPL provides higher-order composite operators that modularize stream sub-graphs. Finally, to enable static checking while exposing optimization opportunities, SPL provides a strong type system and user-defined operator models. This paper provides a language overview, describes the implementation including optimizations such as fusion, and explains the rationale behind the language design.
引用
收藏
页数:11
相关论文
共 37 条
[1]
Aurora: a new model and architecture for data stream management [J].
Abadi, DJ ;
Carney, D ;
Cetintemel, U ;
Cherniack, M ;
Convey, C ;
Lee, S ;
Stonebraker, M ;
Tatbul, N ;
Zdonik, S .
VLDB JOURNAL, 2003, 12 (02) :120-139
[2]
Abadi DJ., 2005, CIDR, V5, P277
[3]
Agrawal J., 2008, SIGMOD 08, P147
[4]
DBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases [J].
Ahmad, Yanif ;
Koch, Christoph .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02) :1566-1569
[5]
[Anonymous], 2004, OSDI 04
[6]
[Anonymous], STORM DISTRIBUTED FA
[7]
[Anonymous], 2007, CIDR
[8]
The CQL continuous query language: semantic foundations and query execution [J].
Arasu, A ;
Babu, S ;
Widom, J .
VLDB JOURNAL, 2006, 15 (02) :121-142
[9]
Barga R.S., 2007, CIDR 2007 3 BIENNIAL, P363
[10]
THE ESTEREL SYNCHRONOUS PROGRAMMING LANGUAGE - DESIGN, SEMANTICS, IMPLEMENTATION [J].
BERRY, G ;
GONTHIER, G .
SCIENCE OF COMPUTER PROGRAMMING, 1992, 19 (02) :87-152