Space and time efficient parallel algorithms and software for EST clustering

被引：21

作者：

Kalyanaraman, A

Aluru, S

Brendel, V

Kothari, S

机构：

[1] Iowa State Univ Sci & Technol, Dept Comp Sci, Ames, IA 50011 USA

[2] Iowa State Univ Sci & Technol, Dept Elect & Comp Engn, Ames, IA 50011 USA

[3] Iowa State Univ Sci & Technol, Dept Zool & Genet, Ames, IA 50011 USA

[4] Iowa State Univ Sci & Technol, Dept Stat, Ames, IA 50011 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2003年 / 14卷 / 12期

基金：

美国国家科学基金会;

关键词：

computational biology; EST clustering; maximal common substring; parallel algorithms; suffix tree applications; SUFFIX TREE CONSTRUCTION; TIGR GENE INDEXES; SEQUENCES; GENERATION; D2-CLUSTER; TOOLS;

D O I：

10.1109/TPDS.2003.1255634

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Expressed sequence tags, abbreviated as ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and for understanding important genetic variations such as those resulting in diseases. In this paper, we present the algorithmic foundations and implementation of PaCE, a parallel software system we developed for large-scale EST clustering. The novel features of our approach include 1) design of space-efficient algorithms to limit the space required to linear in the size of the input data set, 2) a combination of algorithmic techniques to reduce the total work without sacrificing the quality of EST clustering, and 3) use of parallel processing to reduce runtime and facilitate clustering of large data sets. Using a combination of these techniques, we report the clustering of 327,632 rat ESTs in 47 minutes, and 420,694 Triticum aestivum ESTs in 3 hours and 15 minutes, using a 60-processor IBM xSeries cluster. These problems are well beyond the capabilities of state-of-the-art sequential software. We also present thorough experimental evaluation of our software including quality assessment using benchmark Arabidopsis EST data.

引用

页码：1209 / 1221

页数：13

共 30 条

[1]

[Anonymous], 1995, Genome Science and Technology, DOI [DOI 10.1089/GST.1995.1.9, 10.1089/gst.1995.1.9]

[2]

[Anonymous], 1997, ALGORITHMS STRINGS T

[3] PARALLEL CONSTRUCTION OF A SUFFIX TREE WITH APPLICATIONS [J].