Limitations and potentials of current motif discovery algorithms

被引:157
作者
Hu, JJ
Li, B
Kihara, D [1 ]
机构
[1] Purdue Univ, Coll Sci, Dept Biol Sci, W Lafayette, IN 47907 USA
[2] Purdue Univ, Coll Sci, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Purdue Univ, Coll Sci, Markey Ctr Struct Biol, W Lafayette, IN 47907 USA
[4] Purdue Univ, Coll Sci, Bindley Biosci Ctr, W Lafayette, IN 47907 USA
关键词
D O I
10.1093/nar/gki791
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6-45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them.
引用
收藏
页码:4899 / 4913
页数:15
相关论文
共 49 条
[1]  
Abramowitz M., HDB MATH FUNCTIONS F, V10th
[2]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[3]   Functional genomics as applied to mapping transcription regulatory networks [J].
Banerjee, N ;
Zhang, MX .
CURRENT OPINION IN MICROBIOLOGY, 2002, 5 (03) :313-317
[4]  
Benítez-Bellón E, 2002, GENOME BIOL, V3
[5]   Discovery of regulatory elements by a computational method for phylogenetic footprinting [J].
Blanchette, M ;
Tompa, M .
GENOME RESEARCH, 2002, 12 (05) :739-748
[6]   Predicting gene regulatory elements in silico on a genomic scale [J].
Brazma, A ;
Jonassen, I ;
Vilo, J ;
Ukkonen, E .
GENOME RESEARCH, 1998, 8 (11) :1202-1215
[7]   Finding motifs using random projections [J].
Buhler, J ;
Tompa, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (02) :225-242
[8]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[9]   CRITICAL COMPARISON OF CONSENSUS METHODS FOR MOLECULAR SEQUENCES [J].
DAY, WHE ;
MCMORRIS, FR .
NUCLEIC ACIDS RESEARCH, 1992, 20 (05) :1093-1099
[10]  
DIETTERICH G, 2000, P 1 INT WORKSH MULT, P1