Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans

被引:2
作者
Agrawal, R [1 ]
Stormo, GD [1 ]
机构
[1] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
关键词
D O I
10.1093/bioinformatics/btl076
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Computational gene prediction methods are an important component of whole genome analyses. While ab initio gene finders have demonstrated major improvements in accuracy, the most reliable methods are evidence-based gene predictors. These algorithms can rely on several different sources of evidence including predictions from multiple ab initio gene finders, matches to known proteins, sequence conservation and partial cDNAs to predict the final product. Despite the success of these algorithms, prediction of complete gene structures, especially for alternatively spliced products, remains a difficult task. Results: LOCUS (Length Optimized Characterization of Unknown Spliceforms) is a new evidence-based gene finding algorithm which integrates a length-constraint into a dynamic programming-based framework for prediction of gene products. On a Caenorhabditis elegans test set of alternatively spliced internal exons, its performance exceeds that of current ab initio gene finders and in most cases can accurately predict the correct form of all the alternative products. As the length information used by the algorithm can be obtained in a high-throughput fashion, we propose that integration of such information into a gene-prediction pipeline is feasible and doing so may improve our ability to fully characterize the complete set of mRNAs for a genome.
引用
收藏
页码:1239 / 1244
页数:6
相关论文
共 26 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Recent advances in gene structure prediction [J].
Brent, MR ;
Guigó, R .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2004, 14 (03) :264-272
[3]   ESEfinder: a web resource to identify exonic splicing enhancers [J].
Cartegni, L ;
Wang, JH ;
Zhu, ZW ;
Zhang, MQ ;
Krainer, AR .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3568-3571
[4]   HMM sampling and applications to gene finding and alternative splicing [J].
Cawley, Simon L. ;
Pachter, Lior .
BIOINFORMATICS, 2003, 19 :II36-II41
[5]   Pseudo-likelhood ratio tests for semiparametric multivariate copula model selection [J].
Chen, XH ;
Fan, YQ .
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2005, 33 (03) :389-414
[6]   Intron-exon structures of eukaryotic model organisms [J].
Deutsch, M ;
Long, M .
NUCLEIC ACIDS RESEARCH, 1999, 27 (15) :3219-3228
[7]   Leveraging the mouse genome for gene prediction in human: From whole-genome shotgun reads to a global synteny map [J].
Flicek, P ;
Keibler, E ;
Hu, P ;
Korf, I ;
Brent, MR .
GENOME RESEARCH, 2003, 13 (01) :46-54
[8]   EGASP:: collaboration through competition to find human genes [J].
Guigó, R ;
Reese, MG .
NATURE METHODS, 2005, 2 (08) :575-577
[9]   The Ensembl genome database project [J].
Hubbard, T ;
Barker, D ;
Birney, E ;
Cameron, G ;
Chen, Y ;
Clark, L ;
Cox, T ;
Cuff, J ;
Curwen, V ;
Down, T ;
Durbin, R ;
Eyras, E ;
Gilbert, J ;
Hammond, M ;
Huminiecki, L ;
Kasprzyk, A ;
Lehvaslaiho, H ;
Lijnzaad, P ;
Melsopp, C ;
Mongin, E ;
Pettett, R ;
Pocock, M ;
Potter, S ;
Rust, A ;
Schmidt, E ;
Searle, S ;
Slater, G ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Stupka, E ;
Ureta-Vidal, A ;
Vastrik, I ;
Clamp, M .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :38-41
[10]   Computational comparative analyses of alternative splicing regulation using full-length cDNA of various eukaryotes [J].
Itoh, H ;
Washio, T ;
Tomita, M .
RNA, 2004, 10 (07) :1005-1018