HaMStR: Profile hidden markov model based search for orthologs in ESTs

被引:224
作者
Ebersberger, Ingo [1 ,2 ,3 ,4 ]
Strauss, Sascha [1 ,2 ,3 ,4 ]
von Haeseler, Arndt [1 ,2 ,3 ,4 ]
机构
[1] Max F Perutz Labs, CIBIV, Vienna, Austria
[2] Univ Vienna, Vienna, Austria
[3] Med Univ Vienna, Vienna, Austria
[4] Univ Vet Med, Vienna, Austria
来源
BMC EVOLUTIONARY BIOLOGY | 2009年 / 9卷
基金
奥地利科学基金会;
关键词
MULTIPLE SEQUENCE ALIGNMENT; TIGR GENE INDEXES; COMPLETE GENOMES; PHYLOGENOMICS; INFERENCE; TREE; RESOLUTION; PROTEOMES; PROTEINS; DATABASE;
D O I
10.1186/1471-2148-9-157
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: EST sequencing is a versatile approach for rapidly gathering protein coding sequences. They provide direct access to an organism's gene repertoire bypassing the still error-prone procedure of gene prediction from genomic data. Therefore, ESTs are often the only source for biological sequence data from taxa outside mainstream interest. The widespread use of ESTs in evolutionary studies and particularly in molecular systematics studies is still hindered by the lack of efficient and reliable approaches for automated ortholog predictions in ESTs. Existing methods either depend on a known species tree or cannot cope with redundancy in EST data. Results: We present a novel approach (HaMStR) to mine EST data for the presence of orthologs to a curated set of genes. HaMStR combines a profile Hidden Markov Model search and a subsequent BLAST search to extend existing ortholog cluster with sequences from further taxa. We show that the HaMStR results are consistent with those obtained with existing orthology prediction methods that require completely sequenced genomes. A case study on the phylogeny of 35 fungal taxa illustrates that HaMStR is well suited to compile informative data sets for phylogenomic studies from ESTs and protein sequence data. Conclusion: HaMStR extends in a standardized manner a pre-defined set of orthologs with ESTs from further taxa. In the same fashion HaMStR can be applied to protein sequence data, and thus provides a comprehensive approach to compile ortholog cluster from any protein coding data. The resulting orthology predictions serve as the data basis for a variety of evolutionary studies. Here, we have demonstrated the application of HaMStR in a molecular systematics study. However, we envision that studies tracing the evolutionary fate of individual genes or functional complexes of genes will greatly benefit from HaMStR orthology predictions as well.
引用
收藏
页数:9
相关论文
共 34 条
[1]   Automatic clustering of orthologs and inparalogs shared by multiple proteomes [J].
Alexeyenko, Andrey ;
Tamas, Ivica ;
Liu, Gang ;
Sonnhammer, Erik L. L. .
BIOINFORMATICS, 2006, 22 (14) :E9-E15
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Lack of resolution in the animal phylogeny: Closely spaced cladogeneses or undetected systematic errors? [J].
Baurain, Denis ;
Brinkmann, Henner ;
Philippe, Herve .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (01) :6-9
[4]   GeneWise and genomewise [J].
Birney, E ;
Clamp, M ;
Durbin, R .
GENOME RESEARCH, 2004, 14 (05) :988-995
[5]   Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes [J].
Chen, Feng ;
Mackey, Aaron J. ;
Vermunt, Jeroen K. ;
Roos, David S. .
PLOS ONE, 2007, 2 (04)
[6]   Phylogenomics and the reconstruction of the tree of life [J].
Delsuc, F ;
Brinkmann, H ;
Philippe, H .
NATURE REVIEWS GENETICS, 2005, 6 (05) :361-375
[7]   Broad phylogenomic sampling improves resolution of the animal tree of life [J].
Dunn, Casey W. ;
Hejnol, Andreas ;
Matus, David Q. ;
Pang, Kevin ;
Browne, William E. ;
Smith, Stephen A. ;
Seaver, Elaine ;
Rouse, Greg W. ;
Obst, Matthias ;
Edgecombe, Gregory D. ;
Sorensen, Martin V. ;
Haddock, Steven H. D. ;
Schmidt-Rhaesa, Andreas ;
Okusu, Akiko ;
Kristensen, Reinhardt Mobjerg ;
Wheeler, Ward C. ;
Martindale, Mark Q. ;
Giribet, Gonzalo .
NATURE, 2008, 452 (7188) :745-U5
[8]  
Durbin R., 1998, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
[9]   Assessment of phylogenomic and orthology approaches for phylogenetic inference [J].
Dutilh, B. E. ;
van Noort, V. ;
van der Heijden, R. T. J. M. ;
Boekhout, T. ;
Snel, B. ;
Huynen, M. A. .
BIOINFORMATICS, 2007, 23 (07) :815-824
[10]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194