Algorithm of OMA for large-scale orthology inference

被引:113
作者
Roth, Alexander C. J. [1 ]
Gonnet, Gaston H.
Dessimoz, Christophe
机构
[1] ETH, CH-8092 Zurich, Switzerland
关键词
D O I
10.1186/1471-2105-9-518
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: OMA is a project that aims to identify orthologs within publicly available, complete genomes. With 657 genomes analyzed to date, OMA is one of the largest projects of its kind. Results: The algorithm of OMA improves upon standard bidirectional best-hit approach in several respects: it uses evolutionary distances instead of scores, considers distance inference uncertainty, includes many-to-many orthologous relations, and accounts for differential gene losses. Herein, we describe in detail the algorithm for inference of orthology and provide the rationale for parameter selection through multiple tests. Conclusion: OMA contains several novel improvement ideas for orthology inference and provides a unique dataset of large-scale orthology assignments.
引用
收藏
页数:10
相关论文
共 33 条
[1]   Automatic clustering of orthologs and inparalogs shared by multiple proteomes [J].
Alexeyenko, Andrey ;
Tamas, Ivica ;
Liu, Gang ;
Sonnhammer, Erik L. L. .
BIOINFORMATICS, 2006, 22 (14) :E9-E15
[2]  
ALTENHOFF AM, 2008, PLOS COMPUT IN PRESS
[3]   An improved fixed-parameter algorithm for vertex cover [J].
Balasubramanian, R ;
Fellows, MR ;
Raman, V .
INFORMATION PROCESSING LETTERS, 1998, 65 (03) :163-168
[4]  
Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkr1065, 10.1093/nar/gkp985, 10.1093/nar/gkh121]
[5]  
Benson DA, 2010, NUCLEIC ACIDS RES, V38, pD46, DOI [10.1093/nar/gkp1024, 10.1093/nar/gkq1079, 10.1093/nar/gkl986, 10.1093/nar/gks1195, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkn723, 10.1093/nar/gkx1094]
[6]   Optimal gene trees from sequences and species trees using a soft interpretation of parsimony [J].
Berglund-Sonnhammer, Ann-Charlotte ;
Steffansson, Par ;
Betts, Matthew J. ;
Liberles, David A. .
JOURNAL OF MOLECULAR EVOLUTION, 2006, 63 (02) :240-250
[7]   Domain rearrangements in protein evolution [J].
Björklund, ÅK ;
Ekman, D ;
Light, S ;
Frey-Skött, J ;
Elofsson, A .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 353 (04) :911-923
[8]   NOTUNG: A program for dating gene duplications and optimizing gene family trees [J].
Chen, K ;
Durand, D ;
Farach-Colton, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :429-447
[9]  
Dayhoff M., 1978, Atlas of Protein Sequence and Structure, V5, P345
[10]   Roundup: a multi-genome repository of orthologs and evolutionary distances [J].
DeLuca, Todd F. ;
Wu, I-Hsien ;
Pu, Jian ;
Monaghan, Thomas ;
Peshkin, Leonid ;
Singh, Saurav ;
Wall, Dennis P. .
BIOINFORMATICS, 2006, 22 (16) :2044-2046