DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection

被引:14
作者
Chen, Ting-wen [1 ,2 ]
Wu, Timothy H. [1 ]
Ng, Wailap V. [1 ]
Lin, Wen-chang [1 ,2 ]
机构
[1] Natl Yang Ming Univ, Inst Biomed Informat, Taipei 112, Taiwan
[2] Acad Sinica, Inst Biomed Sci, Taipei, Taiwan
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
PROTEIN FUNCTIONS; DATABASE; PHYLOGENOMICS; INFERENCE; EVOLUTION; HOMOLOGY; GENOMES; PFAM;
D O I
10.1186/1471-2105-11-S7-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired. Results: An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases. Conclusions: DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from http://140.109.42.19:16080/dodo_web/home.htm
引用
收藏
页数:10
相关论文
共 24 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   The geometry of domain combination in proteins [J].
Bashton, M ;
Chothia, C .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 315 (04) :927-939
[3]   Phylogenomics and the reconstruction of the tree of life [J].
Delsuc, F ;
Brinkmann, H ;
Philippe, H .
NATURE REVIEWS GENETICS, 2005, 6 (05) :361-375
[4]   Pfam:: clans, web tools and services [J].
Finn, Robert D. ;
Mistry, Jaina ;
Schuster-Bockler, Benjamin ;
Griffiths-Jones, Sam ;
Hollich, Volker ;
Lassmann, Timo ;
Moxon, Simon ;
Marshall, Mhairi ;
Khanna, Ajay ;
Durbin, Richard ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D247-D251
[5]   The Pfam protein families database [J].
Finn, Robert D. ;
Tate, John ;
Mistry, Jaina ;
Coggill, Penny C. ;
Sammut, Stephen John ;
Hotz, Hans-Rudolf ;
Ceric, Goran ;
Forslund, Kristoffer ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D281-D288
[6]   Homology - a personal view on some of the problems [J].
Fitch, WM .
TRENDS IN GENETICS, 2000, 16 (05) :227-231
[7]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&
[8]   MSOAR: A high-throughput ortholog assignment system based on genome rearrangement [J].
Fu, Zheng ;
Chen, Xin ;
Vacic, Vladimir ;
Nan, Peng ;
Zhong, Yang ;
Jiang, Tao .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2007, 14 (09) :1160-1175
[9]  
Fu Zheng, 2008, Journal of Bioinformatics and Computational Biology, V6, P573, DOI 10.1142/S0219720008003540
[10]   Ensembl 2007 [J].
Hubbard, T. J. P. ;
Aken, B. L. ;
Beal, K. ;
Ballester, B. ;
Caccamo, M. ;
Chen, Y. ;
Clarke, L. ;
Coates, G. ;
Cunningham, F. ;
Cutts, T. ;
Down, T. ;
Dyer, S. C. ;
Fitzgerald, S. ;
Fernandez-Banet, J. ;
Graf, S. ;
Haider, S. ;
Hammond, M. ;
Herrero, J. ;
Holland, R. ;
Howe, K. ;
Howe, K. ;
Johnson, N. ;
Kahari, A. ;
Keefe, D. ;
Kokocinski, F. ;
Kulesha, E. ;
Lawson, D. ;
Longden, I. ;
Melsopp, C. ;
Megy, K. ;
Meidl, P. ;
Overduin, B. ;
Parker, A. ;
Prlic, A. ;
Rice, S. ;
Rios, D. ;
Schuster, M. ;
Sealy, I. ;
Severin, J. ;
Slater, G. ;
Smedley, D. ;
Spudich, G. ;
Trevanion, S. ;
Vilella, A. ;
Vogel, J. ;
White, S. ;
Wood, M. ;
Cox, T. ;
Curwen, V. ;
Durbin, R. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D610-D617