Assignment of orthologous genes via genome rearrangement

被引:107
作者
Chen, X [1 ]
Zheng, J
Fu, Z
Nan, P
Zhong, Y
Lonardi, S
Jiang, T
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Shanghai Ctr Bioinformat Technol, Shanghai, Peoples R China
基金
美国国家科学基金会;
关键词
ortholog; paralog; gene duplication; genome rearrangement; reversal; comparative genomics;
D O I
10.1109/TCBB.2005.48
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.
引用
收藏
页码:302 / 315
页数:14
相关论文
共 28 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   A linear-time algorithm for computing inversion distance between signed permutations with an experimental study [J].
Bader, DA ;
Moret, BME ;
Yan, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (05) :483-491
[3]  
Bourque G, 2002, GENOME RES, V12, P26
[4]   OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies [J].
Cannon, SB ;
Young, ND .
BMC BIOINFORMATICS, 2003, 4 (1)
[5]  
Caprara A., 1997, P 1 ANN INT C COMP M, P75, DOI DOI 10.1145/267521.267531
[6]  
CAPRIOLI A, 1999, NOTIZIARIO I SUPE S1, V12, P1
[7]   LINEAR-SPACE ALGORITHMS THAT BUILD LOCAL ALIGNMENTS FROM FRAGMENTS [J].
CHAO, KM ;
MILLER, W .
ALGORITHMICA, 1995, 13 (1-2) :106-134
[8]   Sorting strings by reversals and by transpositions [J].
Christie, DA ;
Irving, RW .
SIAM JOURNAL ON DISCRETE MATHEMATICS, 2001, 14 (02) :193-206
[9]   Reconstructing an ancestral genome using minimum segments duplications and reversals [J].
El-Mabrouk, N .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2002, 65 (03) :442-464
[10]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&