Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs

被引:39
作者
Zheng, XH [1 ]
Lu, F [1 ]
Wang, ZY [1 ]
Hoover, J [1 ]
Mural, R [1 ]
机构
[1] Celera Genom Corp, Assays & Bioinformat, Rockville, MD 20850 USA
关键词
D O I
10.1093/bioinformatics/bti045
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The identification of orthologous gene pairs is generally based on sequence similarity. Gene pairs that are mutually 'best hits' between the genomes being compared are asserted to be orthologs. Although this method identifies most orthologous gene pairs with high confidence, it will miss a fraction of them, especially genes in duplicated gene families. In addition, the approach depends heavily on the completeness and quality of gene annotation. When the gene sequences are not correctly represented the approach is unlikely to find the correct ortholog. To overcome these limitations, we have developed an approach to identify orthologous gene pairs using shared chromosomal synteny and the annotation of protein function. Results: Assembled mouse and human genomes were used to identify the regions of conserved synteny between these genomes. 'Syntenic anchors' are conserved non-repetitive locations between mouse and human genomes. Using these anchors, we identified blocks of sequences that contain consistently ordered anchors between the two genomes (syntenic blocks). The synteny information has been used to help us identify orthologous gene pairs between mouse and human genomes. The approach combines the mutual selection of the best tBlastX hits between human and mouse transcripts, and inferring gene orthologous relationships based on sharing syntenic anchors, collocating in the same syntenic blocks and sharing the same annotated protein function. Using this approach, we were able to find 19 357 orthologous gene pairs between human and mouse genomes, a 20% increase in the number of orthologs identified by conventional approaches.
引用
收藏
页码:703 / 710
页数:8
相关论文
共 36 条
[1]   The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[2]  
[Anonymous], 1998, SCIENCE, V282, P2012
[3]   Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes [J].
Aparicio, S ;
Chapman, J ;
Stupka, E ;
Putnam, N ;
Chia, J ;
Dehal, P ;
Christoffels, A ;
Rash, S ;
Hoon, S ;
Smit, A ;
Gelpke, MDS ;
Roach, J ;
Oh, T ;
Ho, IY ;
Wong, M ;
Detter, C ;
Verhoef, F ;
Predki, P ;
Tay, A ;
Lucas, S ;
Richardson, P ;
Smith, SF ;
Clark, MS ;
Edwards, YJK ;
Doggett, N ;
Zharkikh, A ;
Tavtigian, SV ;
Pruss, D ;
Barnstead, M ;
Evans, C ;
Baden, H ;
Powell, J ;
Glusman, G ;
Rowen, L ;
Hood, L ;
Tan, YH ;
Elgar, G ;
Hawkins, T ;
Venkatesh, B ;
Rokhsar, D ;
Brenner, S .
SCIENCE, 2002, 297 (5585) :1301-1310
[4]   Ultraconserved elements in the human genome [J].
Bejerano, G ;
Pheasant, M ;
Makunin, I ;
Stephen, S ;
Kent, WJ ;
Mattick, JS ;
Haussler, D .
SCIENCE, 2004, 304 (5675) :1321-1325
[5]   Ensembl 2002: accommodating comparative genomics [J].
Clamp, M ;
Andrews, D ;
Barker, D ;
Bevan, P ;
Cameron, G ;
Chen, Y ;
Clark, L ;
Cox, T ;
Cuff, J ;
Curwen, V ;
Down, T ;
Durbin, R ;
Eyras, E ;
Gilbert, J ;
Hammond, M ;
Hubbard, T ;
Kasprzyk, A ;
Keefe, D ;
Lehvaslaiho, H ;
Iyer, V ;
Melsopp, C ;
Mongin, E ;
Pettett, R ;
Potter, S ;
Rust, A ;
Schmidt, E ;
Searle, S ;
Slater, G ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Stupka, E ;
Ureta-Vidal, A ;
Vastrik, I ;
Birney, E .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :38-42
[6]   The draft genome of Ciona intestinalis:: Insights into chordate and vertebrate origins [J].
Dehal, P ;
Satou, Y ;
Campbell, RK ;
Chapman, J ;
Degnan, B ;
De Tomaso, A ;
Davidson, B ;
Di Gregorio, A ;
Gelpke, M ;
Goodstein, DM ;
Harafuji, N ;
Hastings, KEM ;
Ho, I ;
Hotta, K ;
Huang, W ;
Kawashima, T ;
Lemaire, P ;
Martinez, D ;
Meinertzhagen, IA ;
Necula, S ;
Nonaka, M ;
Putnam, N ;
Rash, S ;
Saiga, H ;
Satake, M ;
Terry, A ;
Yamada, L ;
Wang, HG ;
Awazu, S ;
Azumi, K ;
Boore, J ;
Branno, M ;
Chin-bow, S ;
DeSantis, R ;
Doyle, S ;
Francino, P ;
Keys, DN ;
Haga, S ;
Hayashi, H ;
Hino, K ;
Imai, KS ;
Inaba, K ;
Kano, S ;
Kobayashi, K ;
Kobayashi, M ;
Lee, BI ;
Makabe, KW ;
Manohar, C ;
Matassi, G ;
Medina, M .
SCIENCE, 2002, 298 (5601) :2157-2167
[7]   Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[8]   Homology - a personal view on some of the problems [J].
Fitch, WM .
TRENDS IN GENETICS, 2000, 16 (05) :227-231
[9]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&
[10]   Genome sequence of the Brown Norway rat yields insights into mammalian evolution [J].
Gibbs, RA ;
Weinstock, GM ;
Metzker, ML ;
Muzny, DM ;
Sodergren, EJ ;
Scherer, S ;
Scott, G ;
Steffen, D ;
Worley, KC ;
Burch, PE ;
Okwuonu, G ;
Hines, S ;
Lewis, L ;
DeRamo, C ;
Delgado, O ;
Dugan-Rocha, S ;
Miner, G ;
Morgan, M ;
Hawes, A ;
Gill, R ;
Holt, RA ;
Adams, MD ;
Amanatides, PG ;
Baden-Tillson, H ;
Barnstead, M ;
Chin, S ;
Evans, CA ;
Ferriera, S ;
Fosler, C ;
Glodek, A ;
Gu, ZP ;
Jennings, D ;
Kraft, CL ;
Nguyen, T ;
Pfannkoch, CM ;
Sitter, C ;
Sutton, GG ;
Venter, JC ;
Woodage, T ;
Smith, D ;
Lee, HM ;
Gustafson, E ;
Cahill, P ;
Kana, A ;
Doucette-Stamm, L ;
Weinstock, K ;
Fechtel, K ;
Weiss, RB ;
Dunn, DM ;
Green, ED .
NATURE, 2004, 428 (6982) :493-521