Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human

被引:93
作者
Goodstadt, Leo [1 ]
Ponting, Chris P. [1 ]
机构
[1] Univ Oxford, MRC, Funct Genet Unit, Dept Physiol Anat & Genet, Oxford OX1 2JD, England
基金
英国医学研究理事会;
关键词
D O I
10.1371/journal.pcbi.0020133
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or "in-paralogues," are generally shorter and have fewer exons than 1: 1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes.
引用
收藏
页码:1134 / 1150
页数:17
相关论文
共 64 条
[21]   Nature and structure of human genes that generate retropseudogenes [J].
Gonçalves, I ;
Duret, L ;
Mouchiroud, D .
GENOME RESEARCH, 2000, 10 (05) :672-678
[22]   Transcription-associated mutational asymmetry in mammalian evolution [J].
Green, P ;
Ewing, B ;
Miller, W ;
Thomas, PJ ;
Green, ED .
NATURE GENETICS, 2003, 33 (04) :514-517
[23]   A 1-Mb resolution radiation hybrid map of the canine genome [J].
Guyon, R ;
Lorentzen, TD ;
Hitte, C ;
Kim, L ;
Cadieu, E ;
Parker, HG ;
Quignon, P ;
Lowe, JK ;
Renier, C ;
Gelfenbeyn, B ;
Vignaux, F ;
DeFrance, HB ;
Gloux, S ;
Mahairas, GG ;
André, C ;
Galiber, F ;
Ostrander, EA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (09) :5296-5301
[24]   Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution [J].
Hardison, RC ;
Roskin, KM ;
Yang, S ;
Diekhans, M ;
Kent, WJ ;
Weber, R ;
Elnitski, L ;
Li, J ;
O'Connor, M ;
Kolbe, D ;
Schwartz, S ;
Furey, TS ;
Whelan, S ;
Goldman, N ;
Smit, A ;
Miller, W ;
Chiaromonte, F ;
Haussler, D .
GENOME RESEARCH, 2003, 13 (01) :13-26
[25]   EFFECT OF LINKAGE ON LIMITS TO ARTIFICIAL SELECTION [J].
HILL, WG ;
ROBERTSON, A .
GENETICS RESEARCH, 1966, 8 (03) :269-+
[26]   Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution [J].
Hillier, LW ;
Miller, W ;
Birney, E ;
Warren, W ;
Hardison, RC ;
Ponting, CP ;
Bork, P ;
Burt, DW ;
Groenen, MAM ;
Delany, ME ;
Dodgson, JB ;
Chinwalla, AT ;
Cliften, PF ;
Clifton, SW ;
Delehaunty, KD ;
Fronick, C ;
Fulton, RS ;
Graves, TA ;
Kremitzki, C ;
Layman, D ;
Magrini, V ;
McPherson, JD ;
Miner, TL ;
Minx, P ;
Nash, WE ;
Nhan, MN ;
Nelson, JO ;
Oddy, LG ;
Pohl, CS ;
Randall-Maher, J ;
Smith, SM ;
Wallis, JW ;
Yang, SP ;
Romanov, MN ;
Rondelli, CM ;
Paton, B ;
Smith, J ;
Morrice, D ;
Daniels, L ;
Tempest, HG ;
Robertson, L ;
Masabanda, JS ;
Griffin, DK ;
Vignal, A ;
Fillon, V ;
Jacobbson, L ;
Kerje, S ;
Andersson, L ;
Crooijmans, RPM ;
Aerts, J .
NATURE, 2004, 432 (7018) :695-716
[27]   Ensembl 2005 [J].
Hubbard, T ;
Andrews, D ;
Caccamo, M ;
Cameron, G ;
Chen, Y ;
Clamp, M ;
Clarke, L ;
Coates, G ;
Cox, T ;
Cunningham, F ;
Curwen, V ;
Cutts, T ;
Down, T ;
Durbin, R ;
Fernandez-Suarez, XM ;
Gilbert, J ;
Hammond, M ;
Herrero, J ;
Hotz, H ;
Howe, K ;
Iyer, V ;
Jekosch, K ;
Kahari, A ;
Kasprzyk, A ;
Keefe, D ;
Keenan, S ;
Kokocinsci, F ;
London, D ;
Longden, I ;
McVicker, G ;
Melsopp, C ;
Meidl, P ;
Potter, S ;
Proctor, G ;
Rae, M ;
Rios, D ;
Schuster, M ;
Searle, S ;
Severin, J ;
Slater, G ;
Smedley, D ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Storey, R ;
Trevanion, S ;
Ureta-Vidal, A ;
Vogel, J ;
White, S .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D447-D453
[28]   Duplicated genes evolve slower than singletons despite the initial rate increase [J].
Jordan, IK ;
Wolf, YI ;
Koonin, EV .
BMC EVOLUTIONARY BIOLOGY, 2004, 4 (1)
[29]  
Katju V, 2003, GENETICS, V165, P1793
[30]   Deleterious nutations and the evolution of sex [J].
Keightley, PD ;
Eyre-Walker, A .
SCIENCE, 2000, 290 (5490) :331-333