Identifying and Quantifying Orphan Protein Sequences in Fungi

被引:46
作者
Ekman, Diana [1 ]
Elofsson, Arne [1 ]
机构
[1] Stockholm Univ, Stockholm Bioinformat Ctr, Ctr Biomembrane Res, Dept Biochem & Biophys, SE-10691 Stockholm, Sweden
关键词
evolution; protein domain; orphan protein; fungi; INTRINSICALLY UNSTRUCTURED PROTEINS; SACCHAROMYCES-CEREVISIAE; STRUCTURE PREDICTION; EVOLUTIONARY RATE; GENES; REGIONS; YEAST; ALIGNMENT; GENOMES; ORFANS;
D O I
10.1016/j.jmb.2009.11.053
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
For large regions of many proteins, and even entire proteins, no homology to known domains or proteins can be detected. These sequences are often referred to as orphans. Surprisingly, it has been reported that the large number of orphans is sustained in spite of a rapid increase of available genomic sequences. However, it is believed that de novo creation of coding sequences is rare in comparison to mechanisms such as domain shuffling and gene duplication; hence, most sequences should have homologs in other genomes. To investigate this, the sequences of 19 complete fungi genomes were compared. By using the phylogenetic relationship between these genomes, we could identify potentially de novo created orphans in Saccharomyces cerevisiae. We found that only a small fraction, <2%, of the S. cerevisiae proteome is orphan, which confirms that de novo creation of coding sequences is indeed rare. Furthermore, we found it necessary to compare the most closely related species to distinguish between de novo created sequences and rapidly evolving sequences where homologs are present but cannot be detected. Next, the orphan proteins (OPs) and orphan domains (ODs) were characterized. First, it was observed that both OPs and ODs are short. In addition, at least some of the OPs have been shown to be functional in experimental assays, showing that they are not pseudogenes. Furthermore, in contrast to what has been reported before and what is seen for older orphans, S. cerevisiae specific ODs and proteins are not more disordered than other proteins. This might indicate that many of the older, and earlier classified, orphans indeed are fast-evolving sequences. Finally, >90% of the detected ODs are located at the protein termini, which suggests that these orphans could have been created by mutations that have affected the start or stop codons. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:396 / 405
页数:10
相关论文
共 33 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Domain rearrangements in protein evolution [J].
Björklund, ÅK ;
Ekman, D ;
Light, S ;
Frey-Skött, J ;
Elofsson, A .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 353 (04) :911-923
[3]   Evolutionary rate heterogeneity in proteins with long disordered regions [J].
Brown, CJ ;
Takayama, S ;
Campen, AM ;
Vise, P ;
Marshall, TW ;
Oldfield, CJ ;
Williams, CJ ;
Dunker, AK .
JOURNAL OF MOLECULAR EVOLUTION, 2002, 55 (01) :104-110
[4]   Accelerated evolutionary rate may be responsible for the emergence of lineage-specific genes in Ascomycota [J].
Cai, James J. ;
Woo, Patrick C. Y. ;
Lau, Susanna K. P. ;
Smith, David K. ;
Yuen, Kwok-yung .
JOURNAL OF MOLECULAR EVOLUTION, 2006, 63 (01) :1-11
[5]   De novo origination of a new protein-coding gene in Saccharomyces cerevisiae [J].
Cai, Jing ;
Zhao, Ruoping ;
Jiang, Huifeng ;
Wang, Wen .
GENETICS, 2008, 179 (01) :487-496
[6]   An evolutionary analysis of orphan genes in Drosophila [J].
Domazet-Loso, T ;
Tautz, D .
GENOME RESEARCH, 2003, 13 (10) :2213-2219
[7]   The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins [J].
Dosztányi, Z ;
Csizmók, V ;
Tompa, P ;
Simon, I .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 347 (04) :827-839
[8]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[9]   Multi-domain proteins in the three kingdoms of life:: Orphan domains and other unassigned regions [J].
Ekman, D ;
Björklund, ÅK ;
Frey-Skött, J ;
Elofsson, A .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 348 (01) :231-243
[10]   Finding families for genomic ORFans [J].
Fischer, D ;
Eisenberg, D .
BIOINFORMATICS, 1999, 15 (09) :759-762