Exploring the plant transcriptome through phylogenetic profiling

被引:33
作者
Vandepoele, K [1 ]
Van de Peer, Y [1 ]
机构
[1] State Univ Ghent VIB, Dept Plant Syst Biol, B-9052 Ghent, Belgium
关键词
D O I
10.1104/pp.104.054700
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.
引用
收藏
页码:31 / 42
页数:12
相关论文
共 53 条
[11]   Chlamydomonas reinhardtii at the crossroads of genomics [J].
Grossman, AR ;
Harris, EE ;
Hauser, C ;
Lefebvre, PA ;
Martinez, D ;
Rokhsar, D ;
Shrager, J ;
Silflow, CD ;
Stern, D ;
Vallon, O ;
Zhang, ZD .
EUKARYOTIC CELL, 2003, 2 (06) :1137-1150
[12]   Phylogenetic profiling of the Arabidopsis thaliana proteome:: what proteins distinguish plants from other organisms? -: art. no. R53 [J].
Gutiérrez, RA ;
Green, PJ ;
Keegstra, K ;
Ohlrogge, JB .
GENOME BIOLOGY, 2004, 5 (08)
[13]   The new genes of rice: a closer look [J].
Jabbari, K ;
Cruveiller, S ;
Clay, O ;
Le Saux, J ;
Bernardi, G .
TRENDS IN PLANT SCIENCE, 2004, 9 (06) :281-285
[14]   Analysis of the genome sequence of the flowering plant Arabidopsis thaliana [J].
Kaul, S ;
Koo, HL ;
Jenkins, J ;
Rizzo, M ;
Rooney, T ;
Tallon, LJ ;
Feldblyum, T ;
Nierman, W ;
Benito, MI ;
Lin, XY ;
Town, CD ;
Venter, JC ;
Fraser, CM ;
Tabata, S ;
Nakamura, Y ;
Kaneko, T ;
Sato, S ;
Asamizu, E ;
Kato, T ;
Kotani, H ;
Sasamoto, S ;
Ecker, JR ;
Theologis, A ;
Federspiel, NA ;
Palm, CJ ;
Osborne, BI ;
Shinn, P ;
Conway, AB ;
Vysotskaia, VS ;
Dewar, K ;
Conn, L ;
Lenz, CA ;
Kim, CJ ;
Hansen, NF ;
Liu, SX ;
Buehler, E ;
Altafi, H ;
Sakano, H ;
Dunn, P ;
Lam, B ;
Pham, PK ;
Chao, Q ;
Nguyen, M ;
Yu, GX ;
Chen, HM ;
Southwick, A ;
Lee, JM ;
Miranda, M ;
Toriumi, MJ ;
Davis, RW .
NATURE, 2000, 408 (6814) :796-815
[15]   Glycine-rich proteins encoded by a nodule-specific gene family are implicated in different stages of symbiotic nodule development in Medicago spp. [J].
Kevei, Z ;
Vinardell, JM ;
Kiss, GB ;
Kondorosi, A ;
Kondorosi, E .
MOLECULAR PLANT-MICROBE INTERACTIONS, 2002, 15 (09) :922-931
[16]   PRIMARY STRUCTURE AND EXPRESSION OF A GAMETE LYTIC ENZYME IN CHLAMYDOMONAS-REINHARDTII - SIMILARITY OF FUNCTIONAL DOMAINS TO MATRIX METALLOPROTEASES [J].
KINOSHITA, T ;
FUKUZAWA, H ;
SHIMADA, T ;
SAITO, T ;
MATSUDA, Y .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (10) :4693-4697
[17]   A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes [J].
Koonin, EV ;
Fedorova, ND ;
Jackson, JD ;
Jacobs, AR ;
Krylov, DM ;
Makarova, KS ;
Mazumder, R ;
Mekhedov, SL ;
Nikolskaya, AN ;
Rao, BS ;
Rogozin, IB ;
Smirnov, S ;
Sorokin, AV ;
Sverdlov, AV ;
Vasudevan, S ;
Wolf, YI ;
Yin, JJ ;
Natale, DA .
GENOME BIOLOGY, 2004, 5 (02)
[18]   Clustering and analysis of protein families [J].
Kriventseva, EV ;
Biswas, M ;
Apweiler, R .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2001, 11 (03) :334-339
[19]   The EMBL nucleotide sequence database [J].
Kulikova, T ;
Aldebert, P ;
Althorpe, N ;
Baker, W ;
Bates, K ;
Browne, P ;
van den Broek, A ;
Cochrane, G ;
Duggan, K ;
Eberhardt, R ;
Faruque, N ;
Garcia-Pastor, M ;
Harte, N ;
Kanz, C ;
Leinonen, R ;
Lin, Q ;
Lombard, V ;
Lopez, R ;
Mancuso, R ;
McHale, M ;
Nardone, F ;
Silventoinen, V ;
Stoehr, P ;
Stoesser, G ;
Tuli, MA ;
Tzouvara, K ;
Vaughan, R ;
Wu, D ;
Zhu, WM ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D27-D30
[20]   Evolutionary analyses of the human genome [J].
Li, WH ;
Gu, ZL ;
Wang, HD ;
Nekrutenko, A .
NATURE, 2001, 409 (6822) :847-849