High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

被引:359
作者
Novaes, Evandro [1 ]
Drost, Derek R. [1 ]
Farmerie, William G. [2 ,3 ]
Pappas, Georgios J., Jr. [4 ,5 ]
Grattapaglia, Dario [4 ,5 ]
Sederoff, Ronald R. [6 ]
Kirst, Matias [1 ,3 ]
机构
[1] Univ Florida, Sch Forest Resources & Conservat, Gainesville, FL 32611 USA
[2] Univ Florida, Interdisciplinary Ctr Biotechnol Res, Gainesville, FL 32611 USA
[3] Univ Florida, Genet Inst, Gainesville, FL 32611 USA
[4] Univ Catolica Brasilia, Grad Program Genom Sci & Biotechnol, Brasilia, DF, Brazil
[5] Empresa Brasileira Pesquisa Agropecuaria, EMBRAPA Recursos Genet & Biotecnol, Brasilia, DF, Brazil
[6] N Carolina State Univ, Dept Genet, Raleigh, NC 27695 USA
关键词
D O I
10.1186/1471-2164-9-312
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation. Results: With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (theta) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches. Conclusion: In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.
引用
收藏
页数:14
相关论文
共 40 条
[21]   Nucleotide diversity and linkage disequilibrium in cold-hardiness- and wood quality-related candidate genes in Douglas fir [J].
Krutovsky, KV ;
Neale, DB .
GENETICS, 2005, 171 (04) :2029-2041
[22]   The evolutionary fate and consequences of duplicate genes [J].
Lynch, M ;
Conery, JS .
SCIENCE, 2000, 290 (5494) :1151-1155
[23]   Genetic structure and evolutionary history of a diploid hybrid pine Pinus densata inferred from the nucleotide variation at seven gene loci [J].
Ma, XF ;
Szmidt, AE ;
Wang, XR .
MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (04) :807-816
[24]   Genome sequencing in microfabricated high-density picolitre reactors [J].
Margulies, M ;
Egholm, M ;
Altman, WE ;
Attiya, S ;
Bader, JS ;
Bemben, LA ;
Berka, J ;
Braverman, MS ;
Chen, YJ ;
Chen, ZT ;
Dewell, SB ;
Du, L ;
Fierro, JM ;
Gomes, XV ;
Godwin, BC ;
He, W ;
Helgesen, S ;
Ho, CH ;
Irzyk, GP ;
Jando, SC ;
Alenquer, MLI ;
Jarvie, TP ;
Jirage, KB ;
Kim, JB ;
Knight, JR ;
Lanza, JR ;
Leamon, JH ;
Lefkowitz, SM ;
Lei, M ;
Li, J ;
Lohman, KL ;
Lu, H ;
Makhijani, VB ;
McDade, KE ;
McKenna, MP ;
Myers, EW ;
Nickerson, E ;
Nobile, JR ;
Plant, R ;
Puc, BP ;
Ronan, MT ;
Roth, GT ;
Sarkis, GJ ;
Simons, JF ;
Simpson, JW ;
Srinivasan, M ;
Tartaro, KR ;
Tomasz, A ;
Vogt, KA ;
Volkmer, GA .
NATURE, 2005, 437 (7057) :376-380
[25]  
MATSUO Y, 1989, GENETICS, V122, P87
[26]   Establishment of Arabidopsis thaliana ribosomal protein RPL23A-1 as a functional homologue of Saccharomyces cerevisiae ribosomal protein L25 [J].
McIntosh, KB ;
Bonham-Smith, PC .
PLANT MOLECULAR BIOLOGY, 2001, 46 (06) :673-682
[27]   Targeted high-throughput sequencing of tagged nucleic acid samples [J].
Meyer, Matthias ;
Stenzel, Udo ;
Myles, Sean ;
Pruefer, Kay ;
Hofreiter, Michael .
NUCLEIC ACIDS RESEARCH, 2007, 35 (15)
[28]   Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms [J].
Moore, Michael J. ;
Bell, Charles D. ;
Soltis, Pamela S. ;
Soltis, Douglas E. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (49) :19363-19368
[29]   Association genetics of complex traits in conifers [J].
Neale, DB ;
Savolainen, O .
TRENDS IN PLANT SCIENCE, 2004, 9 (07) :325-330
[30]   Selectionism and neutralism in molecular evolution [J].
Nei, M .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (12) :2318-2342