Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

被引:100
作者
You, Frank M. [1 ,2 ]
Huo, Naxin [1 ,2 ]
Deal, Karin R. [1 ]
Gu, Yong Q. [2 ]
Luo, Ming-Cheng [1 ]
McGuire, Patrick E. [1 ]
Dvorak, Jan [1 ]
Anderson, Olin D. [2 ]
机构
[1] Univ Calif Davis, Dept Plant Sci, Davis, CA 95616 USA
[2] ARS, Genom & Gene Discovery Res Unit, USDA, Western Reg Res Ctr, Albany, CA 94710 USA
来源
BMC GENOMICS | 2011年 / 12卷
基金
美国国家科学基金会;
关键词
DATABASE RAP-DB; HEXAPLOID WHEAT; READ ALIGNMENT; POLYMORPHISM; EVOLUTION; RICE; MAP; FREQUENCY; HAPLOTYPE; TRITICUM;
D O I
10.1186/1471-2164-12-59
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results: An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion: An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml).
引用
收藏
页数:19
相关论文
共 42 条
[1]   Nucleotide diversity maps reveal variation in diversity among wheat genomes and chromosomes [J].
Akhunov, Eduard D. ;
Akhunova, Alina R. ;
Anderson, Olin D. ;
Anderson, James A. ;
Blake, Nancy ;
Clegg, Michael T. ;
Coleman-Derr, Devin ;
Conley, Emily J. ;
Crossman, Curt C. ;
Deal, Karin R. ;
Dubcovsky, Jorge ;
Gill, Bikram S. ;
Gu, Yong Q. ;
Hadam, Jakub ;
Heo, Hwayoung ;
Huo, Naxin ;
Lazo, Gerard R. ;
Luo, Ming-Cheng ;
Ma, Yaqin Q. ;
Matthews, David E. ;
McGuire, Patrick E. ;
Morrell, Peter L. ;
Qualset, Calvin O. ;
Renfro, James ;
Tabanao, Dindo ;
Talbert, Luther E. ;
Tian, Chao ;
Toleno, Donna M. ;
Warburton, Marilyn L. ;
You, Frank M. ;
Zhang, Wenjun ;
Dvorak, Jan .
BMC GENOMICS, 2010, 11
[2]   An SNP map of the human genome generated by reduced representation shotgun sequencing [J].
Altshuler, D ;
Pollara, VJ ;
Cowles, CR ;
Van Etten, WJ ;
Baldwin, J ;
Linton, L ;
Lander, ES .
NATURE, 2000, 407 (6803) :513-516
[3]  
Arumuganathan K., 1991, Plant Mol Biol Rep, V9, P208, DOI [10.1007/BF02672069, DOI 10.1007/BF02672069]
[4]   SNP discovery via 454 transcriptome sequencing [J].
Barbazuk, W. Brad ;
Emrich, Scott J. ;
Chen, Hsin D. ;
Li, Li ;
Schnable, Patrick S. .
PLANT JOURNAL, 2007, 51 (05) :910-918
[5]  
BURRY K, 1975, STAT METHODS APPL SC
[6]   A soybean transcript map: Gene distribution, haplotype and single-nucleotide polymorphism analysis [J].
Choi, Ik-Young ;
Hyten, David L. ;
Matukumalli, Lakshmi K. ;
Song, Qijian ;
Chaky, Julian M. ;
Quigley, Charles V. ;
Chase, Kevin ;
Lark, K. Gordon ;
Reiter, Robert S. ;
Yoon, Mun-Sup ;
Hwang, Eun-Young ;
Yi, Seung-In ;
Young, Nevin D. ;
Shoemaker, Randy C. ;
van Tassell, Curtis P. ;
Specht, James E. ;
Cregan, Perry B. .
GENETICS, 2007, 176 (01) :685-696
[7]   Rapid Genome-wide Single Nucleotide Polymorphism Discovery in Soybean and Rice via Deep Resequencing of Reduced Representation Libraries with the Illumina Genome Analyzer [J].
Deschamps, Stephane ;
la Rota, Mauricio ;
Ratashak, Jeffrey P. ;
Biddle, Phyllis ;
Thureen, Dean ;
Farmer, Andrew ;
Luck, Stanley ;
Beatty, Mary ;
Nagasawa, Nobuhiro ;
Michael, Leah ;
Llaca, Victor ;
Sakai, Hajime ;
May, Gregory ;
Lightner, Jonathan ;
Campbell, Matthew A. .
PLANT GENOME, 2010, 3 (01) :53-68
[8]   APPARENT SOURCES OF THE A GENOMES OF WHEATS INFERRED FROM POLYMORPHISM IN ABUNDANCE AND RESTRICTION FRAGMENT LENGTH OF REPEATED NUCLEOTIDE-SEQUENCES [J].
DVORAK, J ;
MCGUIRE, PE ;
CASSIDY, B .
GENOME, 1988, 30 (05) :680-689
[9]   RECONSTRUCTION OF THE PHYLOGENY OF THE GENUS TRITICUM FROM VARIATION IN REPEATED NUCLEOTIDE-SEQUENCES [J].
DVORAK, J ;
ZHANG, HB .
THEORETICAL AND APPLIED GENETICS, 1992, 84 (3-4) :419-429
[10]  
DVORAK J, 2010, J GENETICS BREEDING