Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome

被引:29
作者
Amaral, Andreia J. [1 ]
Megens, Hendrik-Jan [1 ]
Kerstens, Hindrik H. D. [1 ]
Heuven, Henri C. M. [1 ,2 ]
Dibbits, Bert [1 ]
Crooijmans, Richard P. M. A. [1 ]
Den Dunnen, Johan T. [3 ]
Groenen, Martien A. M. [1 ]
机构
[1] Wageningen Univ, Anim Breeding & Genom Ctr, NL-6700 AH Wageningen, Netherlands
[2] Univ Utrecht, NL-3508 TD Utrecht, Netherlands
[3] Leiden Univ, Med Ctr, Leiden Genome Technol Ctr, Leiden, Netherlands
来源
BMC GENOMICS | 2009年 / 10卷
关键词
SINGLE-NUCLEOTIDE POLYMORPHISMS; REDUCED REPRESENTATION; LINKAGE DISEQUILIBRIUM; TECHNOLOGIES; POPULATIONS; SELECTION; SITES; SWINE;
D O I
10.1186/1471-2164-10-374
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Although the Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. Therefore, in the framework of the International Porcine SNP Chip Consortium, this pilot study aimed to evaluate the impact of the quality level of the sequenced bases on mapping quality and identification of true SNPs on a large scale. Results: DNA pooled from five animals from a commercial boar line was digested with DraI; 150-250-bp fragments were isolated and end-sequenced using the Illumina 1 G Genome Analyzer, yielding 70,348,064 sequences 36-bp long. Rules were developed to select sequences, which were then aligned to unique positions in a reference genome. Sequences were selected based on quality, and three thresholds of sequence quality (SQ) were compared. The highest threshold of SQ allowed identification of a larger number of SNPs (17,489), distributed widely across the pig genome. In total, 3,142 SNPs were validated with a success rate of 96%. The correlation between estimated minor allele frequency (MAF) and genotyped MAF was moderate, and SNPs were highly polymorphic in other pig breeds. Lowering the SQ threshold and maintaining the same criteria for SNP identification resulted in the discovery of fewer SNPs (16,768), of which 259 were not identified using higher SQ levels. Validation of SNPs found exclusively in the lower SQ threshold had a success rate of 94% and a low correlation between estimated MAF and genotyped MAF. Base change analysis suggested that the rate of transitions in the pig genome is likely to be similar to that observed in humans. Chromosome X showed reduced nucleotide diversity relative to autosomes, as observed for other species. Conclusion: Large numbers of SNPs can be identified reliably by creating strict rules for sequence selection, which simultaneously decreases sequence ambiguity. Selection of sequences using a higher SQ threshold leads to more reliable identification of SNPs. Lower SQ thresholds can be used to guarantee sufficient sequence coverage, resulting in high success rate but less reliable MAF estimation. Nucleotide diversity varies between porcine chromosomes, with the X chromosome showing less variation as observed in other species.
引用
收藏
页数:10
相关论文
共 27 条
  • [1] Pyrosequencing: History, biochemistry and future
    Ahmadian, A
    Ehn, M
    Hober, S
    [J]. CLINICA CHIMICA ACTA, 2006, 363 (1-2) : 83 - 94
  • [2] An SNP map of the human genome generated by reduced representation shotgun sequencing
    Altshuler, D
    Pollara, VJ
    Cowles, CR
    Van Etten, WJ
    Baldwin, J
    Linton, L
    Lander, ES
    [J]. NATURE, 2000, 407 (6803) : 513 - 516
  • [3] Linkage disequilibrium decay and haplotype block structure in the pig
    Amaral, Andreia J.
    Megens, Hendrik-Jan
    Crooijmans, Richard P. M. A.
    Heuven, Henri C. M.
    Groenen, Martien A. M.
    [J]. GENETICS, 2008, 179 (01) : 569 - 579
  • [4] Reduced X-linked diversity in derived populations of house mice
    Baines, John F.
    Harr, Bettina
    [J]. GENETICS, 2007, 175 (04) : 1911 - 1921
  • [5] Whole-genome re-sequencing
    Bentley, David R.
    [J]. CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) : 545 - 552
  • [6] A pseudohitchhiking model of x vs. autosomal diversity
    Betancourt, AJ
    Kim, Y
    Orr, HA
    [J]. GENETICS, 2004, 168 (04) : 2261 - 2269
  • [7] Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment
    Ewing, B
    Hillier, L
    Wendl, MC
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 175 - 185
  • [8] Galtier N, 2001, GENETICS, V159, P907
  • [9] Whole-genome sequencing and variant discovery in C-elegans
    Hillier, LaDeana W.
    Marth, Gabor T.
    Quinlan, Aaron R.
    Dooling, David
    Fewell, Ginger
    Barnett, Derek
    Fox, Paul
    Glasscock, Jarret I.
    Hickenbotham, Matthew
    Huang, Weichun
    Magrini, Vincent J.
    Richt, Ryan J.
    Sander, Sacha N.
    Stewart, Donald A.
    Stromberg, Michael
    Tsung, Eric F.
    Wylie, Todd
    Schedl, Tim
    Wilson, Richard K.
    Mardis, Elaine R.
    [J]. NATURE METHODS, 2008, 5 (02) : 183 - 188
  • [10] Estimation of the extent of linkage disequilibrium in seven regions of the porcine genome
    Jungerius, BJ
    Gu, JJ
    Crooijmans, RPMA
    van der Poel, JJ
    Groenen, MAM
    van Oost, BA
    Pas, MFWT
    [J]. ANIMAL BIOTECHNOLOGY, 2005, 16 (01) : 41 - 54