A general approach to single-nucleotide polymorphism discovery

被引:368
作者
Marth, GT [1 ]
Korf, I
Yandell, MD
Yeh, RT
Gu, ZJ
Zakeri, H
Stitziel, NO
Hillier, L
Kwok, PY
Gish, WR
机构
[1] Washington Univ, Dept Genet, St Louis, MO 63110 USA
[2] Washington Univ, Genome Sequencing Ctr, St Louis, MO 63110 USA
[3] Washington Univ, Div Dermatol, St Louis, MO 63110 USA
关键词
D O I
10.1038/70570
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits'. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2-5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic: sequence(6,7) as a template on which to layer often unmapped, fragmentary sequence data(8-11) and to use base quality values(12) to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.
引用
收藏
页码:452 / 456
页数:5
相关论文
共 23 条
  • [1] Toward the development of a gene index to the human genome: An assessment of the nature of high-throughput EST sequence data
    Aaronson, JS
    Eckman, B
    Blevins, RA
    Borkowski, JA
    Myerson, J
    Imran, S
    Elliston, KO
    [J]. GENOME RESEARCH, 1996, 6 (09): : 829 - 845
  • [2] RAPID CDNA SEQUENCING (EXPRESSED SEQUENCE TAGS) FROM A DIRECTIONALLY CLONED HUMAN INFANT BRAIN CDNA LIBRARY
    ADAMS, MD
    SOARES, MB
    KERLAVAGE, AR
    FIELDS, C
    VENTER, JC
    [J]. NATURE GENETICS, 1993, 4 (04) : 373 - 386
  • [3] Bayes T., 1763, PHILOS T R SOC LOND, V53, P370, DOI DOI 10.1098/RSTL.1763.0053
  • [4] Reliable identification of large numbers of candidate SNPs from public EST data
    Buetow, KH
    Edmonson, MN
    Cassidy, AB
    [J]. NATURE GENETICS, 1999, 21 (03) : 323 - 325
  • [5] Characterization of single-nucleotide polymorphisms in coding regions of human genes
    Cargill, M
    Altshuler, D
    Ireland, J
    Sklar, P
    Ardlie, K
    Patil, N
    Lane, CR
    Lim, EP
    Kalyanaraman, N
    Nemesh, J
    Ziaugra, L
    Friedland, L
    Rolfe, A
    Warrington, J
    Lipshutz, R
    Daley, GQ
    Lander, ES
    [J]. NATURE GENETICS, 1999, 22 (03) : 231 - 238
  • [6] Variations on a theme: Cataloging human DNA sequence variation
    Collins, FS
    Guyer, MS
    Chakravarti, A
    [J]. SCIENCE, 1997, 278 (5343) : 1580 - 1581
  • [7] New goals for the US Human Genome Project: 1998-2003
    Collins, FS
    Patrinos, A
    Jordan, E
    Chakravarti, A
    Gesteland, R
    Walters, L
    Fearon, E
    Hartwelt, L
    Langley, CH
    Mathies, RA
    Olson, M
    Pawson, AJ
    Pollard, T
    Williamson, A
    Wold, B
    Buetow, K
    Branscomb, E
    Capecchi, M
    Church, G
    Garner, H
    Gibbs, RA
    Hawkins, T
    Hodgson, K
    Knotek, M
    Meisler, M
    Rubin, GM
    Smith, LM
    Smith, RF
    Westerfield, M
    Clayton, EW
    Fisher, NL
    Lerman, CE
    McInerney, JD
    Nebo, W
    Press, N
    Valle, D
    [J]. SCIENCE, 1998, 282 (5389) : 682 - 689
  • [8] Base qualities help sequencing software
    Durbin, R
    Dear, S
    [J]. GENOME RESEARCH, 1998, 8 (03) : 161 - 162
  • [9] Base-calling of automated sequencer traces using phred.: II.: Error probabilities
    Ewing, B
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 186 - 194
  • [10] Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment
    Ewing, B
    Hillier, L
    Wendl, MC
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 175 - 185