Single Nucleotide Differences (SNDs) in the dbSNP Database May Lead to Errors in Genotyping and Haplotyping Studies

被引:48
作者
Musumeci, Lucia [1 ]
Arthur, Jonathan W. [2 ]
Cheung, Florence S. G. [1 ]
Hoque, Ashraful [3 ]
Lippman, Scott [3 ]
Reichardt, Juergen K. V. [1 ]
机构
[1] Univ Sydney, Plunkett Chair Mol Biol Med, Bosch Inst, Camperdown, NSW 2006, Australia
[2] Univ Sydney, Sydney Med Sch, Discipline Med, Camperdown, NSW 2006, Australia
[3] Univ Texas MD Anderson Canc Ctr, Houston, TX 77030 USA
关键词
single nucleotide polymorphism; SNP; paralogue; single nucleotide difference; SND; alignment; SEQUENCE VARIATION; GENOME; GENES;
D O I
10.1002/humu.21137
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The creation of single nucleotide polymorphism (SNP) databases (such as NCBI dbSNP) has facilitated scientific research in many fields. SNP discovery and detection has improved to the extent that there are over 17 million human reference (rs) SNPs reported to date (Build 129 of dbSINP). SNP databases are unfortunately not always complete and/or accurate. In fact, half of the reported SNPs are still only candidate SNPs and are not validated in a population. We describe the identification of SNDs (single nucleotide differences) in humans, that may contaminate the dbSNP database. These SNDs, reported as real SNPs in the database, do not exist as such, but are merely artifacts due to the presence of a paralogue (highly similar duplicated) sequence in the genome. Using sequencing we showed how SNDs could originate in two paralogous genes and evaluated samples from a population of 100 individuals for the presence/absence of SNPs. Moreover, using bioinformatics, we predicted as many as 8.32% of the biallelic, coding SNPs in the dbSNP database to be SNDs. Our identification of SNDs in the database will allow researchers to not only select truly informative SNPs for association studies, but also aid in determining accurate SNP genotypes and haplotypes. Hum Mutat 31:67-73, 2010. (C) 2009 Wiley-Liss, Inc.
引用
收藏
页码:67 / 73
页数:7
相关论文
共 18 条
[11]   A renaissance of "biochemical genetics"? SNPs, haplotypes, function, and complex diseases [J].
Mehrian-Shai, R ;
Reichardt, JKV .
MOLECULAR GENETICS AND METABOLISM, 2004, 83 (1-2) :47-50
[12]   Discrepancies in dbSNP confirmation rates and allele frequency distributions from varying genotyping error rates and patterns [J].
Mitchell, AA ;
Zwick, ME ;
Chakravarti, A ;
Cutler, DJ .
BIOINFORMATICS, 2004, 20 (07) :1022-1032
[13]   Quality and completeness of SNP databases [J].
Reich, DE ;
Gabriel, SB ;
Altshuler, D .
NATURE GENETICS, 2003, 33 (04) :457-458
[14]   MedRefSNP : A Database of Medically Investigated SNPs [J].
Rhee, Hwanseok ;
Lee, Jin-Sung .
HUMAN MUTATION, 2009, 30 (03) :E460-E466
[15]   A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms [J].
Sachidanandam, R ;
Weissman, D ;
Schmidt, SC ;
Kakol, JM ;
Stein, LD ;
Marth, G ;
Sherry, S ;
Mullikin, JC ;
Mortimore, BJ ;
Willey, DL ;
Hunt, SE ;
Cole, CG ;
Coggill, PC ;
Rice, CM ;
Ning, ZM ;
Rogers, J ;
Bentley, DR ;
Kwok, PY ;
Mardis, ER ;
Yeh, RT ;
Schultz, B ;
Cook, L ;
Davenport, R ;
Dante, M ;
Fulton, L ;
Hillier, L ;
Waterston, RH ;
McPherson, JD ;
Gilman, B ;
Schaffner, S ;
Van Etten, WJ ;
Reich, D ;
Higgins, J ;
Daly, MJ ;
Blumenstiel, B ;
Baldwin, J ;
Stange-Thomann, NS ;
Zody, MC ;
Linton, L ;
Lander, ES ;
Altshuler, D .
NATURE, 2001, 409 (6822) :928-933
[16]   dbSNP: the NCBI database of genetic variation [J].
Sherry, ST ;
Ward, MH ;
Kholodov, M ;
Baker, J ;
Phan, L ;
Smigielski, EM ;
Sirotkin, K .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :308-311
[17]   SNP discovery in associating genetic variation with human disease phenotypes [J].
Suh, Y ;
Vijg, J .
MUTATION RESEARCH-FUNDAMENTAL AND MOLECULAR MECHANISMS OF MUTAGENESIS, 2005, 573 (1-2) :41-53
[18]   Genome-Wide Analysis of Human Disease Alleles Reveals That Their Locations Are Correlated in Paralogous Proteins [J].
Yandell, Mark ;
Moore, Barry ;
Salas, Fidel ;
Mungall, Chris ;
MacBride, Andrew ;
White, Charles ;
Reese, Martin G. .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (11)