Data mining of public SNP databases for the selection of intragenic SNPs

被引:17
作者
Aerts, J
Wetzels, Y
Cohen, N
Aerssens, J
机构
[1] Janssen Res Fdn, Dept Pharmacogenom, B-2340 Beerse, Belgium
[2] RW Johnson Pharmaceut Res Inst, Raritan, NJ 08869 USA
关键词
SNP; database; bioinformatics; pharmacogenomics; dbSNP; HGBASE; HGVbase; ISAB; computational biology;
D O I
10.1002/humu.10107
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Different strategies to search public single nucleotide polymorphism (SNP) databases for intragenic SNPs were evaluated. First, we assembled a strategy to annotate SNPs onto candidate genes based on a BLAST search of public SNP databases (Intragenic SNP Annotation by BLAST, ISAB). Only BLAST hits that complied with stringent criteria according to 1) percentage identity (minimum 98%), 2) BLAST hit length (the hit covers at least 98% of the length of the SNP entry in the database, or the hit is longer than 250 base pairs), and 3) location in non repetitive DNA, were considered as valid SNPs. We assessed the intragenic context and redundancy of these SNPs, and demonstrated that the SNP content of the dbSNP and HGBASE/HGVbase databases are highly complementary but also overlap significantly. Second, we assessed the validity of intragenic SNP annotation available on the dbSNP and HGVbase websites by comparison with the results of the ISAB strategy. Only a minority of all annotated SNPs was found in common between the respective public SNP database websites and the ISAB annotation strategy. A detailed analysis was performed aiming to explain this discrepancy. As a conclusion, we recommend the application of an independent strategy (such as ISAB) to annotate intragenic SNPs, complementary to the annotation provided at the dbSNP and HGVbase websites. Such an approach might be useful in the selection process of intragenic SNPs for genotyping in genetic studies.
引用
收藏
页码:162 / 173
页数:12
相关论文
共 21 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Database analysis and gene discovery in pharmacogenetics [J].
Board, P ;
Tetlow, N ;
Blackburn, A ;
Chelvanayagam, G .
CLINICAL CHEMISTRY AND LABORATORY MEDICINE, 2000, 38 (09) :863-867
[3]   HGBASE:: a database of SNPs and other variations in and around human genes [J].
Brookes, AJ ;
Lehväslaiho, H ;
Siegfried, M ;
Boehm, JG ;
Yuan, YP ;
Sarkar, CM ;
Bork, P ;
Ortigao, F .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :356-360
[4]   Reliable identification of large numbers of candidate SNPs from public EST data [J].
Buetow, KH ;
Edmonson, MN ;
Cassidy, AB .
NATURE GENETICS, 1999, 21 (03) :323-325
[5]   Characterization of single-nucleotide polymorphisms in coding regions of human genes [J].
Cargill, M ;
Altshuler, D ;
Ireland, J ;
Sklar, P ;
Ardlie, K ;
Patil, N ;
Lane, CR ;
Lim, EP ;
Kalyanaraman, N ;
Nemesh, J ;
Ziaugra, L ;
Friedland, L ;
Rolfe, A ;
Warrington, J ;
Lipshutz, R ;
Daley, GQ ;
Lander, ES .
NATURE GENETICS, 1999, 22 (03) :231-238
[6]   Polymorphisms of human aryl hydrocarbon receptor (AhR) gene in a French population:: relationship with CYP1A1 inducibility and lung cancer [J].
Cauchi, S ;
Stücker, I ;
Solas, C ;
Laurent-Puig, P ;
Cénée, S ;
Hémon, D ;
Jacquet, M ;
Kremers, P ;
Beaune, P ;
Massaad-Massade, L .
CARCINOGENESIS, 2001, 22 (11) :1819-1824
[7]   ALFRED: an allele frequency database for diverse populations and DNA polymorphisms [J].
Cheung, KH ;
Osier, MV ;
Kidd, JR ;
Pakstis, AJ ;
Miller, PL ;
Kidd, KK .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :361-363
[8]  
Cox DG, 2001, HUM MUTAT, V17, P141, DOI 10.1002/1098-1004(200102)17:2<141::AID-HUMU6>3.0.CO
[9]  
2-1
[10]   Single nucleotide polymorphisms as tools in human genetics [J].
Gray, IC ;
Campbell, DA ;
Spurr, NK .
HUMAN MOLECULAR GENETICS, 2000, 9 (16) :2403-2408