Algorithms for large-scale genotyping microarrays

被引:78
作者
Liu, WM [1 ]
Di, XJ [1 ]
Yang, G [1 ]
Matsuzaki, H [1 ]
Huang, J [1 ]
Mei, R [1 ]
Ryder, TB [1 ]
Webster, TA [1 ]
Dong, SL [1 ]
Liu, GY [1 ]
Jones, KW [1 ]
Kennedy, GC [1 ]
Kulp, D [1 ]
机构
[1] Affymetrix Inc, Santa Clara, CA 95051 USA
关键词
D O I
10.1093/bioinformatics/btg332
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Analysis of many thousands of single nucleotide polymorphisms (SNPs) across whole genome is crucial to efficiently map disease genes and understanding susceptibility to diseases, drug efficacy and side effects for different populations and individuals. High density oligonucleotide microarrays provide the possibility for such analysis with reasonable cost. Such analysis requires accurate, reliable methods for feature extraction, classification, statistical modeling and filtering. Results: We propose the modified partitioning around medoids as a classification method for relative allele signals. We use the average silhouette width, separation and other quantities as quality measures for genotyping classification. We form robust statistical models based on the classification results and use these models to make genotype calls and calculate quality measures of calls. We apply our algorithms to several different genotyping microarrays. We use reference types, informative Mendelian relationship in families, and leave-one-out cross validation to verify our results. The concordance rates with the single base extension reference types are 99.36% for the SNPs on autosomes and 99.64% for the SNPs on sex chromosomes. The concordance of the leave-one-out test is over 99.5% and is 99.9% higher for AA, AB and BB cells. We also provide a method to determine the gender of a sample based on the heterozygous call rate of SNPs on the X chromosome. See http://www.affymetrix.com for further information. The microarray data will also be available from the Affymetrix web site.
引用
收藏
页码:2397 / 2403
页数:7
相关论文
共 15 条
[1]   Characterization of single-nucleotide polymorphisms in coding regions of human genes [J].
Cargill, M ;
Altshuler, D ;
Ireland, J ;
Sklar, P ;
Ardlie, K ;
Patil, N ;
Lane, CR ;
Lim, EP ;
Kalyanaraman, N ;
Nemesh, J ;
Ziaugra, L ;
Friedland, L ;
Rolfe, A ;
Warrington, J ;
Lipshutz, R ;
Daley, GQ ;
Lander, ES .
NATURE GENETICS, 1999, 22 (03) :231-238
[2]   Accessing genetic information with high-density DNA arrays [J].
Chee, M ;
Yang, R ;
Hubbell, E ;
Berno, A ;
Huang, XC ;
Stern, D ;
Winkler, J ;
Lockhart, DJ ;
Morris, MS ;
Fodor, SPA .
SCIENCE, 1996, 274 (5287) :610-614
[3]   High-throughput variation detection and genotyping using microarrays [J].
Cutler, DJ ;
Zwick, ME ;
Carrasquillo, MM ;
Yohn, CT ;
Tobin, KP ;
Kashuk, C ;
Mathews, DJ ;
Shah, NA ;
Eichler, EE ;
Warrington, JA ;
Chakravarti, A .
GENOME RESEARCH, 2001, 11 (11) :1913-1925
[4]   Flexible use of high-density oligonucleotide arrays for single-nucleotide polymorphism discovery and validation [J].
Dong, SL ;
Wang, E ;
Hsie, L ;
Cao, YX ;
Chen, XG ;
Gingeras, TR .
GENOME RESEARCH, 2001, 11 (08) :1418-1424
[5]   MULTIPLEXED BIOCHEMICAL ASSAYS WITH BIOLOGICAL CHIPS [J].
FODOR, SPA ;
RAVA, RP ;
HUANG, XHC ;
PEASE, AC ;
HOLMES, CP ;
ADAMS, CL .
NATURE, 1993, 364 (6437) :555-556
[6]  
Hastie T., 1990, Generalized additive model
[7]  
Johnson R.A., 1988, Applied multivariate statistical analysis
[8]  
Kaufman L., 1987, Statistical Data Analysis Based on the L1-Norm and Related Methods. First International Conference, P405
[9]  
KENNEDY GC, 2003, IN PRESS NAT BIOTECH
[10]   Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide polymorphism arrays [J].
Lindblad-Toh, K ;
Tanenbaum, DM ;
Daly, MJ ;
Winchester, E ;
Lui, WO ;
Villapakkam, A ;
Stanton, SE ;
Larsson, C ;
Hudson, TJ ;
Johnson, BE ;
Lander, ES ;
Meyerson, M .
NATURE BIOTECHNOLOGY, 2000, 18 (09) :1001-1005