Discovering Genetic Ancestry Using Spectral Graph Theory

被引:78
作者
Lee, Ann B. [1 ]
Luca, Diana [1 ]
Klei, Lambertus [2 ]
Devlin, Bernie [2 ]
Roeder, Kathryn [1 ]
机构
[1] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Sch Med, Dept Psychiat, Pittsburgh, PA USA
关键词
eigenanalysis; genome-wide association; principal component analysis; population structure; POPULATION; ASSOCIATION; DISEASE;
D O I
10.1002/gepi.20434
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
As one approach to uncovering the genetic underpinnings of complex disease, individuals are measured at a large number of genetic variants (usually SNPs) across the genome and these SNP genotypes are assessed for association with disease status. We propose a new statistical method called Spectral-GEM for the analysis of genome-wide association studies; the goal of Spectral-GEM is to quantify the ancestry of the sample from such genotypic data. Ignoring structure due to differential ancestry can lead to an excess of spurious findings and reduce power. Ancestry is commonly estimated using the eigenvectors derived from principal component analysis (PCA). To develop an alternative to PCA we draw on connections between multidimensional scaling and spectral graph theory. Our approach, based on a spectral embedding derived from the normalized Laplacian of a graph, can produce more meaningful delineation of ancestry than by using PCA. Often the results from Spectral-GEM are straightforward to interpret and therefore useful in association analysis. We illustrate the new algorithm with an analysis of the POPRES data [Nelson et al., 2008]. Genet. Epiderniol. 34:51-59, 2010. (c) 2009 Wiley-Liss, Inc.
引用
收藏
页码:51 / 59
页数:9
相关论文
共 23 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]  
[Anonymous], 1995, Observational Studies
[3]  
[Anonymous], 1980, Multivariate Analysis
[4]  
Belkin M., 2002, ADV NEURAL INF PROCE, V14
[5]  
Cavalli-Sforza L.L., 1994, HIST GEOGRAPHY HUMAN
[6]  
CHUNG F, 1992, CBMS REGIONAL C SERI, V92
[7]   A simple and improved correction for population stratification in case-control studies [J].
Epstein, Michael P. ;
Allen, Andrew S. ;
Satten, Glen A. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 80 (05) :921-930
[8]   Investigation of the fine structure of European populations with applications to disease association studies [J].
Heath, Simon C. ;
Gut, Ivo G. ;
Brennan, Paul ;
McKay, James D. ;
Bencko, Vladimir ;
Fabianova, Eleonora ;
Foretova, Lenka ;
Georges, Michel ;
Janout, Vladimir ;
Kabesch, Michael ;
Krokan, Hans E. ;
Elvestad, Maiken B. ;
Lissowska, Jolanta ;
Mates, Dana ;
Rudnai, Peter ;
Skorpen, Frank ;
Schreiber, Stefan ;
Soria, Jose M. ;
Syvanen, Ann-Christine ;
Meneton, Pierre ;
Hercberg, Serge ;
Galan, Pilar ;
Szeszenia-Dabrowska, Neonilia ;
Zaridze, David ;
Genin, Emmanuel ;
Cardon, Lon R. ;
Lathrop, Mark .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2008, 16 (12) :1413-1429
[9]   On the distribution of the largest eigenvalue in principal components analysis [J].
Johnstone, IM .
ANNALS OF STATISTICS, 2001, 29 (02) :295-327
[10]  
LUCA D, 2008, THESIS CARNEGIE MELL