Tree-based recursive partitioning methods for subdividing sibpairs into relatively more homogeneous subgroups

被引:34
作者
Shannon, WD
Province, MA
Rao, DC
机构
[1] Washington Univ, Sch Med, Div Biostat, Dept Psychiat, St Louis, MO 63110 USA
[2] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
[3] Washington Univ, Sch Med, Dept Med, St Louis, MO 63110 USA
[4] Washington Univ, Sch Med, Div Gen Med Sci, St Louis, MO USA
关键词
recursive partitioning; linkage analysis; sibpairs;
D O I
10.1002/gepi.1.abs
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
We propose a new splitting rule for recursively partitioning sibpair data into relatively more homogeneous subgroups. This strategy is designed to identify subgroups of sibpairs such that within-subgroup analyses result in increased power to detect linkage using Haseman-Elston regression. We assume that the subgroups can be defined by patterns of non-genetic binary covariates measured on each sibpair. The data we consider consists of the squared difference of a quantitative trait measurement on each sibpair, estimates of identity-by-descent (IBD) values at each genetic marker, and binary covariate data describing characteristics of the sibpair (e.g., race, sex, family history of disease). To test the efficacy of this method in linkage analysis, we performed two simulation experiments. In the first, we simulated a mixture consisting of 66.6% of the sibpairs with no linkage and 33.3% of the sibpairs with genetic linkage to one marker. The two groups were distinguished by the value of a single binary covariate. We also simulated one unlinked marker and one random covariate to include as noise in the data. In the second experiment, we simulated a mixture consisting of 55% of the sibpairs with no genetic linkage. 22.5% of the sibpairs with genetic linkage to one marker, and 22.5% of the sibpairs with linkage to a different marker. Each subgroup was defined by a distinct pattern of two binary covariates. We also simulated one unlinked marker and two random covariates to include as noise in the data. Our simulation studies found that we can significantly increase the overall power to detect linkage by fitting Haseman-Elston regression models to homogeneous subgroups with only a small increase in the false-positive rate. Second, the splitting rule can correctly identify important covariates and linked markers. Third, recursive partitioning of sibpair data using this splitting rule can correctly identify sibpair subgroups. These results indicate that partitioning sibpairs into homogeneous subgroups is feasible and significantly increases the power to detect linkage, thus demonstrating the practical utility and potential this new methodology holds. (C) 2001 Wiley-Liss, Inc.
引用
收藏
页码:293 / 306
页数:14
相关论文
共 13 条
[1]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[2]  
Clark LA, 1992, STAT MODELS S PACIFI
[3]  
HAND DJ, 1998, CONSTRUCTION ASSESSM
[4]   HEDONIC HOUSING PRICES AND DEMAND FOR CLEAN-AIR [J].
HARRISON, D ;
RUBINFELD, DL .
JOURNAL OF ENVIRONMENTAL ECONOMICS AND MANAGEMENT, 1978, 5 (01) :81-102
[5]   INVESTIGATION OF LINKAGE BETWEEN A QUANTITATIVE TRAIT AND A MARKER LOCUS [J].
HASEMAN, JK ;
ELSTON, RC .
BEHAVIOR GENETICS, 1972, 2 (01) :3-19
[6]  
Langley P., 1996, ELEMENTS MACHINE LEA
[7]   PROBLEMS IN ANALYSIS OF SURVEY DATA, AND A PROPOSAL [J].
MORGAN, JN ;
SONQUIST, JA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1963, 58 (302) :415-&
[8]  
Nakhaeizadeh G., 1997, MACHINE LEARNING STA
[9]  
Quinlan J. R., 1986, Machine Learning, V1, P81, DOI 10.1023/A:1022643204877
[10]  
Quinlan R, 1993, C4.5: Programs for Machine Learning