Data mining

被引:8
作者
Cupples, LA
Bailey, J
Cartier, KC
Falk, CT
Liu, KY
Ye, YQ
Yu, R
Zhang, HP
Zhao, HY
机构
[1] Boston Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02118 USA
[2] Univ Calif Los Angeles, Sch Med, Dept Psychiat, Los Angeles, CA 90024 USA
[3] Case Western Reserve Univ, Dept Epidemiol & Biostat, Cleveland, OH 44106 USA
[4] New York Blood Ctr, Lindsley F Kimball Res Inst, New York, NY 10021 USA
[5] Harvard Univ, Brigham & Womens Hosp, Sch Med, Ctr Neurodegenerat & Repair,Ctr Bioinformat, Boston, MA 02115 USA
[6] Yale Univ, Sch Med, Dept Epidemiol & Publ Hlth, New Haven, CT 06510 USA
[7] Univ Texas, MD Anderson Canc Ctr, Dept Epidemiol, Houston, TX 77030 USA
[8] Yale Univ, Sch Med, Dept Genet, New Haven, CT 06510 USA
关键词
association studies; haplotype estimation; neural networks; machine learning; trees;
D O I
10.1002/gepi.20117
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Group 14 used data-mining strategies to evaluate a number of issues, including appropriate diagnosis, haplotype estimation, genetic linkage and association studies, and type I error. Methods ranged from exploratory analyses, to machine learning strategies (neural networks, supervised learning, and tree-based methods), to false discovery rate control of type I errors. The general motivations were to find the "story" in the data and to summarize information from a multitude of measures. Several methods illustrated strategies for better trait definition, using summarization of related traits. In the few studies that sought to identify genes for alcoholism, there was little agreement among the different strategies, likely reflecting the complexities of the disease. Nevertheless, Group 14 found that these methods offered strategies to gain a better understanding of the complex pathways by which disease develops.
引用
收藏
页码:S103 / S109
页数:7
相关论文
共 21 条
[1]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[2]  
Breiman L, 1998, ANN STAT, V26, P801
[3]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[4]   An artificial neural network for estimating haplotype frequencies [J].
Cartier, KC ;
Baechle, D .
BMC GENETICS, 2005, 6 (Suppl 1)
[5]   Whole-genome association studies on alcoholism comparing different phenotypes using single-nucleotide polymorphisms and microsatellites [J].
Chen, L ;
Liu, NJ ;
Wang, S ;
Oh, CG ;
Carriero, NJ ;
Zhao, HY .
BMC GENETICS, 2005, 6 (Suppl 1)
[6]   Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].
Efron, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104
[7]   Diagnosis of alcoholism based on neural network analysis of phenotypic risk factors [J].
Falk, CT .
BMC GENETICS, 2005, 6 (Suppl 1)
[8]  
Freund Y, 1999, MACHINE LEARNING, PROCEEDINGS, P124
[9]   On a general class of conditional tests for family-based association studies in genetics: The asymptotic distribution, the conditional power, and optimality considerations [J].
Lange, C ;
Laird, NM .
GENETIC EPIDEMIOLOGY, 2002, 23 (02) :165-180
[10]   Boosting alternating decision trees modeling of disease trait information [J].
Liu, KY ;
Lin, J ;
Zhou, XB ;
Wong, STC .
BMC GENETICS, 2005, 6 (Suppl 1)