Data mining

被引:8
作者
Cupples, LA
Bailey, J
Cartier, KC
Falk, CT
Liu, KY
Ye, YQ
Yu, R
Zhang, HP
Zhao, HY
机构
[1] Boston Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02118 USA
[2] Univ Calif Los Angeles, Sch Med, Dept Psychiat, Los Angeles, CA 90024 USA
[3] Case Western Reserve Univ, Dept Epidemiol & Biostat, Cleveland, OH 44106 USA
[4] New York Blood Ctr, Lindsley F Kimball Res Inst, New York, NY 10021 USA
[5] Harvard Univ, Brigham & Womens Hosp, Sch Med, Ctr Neurodegenerat & Repair,Ctr Bioinformat, Boston, MA 02115 USA
[6] Yale Univ, Sch Med, Dept Epidemiol & Publ Hlth, New Haven, CT 06510 USA
[7] Univ Texas, MD Anderson Canc Ctr, Dept Epidemiol, Houston, TX 77030 USA
[8] Yale Univ, Sch Med, Dept Genet, New Haven, CT 06510 USA
关键词
association studies; haplotype estimation; neural networks; machine learning; trees;
D O I
10.1002/gepi.20117
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Group 14 used data-mining strategies to evaluate a number of issues, including appropriate diagnosis, haplotype estimation, genetic linkage and association studies, and type I error. Methods ranged from exploratory analyses, to machine learning strategies (neural networks, supervised learning, and tree-based methods), to false discovery rate control of type I errors. The general motivations were to find the "story" in the data and to summarize information from a multitude of measures. Several methods illustrated strategies for better trait definition, using summarization of related traits. In the few studies that sought to identify genes for alcoholism, there was little agreement among the different strategies, likely reflecting the complexities of the disease. Nevertheless, Group 14 found that these methods offered strategies to gain a better understanding of the complex pathways by which disease develops.
引用
收藏
页码:S103 / S109
页数:7
相关论文
共 21 条
[11]   A demonstration and findings of a statistical approach through reanalysis of inflammatory bowel disease data [J].
Lo, SH ;
Zheng, T .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (28) :10386-10391
[12]   Predicting genetic regulatory response using classification [J].
Middendorf, Manuel ;
Kundaje, Anshul ;
Wiggins, Chris ;
Freund, Yoav ;
Leslie, Christina .
BIOINFORMATICS, 2004, 20 :232-240
[13]  
Pao Y.H, 1988, ADAPTIVE PATTERN REC
[14]  
Quinlan JR, 1996, PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, P725
[15]   A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information [J].
Rabinowitz, D ;
Laird, N .
HUMAN HEREDITY, 2000, 50 (04) :211-223
[16]   A direct approach to false discovery rates [J].
Storey, JD .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 :479-498
[17]   Data mining of the GAW14 simulated data using rough set theory and tree-based methods [J].
Wei, LY ;
Huang, CL ;
Chen, CH .
BMC GENETICS, 2005, 6 (Suppl 1)
[18]  
Windemuth C, 1999, GENET EPIDEMIOL, V17, pS403
[19]   Power and type I error rate of false discovery rate approaches in genome-wide association studies [J].
Yang, Q ;
Cui, J ;
Chazaro, I ;
Cupples, LA ;
Demissie, S .
BMC GENETICS, 2005, 6 (Suppl 1)
[20]   A genome-wide tree- and forest-based association analysis of comorbidity of alcoholism and smoking [J].
Ye, YQ ;
Zhong, XY ;
Zhang, HP .
BMC GENETICS, 2005, 6