Informatics and machine learning to define the phenotype

被引:37
作者
Basile, Anna Okula [1 ]
Ritchie, Marylyn DeRiggi [1 ,2 ]
机构
[1] Penn State Univ, Dept Biochem & Mol Biol, State Coll, PA 16801 USA
[2] Univ Penn, Perelman Sch Med, Dept Genet, Philadelphia, PA 19104 USA
关键词
Cluster analysis; complex traits; dimensionality reduction; electronic health records (EHRs); heterogeneity; machine learning; missing data; phenotype; topological analysis; unsupervised analysis; ELECTRONIC MEDICAL-RECORDS; HEALTH RECORDS; CLUSTER-ANALYSIS; DATA QUALITY; SAMPLE-SIZE; LARGE-SCALE; HETEROGENEITY; ASSOCIATION; BIOBANK; DISEASE;
D O I
10.1080/14737159.2018.1439380
中图分类号
R36 [病理学];
学科分类号
100103 [病原生物学];
摘要
Introduction: For the past decade, the focus of complex disease research has been the genotype. From technological advancements to the development of analysis methods, great progress has been made. However, advances in our definition of the phenotype have remained stagnant. Phenotype characterization has recently emerged as an exciting area of informatics and machine learning. The copious amounts of diverse biomedical data that have been collected may be leveraged with data-driven approaches to elucidate trait-related features and patterns.Areas covered: In this review, the authors discuss the phenotype in traditional genetic associations and the challenges this has imposed.Approaches for phenotype refinement that can aid in more accurate characterization of traits are also discussed. Further, the authors highlight promising machine learning approaches for establishing a phenotype and the challenges of electronic health record (EHR)-derived data.Expert commentary: The authors hypothesize that through unsupervised machine learning, data-driven approaches can be used to define phenotypes rather than relying on expert clinician knowledge. Through the use of machine learning and an unbiased set of features extracted from clinical repositories, researchers will have the potential to further understand complex traits and identify patient subgroups. This knowledge may lead to more preventative and precise clinical care.
引用
收藏
页码:219 / 226
页数:8
相关论文
共 77 条
[1]
Ahlqvist E, 2017, BIORXIV
[2]
Improving Case Definition of Crohn's Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing: A Novel Informatics Approach [J].
Ananthakrishnan, Ashwin N. ;
Cai, Tianxi ;
Savova, Guergana ;
Cheng, Su-Chun ;
Chen, Pei ;
Perez, Raul Guzman ;
Gainer, Vivian S. ;
Murphy, Shawn N. ;
Szolovits, Peter ;
Xia, Zongqi ;
Shaw, Stanley ;
Churchill, Susanne ;
Karlson, Elizabeth W. ;
Kohane, Isaac ;
Plenge, Robert M. ;
Liao, Katherine P. .
INFLAMMATORY BOWEL DISEASES, 2013, 19 (07) :1411-1420
[3]
BEAULIEU-JONES BK., 2016, Pac Symp Biocomput, V22, P207
[4]
Beaulieu-Jones BK, 2017, BIORXIV
[5]
Botsis Taxiarchis, 2010, Summit Transl Bioinform, V2010, P1
[6]
THE HETEROGENEITY OF OBESITY - FITTING TREATMENTS TO INDIVIDUALS [J].
BROWNELL, KD ;
WADDEN, TA .
BEHAVIOR THERAPY, 1991, 22 (02) :153-177
[7]
Clinical COPD phenotypes: a novel approach using principal component and cluster analyses [J].
Burgel, P-R. ;
Paillasseur, J-L. ;
Caillaud, D. ;
Tillie-Leblond, I. ;
Chanez, P. ;
Escamilla, R. ;
Court-Fortune, I. ;
Perez, T. ;
Carre, P. ;
Roche, N. .
EUROPEAN RESPIRATORY JOURNAL, 2010, 36 (03) :531-539
[8]
Identification of Clinical Phenotypes Using Cluster Analyses in COPD Patients with Multiple Comorbidities [J].
Burgel, Pierre-Regis ;
Paillasseur, Jean-Louis ;
Roche, Nicolas .
BIOMED RESEARCH INTERNATIONAL, 2014, 2014
[9]
When a Case Is Not a Case: Effects of Phenotype Misclassification on Power and Sample Size Requirements for the Transmission Disequilibrium Test with Affected Child Trios [J].
Buyske, Steven ;
Yang, Guang ;
Matise, Tara C. ;
Gordon, Derek .
HUMAN HEREDITY, 2009, 67 (04) :287-292
[10]
Detection of temporal lobe epilepsy using support vector machines in multi-parametric quantitative MR imaging [J].
Cantor-Rivera, Diego ;
Khan, Ali R. ;
Goubran, Maged ;
Mirsattari, Seyed M. ;
Peters, Terry M. .
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2015, 41 :14-28