Chapter 13: Mining Electronic Health Records in the Genomics Era

被引:101
作者
Denny, Joshua C. [1 ,2 ]
机构
[1] Vanderbilt Univ, Sch Med, Dept Biomed Informat, Nashville, TN 37212 USA
[2] Vanderbilt Univ, Sch Med, Dept Med, Nashville, TN 37212 USA
关键词
WIDE ASSOCIATION; MEDICAL-RECORDS; PERSONALIZED MEDICINE; RHEUMATOID-ARTHRITIS; CLINICAL DOCUMENTS; EXTRACTION SYSTEM; PHENOME-WIDE; DISEASES; VARIANTS; RISK;
D O I
10.1371/journal.pcbi.1002823
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped individuals. This chapter reviews several examples of phenotype extraction and their application to genetic research, demonstrating a viable future for genomic discovery using EHR-linked data.
引用
收藏
页数:15
相关论文
共 76 条
[1]   The MITRE Identification Scrubber Toolkit: Design, training, and assessment [J].
Aberdeen, John ;
Bayer, Samuel ;
Yeniterzi, Reyyan ;
Wellner, Ben ;
Clark, Cheryl ;
Hanauer, David ;
Malin, Bradley ;
Hirschman, Lynette .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2010, 79 (12) :849-859
[2]   An overview of MetaMap: historical perspective and recent advances [J].
Aronson, Alan R. ;
Lang, Francois-Michel .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) :229-236
[3]   Genome-wide association with select biomarker traits in the Framingham Heart Study [J].
Benjamin, Emelia J. ;
Dupuis, Josee ;
Larson, Martin G. ;
Lunetta, Kathryn L. ;
Booth, Sarah L. ;
Govindaraju, Diddahally R. ;
Kathiresan, Sekar ;
Keaney, John F., Jr. ;
Keyes, Michelle J. ;
Lin, Jing-Ping ;
Meigs, James B. ;
Robins, Sander J. ;
Rong, Jian ;
Schnabel, Renate ;
Vita, Joseph A. ;
Wang, Thomas J. ;
Wilson, Peter W. F. ;
Wolf, Philip A. ;
Vasan, Ramachandran S. .
BMC MEDICAL GENETICS, 2007, 8
[4]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[5]   Population stratification and spurious allelic association [J].
Cardon, LR ;
Palmer, LJ .
LANCET, 2003, 361 (9357) :598-604
[6]   Portability of an algorithm to identify rheumatoid arthritis in electronic health records [J].
Carroll, Robert J. ;
Thompson, Will K. ;
Eyler, Anne E. ;
Mandelin, Arthur M. ;
Cai, Tianxi ;
Zink, Raquel M. ;
Pacheco, Jennifer A. ;
Boomershine, Chad S. ;
Lasko, Thomas A. ;
Xu, Hua ;
Karlson, Elizabeth W. ;
Perez, Raul G. ;
Gainer, Vivian S. ;
Murphy, Shawn N. ;
Ruderman, Eric M. ;
Pope, Richard M. ;
Plenge, Robert M. ;
Kho, Abel Ngo ;
Liao, Katherine P. ;
Denny, Joshua C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (E1) :E162-E169
[7]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[8]   A NEW METHOD OF CLASSIFYING PROGNOSTIC CO-MORBIDITY IN LONGITUDINAL-STUDIES - DEVELOPMENT AND VALIDATION [J].
CHARLSON, ME ;
POMPEI, P ;
ALES, KL ;
MACKENZIE, CR .
JOURNAL OF CHRONIC DISEASES, 1987, 40 (05) :373-383
[9]  
Chen David P, 2008, Pac Symp Biocomput, P243
[10]  
Conway Mike, 2011, AMIA Annu Symp Proc, V2011, P274