Applications of machine learning and high-dimensional visualization in cancer detection, diagnosis, and management

被引:80
作者
McCarthy, JF [1 ]
Marx, KA [1 ]
Hoffman, PE [1 ]
Gee, AG [1 ]
O'Neil, P [1 ]
Ujwal, ML [1 ]
Hotchkiss, J [1 ]
机构
[1] AnVil Inc, Burlington, MA 01803 USA
来源
APPLICATIONS OF BIOINFORMATICS IN CANCER DETECTION | 2004年 / 1020卷
关键词
data mining; exploratory data analysis; machine learning; visualization; bioinformatics; biomedical informatics; cherninformatics; genomics; proteomics; molecular medicine; cancer;
D O I
10.1196/annals.1310.020
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Recent technical advances in combinatorial chemistry, genomics, and proteomics have made available large databases of biological and chemical information that have the potential to dramatically improve our understanding of cancer biology at the molecular level. Such an understanding of cancer biology could have a substantial impact on how we detect, diagnose, and manage cancer cases in the clinical setting. One of the biggest challenges facing clinical oncologists is how to extract clinically useful knowledge from the overwhelming amount of raw molecular data that are currently available. In this paper, we discuss how the exploratory data analysis techniques of machine learning and high-dimensional visualization can be applied to extract clinically useful knowledge from a heterogeneous assortment of molecular data. After an introductory overview of machine learning and visualization techniques, we describe two proprietary algorithms (PURS and RadViz(TM)) that we have found to be useful in the exploratory analysis of large biological data sets. We next illustrate, by way of three examples, the, applicability of these techniques to cancer detection, diagnosis, and management using three very different types of molecular data. We first discuss the use of our exploratory analysis techniques on proteomic mass spectroscopy data for the detection of ovarian cancer. Next, we discuss the diagnostic use of these techniques on gene expression data to differentiate between squamous and adenocarcinoma of the lung. Finally, we illustrate the use of such techniques in selecting from a database of chemical compounds those most effective in managing patients with melanoma versus leukemia.
引用
收藏
页码:239 / 262
页数:24
相关论文
共 39 条
[1]  
Adam BL, 2002, CANCER RES, V62, P3609
[2]  
[Anonymous], 1997, ANAL INCOMPLETE MULT, DOI DOI 10.1201/9781439821862
[3]   Genomics and proteomics in cancer [J].
Baak, JPA ;
Path, FRC ;
Hermsen, MAJA ;
Meijer, G ;
Schmidt, J ;
Janssen, EAM .
EUROPEAN JOURNAL OF CANCER, 2003, 39 (09) :1199-1215
[4]  
BAI R, 1991, J BIOL CHEM, V266, P15882
[5]   An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers [J].
Ball, G ;
Mian, S ;
Holding, F ;
Allibone, RO ;
Lowe, J ;
Ali, S ;
Li, G ;
McCardle, S ;
Ellis, IO ;
Creaser, C ;
Rees, RC .
BIOINFORMATICS, 2002, 18 (03) :395-404
[6]   Diagnostic potential of serum proteomic patterns in prostate cancer [J].
Bañez, LL ;
Prasanna, P ;
Sun, L ;
Ali, A ;
Zou, ZQ ;
Adam, BL ;
McLeod, DG ;
Moul, JW ;
Srivastava, S .
JOURNAL OF UROLOGY, 2003, 170 (02) :442-446
[7]   Pharmacogenomic analysis: Correlating molecular substructure classes with microarray gene expression data [J].
Blower P.E. ;
Yang C. ;
Fligner M.A. ;
Verducci J.S. ;
Yu L. ;
Richman S. ;
Weinstein J.N. .
The Pharmacogenomics Journal, 2002, 2 (4) :259-271
[8]   SITE OF ACTION OF 2 NOVEL PYRIMIDINE BIOSYNTHESIS INHIBITORS ACCURATELY PREDICTED BY THE COMPARE PROGRAM [J].
CLEAVELAND, ES ;
MONKS, A ;
VAIGROWOLFF, A ;
ZAHAREVITZ, DW ;
PAULL, K ;
ARDALAN, K ;
COONEY, DA ;
FORD, H .
BIOCHEMICAL PHARMACOLOGY, 1995, 49 (07) :947-954
[9]   DAVID: Database for annotation, visualization, and integrated discovery [J].
Dennis, G ;
Sherman, BT ;
Hosack, DA ;
Yang, J ;
Gao, W ;
Lane, HC ;
Lempicki, RA .
GENOME BIOLOGY, 2003, 4 (09)
[10]  
GUPTA M, 1995, MOL PHARMACOL, V48, P658