Applications of machine learning and high-dimensional visualization in cancer detection, diagnosis, and management

被引：80

作者：

McCarthy, JF ^{[1
]}

Marx, KA ^{[1
]}

Hoffman, PE ^{[1
]}

Gee, AG ^{[1
]}

O'Neil, P ^{[1
]}

Ujwal, ML ^{[1
]}

Hotchkiss, J ^{[1
]}

机构：

[1] AnVil Inc, Burlington, MA 01803 USA

来源：

APPLICATIONS OF BIOINFORMATICS IN CANCER DETECTION | 2004年 / 1020卷

关键词：

data mining; exploratory data analysis; machine learning; visualization; bioinformatics; biomedical informatics; cherninformatics; genomics; proteomics; molecular medicine; cancer;

D O I：

10.1196/annals.1310.020

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Recent technical advances in combinatorial chemistry, genomics, and proteomics have made available large databases of biological and chemical information that have the potential to dramatically improve our understanding of cancer biology at the molecular level. Such an understanding of cancer biology could have a substantial impact on how we detect, diagnose, and manage cancer cases in the clinical setting. One of the biggest challenges facing clinical oncologists is how to extract clinically useful knowledge from the overwhelming amount of raw molecular data that are currently available. In this paper, we discuss how the exploratory data analysis techniques of machine learning and high-dimensional visualization can be applied to extract clinically useful knowledge from a heterogeneous assortment of molecular data. After an introductory overview of machine learning and visualization techniques, we describe two proprietary algorithms (PURS and RadViz(TM)) that we have found to be useful in the exploratory analysis of large biological data sets. We next illustrate, by way of three examples, the, applicability of these techniques to cancer detection, diagnosis, and management using three very different types of molecular data. We first discuss the use of our exploratory analysis techniques on proteomic mass spectroscopy data for the detection of ovarian cancer. Next, we discuss the diagnostic use of these techniques on gene expression data to differentiate between squamous and adenocarcinoma of the lung. Finally, we illustrate the use of such techniques in selecting from a database of chemical compounds those most effective in managing patients with melanoma versus leukemia.

引用

页码：239 / 262

页数：24

共 39 条

[1]

Adam BL, 2002, CANCER RES, V62, P3609

[2]

[Anonymous], 1997, ANAL INCOMPLETE MULT, DOI DOI 10.1201/9781439821862

[3] Genomics and proteomics in cancer [J].