Simple decision rules for classifying human cancers from gene expression profiles

被引:312
作者
Tan, AC
Naiman, DQ
Xu, L
Winslow, RL
Geman, D
机构
[1] Whitaker Biomed Engn Inst, Ctr Cardiovasc Bioinformat & Modeling, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Dept Appl Math & Stat, Baltimore, MD 21218 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bti631
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Motivation: Various studies have shown that cancer tissue samples can be successfully detected and classified by their gene expression patterns using machine learning approaches. One of the challenges in applying these techniques for classifying gene expression data is to extract accurate, readily interpretable rules providing biological insight as to how classification is performed. Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers. Here, we introduce a new classifier in order to address these problems. It is referred to as k-TSP (k-Top Scoring Pairs) and is based on the concept of 'relative expression reversals'. This method generates simple and accurate decision rules that only involve a small number of gene-to-gene expression comparisons, thereby facilitating follow-up studies. Results: In this study, we have compared our approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers. The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and naive Bayes). Our approach is easy to interpret as the classifier involves only a small number of informative genes. For these reasons, we consider the k-TSP method to be a useful tool for cancer classification from microarray gene expression data.
引用
收藏
页码:3896 / 3904
页数:9
相关论文
共 43 条
[1]
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]
Joint induction of shape features and tree classifiers [J].
Amit, Y ;
Geman, D ;
Wilder, K .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (11) :1300-1305
[4]
MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[5]
Gene-expression profiles predict survival of patients with lung adenocarcinoma [J].
Beer, DG ;
Kardia, SLR ;
Huang, CC ;
Giordano, TJ ;
Levin, AM ;
Misek, DE ;
Lin, L ;
Chen, GA ;
Gharib, TG ;
Thomas, DG ;
Lizyness, ML ;
Kuick, R ;
Hayasaka, S ;
Taylor, JMG ;
Iannettoni, MD ;
Orringer, MB ;
Hanash, S .
NATURE MEDICINE, 2002, 8 (08) :816-824
[6]
DIFFERENCES IN THE FREQUENCY OF NORMAL AND CLONAL PRECURSORS OF COLONY-FORMING CELLS IN CHRONIC MYELOGENOUS LEUKEMIA AND ACUTE MYELOGENOUS LEUKEMIA [J].
BERNSTEIN, ID ;
SINGER, JW ;
SMITH, FO ;
ANDREWS, RG ;
FLOWERS, DA ;
PETERSENS, J ;
STEINMANN, L ;
NAJFELD, V ;
SAVAGE, D ;
FRUCHTMAN, S ;
ARLIN, Z ;
FIALKOW, PJ .
BLOOD, 1992, 79 (07) :1811-1816
[7]
Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[8]
Bo TH, 2002, GENOME BIOL, V3
[9]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32