Simultaneous classification and relevant feature identification in high-dimensional spaces: application to molecular profiling data

被引:41
作者
Bhattacharyya, C
Grate, LR
Rizki, A
Radisky, D
Molina, FJ
Jordan, MI
Bissell, MJ
Mian, IS [1 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Div Life Sci, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
[3] Univ Calif Santa Cruz, Dept Math, Santa Cruz, CA 95064 USA
[4] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
L-1 norm minimisation; molecular profiling data; feature selection; classification; cancer biology; LIKNON; minimax probability machine;
D O I
10.1016/S0165-1684(02)00474-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Molecular profiling technologies monitor many thousands of transcripts, proteins, metabolites or other species concurrently in a biological sample of interest. Given such high-dimensional data for different types of samples, classification methods aim to assign specimens to known categories. Relevant feature identification methods seek to define a subset of molecules that differentiate the samples. This work describes LIKNON, a specific implementation of a statistical approach for creating a classifier and identifying a small number of relevant features simultaneously. Given two-class data, LIKNON estimates a sparse linear classifier by exploiting the simple and well-known property that minimising an L-1 norm (via linear programming) yields a sparse hyperplane. It performs well when used for retrospective analysis of three cancer biology profiling data sets, (i) small, round, blue cell tumour transcript profiles from tumour biopsies and cell lines, (ii) sporadic breast carcinoma transcript profiles from patients with distant metastases <5 years and those with no distant metastases greater than or equal to5 years and (iii) serum sample protein profiles from unaffected and ovarian cancer patients. Computationally, LIKNON is less demanding than the prevailing filter-wrapper strategy; this approach generates many feature subsets and equates relevant features with the subset yielding a classifier with the lowest generalisation error. Biologically, the results suggest a role for the cellular microenvironment in influencing disease outcome and its importance in developing clinical decision support systems. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:729 / 743
页数:15
相关论文
共 42 条
  • [1] Allander SV, 2001, CANCER RES, V61, P8624
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] Bennett KP, 1999, ADV NEUR IN, V11, P368
  • [4] BENNETT KP, 2000, SIGKDD EXPLORATIONS, V2, P1, DOI DOI 10.1145/380995.380999
  • [5] Bertsimas D., 2000, HDB SEMIDEFINITE PRO, P469, DOI 10.1007/978-1-4615-4381-7_16
  • [6] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
    Bhattacharjee, A
    Richards, WG
    Staunton, J
    Li, C
    Monti, S
    Vasa, P
    Ladd, C
    Beheshti, J
    Bueno, R
    Gillette, M
    Loda, M
    Weber, G
    Mark, EJ
    Lander, ES
    Wong, W
    Johnson, BE
    Golub, TR
    Sugarbaker, DJ
    Meyerson, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) : 13790 - 13795
  • [7] Putting tumours in context
    Bissell, MJ
    Radisky, D
    [J]. NATURE REVIEWS CANCER, 2001, 1 (01) : 46 - 54
  • [8] BOYD S, 2001, EE364 STANF U
  • [9] Detection of micrometastases in lymph nodes from patients with breast cancer
    Branagan, G
    Hughes, D
    Jeffrey, M
    Crane-Robinson, C
    Perry, PM
    [J]. BRITISH JOURNAL OF SURGERY, 2002, 89 (01) : 86 - 89
  • [10] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267