Development of biomarker classifiers from high-dimensional data

被引:48
作者
Baek, Songjoon [1 ]
Tsai, Chen-An [2 ]
Chen, James J. [3 ]
机构
[1] HFT 20, Jefferson, AR 72079 USA
[2] China Med Univ, Shenyang, Taiwan
[3] US FDA, NCTR, Rockville, MD 20857 USA
关键词
class prediction; cross-validation; feature selection; frequency of selection; stable feature set; GENE SELECTION; CANCER CLASSIFICATION; PERSONALIZED MEDICINE; MICROARRAY; VALIDATION; TUMOR; ALGORITHMS; PREDICTION; DIAGNOSIS; PATTERNS;
D O I
10.1093/bib/bbp016
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Recent development of high-throughput technology has accelerated interest in the development of molecular biomarker classifiers for safety assessment, disease diagnostics and prognostics, and prediction of response for patient assignment. This article reviews and evaluates some important aspects and key issues in the development of biomarker classifiers. Development of a biomarker classifier for high-throughput data involves two components: (i) model building and (ii) performance assessment. This article focuses on feature selection in model building and cross validation for performance assessment. A frequency approach to feature selection is presented and compared to the conventional approach in terms of the predictive accuracy and stability of the selected feature set. The two approaches are compared based on four biomarker classifiers, each with a different feature selection method and well-known classification algorithm. In each of the four classifiers the feature predictor set selected by the frequency approach is more stable than the feature set selected by the conventional approach.
引用
收藏
页码:537 / 546
页数:10
相关论文
共 40 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[4]   Optimization models for cancer classification: extracting gene interaction information from microarray expression data [J].
Antonov, AV ;
Tetko, IV ;
Mader, MT ;
Budczies, J ;
Mewes, HW .
BIOINFORMATICS, 2004, 20 (05) :644-U145
[5]   Identifying high-dimensional biomarkers for personalized medicine via variable importance ranking [J].
Baek, Songjoon ;
Moon, Hojin ;
Ahn, Hongshik ;
Kodell, Ralph L. ;
Lin, Chien-Ju ;
Chen, James J. .
JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2008, 18 (05) :853-868
[6]   New algorithms for multi-class cancer diagnosis using tumor gene expression signatures [J].
Bagirov, AM ;
Ferguson, B ;
Ivkovic, S ;
Saunders, G ;
Yearwood, J .
BIOINFORMATICS, 2003, 19 (14) :1800-1807
[7]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]  
BRIEMAN L, 1998, CART CLASSIFICATION
[10]   Gene selection with multiple ordering criteria [J].
Chen, James J. ;
Tsai, Chen-An ;
Tzeng, ShengLi ;
Chen, Chun-Houh .
BMC BIOINFORMATICS, 2007, 8 (1)