Identifying high-dimensional biomarkers for personalized medicine via variable importance ranking

被引:5
作者
Baek, Songjoon [2 ]
Moon, Hojin [1 ]
Ahn, Hongshik [3 ]
Kodell, Ralph L. [4 ]
Lin, Chien-Ju [2 ]
Chen, James J. [2 ]
机构
[1] Calif State Univ Long Beach, Dept Math & Stat, Long Beach, CA 90840 USA
[2] US FDA, Natl Ctr Toxicol Res, Biometry Branch, Div Personalized Nutr & Med, Jefferson, AR 72079 USA
[3] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[4] Univ Arkansas Med Sci, Dept Biostat, Little Rock, AR 72205 USA
关键词
class prediction; cross-validation; ensembles; gene selection; risk profiling;
D O I
10.1080/10543400802278023
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
We apply robust classification algorithms to high-dimensional genomic data to find biomarkers, by analyzing variable importance, that enable a better diagnosis of disease, an earlier intervention, or a more effective assignment of therapies. The goal is to use variable importance ranking to isolate a set of important genes that can be used to classify life-threatening diseases with respect to prognosis or type to maximize efficacy or minimize toxicity in personalized treatment of such diseases. A ranking method and present several other methods to select a set of important genes to use as genomic biomarkers is proposed, and the performance of the selection procedures in patient classification by cross-validation is evaluated. The various selection algorithms are applied to published high-dimensional genomic data sets using several well-known classification methods. For each data set, a set of genes selected on the basis of variable importance that performed the best in classification is reported. That classification algorithm with the proposed ranking method is shown to be competitive with other selection methods for discovering genomic biomarkers underlying both adverse and efficacious outcomes for improving individualized treatment of patients for life-threatening diseases.
引用
收藏
页码:853 / 868
页数:16
相关论文
共 24 条
[1]   Classification by ensembles from random partitions of high-dimensional data [J].
Ahn, Hongshik ;
Moon, Hojin ;
Fazzari, Melissa J. ;
Lim, Noha ;
Chen, James J. ;
Kodell, Ralph L. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (12) :6166-6179
[2]   Class discovery and classification of tumor samples using mixture modeling of gene expression data - a unified approach [J].
Alexandridis, R ;
Lin, SL ;
Irwin, M .
BIOINFORMATICS, 2004, 20 (16) :2545-2552
[3]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[4]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[5]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[10]  
Freund Y., 1990, Proceedings of the Third Annual Workshop on Computational Learning Theory, P202