Significance of gene ranking for classification of microarray samples

被引:42
作者
Zhang, Chaolin [1 ]
Lu, Xuesong
Zhang, Xuegong
机构
[1] SUNY Stony Brook, Cold Spring Harbor Lab, Stony Brook, NY 11794 USA
[2] SUNY Stony Brook, Dept Biomed Engn, Stony Brook, NY 11794 USA
[3] Tsinghua Univ, MOE Key Lab Bioinformat, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
significance of gene ranking; gene selection; classification; microarray data analysis;
D O I
10.1109/TCBB.2006.42
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many methods for classification and gene selection with microarray data have been developed. These methods usually give a ranking of genes. Evaluating the statistical significance of the gene ranking is important for understanding the results and for further biological investigations, but this question has not been well addressed for machine learning methods in existing works. Here, we address this problem by formulating it in the framework of hypothesis testing and propose a solution based on resampling. The proposed r-test methods convert gene ranking results into position p-values to evaluate the significance of genes. The methods are tested on three real microarray data sets and three simulation data sets with support vector machines as the method of classification and gene selection. The obtained position p-values help to determine the number of genes to be selected and enable scientists to analyze selection results by sophisticated multivariate methods under the same statistical inference paradigm as for simple hypothesis testing methods.
引用
收藏
页码:312 / 320
页数:9
相关论文
共 21 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Statistical methods for ranking differentially expressed genes [J].
Broberg, P .
GENOME BIOLOGY, 2003, 4 (06)
[3]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[4]   Entropy-based gene ranking without selection bias for the predictive classification of microarray data [J].
Furlanello, C ;
Serafini, M ;
Merler, S ;
Jurman, G .
BMC BIOINFORMATICS, 2003, 4 (1)
[5]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[6]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[7]   Gene expression predictors of breast cancer outcomes [J].
Huang, E ;
Cheng, SH ;
Dressman, H ;
Pittman, J ;
Tsou, MH ;
Horng, CF ;
Bild, A ;
Iversen, ES ;
Liao, M ;
Chen, CM ;
West, M ;
Nevins, JR ;
Huang, AT .
LANCET, 2003, 361 (9369) :1590-1596
[8]   A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments [J].
Pan, W .
BIOINFORMATICS, 2002, 18 (04) :546-554
[9]   Selecting differentially expressed genes from microarray experiments [J].
Pepe, MS ;
Longton, G ;
Anderson, GL ;
Schummer, M .
BIOMETRICS, 2003, 59 (01) :133-142
[10]   A molecular signature of metastasis in primary solid tumors [J].
Ramaswamy, S ;
Ross, KN ;
Lander, ES ;
Golub, TR .
NATURE GENETICS, 2003, 33 (01) :49-54