LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data

被引:141
作者
Sartor, Maureen A. [4 ]
Leikauf, George D. [3 ]
Medvedovic, Mario [1 ,2 ]
机构
[1] Univ Cincinnati, Dept Environm Hlth, Cincinnati, OH 45221 USA
[2] Univ Cincinnati, Ctr Environm Genet, Cincinnati, OH USA
[3] Univ Pittsburgh, Dept Environm & Occupat Hlth, Pittsburgh, PA USA
[4] Univ Michigan, Ctr Computat Med & Biol, Ann Arbor, MI 48109 USA
关键词
BREAST-CANCER; ONTOLOGY; PATHWAYS; GOMINER; TOOL;
D O I
10.1093/bioinformatics/btn592
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The elucidation of biological pathways enriched with differentially expressed genes has become an integral part of the analysis and interpretation of microarray data. Several statistical methods are commonly used in this context, but the question of the optimal approach has still not been resolved. Results: We present a logistic regression-based method ( LRpath) for identifying predefined sets of biologically related genes enriched with ( or depleted of) differentially expressed transcripts in microarray experiments. We functionally relate the odds of gene set membership with the significance of differential expression, and calculate adjusted P-values as a measure of statistical significance. The new approach is compared with Fisher's exact test and other relevant methods in a simulation study and in the analysis of two breast cancer datasets. Overall results were concordant between the simulation study and the experimental data analysis, and provide useful information to investigators seeking to choose the appropriate method. LRpath displayed robust behavior and improved statistical power compared with tested alternatives. It is applicable in experiments involving two or more sample types, and accepts significance statistics of the investigator's choice as input.
引用
收藏
页码:211 / 217
页数:7
相关论文
共 28 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   Characterizing gene sets with FuncAssociate [J].
Berriz, GF ;
King, OD ;
Bryant, B ;
Sander, C ;
Roth, FP .
BIOINFORMATICS, 2003, 19 (18) :2502-2504
[4]   Pathways to the analysis of microarray data [J].
Curtis, RK ;
Oresic, M ;
Vidal-Puig, A .
TRENDS IN BIOTECHNOLOGY, 2005, 23 (08) :429-435
[5]   DAVID: Database for annotation, visualization, and integrated discovery [J].
Dennis, G ;
Sherman, BT ;
Hosack, DA ;
Yang, J ;
Gao, W ;
Lane, HC ;
Lempicki, RA .
GENOME BIOLOGY, 2003, 4 (09)
[6]   Global functional profiling of gene expression [J].
Draghici, S ;
Khatri, P ;
Martins, RP ;
Ostermeier, GC ;
Krawetz, SA .
GENOMICS, 2003, 81 (02) :98-104
[7]  
Gentleman R. C., 2005, BIOCONDUCTOR PACKAGE
[8]   The Gene Ontology (GO) database and informatics resource [J].
Harris, MA ;
Clark, J ;
Ireland, A ;
Lomax, J ;
Ashburner, M ;
Foulger, R ;
Eilbeck, K ;
Lewis, S ;
Marshall, B ;
Mungall, C ;
Richter, J ;
Rubin, GM ;
Blake, JA ;
Bult, C ;
Dolan, M ;
Drabkin, H ;
Eppig, JT ;
Hill, DP ;
Ni, L ;
Ringwald, M ;
Balakrishnan, R ;
Cherry, JM ;
Christie, KR ;
Costanzo, MC ;
Dwight, SS ;
Engel, S ;
Fisk, DG ;
Hirschman, JE ;
Hong, EL ;
Nash, RS ;
Sethuraman, A ;
Theesfeld, CL ;
Botstein, D ;
Dolinski, K ;
Feierbach, B ;
Berardini, T ;
Mundodi, S ;
Rhee, SY ;
Apweiler, R ;
Barrell, D ;
Camon, E ;
Dimmer, E ;
Lee, V ;
Chisholm, R ;
Gaudet, P ;
Kibbe, W ;
Kishore, R ;
Schwarz, EM ;
Sternberg, P ;
Gwinn, M .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D258-D261
[9]   Identifying biological themes within lists of genes with EASE [J].
Hosack, DA ;
Dennis, G ;
Sherman, BT ;
Lane, HC ;
Lempicki, RA .
GENOME BIOLOGY, 2003, 4 (10)
[10]  
Ihaka R., 1996, J Comput Graph Stat, V5, P299, DOI [10.1080/10618600.1996.10474713, DOI 10.1080/10618600.1996.10474713, DOI 10.2307/1390807, 10.2307/1390807]