Analysis of sample set enrichment scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles

被引:51
作者
Edelman, Elena
Porrello, Alessandro
Guinney, Justin
Balakumaran, Bala
Bild, Andrea
Febbo, Phillip G.
Mukherjee, Sayan [1 ]
机构
[1] Duke Univ, Inst Genome Sci & Policy, Durham, NC 27708 USA
[2] Duke Univ, Computat Biol & Bioinformat Program, Durham, NC 27708 USA
[3] Duke Univ, Dept Med, Durham, NC 27708 USA
[4] Duke Univ, Dept Mol Genet & Microbiol, Durham, NC 27708 USA
[5] Duke Univ, Inst Stat & Decis Sci, Durham, NC 27708 USA
[6] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA
[7] Regina Elena Inst Canc Res, Mol Oncogenesis Lab, I-00158 Rome, Italy
关键词
D O I
10.1093/bioinformatics/btl231
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Gene expression profiling experiments in cell lines and animal models characterized by specific genetic or molecular perturbations have yielded sets of genes annotated by the perturbation. These gene sets can serve as a reference base for interrogating other expression datasets. For example, a new dataset in which a specific pathway gene set appears to be enriched, in terms of multiple genes in that set evidencing expression changes, can then be annotated by that reference pathway. We introduce in this paper a formal statistical method to measure the enrichment of each sample in an expression dataset. This allows us to assay the natural variation of pathway activity in observed gene expression data sets from clinical cancer and other studies. Results: Validation of the method and illustrations of biological insights gleaned are demonstrated on cell line data, mouse models, and cancer-related datasets. Using oncogenic pathway signatures, we show that gene sets built from a model system are indeed enriched in the model system. We employ ASSESS for the use of molecular classification by pathways. This provides an accurate classifier that can be interpreted at the level of pathways instead of individual genes. Finally, ASSESS can be used for cross-platform expression models where data on the same type of cancer are integrated over different platforms into a space of enrichment scores.
引用
收藏
页码:E108 / E116
页数:9
相关论文
共 26 条
[1]   Identification of a genetic signature of activated signal transducer and activator of transcription 3 in human tumors [J].
Alvarez, JV ;
Febbo, PG ;
Ramaswamy, S ;
Loda, M ;
Richardson, A ;
Frank, DA .
CANCER RESEARCH, 2005, 65 (12) :5054-5062
[2]   Significance analysis of functional categories in gene expression studies: a structured permutation approach [J].
Barry, WT ;
Nobel, AB ;
Wright, FA .
BIOINFORMATICS, 2005, 21 (09) :1943-1949
[3]   Oncogenic pathway signatures in human cancers as a guide to targeted therapies [J].
Bild, AH ;
Yao, G ;
Chang, JT ;
Wang, QL ;
Potti, A ;
Chasse, D ;
Joshi, MB ;
Harpole, D ;
Lancaster, JM ;
Berchuck, A ;
Olson, JA ;
Marks, JR ;
Dressman, HK ;
West, M ;
Nevins, JR .
NATURE, 2006, 439 (7074) :353-357
[4]  
Black EP, 2003, CANCER RES, V63, P3716
[5]   Metagenes and molecular pattern discovery using matrix factorization [J].
Brunet, JP ;
Tamayo, P ;
Golub, TR ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) :4164-4169
[6]   Delineation of prognostic biomarkers in prostate cancer [J].
Dhanasekaran, SM ;
Barrette, TR ;
Ghosh, D ;
Shah, R ;
Varambally, S ;
Kurachi, K ;
Pienta, KJ ;
Rubin, MA ;
Chinnaiyan, AM .
NATURE, 2001, 412 (6849) :822-826
[7]  
Durrett R, 1996, STOCHASTIC CALCULUS
[8]  
EWANS W, 2002, STAT METHODS BIOINFO
[9]   Neoadjuvant docetaxel before radical prostatectomy in patients with high-risk localized prostate cancer [J].
Febbo, PG ;
Richie, JP ;
George, DJ ;
Loda, M ;
Manola, J ;
Shankar, S ;
Barnes, AS ;
Tempany, C ;
Catalona, W ;
Kantoff, PW ;
Oh, WK .
CLINICAL CANCER RESEARCH, 2005, 11 (14) :5233-5240
[10]  
Feller W., 1971, INTRO PROBABILITY TH