Comparative study on gene set and pathway topology-based enrichment methods

被引:56
作者
Bayerlova, Michaela [1 ]
Jung, Klaus [1 ]
Kramer, Frank [1 ]
Klemm, Florian [2 ]
Bleckmann, Annalen [1 ,2 ]
Beissbarth, Tim [1 ]
机构
[1] Univ Med Ctr Gottingen, Dept Med Stat, D-37099 Gottingen, Germany
[2] Univ Med Ctr Gottingen, Dept Hematol & Med Oncol, D-37099 Gottingen, Germany
关键词
Gene set analysis; Pathway topology; Enrichment methods; Simulations; Accuracy; Sensitivity; EXPRESSION ANALYSIS; ALZHEIMERS-DISEASE; SIGNALING PATHWAYS; COLORECTAL-CANCER; IDENTIFICATION; REGIONS;
D O I
10.1186/s12859-015-0751-5
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Background: Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. Methods: We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. Results: In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. Conclusions: We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps.
引用
收藏
页数:15
相关论文
共 61 条
[1]
Comparative study of gene set enrichment methods [J].
Abatangelo, Luca ;
Maglietta, Rosalia ;
Distaso, Angela ;
D'Addabbo, Annarita ;
Creanza, Teresa Maria ;
Mukherjee, Sayan ;
Ancona, Nicola .
BMC BIOINFORMATICS, 2009, 10 :275
[2]
Gene Expression Differences between Enriched Normal and Chronic Myelogenous Leukemia Quiescent Stem/Progenitor Cells and Correlations with Biological Abnormalities [J].
Affer, M. ;
Dao, S. ;
Liu, C. ;
Olshen, A. B. ;
Mo, Q. ;
Viale, A. ;
Lambek, C. L. ;
Marr, T. G. ;
Clarkson, B. D. .
JOURNAL OF ONCOLOGY, 2011, 2011
[3]
[Anonymous], THESIS G AUGUST U GO
[4]
[Anonymous], 2001, Biotech Software & Internet Report, DOI 10.1089/152791601750294344
[5]
Badea L, 2008, HEPATO-GASTROENTEROL, V55, P2016
[6]
Significance analysis of functional categories in gene expression studies: a structured permutation approach [J].
Barry, WT ;
Nobel, AB ;
Wright, FA .
BIOINFORMATICS, 2005, 21 (09) :1943-1949
[7]
Identification of a common gene expression signature in dilated cardiomyopathy across independent microarray studies [J].
Barth, Andreas S. ;
Kuner, Ruprecht ;
Buness, Andreas ;
Ruschhaupt, Markus ;
Merk, Sylvia ;
Zwermann, Ludwig ;
Kaeaeb, Stefan ;
Kreuzer, Eckart ;
Steinbeck, Gerhard ;
Mansmann, Ulrich ;
Poustka, Annemarie ;
Nabauer, Michael ;
Sueltmann, Holger .
JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2006, 48 (08) :1610-1617
[8]
GOstat: find statistically overrepresented Gene Ontologies within a group of genes [J].
Beissbarth, T ;
Speed, TP .
BIOINFORMATICS, 2004, 20 (09) :1464-1465
[9]
Interpreting experimental results using gene ontologies [J].
Beissbarth, Tim .
DNA MICROARRAYS, PART B: DATABASES AND STATISTICS, 2006, 411 :340-352
[10]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300