Improved detection of overrepresentation of Gene-Ontology annotations with parentchild analysis

被引:251
作者
Grossmann, Steffen
Bauer, Sebastian
Robinson, Peter N.
Vingron, Martin
机构
[1] Univ Med Charite, Inst Med Genet, D-13353 Berlin, Germany
[2] Max Planck Inst Mol Genet, D-14195 Berlin, Germany
关键词
D O I
10.1093/bioinformatics/btm440
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: High-throughput experiments such as microarray hybridizations often yield long lists of genes found to share a certain characteristic such as differential expression. Exploring Gene Ontology (GO) annotations for such lists of genes has become a widespread practice to get first insights into the potential biological meaning of the experiment. The standard statistical approach to measuring overrepresentation of GO terms cannot cope with the dependencies resulting from the structure of GO because they analyze each term in isolation. Especially the fact that annotations are inherited from more specific descendant terms can result in certain types of false-positive results with potentially misleading biological interpretation, a phenomenon which we term the inheritance problem. Results: We present here a novel approach to analysis of GO term overrepresentation that determines overrepresentation of terms in the context of annotations to the terms parents. This approach reduces the dependencies between the individual terms measurements, and thereby avoids producing false-positive results owing to the inheritance problem. ROC analysis using study sets with overrepresented GO terms showed a clear advantage for our approach over the standard algorithm with respect to the inheritance problem. Although there can be no gold standard for exploratory methods such as analysis of GO term overrepresentation, analysis of biological datasets suggests that our algorithm tends to identify the core GO terms that are most characteristic of the dataset being analyzed. Availability: The Ontologizer can be found at the project homepage http://www.charite.de/ch/medgen/ontologizer Contact: peter.robinson@charite.de and vingron@molgen.mpg.de.
引用
收藏
页码:3024 / 3031
页数:8
相关论文
共 14 条
  • [1] Improved scoring of functional groups from gene expression data by decorrelating GO graph structure
    Alexa, Adrian
    Rahnenfuehrer, Joerg
    Lengauer, Thomas
    [J]. BIOINFORMATICS, 2006, 22 (13) : 1600 - 1607
  • [2] [Anonymous], 1993, Resampling-based multiple testing: Examples and methods for P-value adjustment
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology
    Camon, E
    Magrane, M
    Barrell, D
    Lee, V
    Dimmer, E
    Maslen, J
    Binns, D
    Harte, N
    Lopez, R
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D262 - D266
  • [5] CAMON E, 2004, SILICO BIOL, V4, P1386
  • [6] Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)
    Dwight, SS
    Harris, MA
    Dolinski, K
    Ball, CA
    Binkley, G
    Christie, KR
    Fisk, DG
    Issel-Tarver, L
    Schroeder, M
    Sherlock, G
    Sethuraman, A
    Weng, S
    Botstein, D
    Cherry, JM
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 69 - 72
  • [7] Using GOstats to test gene lists for GO term association
    Falcon, S.
    Gentleman, R.
    [J]. BIOINFORMATICS, 2007, 23 (02) : 257 - 258
  • [8] Bioconductor: open software development for computational biology and bioinformatics
    Gentleman, RC
    Carey, VJ
    Bates, DM
    Bolstad, B
    Dettling, M
    Dudoit, S
    Ellis, B
    Gautier, L
    Ge, YC
    Gentry, J
    Hornik, K
    Hothorn, T
    Huber, W
    Iacus, S
    Irizarry, R
    Leisch, F
    Li, C
    Maechler, M
    Rossini, AJ
    Sawitzki, G
    Smith, C
    Smyth, G
    Tierney, L
    Yang, JYH
    Zhang, JH
    [J]. GENOME BIOLOGY, 2004, 5 (10)
  • [9] Grossmann S, 2006, LECT NOTES COMPUT SC, V3909, P85
  • [10] The Gene Ontology (GO) project in 2006
    Harris, Midori A.
    Clark, Jennifer I.
    Ireland, Amelia
    Lomax, Jane
    Ashburner, Michael
    Collins, Russell
    Eilbeck, Karen
    Lewis, Suzanna
    Mungall, Chris
    Richter, John
    Rubin, Gerald M.
    Shu, ShengQiang
    Blake, Judith A.
    Bult, Carol J.
    Diehl, Alexander D.
    Dolan, Mary E.
    Drabkin, Harold J.
    Eppig, Janan T.
    Hill, David P.
    Ni, Li
    Ringwald, Martin
    Balakrishnan, Rama
    Binkley, Gail
    Cherry, J. Michael
    Christie, Karen R.
    Costanzo, Maria C.
    Dong, Qing
    Engel, Stacia R.
    Fisk, Dianna G.
    Hirschman, Jodi E.
    Hitz, Benjamin C.
    Hong, Eurie L.
    Lane, Christopher
    Miyasato, Stuart
    Nash, Robert
    Sethuraman, Anand
    Skrzypek, Marek
    Theesfeld, Chandra L.
    Weng, Shuai
    Botstein, David
    Dolinski, Kara
    Oughtred, Rose
    Berardini, Tanya
    Mundodi, Suparna
    Rhee, Seung Y.
    Apweiler, Rolf
    Barrell, Daniel
    Camon, Evelyn
    Dimmer, Emily
    Mulder, Nicola
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D322 - D326