GONOME: measuring correlations between GO terms and genomic positions

被引:11
作者
Stanley, SM [1 ]
Bailey, TL [1 ]
Mattick, JS [1 ]
机构
[1] Univ Queensland, Inst Mol Biosci, Brisbane, Qld 4072, Australia
关键词
D O I
10.1186/1471-2105-7-94
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Current methods to find significantly under- and over-represented gene ontology (GO) terms in a set of genes consider the genes as equally probable "balls in a bag", as may be appropriate for transcripts in micro-array data. However, due to the varying length of genes and intergenic regions, that approach is inappropriate for deciding if any GO terms are correlated with a set of genomic positions. Results: We present an algorithm - GONOME - that can determine which GO terms are significantly associated with a set of genomic positions given a genome annotated with (at least) the starts and ends of genes. We show that certain GO terms may appear to be significantly associated with a set of randomly chosen positions in the human genome if gene lengths are not considered, and that these same terms have been reported as significantly over-represented in a number of recent papers. This apparent over-representation disappears when gene lengths are considered, as GONOME does. For example, we show that, when gene length is taken into account, the term "development" is not significantly enriched in genes associated with human CpG islands, in contradiction to a previous report. We further demonstrate the efficacy of GONOME by showing that occurrences of the proteosome-associated control element (PACE) upstream activating sequence in the S. cerevisiae genome associate significantly to appropriate GO terms. An extension of this approach yields a whole-genome motif discovery algorithm that allows identification of many other promoter sequences linked to different types of genes, including a large group of previously unknown motifs significantly associated with the terms 'translation' and 'translational elongation'. Conclusion: GONOME is an algorithm that correctly extracts over-represented GO terms from a set of genomic positions. By explicitly considering gene size, GONOME avoids a systematic bias toward GO terms linked to large genes. Inappropriate use of existing algorithms that do not take gene size into account has led to erroneous or suspect conclusions. Reciprocally GONOME may be used to identify new features in genomes that are significantly associated with particular categories of genes.
引用
收藏
页数:11
相关论文
共 35 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], TRANSCRIPTION FACTOR
[3]  
[Anonymous], BIOINFORMATICS
[4]  
[Anonymous], Caenorhabditis Elegans 40x Coverage Dataset
[5]  
[Anonymous], DROSOPHILA MELANOGAS
[6]  
[Anonymous], SACCHAROMYCES CEREVI
[7]   NUMBER OF CPG ISLANDS AND GENES IN HUMAN AND MOUSE [J].
ANTEQUERA, F ;
BIRD, A .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1993, 90 (24) :11995-11999
[8]   Structure, function and evolution of CpG island promoters [J].
Antequera, F .
CELLULAR AND MOLECULAR LIFE SCIENCES, 2003, 60 (08) :1647-1658
[9]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[10]   Glucose depletion rapidly inhibits translation initiation in yeast [J].
Ashe, MP ;
De Long, SK ;
Sachs, AB .
MOLECULAR BIOLOGY OF THE CELL, 2000, 11 (03) :833-848