Integrating gene expression and GO classification for PCA by preclustering

被引:13
作者
De Haan, Jorn R. [1 ]
Piek, Ester [2 ]
van Schaik, Rene C. [3 ]
de Vlieg, Jacob [3 ,4 ]
Bauerschmidt, Susanne [3 ]
Buydens, Lutgarde M. C. [1 ]
Wehrens, Ron [1 ]
机构
[1] Radboud Univ Nijmegen, Inst Mol & Mat, NL-6525 AJ Nijmegen, Netherlands
[2] Radboud Univ Nijmegen, Dept Appl Biol, Fac Sci, NL-6525 AJ Nijmegen, Netherlands
[3] MSD, NL-5340 BH Oss, Netherlands
[4] Radboud Univ Nijmegen, Ctr Mol & Biomol Informat, Nijmegen Ctr Mol Life Sci, NL-6525 GA Nijmegen, Netherlands
关键词
MICROARRAY DATA; OLIGONUCLEOTIDE ARRAYS; PATTERNS; MODEL; ARCHITECTURE; PROFILES; DISPLAY; BINDING;
D O I
10.1186/1471-2105-11-158
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Background: Gene expression data can be analyzed by summarizing groups of individual gene expression profiles based on GO annotation information. The mean expression profile per group can then be used to identify interesting GO categories in relation to the experimental settings. However, the expression profiles present in GO classes are often heterogeneous, i.e., there are several different expression profiles within one class. As a result, important experimental findings can be obscured because the summarizing profile does not seem to be of interest. We propose to tackle this problem by finding homogeneous subclasses within GO categories: preclustering. Results: Two microarray datasets are analyzed. First, a selection of genes from a well-known Saccharomyces cerevisiae dataset is used. The GO class "cell wall organization and biogenesis" is shown as a specific example. After preclustering, this term can be associated with different phases in the cell cycle, where it could not be associated with a specific phase previously. Second, a dataset of differentiation of human Mesenchymal Stem Cells (MSC) into osteoblasts is used. For this dataset results are shown in which the GO term "skeletal development" is a specific example of a heterogeneous GO class for which better associations can be made after preclustering. The Intra Cluster Correlation (ICC), a measure of cluster tightness, is applied to identify relevant clusters. Conclusions: We show that this method leads to an improved interpretability of results in Principal Component Analysis.
引用
收藏
页数:10
相关论文
共 28 条
[1]
Improved scoring of functional groups from gene expression data by decorrelating GO graph structure [J].
Alexa, Adrian ;
Rahnenfuehrer, Joerg ;
Lengauer, Thomas .
BIOINFORMATICS, 2006, 22 (13) :1600-1607
[2]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]
Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[4]
[Anonymous], 1991, A User's Guide to Principal Components
[5]
[Anonymous], R LANG ENV STAT COMP
[6]
Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[7]
SMAD 8 binding to mice Msx1 basal promoter is required for transcriptional activation [J].
Binato, R ;
Martinez, CEA ;
Pizzatti, L ;
Robert, B ;
Abdelhay, E .
BIOCHEMICAL JOURNAL, 2006, 393 :141-150
[8]
Integration of GO annotations in Correspondence Analysis: facilitating the interpretation of microarray data [J].
Busold, CH ;
Winter, S ;
Hauser, N ;
Bauer, A ;
Dippon, J ;
Hoheisel, JD ;
Fellenberg, K .
BIOINFORMATICS, 2005, 21 (10) :2424-2429
[9]
Integrating Biological Knowledge with Gene Expression Profiles for Survival Prediction of Cancer [J].
Chen, Xi ;
Wang, Lily .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (02) :265-278
[10]
Interpretation of ANOVA models for microarray data using PCA [J].
de Haan, J. R. ;
Wehrens, R. ;
Bauerschmidt, S. ;
Piek, E. ;
van Schaik, R. C. ;
Buydens, L. M. C. .
BIOINFORMATICS, 2007, 23 (02) :184-190