Effective dimensionality of large-scale expression data using principal component analysis

被引:12
作者
Hörnquist, M
Hertz, J
Wahde, M
机构
[1] NORDITA, DK-2100 Copenhagen, Denmark
[2] Chalmers Univ Technol, Div Mechatron, SE-41296 Gothenburg, Sweden
关键词
genetic regulatory network; gene regulation; principal component analysis; data reduction; noise effects;
D O I
10.1016/S0303-2647(02)00011-4
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large-scale expression data are today measured for thousands of genes simultaneously. This development is followed by an exploration of theoretical tools to get as much information out of these data as possible. One line is to try to extract the underlying regulatory network. The models used thus far, however, contain many parameters, and a careful investigation is necessary in order not to over-fit the models. We employ principal component analysis to show how, in the context of linear additive models, one can get a rough estimate of the effective dimensionality (the number of information-carrying dimensions) of large-scale gene expression datasets. We treat both the lack of independence of different measurements in a time series and the fact that that measurements are subject to some level of noise, both of which reduce the effective dimensionality and thereby constrain the complexity of models which can be built from the data. (C) 2002 Elsevier Science Ireland Ltd. All rights reserved.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 23 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]  
BJORCK A, 1996, NUMERICAL METHODS LE
[3]   TOPOLOGICAL DIMENSION AND LOCAL COORDINATES FROM TIME-SERIES DATA [J].
BROOMHEAD, DS ;
JONES, R ;
KING, GP .
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1987, 20 (09) :L563-L569
[4]  
D'haeseleer P, 1999, Pac Symp Biocomput, P41
[5]   Genetic network inference: from co-expression clustering to reverse engineering [J].
D'haeseleer, P ;
Liang, SD ;
Somogyi, R .
BIOINFORMATICS, 2000, 16 (08) :707-726
[6]  
DHAESELEER P, UNPUB GENE NETWORK I
[7]  
Draper N.R., 1998, APPL REGRESSION ANAL, V3rd ed., P34
[8]   Nonlinear principal components analysis of neuronal spike train data [J].
Fotheringhame, D ;
Baddeley, R .
BIOLOGICAL CYBERNETICS, 1997, 77 (04) :283-288
[9]  
HERTZ J, 1998, PAC S BIOC
[10]   Fundamental patterns underlying gene expression profiles: Simplicity from complexity [J].
Holter, NS ;
Mitra, M ;
Maritan, A ;
Cieplak, M ;
Banavar, JR ;
Fedoroff, NV .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (15) :8409-8414