Effective dimensionality for principal component analysis of time series expression data

被引:8
作者
Hörnquist, M
Hertz, J
Wahde, M
机构
[1] Linkoping Univ, Dept Sci & Technol, SE-60174 Norrkoping, Sweden
[2] NORDITA, DK-2100 Copenhagen, Denmark
[3] Chalmers Univ Technol, Div Mechatron, S-41296 Gothenburg, Sweden
关键词
PCA; dimensionality; expression data; noise effects;
D O I
10.1016/S0303-2647(03)00128-X
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large-scale expression data are today measured for thousands of genes simultaneously. This development has been followed by an exploration of theoretical tools to get as much information out of these data as possible. Several groups have used principal component analysis (PCA) for this task. However, since this approach is data-driven, care must be taken in order not to analyze the noise instead of the data. As a strong warning towards uncritical use of the output from a PCA, we employ a newly developed procedure to judge the effective dimensionality of a specific data set. Although this data set is obtained during the development of rat central nervous system, our finding is a general property of noisy time series data. Based on knowledge of the noise-level for the data, we find that the effective number of dimensions that are meaningful to use in a PCA is much lower than what could be expected from the number of measurements. We attribute this fact both to effects of noise and the lack of independence of the expression levels. Finally, we explore the possibility to increase the dimensionality by performing more measurements within one time series, and conclude that this is not a fruitful approach. (C) 2003 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:311 / 317
页数:7
相关论文
共 19 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]  
de Boor C., 1978, PRACTICAL GUIDE SPLI, DOI DOI 10.1007/978-1-4612-6333-3
[3]  
Dunteman G. H., 1989, PRINCIPAL COMPONENTS
[4]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[5]  
Hastie T, 2001, GENOME BIOL, V2
[6]   Fundamental patterns underlying gene expression profiles: Simplicity from complexity [J].
Holter, NS ;
Mitra, M ;
Maritan, A ;
Cieplak, M ;
Banavar, JR ;
Fedoroff, NV .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (15) :8409-8414
[7]   Dynamic modeling of gene expression data [J].
Holter, NS ;
Maritan, A ;
Cieplak, M ;
Fedoroff, NV ;
Banavar, JR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (04) :1693-1698
[8]   Effective dimensionality of large-scale expression data using principal component analysis [J].
Hörnquist, M ;
Hertz, J ;
Wahde, M .
BIOSYSTEMS, 2002, 65 (2-3) :147-156
[9]  
HORNQUIST M, 2001, CURRENTS COMPUTATION, P173
[10]  
KNUDSEN S, 2002, BIOL GUIDE ANAL DNA