A multi-step approach to time series analysis and gene expression clustering

被引:34
作者
Amato, R
Ciaramella, A
Deniskina, N
Del Mondo, C
di Bernardo, D
Donalek, C
Longo, G
Mangano, G
Miele, G
Raiconi, G
Staiano, A
Tagliaferri, R [1 ]
机构
[1] Univ Salerno, Dipartimento Matemat & Informat, I-84100 Salerno, Italy
[2] Univ Naples Federico II, Dipartimento Sci Fis, Naples, Italy
[3] Russian Acad Sci, Inst Informat Transmiss Problems, Moscow 101447, Russia
[4] Telethon Inst Genet & Med, Naples, Italy
[5] CALTECH, Dept Astron, Pasadena, CA 91125 USA
[6] Ist Nazl Fis Nucl, I-80125 Naples, Italy
[7] INAF, Ist Nazl Astrofis, Sez Napoli, Naples, Italy
[8] Syracuse Univ, Dept Phys, Syracuse, NY USA
关键词
D O I
10.1093/bioinformatics/btk026
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The huge growth in gene expression data calls for the implementation of automatic tools for data processing and interpretation. Results: We present a new and comprehensive machine learning data mining framework consisting in a non-linear PCA neural network for feature extraction, and probabilistic principal surfaces combined with an agglomerative approach based on Negentropy aimed at clustering gene microarray data. The method, which provides a user-friendly visualization interface, can work on noisy data with missing points and represents an automatic procedure to get, with no a priori assumptions, the number of clusters present in the data. Cell-cycle dataset and a detailed analysis confirm the biological nature of the most significant clusters. Availability: The software described here is a subpackage part of the ASTRONEURAL package and is available upon request from the corresponding author. Contact: robtag@unisa.it Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:589 / 596
页数:8
相关论文
共 46 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]   Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma [J].
Ando, T ;
Suguro, M ;
Hanai, T ;
Kobayashi, T ;
Honda, H ;
Seto, M .
JAPANESE JOURNAL OF CANCER RESEARCH, 2002, 93 (11) :1207-1212
[3]  
Bishop C. M., 1996, Neural networks for pattern recognition
[4]   GTM: The generative topographic mapping [J].
Bishop, CM ;
Svensen, M ;
Williams, CKI .
NEURAL COMPUTATION, 1998, 10 (01) :215-234
[5]   Regulatory element detection using correlation with expression [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
NATURE GENETICS, 2001, 27 (02) :167-171
[6]  
CHANG JH, 2003, GENOMICS INFORM, V1, P32
[7]   A unified model for probabilistic principal surfaces [J].
Chang, K ;
Ghosh, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (01) :22-41
[8]  
CHANG K, 2000, THESIS U TEXAS AUSTI
[9]  
Chen Y, 1997, J Biomed Opt, V2, P364, DOI 10.1117/12.281504
[10]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73