Biclustering algorithms for biological data analysis: A survey

被引:1209
作者
Madeira, SC
Oliveira, AL
机构
[1] Univ Beira Interior, P-6200001 Covilha, Portugal
[2] INESC, ID, Lisbon, Portugal
[3] Univ Tecn Lisboa, Inst Super Tecn, P-1000029 Lisbon, Portugal
关键词
biclustering; simultaneous clustering; coclustering; subspace clustering; bidimensional clustering; direct clustering; block clustering; two-way clustering; two-mode clustering; two-sided clustering; microarray data analysis; biological data analysis; gene expression data;
D O I
10.1109/TCBB.2004.2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.
引用
收藏
页码:24 / 45
页数:22
相关论文
共 52 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], P 4 SIAM INT C DAT M
[4]  
[Anonymous], 1997, COMPUTER SCI COMPUTA
[5]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[6]  
Baldi P., 2002, DNA MICROARRAYS GENE
[7]  
Ben-Dor A., 2002, P 6 ANN INT C COMP B, P49, DOI DOI 10.1145/565196.565203
[8]  
Berkhin P, 2002, SIAM PROC S, P420
[9]  
BUSYGIN S, 2002, P 2 SIAM INT C DAT M
[10]  
Califano A, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P75