Machine learning in bioinformatics: A brief survey and recommendations for practitioners

被引:71
作者
Bhaskar, Harish [1 ]
Hoyle, David C. [1 ]
Singh, Sameer [1 ]
机构
[1] Univ Exeter, Sch Engn Comp Sci & Math, Exeter EX4 4QF, Devon, England
关键词
D O I
10.1016/j.compbiomed.2005.09.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning is used in a large number of bioinformatics applications and studies. The application of machine learning techniques in other areas such as pattern recognition has resulted in accumulated experience as to correct and principled approaches for their use. The aim of this paper is to give an account of issues affecting the application of machine learning tools, focusing primarily on general aspects of feature and model parameter selection, rather than any single specific algorithm. These aspects are discussed in the context of published bioinformatics studies in leading journals over the last 5 years. We assess to what degree the experience gained by the pattern recognition research community pervades these bioinformatics studies. We finally discuss various critical issues relating to bioinformatic data sets and make a number of recommendations on the proper use of machine learning techniques for bioinformatics research based upon previously published research on machine learning. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1104 / 1125
页数:22
相关论文
共 82 条
[1]  
Adcock CJ, 1997, J ROY STAT SOC D-STA, V46, P261
[2]   NETASA: neural network based prediction of solvent accessibility [J].
Ahmad, S ;
Gromiha, MM .
BIOINFORMATICS, 2002, 18 (06) :819-824
[3]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[4]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[5]  
[Anonymous], 2001, Bioinformatics
[6]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[7]   DISTANCE BETWEEN POPULATIONS ON BASIS OF ATTRIBUTE DATA [J].
BALAKRISHNAN, V ;
SANGHVI, LD .
BIOMETRICS, 1968, 24 (04) :859-+
[8]  
Baldi P., 2001, Bioinformatics: the machine learning approach
[9]  
Bishop C. M., 1996, Neural networks for pattern recognition
[10]   A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells [J].
Boland, MV ;
Murphy, RF .
BIOINFORMATICS, 2001, 17 (12) :1213-1223