Machine learning of functional class from phenotype data

被引:38
作者
Clare, A [1 ]
King, RD [1 ]
机构
[1] Univ Wales, Dept Comp Sci, Aberystwyth SY23 3DB, Dyfed, Wales
基金
英国医学研究理事会;
关键词
D O I
10.1093/bioinformatics/18.1.160
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Mutant phenotype growth experiments are an important novel source of functional genomics data which have received little attention in bioinformatics. We applied supervised machine learning to the problem of using phenotype data to predict the functional class of Open Reading Frames (ORFs) in Saccaromyces cerevisiae. Three sources of data were used: TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces (TRIPLES), European Functional Analysis Network (EUROFAN) and Munich Information Center for Protein Sequences (MIPS). The analysis of the data presented a number of challenges to machine learning: multi-class labels, a large number of sparsely populated classes, the need to learn a set of accurate rules (not a complete classification), and a very large amount of missing values. We modified the algorithm C4.5 to deal with these problems. Results: Rules were learnt which are accurate and biologically meaningful. The rules predict function of 83 ORFs of unknown function at an estimated accuracy of greater than or equal to80%. Availability: The data and complete results are available at http://users.aber.ac.uk/ajc99/phenotype/.
引用
收藏
页码:160 / 166
页数:7
相关论文
共 30 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Functional classes in the three domains of life [J].
Andrade, MA ;
Ouzounis, C ;
Sander, C ;
Tamames, J ;
Valencia, A .
JOURNAL OF MOLECULAR EVOLUTION, 1999, 49 (05) :551-557
[3]  
[Anonymous], INFORMATICA
[4]  
[Anonymous], 1999, AAAI 99 WORKSH TEXT
[5]  
[Anonymous], 1994, MACHINE LEARNING NEU
[6]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]  
*CASP, 1999, PROTEIN STRUCT FUN S, V37, pS3
[8]  
Cestnik B., 1990, P EUR C ART INT, P147
[9]   CONSEQUENCES OF THE OVEREXPRESSION OF UBIQUITIN IN YEAST - ELEVATED TOLERANCES OF OSMOSTRESS, ETHANOL AND CANAVANINE, YET REDUCED TOLERANCES OF CADMIUM, ARSENITE AND PAROMOMYCIN [J].
CHEN, YP ;
PIPER, PW .
BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH, 1995, 1268 (01) :59-64
[10]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686