Functional bioinformatics for Arabidopsis thaliana

被引:20
作者
Clare, A [1 ]
Karwath, A
Ougham, H
King, RD
机构
[1] Univ Wales, Dept Comp Sci, Aberystwyth SY23 3DB, Dyfed, Wales
[2] Univ Freiburg, Inst Comp Sci, D-79110 Freiburg, Germany
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/btl051
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The genome of Arabidopsis thaliana, which has the best understood plant genome, still has approximately one-third of its genes with no functional annotation at all from either MIPS or TAIR. We have applied our Data Mining Prediction (DMP) method to the problem of predicting the functional classes of these protein sequences. This method is based on using a hybrid machine-learning/data-mining method to identify patterns in the bioinformatic data about sequences that are predictive of function. We use data about sequence, predicted secondary structure, predicted structural domain, InterPro patterns, sequence similarity profile and expressions data. Results: We predicted the functional class of a high percentage of the Arabidopsis genes with currently unknown function. These predictions are interpretable and have good test accuracies. We describe in detail seven of the rules produced.
引用
收藏
页码:1130 / 1136
页数:7
相关论文
共 28 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   The quest to deduce protein function from sequence: the role of pattern databases [J].
Attwood, TK .
INTERNATIONAL JOURNAL OF BIOCHEMISTRY & CELL BIOLOGY, 2000, 32 (02) :139-155
[4]   Predicting gene function in Saccharomyces cerevisiae [J].
Clare, A. ;
King, R. D. .
BIOINFORMATICS, 2003, 19 :II42-II49
[5]   Machine learning of functional class from phenotype data [J].
Clare, A ;
King, RD .
BIOINFORMATICS, 2002, 18 (01) :160-166
[6]  
CLARE A, 2003, LECT NOTES COMPUTER, V2562
[7]  
Dzeroski Saso, 2001, RELATIONAL DATA MINI
[8]   SEQUENCE SIMILARITY OF PUTATIVE TRANSPOSASES LINKS THE MAIZE MUTATOR AUTONOMOUS ELEMENT AND A GROUP OF BACTERIAL INSERTION SEQUENCES [J].
EISEN, JA ;
BENITO, MI ;
WALBOT, V .
NUCLEIC ACIDS RESEARCH, 1994, 22 (13) :2634-2636
[9]   Functional and structural genomics using PEDANT [J].
Frishman, D ;
Albermann, K ;
Hani, J ;
Heumann, K ;
Metanomski, A ;
Zollner, A ;
Mewes, HW .
BIOINFORMATICS, 2001, 17 (01) :44-57
[10]   Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure [J].
Gough, J ;
Karplus, K ;
Hughey, R ;
Chothia, C .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 313 (04) :903-919