Functional annotation of the Arabidopsis genome using controlled vocabularies

被引:331
作者
Berardini, TZ
Mundodi, S [1 ]
Reiser, L
Huala, E
Garcia-Hernandez, M
Zhang, PF
Mueller, LA
Yoon, J
Doyle, A
Lander, G
Moseyko, N
Yoo, D
Xu, I
Zoeckler, B
Montoya, M
Miller, N
Weems, D
Rhee, SY
机构
[1] Carnegie Inst, Dept Plant Biol, Stanford, CA 94305 USA
[2] Natl Ctr Human Genome Resources, Santa Fe, NM 87505 USA
关键词
D O I
10.1104/pp.104.040071
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resource's goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species.
引用
收藏
页码:745 / 755
页数:11
相关论文
共 15 条
  • [1] Ashburner M, 2001, GENOME RES, V11, P1425
  • [2] The gene ontology annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro
    Camon, E
    Magrane, M
    Barrell, D
    Binns, D
    Fleischmann, W
    Kersey, P
    Mulder, N
    Oinn, T
    Maslen, J
    Cox, A
    Apweiler, R
    [J]. GENOME RESEARCH, 2003, 13 (04) : 662 - 672
  • [3] Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)
    Dwight, SS
    Harris, MA
    Dolinski, K
    Ball, CA
    Binkley, G
    Christie, KR
    Fisk, DG
    Issel-Tarver, L
    Schroeder, M
    Sherlock, G
    Sethuraman, A
    Weng, S
    Botstein, D
    Cherry, JM
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 69 - 72
  • [4] Predicting subcellular localization of proteins based on their N-terminal amino acid sequence
    Emanuelsson, O
    Nielsen, H
    Brunak, S
    von Heijne, G
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) : 1005 - 1016
  • [5] Functional and structural genomics using PEDANT
    Frishman, D
    Albermann, K
    Hani, J
    Heumann, K
    Metanomski, A
    Zollner, A
    Mewes, HW
    [J]. BIOINFORMATICS, 2001, 17 (01) : 44 - 57
  • [6] Assigning function to yeast proteins by integration of technologies
    Hazbun, TR
    Malmström, L
    Anderson, S
    Graczyk, BJ
    Fox, B
    Riffle, M
    Sundin, BA
    Aranda, JD
    McDonald, WH
    Chiu, CH
    Snydsman, BE
    Bradley, P
    Muller, EGD
    Fields, S
    Baker, D
    Yates, JR
    Davis, TN
    [J]. MOLECULAR CELL, 2003, 12 (06) : 1353 - 1365
  • [7] Automated Gene Ontology annotation for anonymous sequence data
    Hennig, S
    Groth, D
    Lehrach, H
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3712 - 3715
  • [8] Mouse proteome analysis
    Kanapin, A
    Batalov, S
    Davis, MJ
    Gough, J
    Grimmond, S
    Kawaji, H
    Magrane, M
    Matsuda, H
    Schönbach, C
    Teasdale, RD
    Yuan, Z
    [J]. GENOME RESEARCH, 2003, 13 (6B) : 1335 - 1344
  • [9] Analysis of the genome sequence of the flowering plant Arabidopsis thaliana
    Kaul, S
    Koo, HL
    Jenkins, J
    Rizzo, M
    Rooney, T
    Tallon, LJ
    Feldblyum, T
    Nierman, W
    Benito, MI
    Lin, XY
    Town, CD
    Venter, JC
    Fraser, CM
    Tabata, S
    Nakamura, Y
    Kaneko, T
    Sato, S
    Asamizu, E
    Kato, T
    Kotani, H
    Sasamoto, S
    Ecker, JR
    Theologis, A
    Federspiel, NA
    Palm, CJ
    Osborne, BI
    Shinn, P
    Conway, AB
    Vysotskaia, VS
    Dewar, K
    Conn, L
    Lenz, CA
    Kim, CJ
    Hansen, NF
    Liu, SX
    Buehler, E
    Altafi, H
    Sakano, H
    Dunn, P
    Lam, B
    Pham, PK
    Chao, Q
    Nguyen, M
    Yu, GX
    Chen, HM
    Southwick, A
    Lee, JM
    Miranda, M
    Toriumi, MJ
    Davis, RW
    [J]. NATURE, 2000, 408 (6814) : 796 - 815
  • [10] Predicting phenotype from patterns of annotation
    King, Oliver D.
    Lee, Jeffrey C.
    Dudley, Aimee M.
    Janse, Daniel M.
    Church, George M.
    Roth, Frederick P.
    [J]. BIOINFORMATICS, 2003, 19 : i183 - i189