Surviving in a sea of data: a survey of plant genome data resources and issues in building data management systems

被引:13
作者
Reiser, L [1 ]
Mueller, LA [1 ]
Rhee, SY [1 ]
机构
[1] Carnegie Inst Washington, Dept Plant Biol, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
controlled vocabulary; databases; data management; genomics; information systems; nomenclature;
D O I
10.1023/A:1013726308559
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Exponential growth of data, largely from whole-genome analyses, has changed the way biologists think about and handle data. Optimal use of these data requires effective methods to analyze and manage these data sets. Computers, software and the World Wide Web are now integral components of biological discovery. Understanding how information is obtained, processed and annotated in public databases allows researchers to effectively organize, analyze and export their own data into these databases. In this review we focus largely on two areas related to management of genomic data. We cite examples of resources available in the public domain and describe some of the software for data management systems currently available for plant research. In addition, we discuss a few concepts of data management from the perspective of an individual or group that wishes to provide data to the public databases, to use the information in the public databases more efficiently, or to develop a database to manage large data sets internally or for public access. These concepts include data descriptions, exchange format, curation, attribution, and database implementation.
引用
收藏
页码:59 / 74
页数:16
相关论文
共 48 条
  • [1] Computational comparison of two draft sequences of the human genome
    Aach, J
    Bulyk, ML
    Church, GM
    Comander, J
    Derti, A
    Shendure, J
    [J]. NATURE, 2001, 409 (6822) : 856 - 859
  • [2] XML, bioinformatics and data integration
    Achard, F
    Vaysseix, G
    Barillot, E
    [J]. BIOINFORMATICS, 2001, 17 (02) : 115 - 125
  • [3] The genome sequence of Drosophila melanogaster
    Adams, MD
    Celniker, SE
    Holt, RA
    Evans, CA
    Gocayne, JD
    Amanatides, PG
    Scherer, SE
    Li, PW
    Hoskins, RA
    Galle, RF
    George, RA
    Lewis, SE
    Richards, S
    Ashburner, M
    Henderson, SN
    Sutton, GG
    Wortman, JR
    Yandell, MD
    Zhang, Q
    Chen, LX
    Brandon, RC
    Rogers, YHC
    Blazej, RG
    Champe, M
    Pfeiffer, BD
    Wan, KH
    Doyle, C
    Baxter, EG
    Helt, G
    Nelson, CR
    Miklos, GLG
    Abril, JF
    Agbayani, A
    An, HJ
    Andrews-Pfannkoch, C
    Baldwin, D
    Ballew, RM
    Basu, A
    Baxendale, J
    Bayraktaroglu, L
    Beasley, EM
    Beeson, KY
    Benos, PV
    Berman, BP
    Bhandari, D
    Bolshakov, S
    Borkova, D
    Botchan, MR
    Bouck, J
    Brokstein, P
    [J]. SCIENCE, 2000, 287 (5461) : 2185 - 2195
  • [4] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [5] The InterPro database, an integrated documentation resource for protein families, domains and functional sites
    Apweiler, R
    Attwood, TK
    Bairoch, A
    Bateman, A
    Birney, E
    Biswas, M
    Bucher, P
    Cerutti, T
    Corpet, F
    Croning, MDR
    Durbin, R
    Falquet, L
    Fleischmann, W
    Gouzy, J
    Hermjakob, H
    Hulo, N
    Jonassen, I
    Kahn, D
    Kanapin, A
    Karavidopoulou, Y
    Lopez, R
    Marx, B
    Mulder, NJ
    Oinn, TM
    Pagni, M
    Servant, F
    Sigrist, CJA
    Zdobnov, EM
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 37 - 40
  • [6] Analysis of the genome sequence of the flowering plant Arabidopsis thaliana
    Kaul, S
    Koo, HL
    Jenkins, J
    Rizzo, M
    Rooney, T
    Tallon, LJ
    Feldblyum, T
    Nierman, W
    Benito, MI
    Lin, XY
    Town, CD
    Venter, JC
    Fraser, CM
    Tabata, S
    Nakamura, Y
    Kaneko, T
    Sato, S
    Asamizu, E
    Kato, T
    Kotani, H
    Sasamoto, S
    Ecker, JR
    Theologis, A
    Federspiel, NA
    Palm, CJ
    Osborne, BI
    Shinn, P
    Conway, AB
    Vysotskaia, VS
    Dewar, K
    Conn, L
    Lenz, CA
    Kim, CJ
    Hansen, NF
    Liu, SX
    Buehler, E
    Altafi, H
    Sakano, H
    Dunn, P
    Lam, B
    Pham, PK
    Chao, Q
    Nguyen, M
    Yu, GX
    Chen, HM
    Southwick, A
    Lee, JM
    Miranda, M
    Toriumi, MJ
    Davis, RW
    [J]. NATURE, 2000, 408 (6814) : 796 - 815
  • [7] Ashburner M, 2001, GENOME RES, V11, P1425
  • [8] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [9] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [10] An ontology for bioinformatics applications
    Baker, PG
    Goble, CA
    Bechhofer, S
    Paton, NW
    Stevens, R
    Brass, A
    [J]. BIOINFORMATICS, 1999, 15 (06) : 510 - 520