GAIA: Framework annotation of genomic sequence

被引:31
作者
Bailey, LC [1 ]
Fischer, S [1 ]
Schug, J [1 ]
Crabtree, J [1 ]
Gibson, M [1 ]
Overton, GC [1 ]
机构
[1] Univ Penn, Sch Med, Dept Genet, Computat Biol & Informat Lab, Philadelphia, PA 19104 USA
关键词
D O I
10.1101/gr.8.3.234
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
As increasing amounts of genomic sequence from many organisms become available, and as DNA sequences become a primary reagent in biologic investigations, the role of annotation as a prospective guide for laboratory experiments will expand rapidly. Here we describe a process of high-throughput, reliable annotation, called Framework annotation, which is designed to provide a foundation for initial biologic characterization of previously unexamined sequence. To examine this concept in practice, we have constructed Genome Annotation and Information Analysis (GAIA), a prototype software architecture that implements several elements important for framework annotation. The center of GAIA consists of an annotation database and the associated data management subsystem that forms the software bus along which other components communicate. The schema for this database defines three principal concepts: (I) Entries, consisting of sequence and associated historical data; (2) Features, comprising information of biologic interest; and (3) Experiments, describing the evidence that supports Features. The database permits tracking of annotation results over time, as well as assessment of the reliability of particular results. New framework annotation is produced by CARTA, a set of autonomous sensors that perform automatic analyses and assert results into the annotation database. These results are available via a Web-based query interface that uses graphical lava applets as well as text-based HTML pages to display data at different levels of resolution and permit interactive exploration of annotation. We present results for initial application of framework annotation to a set of test sequences, demonstrating its effectiveness in providing a starting point for biologic investigation, and discuss ways in which the current prototype can be improved. The prototype is available for public use and comment at http://www.cbil.upenn.edu/gaia.
引用
收藏
页码:234 / 250
页数:17
相关论文
共 29 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] [Anonymous], [No title captured]
  • [3] BAILEY LC, 1998, IN PRESS GENOME RES
  • [4] A gene belonging to the Sm family of snRNP core proteins maps within the mouse MHC
    Bedian, V
    Adams, T
    Geiger, EA
    Bailey, LC
    Gasser, DL
    [J]. IMMUNOGENETICS, 1997, 46 (05) : 427 - 430
  • [5] BUNEMAN P, 1994, 21 INT C VER LARG DA, P158
  • [6] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [7] Evaluation of gene structure prediction programs
    Burset, M
    Guigo, R
    [J]. GENOMICS, 1996, 34 (03) : 353 - 367
  • [8] The Sulfolobus solfataricus P2 genome project
    Charlebois, RL
    Gaasterland, T
    Ragan, MA
    Doolittle, WF
    Sensen, CW
    [J]. FEBS LETTERS, 1996, 389 (01) : 88 - 91
  • [9] CRABTREE J, 1998, MOL BIOL DATABASE
  • [10] *GCG, 1996, PROGR MAN WISC SEQ A