The Merck Gene Index browser: an extensible data integration system for gene finding, gene characterization and EST data mining

被引:23
作者
Eckman, BA
Aaronson, JS
Borkowski, JA
Bailey, WJ
Elliston, KO
Williamson, AR
Blevins, RA
机构
[1] Merck Res Labs, Dept Bioinformat, W Point, PA USA
[2] Merck Res Labs, Dept Immunol, Rahway, NJ USA
关键词
D O I
10.1093/bioinformatics/14.1.2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: To make effective use of the vast amounts of expressed sequence tag (EST) sequence data generated by the Merck-sponsored EST project and other similar efforts, sequences must be organized into gene classes, and scientists must be able to 'mine' the gene class data in the contest of related genomic data. Results: This paper presents the Merck Gene Index browser; an easily extensible, World Wide Web-based system for mining the Merck Gene Index (MGI) and related genomic data. The MGI is a non-redundant set of clones and sequences, each representing a distinct gene, constructed from all high-quality 3' EST sequences generated by the Merck-sponsored EST project. The MGI browser integrates data fi-om a variety of sources and storage formats, both local and remote, using an eclectic integration strategy, including a federation of relational databases, a local data warehouse and simple hypertext links. Data currently. integrated include: LENS cDNA clone and EST data, dbEST protein and I?on-EST nucleic ac id similarity data, WashU sequence chromatograms, Entrez sequence and Medline entries, and UniGene gene clusters. Flatfile sequence data are accessed using the Bioapps server, an internally developed client-server system that supports generic sequence analysis applications. Browser data are retrieved and formatted by means of the Bioinformatics Data Integration Toolkit (B-DIT), a new suite of Perl routines. Availability: Software is available on request from the authors. Contact: barbara_eckman@sbphrd.com.
引用
收藏
页码:2 / 13
页数:12
相关论文
共 46 条
[1]   Toward the development of a gene index to the human genome: An assessment of the nature of high-throughput EST sequence data [J].
Aaronson, JS ;
Eckman, B ;
Blevins, RA ;
Borkowski, JA ;
Myerson, J ;
Imran, S ;
Elliston, KO .
GENOME RESEARCH, 1996, 6 (09) :829-845
[2]  
ADAMS MD, 1995, NATURE, V377, P3
[3]  
ALONSO R, 1987, IEEE DATA ENG B, V10
[4]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[5]  
*AP, 1995, AP HTTP SERV VERS 1
[6]  
AUFFRAY C, 1995, CR ACAD SCI III-VIE, V318, P263
[7]  
BAIROCH A, 1994, NUCLEIC ACIDS RES, V22, P3578
[8]  
BAIROCH A, 1994, NUCLEIC ACIDS RES, V22, P3583
[9]   GENBANK [J].
BENSON, DA ;
BOGUSKI, M ;
LIPMAN, DJ ;
OSTELL, J .
NUCLEIC ACIDS RESEARCH, 1994, 22 (17) :3441-3444
[10]   Bioinformatics - Principles and potential of a new multidisciplinary tool [J].
Benton, D .
TRENDS IN BIOTECHNOLOGY, 1996, 14 (08) :261-272