The GeneMine system for genome/proteome annotation and collaborative data mining

被引:30
作者
Lee, C [1 ]
Irizarry, K [1 ]
机构
[1] Univ Calif Los Angeles, Dept Chem & Biochem, Los Angeles, CA 90095 USA
关键词
D O I
10.1147/sj.402.0592
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As genome data and bioinformatics resources grow exponentially in size and complexity, there is an increasing need for software that can bridge the gap between biologists with questions and the worldwide set of highly specialized fools for answering them. The GeneMine system for small- to medium-scale genome analysis provides: (1) automated analysis of DNA (deoxyribonucleic acid) and protein sequence data using over 50 different analysis servers via the internet, integrating data from homologous functions, tissue expression patterns, mapping, polymorphisms, model organism data and phenotypes, protein structural domains, active sites, motifs and other features, etc., (2) automated filtering and data reduction to highlight significant and interesting patterns, (3) a visual data-mining interface for rapidly exploring correlations, patterns, and contradictions within these data via aggregation, overlay, and drill-down, all projected onto relevant sequence alignments and three-dimensional structures, (4) a plug-in architecture that makes adding new types of analysis, data sources, and servers (including anything on the Internet) as easy as supplying the relevant URLs (uniform resource locators), (5) a hypertext system that lets users create and share "live" views of their discoveries by embedding three-dimensional structures, alignments, and annotation data within their documents, and (6) an integrated database schema far mining large GeneMine data sets in a relational database. The value of the GeneMine system is that if automatically brings together and uncovers important functional information from a much wider range of sources than a given specialist would normally think to query, resulting in insights that the researcher was not planning to look for. In this paper we present the architecture of the software for integrating and mining very diverse biological data, and cross-validation of gene function predictions. The software is freely available at http://www.bioinformatics.ucla.edu/genemine.
引用
收藏
页码:592 / 603
页数:12
相关论文
共 46 条
[31]  
Markowitz V M, 1995, J Comput Biol, V2, P547, DOI 10.1089/cmb.1995.2.547
[32]  
MUELLER M, 1995, GENEMINE SYSTEM AUTO
[33]  
MURZIN AG, 1995, J MOL BIOL, V247, P536, DOI 10.1016/S0022-2836(05)80134-2
[34]   PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization [J].
Nakai, K ;
Horton, P .
TRENDS IN BIOCHEMICAL SCIENCES, 1999, 24 (01) :34-35
[35]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[36]   CATH - a hierarchic classification of protein domain structures [J].
Orengo, CA ;
Michie, AD ;
Jones, S ;
Jones, DT ;
Swindells, MB ;
Thornton, JM .
STRUCTURE, 1997, 5 (08) :1093-1108
[37]   Experimental support for a β-propeller domain in integrin α-subunits and a calcium binding site on its lower surface [J].
Oxvig, C ;
Springer, TA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (09) :4870-4875
[38]  
PANTOLIANO M, 1994, BIOCHEMISTRY-US, P10229
[39]  
PAYNE DA, 2000, IEEE C INF VIS SALT
[40]   IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON [J].
PEARSON, WR ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) :2444-2448