Identifying candidate disease genes with high-performance computing

被引:3
作者
Braun, TA [1 ]
Scheetz, TE
Webster, G
Clark, A
Stone, EM
Sheffield, VC
Casavant, TL
机构
[1] Univ Iowa, Dept Biomed Engn, Dept Ophthalmol, Coordinated Lab Computat Genom, Iowa City, IA 52242 USA
[2] Alcon Labs Inc, Ft Worth, TX 76101 USA
[3] Univ Iowa, Dept Pediat, Iowa City, IA 52242 USA
[4] Univ Iowa, Dept Biomed Engn, Dept Elect & Comp Engn, Coordinated Lab Computat Genom, Iowa City, IA 52242 USA
关键词
high-performance computing; disease gene discovery; bioinformatics; knowledge discovery; biological databases;
D O I
10.1023/A:1024417200364
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The publicly-funded effort to read the complete nucleotide sequence of the human genome, the human genome project (HGP), is nearing completion of the approximately three billion nucleotides of the human genome. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the genome sequencing of model organisms (Escherichia coli, Saccharomyces cerevisiae, the fruit fly Drosophila melanogaster, the worm Caenorhabditis elegans, and the laboratory mouse), gene discovery projects (expressed sequence tags and full-length), and new high-throughput expression analyzes. These resources are invaluable in identifying the trascriptome and proteome-the set of transcribed and translated sequences. However, the bulk of the effort still remains-to identify the functional and structural elements contained within gene sequences. Addressing these challenges requires the use of high-performance computing. There are currently hundreds of databases containing biological information that may contain data relevant to the identification of disease-causing genes. Knowledge discovery using these databases holds enormous potential, if sufficient computing resources are utilized to process the overwhelming amounts of data. We are developing a system to acquire and mine data from a subset of these databases to aid our efforts to identify disease genes. A high performance cluster of Linux of workstations is used to perform distributed sequence alignments as part of our analysis and processing. This system has been used to mine the GeneMap99 database within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedl syndrome (BBS).
引用
收藏
页码:7 / 24
页数:18
相关论文
共 29 条
[1]   When Pharma merges, R&D is the dowry [J].
Agnew, B .
SCIENCE, 2000, 287 (5460) :1952-1953
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]   Elucidating the genetic networks of development: A bioinformatics approach [J].
Bard, JBL ;
Baldock, RA ;
Davidson, DR .
GENOME RESEARCH, 1998, 8 (09) :859-863
[4]  
BARDET G, 1920, THESIS PARIS
[5]   The Molecular Biology Database Collection: an updated compilation of biological database resources [J].
Baxevanis, AD .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :1-10
[6]  
Biedl A., 1922, Deutsch Med Wochenschr, V48, P1630
[7]   Mining functional information associated with expression arrays [J].
Blaschke C. ;
Oliveros J.C. ;
Valencia A. .
Functional & Integrative Genomics, 2001, 1 (4) :256-268
[8]   Linkage mapping in 29 Bardet-Biedl syndrome families confirms loci in chromosomal regions 11q13, 15q22.3-q23, and 16q21 [J].
Bruford, EA ;
Riise, R ;
Teague, PW ;
Porter, K ;
Thomson, KL ;
Moore, AT ;
Jay, M ;
Warburg, M ;
Schinzel, A ;
Tommerup, N ;
Tornqvist, K ;
Rosenberg, T ;
Patton, M ;
Mansfield, DC ;
Wright, AF .
GENOMICS, 1997, 41 (01) :93-99
[9]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[10]   USE OF A DNA POOLING STRATEGY TO IDENTIFY A HUMAN OBESITY SYNDROME LOCUS ON CHROMOSOME-15 [J].
CARMI, R ;
ROKHLINA, T ;
KWITEKBLACK, AE ;
ELBEDOUR, K ;
NISHIMURA, D ;
STONE, EM ;
SHEFFIELD, VC .
HUMAN MOLECULAR GENETICS, 1995, 4 (01) :9-13