A complexity reduction algorithm for analysis and annotation of large genomic sequences

被引:8
作者
Chuang, TJ
Lin, WC
Lee, HC
Wang, CW
Hsiao, KL
Wang, ZH
Shieh, D
Lin, SC
Ch'ang, LY [1 ]
机构
[1] Acad Sinica, Inst Biomed Sci, Bioinformat Res Ctr, Taipei 11529, Taiwan
[2] Acad Sinica, Ctr Comp, Taipei 11529, Taiwan
关键词
D O I
10.1101/gr.313703
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexity reduction algorithm for sequence analysis (CRASA) that enables direct alignment of cDNA sequences to the genome. This method features a progressive data structure in hierarchical orders to facilitate a fast and efficient search mechanism. CRASA implementation was tested with already annotated genomic sequences in two benchmark data sets and compared with 15 annotation programs (10 ab initio and S homology-based approaches) against the EST database. By the use of layered noise filters, the complexity of CRASA-matched data was reduced exponentially. The results from the benchmark tests showed that CRASA annotation excelled in both the sensitivity and specificity categories. When CRASA was applied to the analysis of human Chromosomes 21 and 22, an additional 83 potential genes were identified. With its large-scale processing capability, CRASA can be used as a robust tool for genome annotation with high accuracy by matching the EST sequences precisely to the genomic sequences.
引用
收藏
页码:313 / 322
页数:10
相关论文
共 42 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]  
[Anonymous], P 2 INT C BIOINF SUP
[4]   GAIA: Framework annotation of genomic sequence [J].
Bailey, LC ;
Fischer, S ;
Schug, J ;
Crabtree, J ;
Gibson, M ;
Overton, GC .
GENOME RESEARCH, 1998, 8 (03) :234-250
[5]   Using GeneWise in the Drosophila annotation experiment [J].
Birney, E ;
Durbin, R .
GENOME RESEARCH, 2000, 10 (04) :547-548
[6]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[7]   Finding the genes in genomic DNA [J].
Burge, CB ;
Karlin, S .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :346-354
[8]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[9]  
Chao KM, 1997, COMPUT APPL BIOSCI, V13, P75
[10]  
CHAO KM, 1995, COMPUT APPL BIOSCI, V11, P147