GAZE: A generic framework for the integration of gene-prediction data by dynamic programming

被引:54
作者
Howe, KL [1 ]
Chothia, T [1 ]
Durbin, R [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
基金
英国惠康基金;
关键词
D O I
10.1101/gr.149502
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a method (implemented ill a program, GAZE) for assembling arbitrary evidence for individual gene components (features) into predictions of complete gene structures. Our system is generic in that both the features themselves, and the model of gene structure against which potential assemblies are validated and scored, are external to the system and Supplied by the user. GAZE uses a dynamic programming algorithm to obtain the highest scoring gene structure according to the model and posterior probabilities that each input feature is part of a gene. A novel pruning strategy ensures that the algorithm has a run-time effectively linear in sequence length. To demonstrate the flexibility Of Our system ill the incorporation of additional evidence into the gene prediction process, we show how it can be used to both represent nonstandard gene structures (ill the form of trans-spliced genes in Caenorhabditis elegans), and make use of similarity information (in the form of Expressed Sequence Tag alignments), while requiring no change to the underlying software. GAZE is available at http://www. sanger.ac.uk /Software/ analysis/ GAZE.
引用
收藏
页码:1418 / 1427
页数:10
相关论文
共 21 条
[1]  
BLUMENTHAL T, 1997, C ELEGANS, V2, P117
[2]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[3]   Finding the genes in genomic DNA [J].
Burge, CB ;
Karlin, S .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :346-354
[4]  
Durbin R., 1998, BIOL SEQUENCE ANAL P
[5]   Assembling genes from predicted exons in linear time with dynamic programming [J].
Guigó, R .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (04) :681-702
[6]   An assessment of gene prediction accuracy in large DNA sequences [J].
Guigó, R ;
Agarwal, P ;
Abril, JF ;
Burset, M ;
Fickett, JW .
GENOME RESEARCH, 2000, 10 (10) :1631-1642
[7]   PREDICTION OF GENE STRUCTURE [J].
GUIGO, R ;
KNUDSEN, S ;
DRAKE, N ;
SMITH, T .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 226 (01) :141-157
[8]   A TRANS-SPLICED LEADER SEQUENCE ON ACTIN MESSENGER-RNA IN C-ELEGANS [J].
KRAUSE, M ;
HIRSH, D .
CELL, 1987, 49 (06) :753-761
[9]   Using database matches with HMMGene for automated gene detection in Drosophila [J].
Krogh, A .
GENOME RESEARCH, 2000, 10 (04) :523-528
[10]  
KROGH A, 1997, ISMB, V5, P179