ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences

被引:23
作者
Bonizzoni, P
Rizzi, R
Pesole, G
机构
[1] Univ Milan, Dipartimento Sci Biotecnol, I-20133 Milan, Italy
[2] Univ Milano Bicocca, DISCo, I-20135 Milan, Italy
关键词
D O I
10.1186/1471-2105-6-244
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data ( mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems - hence the need to develop novel strategies. Results: We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction ( mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC ( Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. Conclusion: Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at http://aspic.algo.disco.unimib.it/aspic-devel/.
引用
收藏
页数:16
相关论文
共 31 条
[1]  
Bonizzoni P, 2003, LECT N BIOINFORMAT, V2812, P63
[2]   Theoretical analysis of alternative splice forms using computational methods [J].
Boué, S ;
Vingron, M ;
Kriventseva, E ;
Koch, I .
BIOINFORMATICS, 2002, 18 :S65-S73
[3]   Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus [J].
Brendel, V ;
Xing, LQ ;
Zhu, W .
BIOINFORMATICS, 2004, 20 (07) :1157-1169
[4]   EST comparison indicates 38% of human mRNAs contain possible alternative splice forms [J].
Brett, D ;
Hanke, J ;
Lehmann, G ;
Haase, S ;
Delbrück, S ;
Krueger, S ;
Reich, J ;
Bork, P .
FEBS LETTERS, 2000, 474 (01) :83-86
[5]   Analysis of canonical and non-canonical splice sites in mammalian genomes [J].
Burset, M ;
Seledtsov, IA ;
Solovyev, VV .
NUCLEIC ACIDS RESEARCH, 2000, 28 (21) :4364-4375
[6]   SpliceDB: database of canonical and non-canonical mammalian splice sites [J].
Burset, M ;
Seledtsov, IA ;
Solovyev, VV .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :255-259
[7]   Alternative splicing:: multiple control mechanisms and involvement in human disease [J].
Cáceres, JF ;
Kornblihtt, AR .
TRENDS IN GENETICS, 2002, 18 (04) :186-193
[8]   ESTGenes: Alternative splicing from ESTs in Ensembl [J].
Eyras, E ;
Caccamo, M ;
Curwen, V ;
Clamp, M .
GENOME RESEARCH, 2004, 14 (05) :976-987
[9]   The ENCODE (ENCyclopedia of DNA elements) Project [J].
Feingold, EA ;
Good, PJ ;
Guyer, MS ;
Kamholz, S ;
Liefer, L ;
Wetterstrand, K ;
Collins, FS ;
Gingeras, TR ;
Kampa, D ;
Sekinger, EA ;
Cheng, J ;
Hirsch, H ;
Ghosh, S ;
Zhu, Z ;
Pate, S ;
Piccolboni, A ;
Yang, A ;
Tammana, H ;
Bekiranov, S ;
Kapranov, P ;
Harrison, R ;
Church, G ;
Struhl, K ;
Ren, B ;
Kim, TH ;
Barrera, LO ;
Qu, C ;
Van Calcar, S ;
Luna, R ;
Glass, CK ;
Rosenfeld, MG ;
Guigo, R ;
Antonarakis, SE ;
Birney, E ;
Brent, M ;
Pachter, L ;
Reymond, A ;
Dermitzakis, ET ;
Dewey, C ;
Keefe, D ;
Denoeud, F ;
Lagarde, J ;
Ashurst, J ;
Hubbard, T ;
Wesselink, JJ ;
Castelo, R ;
Eyras, E ;
Myers, RM ;
Sidow, A ;
Batzoglou, S .
SCIENCE, 2004, 306 (5696) :636-640
[10]   A computer program for aligning a cDNA sequence with a genomic DNA sequence [J].
Florea, L ;
Hartzell, G ;
Zhang, Z ;
Rubin, GM ;
Miller, W .
GENOME RESEARCH, 1998, 8 (09) :967-974