SpliceDB: database of canonical and non-canonical mammalian splice sites

被引:184
作者
Burset, M
Seledtsov, IA
Solovyev, VV
机构
[1] Sanger Ctr, Cambridge CB10 1SA, England
[2] Softberry Inc, White Plains, NY 10604 USA
关键词
D O I
10.1093/nar/29.1.255
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT-AG junctions (22 199 entries) and 0.56% have non-canonical GC-AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs, EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG), Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors corrected to AT-AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web sewer of the Sanger Centre: http:// genomic.sanger.ac.uk/spldb/SpliceDB.html and at http://www.softberry.com/spldb/SpliceDB.html.
引用
收藏
页码:255 / 259
页数:5
相关论文
共 7 条
[1]   GenBank [J].
Benson, DA ;
Boguski, MS ;
Lipman, DJ ;
Ostell, J ;
Ouellette, BFF ;
Rapp, BA ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :12-17
[2]   Analysis of canonical and non-canonical splice sites in mammalian genomes [J].
Burset, M ;
Seledtsov, IA ;
Solovyev, VV .
NUCLEIC ACIDS RESEARCH, 2000, 28 (21) :4364-4375
[3]  
JACKSON IJ, 1991, NUCLEIC ACIDS RES, V19, P3795
[4]   HUMAN PRE-MESSENGER-RNA SPLICING SIGNALS [J].
PENOTTI, FE .
JOURNAL OF THEORETICAL BIOLOGY, 1991, 150 (03) :385-420
[5]   Ab initio gene finding in Drosophila genomic DNA [J].
Salamov, AA ;
Solovyev, VV .
GENOME RESEARCH, 2000, 10 (04) :516-522
[6]   INFOGENE: a database of known gene structures and predicted genes and proteins in sequences of genome sequencing projects [J].
Solovyev, VV ;
Salamov, AA .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :248-250