Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping

被引:81
作者
Zhu, W
Schlueter, SD
Brendel, V [1 ]
机构
[1] Iowa State Univ, Dept Zool & Genet, Ames, IA 50011 USA
[2] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
关键词
D O I
10.1104/pp.102.018101
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating sequence similarity and splice site scoring. About 96% of the available ESTs could be properly aligned with a genomic locus, with the remaining ESTs deriving from organelle genomes and non-Arabidopsis sources or displaying insufficient sequence quality for alignment. The mapping provides verified sets of EST clusters for evaluation of EST clustering programs. Analysis of the spliced alignments suggests corrections to current gene structure annotation and provides examples of alternative and non-canonical pre-mRNA splicing. All results of this study were parsed into a database and are accessible via a flexible Web interface at http://www.plantgdb.org/AtGDB/.
引用
收藏
页码:469 / 484
页数:16
相关论文
共 42 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], ISMB
[3]   Combining evidence using p-values: application to sequence homology searches [J].
Bailey, TL ;
Gribskov, M .
BIOINFORMATICS, 1998, 14 (01) :48-54
[4]   EXON RECOGNITION IN VERTEBRATE SPLICING [J].
BERGET, SM .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1995, 270 (06) :2411-2414
[5]   Protein diversity from alternative splicing: A challenge for bioinformatics and post-genome biology [J].
Black, DL .
CELL, 2000, 103 (03) :367-370
[6]   Comparison of gene indexing databases [J].
Bouck, J ;
Yu, W ;
Gibbs, R ;
Worley, K .
TRENDS IN GENETICS, 1999, 15 (04) :159-162
[7]   Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA [J].
Brendel, V ;
Kleffe, J .
NUCLEIC ACIDS RESEARCH, 1998, 26 (20) :4748-4757
[8]   Computational modeling of gene structure in Arabidopsis thaliana [J].
Brendel, V ;
Zhu, W .
PLANT MOLECULAR BIOLOGY, 2002, 48 (1-2) :49-58
[9]   Alternative splicing and genome complexity [J].
Brett, D ;
Pospisil, H ;
Valcárcel, J ;
Reich, J ;
Bork, P .
NATURE GENETICS, 2002, 30 (01) :29-30
[10]   Arabidopsis consensus intron sequences [J].
Brown, JWS ;
Smith, P ;
Simpson, CG .
PLANT MOLECULAR BIOLOGY, 1996, 32 (03) :531-535