A comparative method for identification of gene structures and alternatively spliced variants

被引:10
作者
Chuang, TJ [1 ]
Chen, FC
Chou, MY
机构
[1] Acad Sinica, Genom Res Ctr, Taipei 11529, Taiwan
[2] Acad Sinica, Inst Informat Sci, Taipei 11529, Taiwan
关键词
D O I
10.1093/bioinformatics/bth368
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Alternative splicing (AS) serves as a mechanism to create diversity among functional proteins. Increasing evidence indicates that a large portion of genes have AS forms. Hence AS variants should be considered while analyzing gene structures. Results: A new cross-species gene identification and AS analysis system, PSEP, has been developed. The system is based on expressed sequence tag (EST)-to-genome and genome-to-genome comparisons and is implemented in two steps: sequence alignment and a series of post-alignment processes, including progressive signal extraction and patching. For gene identification, these post-alignment processes serve as noise filters and enable PSEP to eliminate similar to88% of potential overprediction. The overall accuracy of PSEP is better than or comparable to that of other well-known cross-species gene prediction programs, including the ROSETTA program, TWINSCAN, SGP-1/-2 and SLAM, when tested on three benchmark datasets (the ELN gene region, the HoxA cluster and the ROSETTA set). In addition, 76.2 and 76.0% of multiple-exon genes in the ROSETTA dataset and human chromosome 20, respectively, are found to have AS forms. Approximately 23% of the 210 elementary alternatives identified in the ROSETTA dataset are not conserved between the human and mouse genomes, and none of the 210 transcripts is found in the RefSeq annotation. With its dual functions in cross-species conserved sequence analysis and AS analysis, PSEP is highly suitable for studying the evolution of AS patterns and for finding unidentified gene expression features.
引用
收藏
页码:3064 / 3079
页数:16
相关论文
共 53 条
[1]   SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model [J].
Alexandersson, M ;
Cawley, S ;
Pachter, L .
GENOME RESEARCH, 2003, 13 (03) :496-502
[2]  
[Anonymous], 13 ANN COLD SPRING H
[3]  
Bafna V, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P3
[4]   GAIA: Framework annotation of genomic sequence [J].
Bailey, LC ;
Fischer, S ;
Schug, J ;
Crabtree, J ;
Gibson, M ;
Overton, GC .
GENOME RESEARCH, 1998, 8 (03) :234-250
[5]   Human and mouse gene structure: Comparative analysis and application to exon prediction [J].
Batzoglou, S ;
Pachter, L ;
Mesirov, JP ;
Berger, B ;
Lander, ES .
GENOME RESEARCH, 2000, 10 (07) :950-958
[6]   Using GeneWise in the Drosophila annotation experiment [J].
Birney, E ;
Durbin, R .
GENOME RESEARCH, 2000, 10 (04) :547-548
[7]   EST comparison indicates 38% of human mRNAs contain possible alternative splice forms [J].
Brett, D ;
Hanke, J ;
Lehmann, G ;
Haase, S ;
Delbrück, S ;
Krueger, S ;
Reich, J ;
Bork, P .
FEBS LETTERS, 2000, 474 (01) :83-86
[8]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[9]   Finding the genes in genomic DNA [J].
Burge, CB ;
Karlin, S .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :346-354
[10]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367