ECgene: Genome-based EST clustering and gene modeling for alternative splicing

被引:66
作者
Kim, N
Shin, S
Lee, S [1 ]
机构
[1] Ewha Womans Univ, Div Mol Life Sci, Seoul 120750, South Korea
[2] Seoul Natl Univ, Sch Chem, Seoul 151747, South Korea
关键词
D O I
10.1101/gr.3030405
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With the availability of the human genome map and fast algorithms for sequence alignment, genome-based EST clustering became a viable method for gene modeling. We developed a novel gene-modeling method, ECgene (Gene modeling by EST Clustering), which combines genome-based EST clustering and the transcript assembly procedure in a coherent and consistent fashion. Specifically, ECgene takes alternative splicing events into consideration. The position of splice sites (i.e., exon-intron boundaries) in the genome map is utilized as the critical information in the whole procedure. Sequences that share any splice sites are grouped together to define an EST cluster in a manner similar to that of the genome-based version of the UniGene algorithm. Transcript assembly is achieved using graph theory that represents the exon connectivity in each cluster as a directed acyclic graph (DAG). Distinct paths along exons correspond to possible gene models encompassing all alternative splicing events. EST sequences in each cluster are subclustered further according to the compatibility with gene structure of each splice variant, and they can be regarded as clone evidence for the corresponding isoform. The reliability of each isoform is assessed from the nature of cluster members and from the minimum number of clones required to reconstruct all exons in the transcript.
引用
收藏
页码:566 / 576
页数:11
相关论文
共 41 条
[31]   The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species [J].
Quackenbush, J ;
Cho, J ;
Lee, D ;
Liang, F ;
Holt, I ;
Karamycheva, S ;
Parvizi, B ;
Pertea, G ;
Sultana, R ;
White, J .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :159-164
[32]   Ab initio gene finding in Drosophila genomic DNA [J].
Salamov, AA ;
Solovyev, VV .
GENOME RESEARCH, 2000, 10 (04) :516-522
[33]   A gene map of the human genome [J].
Schuler, GD ;
Boguski, MS ;
Stewart, EA ;
Stein, LD ;
Gyapay, G ;
Rice, K ;
White, RE ;
RodriguezTome, P ;
Aggarwal, A ;
Bajorek, E ;
Bentolila, S ;
Birren, BB ;
Butler, A ;
Castle, AB ;
Chiannilkulchai, N ;
Chu, A ;
Clee, C ;
Cowles, S ;
Day, PJR ;
Dibling, T ;
Drouot, N ;
Dunham, I ;
Duprat, S ;
East, C ;
Edwards, C ;
Fan, JB ;
Fang, N ;
Fizames, C ;
Garrett, C ;
Green, L ;
Hadley, D ;
Harris, M ;
Harrison, P ;
Brady, S ;
Hicks, A ;
Holloway, E ;
Hui, L ;
Hussain, S ;
LouisDitSully, C ;
Ma, J ;
MacGilvery, A ;
Mader, C ;
Maratukulam, A ;
Matise, TC ;
McKusick, KB ;
Morissette, J ;
Mungall, A ;
Muselet, D ;
Nusbaum, HC ;
Page, DC .
SCIENCE, 1996, 274 (5287) :540-546
[34]   A novel algorithm for computational identification of contaminated EST libraries [J].
Sorek, R ;
Safer, HM .
NUCLEIC ACIDS RESEARCH, 2003, 31 (03) :1067-1074
[35]  
Sugnet CW, 2003, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, P66
[36]   Detection of polyadenylation signals in human DNA sequences [J].
Tabaska, JE ;
Zhang, MQ .
GENE, 1999, 231 (1-2) :77-86
[37]   The multiassembly problem: Reconstructing multiple transcript isoforms from EST fragment mixtures [J].
Xing, Y ;
Resch, A ;
Lee, C .
GENOME RESEARCH, 2004, 14 (03) :426-441
[38]   Discovery of novel splice forms and functional analysis of cancer-specific alternative splicing in human expressed sequences [J].
Xu, Q ;
Lee, C .
NUCLEIC ACIDS RESEARCH, 2003, 31 (19) :5635-5643
[39]   Genome-wide detection of tissue-specific alternative splicing in the human transcriptome [J].
Xu, Q ;
Modrek, B ;
Lee, C .
NUCLEIC ACIDS RESEARCH, 2002, 30 (17) :3754-3766
[40]   Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome [J].
Zavolan, M ;
van Nimwegen, E ;
Gaasterland, T .
GENOME RESEARCH, 2002, 12 (09) :1377-1385