Gene structure prediction and alternative splicing analysis using genomically aligned ESTs

被引:289
作者
Kan, ZY
Rouchka, EC
Gish, WR
States, DJ [1 ]
机构
[1] Washington Univ, Ctr Computat Biol, St Louis, MO 63110 USA
[2] Washington Univ, Dept Genet, St Louis, MO 63110 USA
关键词
D O I
10.1101/gr.155001
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With the availability of a nearly complete sequence of the human genome, aligning expressed sequence tags (EST) to the genomic sequence has become a practical and powerful strategy for gene prediction. Elucidating gene structure is a complex problem requiring the identification of splice junctions, gene boundaries, and alternative splicing variants. We have developed a software tool, Transcript Assembly Program (TAP), to delineate gene structures using genomically aligned EST sequences. TAP assembles the joint gene structure of the entire genomic region from individual splice junction pairs, using a novel algorithm that uses the EST-encoded connectivity and redundancy information to sort out the complex alternative splicing patterns. A method called polyadenylation site scan (PASS) has been developed to detect poly-A sites in the genome. TAP uses these predictions to identify gene boundaries by segmenting the joint gene structure at polyadenylated terminal exons. Reconstructing 1007 known transcripts, TAP scored a sensitivity (Sn) of 60% and a specificity (Sp) of 92% at the exon level. The gene boundary identification process was found to be accurate 78% of the time. TAP also reports alternative splicing patterns in EST alignments. An analysis of alternative splicing in 1124 genic regions suggested that more than half of human genes undergo alternative splicing. Surprisingly, we saw an absolute majority of the detected alternative splicing events affect the coding region. Furthermore, the evolutionary conservation of alternative splicing between human and mouse was analyzed using an EST-based approach. (See http://stl.wustl.edu/-zkan/TAP/).
引用
收藏
页码:889 / 900
页数:12
相关论文
共 30 条
  • [11] A computer program for aligning a cDNA sequence with a genomic DNA sequence
    Florea, L
    Hartzell, G
    Zhang, Z
    Rubin, GM
    Miller, W
    [J]. GENOME RESEARCH, 1998, 8 (09) : 967 - 974
  • [12] Alternate polyadenylation in human mRNAs: A large-scale analysis by EST clustering
    Gautheret, D
    Poirot, O
    Lopez, F
    Audic, S
    Claverie, JM
    [J]. GENOME RESEARCH, 1998, 8 (05): : 524 - 530
  • [13] GISH W, 1996, WU BLAST 2 0
  • [14] SPLICING AND THE FORMATION OF STABLE RNA
    HAMER, DH
    LEDER, P
    [J]. CELL, 1979, 18 (04) : 1299 - 1302
  • [15] Generation and analysis of 280,000 human expressed sequence tags
    Hillier, L
    Lennon, G
    Becker, M
    Bonaldo, MF
    Chiapelli, B
    Chissoe, S
    Dietrich, N
    DuBuque, T
    Favello, A
    Gish, W
    Hawkins, M
    Hultman, M
    Kucaba, T
    Lacy, M
    Le, M
    Le, N
    Mardis, E
    Moore, B
    Morris, M
    Parsons, J
    Prange, C
    Rifkin, L
    Rohlfing, T
    Schellenberg, K
    Soares, MB
    Tan, F
    ThierryMeg, J
    Trevaskis, E
    Underwood, K
    Wohldman, P
    Waterston, R
    Wilson, R
    Marra, M
    [J]. GENOME RESEARCH, 1996, 6 (09) : 807 - 828
  • [16] EbEST: An automated tool using expressed sequence tags to delineate gene structure
    Jiang, J
    Jacob, HJ
    [J]. GENOME RESEARCH, 1998, 8 (03): : 268 - 275
  • [17] KAN Z, 2000, INTELL SYST MOL BIOL, V8, P216
  • [18] KULP D, 1996, ISMB, V4, P134
  • [19] Gene Index analysis of the human genome estimates approximately 120,000 genes
    Liang, F
    Holt, I
    Pertea, G
    Karamycheva, S
    Salzberg, SL
    Quackenbush, J
    [J]. NATURE GENETICS, 2000, 25 (02) : 239 - 240
  • [20] Alternative splicing of pre-mRNA: Developmental consequences and mechanisms of regulation
    Lopez, AJ
    [J]. ANNUAL REVIEW OF GENETICS, 1998, 32 : 279 - 305