EbEST: An automated tool using expressed sequence tags to delineate gene structure

被引:31
作者
Jiang, J [1 ]
Jacob, HJ [1 ]
机构
[1] Med Coll Wisconsin, Dept Physiol, Lab Genet Res, Milwaukee, WI 53226 USA
来源
GENOME RESEARCH | 1998年 / 8卷 / 03期
关键词
D O I
10.1101/gr.8.3.268
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large numbers of expressed sequence tags (ESTs) continue to fill public and private databases with partial cDNA sequences. However, using this huge amount of ESTs to facilitate gene finding in genomic sequence imposes a challenge, especially to wet-lab scientists who often have limited computing resources. In an effort to consolidate the information hidden in the vast number of ESTs into a readable and manageable format, we have developed EbEST-a program that automates the process of using ESTs to help delineate gene structure in long stretches of genomic sequence. The EbEST program consists of three Functional modules-the First module separates homologous ESTs into clusters and identifies the most informative ESTs within each cluster; the second module uses the informative ESTs to perform gapped alignment and to predict the exon-intron boundary; and the third module generates text file and graphic outputs that illustrate the orientation, exonic structure, and untranslated regions [UTRs] of putative genes in the genomic sequence being analyzed. Evaluation of EbEST with 176 human genes from the ALLSEQ set indicated that it performed in-line with several existing gene finding programs, but was more tolerant to sequencing errors. Furthermore, when EbEST was challenged with query sequences that harbor more than one gene, it suffered only a slight drop in performance, whereas the performance of the other programs evaluated decreased more. EbEST may be used as a stand-alone tool to annotate human genomic sequences with EST-derived gene elements, or can be used in conjunction with computational gene-recognition programs to increase the accuracy of gene prediction.
引用
收藏
页码:268 / 275
页数:8
相关论文
共 16 条
  • [1] Toward the development of a gene index to the human genome: An assessment of the nature of high-throughput EST sequence data
    Aaronson, JS
    Eckman, B
    Blevins, RA
    Borkowski, JA
    Myerson, J
    Imran, S
    Elliston, KO
    [J]. GENOME RESEARCH, 1996, 6 (09): : 829 - 845
  • [2] ADAMS MD, 1995, NATURE, V377, P3
  • [3] Large-scale sequencing in human chromosome 12p13: Experimental and computational gene structure determination
    AnsariLari, MA
    Shen, Y
    Muzny, DM
    Lee, W
    Gibbs, RA
    [J]. GENOME RESEARCH, 1997, 7 (03) : 268 - 280
  • [4] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [5] Evaluation of gene structure prediction programs
    Burset, M
    Guigo, R
    [J]. GENOMICS, 1996, 34 (03) : 353 - 367
  • [6] PREDICTION OF GENE STRUCTURE
    GUIGO, R
    KNUDSEN, S
    DRAKE, N
    SMITH, T
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1992, 226 (01) : 141 - 157
  • [7] Genotator: A workbench for sequence annotation
    Harris, NL
    [J]. GENOME RESEARCH, 1997, 7 (07): : 754 - 762
  • [8] Generation and analysis of 280,000 human expressed sequence tags
    Hillier, L
    Lennon, G
    Becker, M
    Bonaldo, MF
    Chiapelli, B
    Chissoe, S
    Dietrich, N
    DuBuque, T
    Favello, A
    Gish, W
    Hawkins, M
    Hultman, M
    Kucaba, T
    Lacy, M
    Le, M
    Le, N
    Mardis, E
    Moore, B
    Morris, M
    Parsons, J
    Prange, C
    Rifkin, L
    Rohlfing, T
    Schellenberg, K
    Soares, MB
    Tan, F
    ThierryMeg, J
    Trevaskis, E
    Underwood, K
    Wohldman, P
    Waterston, R
    Wilson, R
    Marra, M
    [J]. GENOME RESEARCH, 1996, 6 (09) : 807 - 828
  • [9] Mott R, 1997, COMPUT APPL BIOSCI, V13, P477
  • [10] IDENTIFICATION OF COMMON MOLECULAR SUBSEQUENCES
    SMITH, TF
    WATERMAN, MS
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1981, 147 (01) : 195 - 197