EbEST: An automated tool using expressed sequence tags to delineate gene structure

被引:31
作者
Jiang, J [1 ]
Jacob, HJ [1 ]
机构
[1] Med Coll Wisconsin, Dept Physiol, Lab Genet Res, Milwaukee, WI 53226 USA
来源
GENOME RESEARCH | 1998年 / 8卷 / 03期
关键词
D O I
10.1101/gr.8.3.268
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large numbers of expressed sequence tags (ESTs) continue to fill public and private databases with partial cDNA sequences. However, using this huge amount of ESTs to facilitate gene finding in genomic sequence imposes a challenge, especially to wet-lab scientists who often have limited computing resources. In an effort to consolidate the information hidden in the vast number of ESTs into a readable and manageable format, we have developed EbEST-a program that automates the process of using ESTs to help delineate gene structure in long stretches of genomic sequence. The EbEST program consists of three Functional modules-the First module separates homologous ESTs into clusters and identifies the most informative ESTs within each cluster; the second module uses the informative ESTs to perform gapped alignment and to predict the exon-intron boundary; and the third module generates text file and graphic outputs that illustrate the orientation, exonic structure, and untranslated regions [UTRs] of putative genes in the genomic sequence being analyzed. Evaluation of EbEST with 176 human genes from the ALLSEQ set indicated that it performed in-line with several existing gene finding programs, but was more tolerant to sequencing errors. Furthermore, when EbEST was challenged with query sequences that harbor more than one gene, it suffered only a slight drop in performance, whereas the performance of the other programs evaluated decreased more. EbEST may be used as a stand-alone tool to annotate human genomic sequences with EST-derived gene elements, or can be used in conjunction with computational gene-recognition programs to increase the accuracy of gene prediction.
引用
收藏
页码:268 / 275
页数:8
相关论文
共 16 条
  • [11] Complete genomic sequence and analysis of 117 kb of human DNA containing the gene BRCA1
    Smith, TM
    Lee, MK
    Szabo, CI
    Jerome, N
    McEuen, M
    Taylor, M
    Hood, L
    King, MC
    [J]. GENOME RESEARCH, 1996, 6 (11) : 1029 - 1049
  • [12] IDENTIFICATION OF PROTEIN-CODING REGIONS IN GENOMIC DNA
    SNYDER, EE
    STORMO, GD
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1995, 248 (01) : 1 - 18
  • [13] PREDICTING INTERNAL EXONS BY OLIGONUCLEOTIDE COMPOSITION AND DISCRIMINANT-ANALYSIS OF SPLICEABLE OPEN READING FRAMES
    SOLOVYEV, VV
    SALAMOV, AA
    LAWRENCE, CB
    [J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (24) : 5156 - 5163
  • [14] Uberbacher EC, 1996, METHOD ENZYMOL, V266, P259
  • [15] A comparison of expressed sequence tags (ESTs) to human genomic sequences
    Wolfsberg, TG
    Landsman, D
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (08) : 1626 - 1632
  • [16] PowerBLAST: A new network BLAST application for interactive or automated sequence analysis and annotation
    Zhang, JH
    Madden, TL
    [J]. GENOME RESEARCH, 1997, 7 (06): : 649 - 656