A comparison of expressed sequence tags (ESTs) to human genomic sequences

被引:103
作者
Wolfsberg, TG [1 ]
Landsman, D [1 ]
机构
[1] NIH,NATL LIB MED,NATL CTR BIOTECHNOL INFORMAT,BETHESDA,MD 20894
关键词
D O I
10.1093/nar/25.8.1626
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Expressed Sequence Tag (EST) division of GenBank, dbEST, is a large repository of the data being generated by human genome sequencing centers. ESTs are short, single pass cDNA sequences generated from randomly selected library clones. The similar to 415 000 human ESTs represent a valuable, law priced, and easily accessible biological reagent. As many ESTs are derived from yet uncharacterized genes, dbEST is a prime starting point for the identification of novel mRNAs. Conversely other genes are represented by hundreds of ESTs, a redundancy which may provide data about rare mRNA isoforms. Here we present an analysis of >1000 ESTs generated by the WashU-Merck EST project. These ESTs were collected by querying dbEST with the genomic sequences of 15 human genes. When we aligned the matching ESTs to the genomic sequences, we found that in one gene, 73% of the ESTs which derive from spliced or partially spliced transcripts either contain intron sequences or are spliced at previously unreported sites; other genes have lower percentages of such ESTs, and some have none. This finding suggests that ESTs could provide researchers with novel information about alternative splicing in certain genes. In a related analysis of pairs of ESTs which are reported to derive from a single gene, we found that as many as 26% of the pairs do not BOTH align with the sequence of the same gene. We suspect that some of these unusual ESTs result from artifacts in EST generation, and caution researchers that they may find such clones while analyzing sequences in dbEST.
引用
收藏
页码:1626 / 1632
页数:7
相关论文
共 20 条
  • [1] Toward the development of a gene index to the human genome: An assessment of the nature of high-throughput EST sequence data
    Aaronson, JS
    Eckman, B
    Blevins, RA
    Borkowski, JA
    Myerson, J
    Imran, S
    Elliston, KO
    [J]. GENOME RESEARCH, 1996, 6 (09): : 829 - 845
  • [2] COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT
    ADAMS, MD
    KELLEY, JM
    GOCAYNE, JD
    DUBNICK, M
    POLYMEROPOULOS, MH
    XIAO, H
    MERRIL, CR
    WU, A
    OLDE, B
    MORENO, RF
    KERLAVAGE, AR
    MCCOMBIE, WR
    VENTER, JC
    [J]. SCIENCE, 1991, 252 (5013) : 1651 - 1656
  • [3] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [4] Virtually sequenced: The next genomic generation
    Bains, W
    [J]. NATURE BIOTECHNOLOGY, 1996, 14 (06) : 711 - 713
  • [5] DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS
    BOGUSKI, MS
    LOWE, TMJ
    TOLSTOSHEV, CM
    [J]. NATURE GENETICS, 1993, 4 (04) : 332 - 333
  • [6] GENE DISCOVERY IN DBEST
    BOGUSKI, MS
    TOLSTOSHEV, CM
    BASSETT, DE
    [J]. SCIENCE, 1994, 265 (5181) : 1993 - 1994
  • [7] THE TURNING-POINT IN GENOME RESEARCH
    BOGUSKI, MS
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1995, 20 (08) : 295 - 296
  • [8] Normalization and subtraction: Two approaches to facilitate gene discovery
    Bonaldo, MDF
    Lennon, G
    Soares, MB
    [J]. GENOME RESEARCH, 1996, 6 (09): : 791 - 806
  • [9] CHAO KM, 1995, COMPUT APPL BIOSCI, V11, P147
  • [10] INFORMATION ENHANCEMENT METHODS FOR LARGE-SCALE SEQUENCE-ANALYSIS
    CLAVERIE, JM
    STATES, DJ
    [J]. COMPUTERS & CHEMISTRY, 1993, 17 (02): : 191 - 201