Alternative gene form discovery and candidate gene selection from gene indexing projects

被引:80
作者
Burke, J
Wang, H
Hide, W
Davison, DB
机构
[1] Univ Houston, Dept Biochem & Biophys Sci, Houston, TX 77004 USA
[2] Univ Western Cape, S African Natl Bioinformat Inst, ZA-7535 Bellville, South Africa
[3] Baylor Coll Med, Dept Cell Biol, Houston, TX 77030 USA
[4] Univ Houston, Dept Comp Sci, Houston, TX 77204 USA
来源
GENOME RESEARCH | 1998年 / 8卷 / 03期
关键词
D O I
10.1101/gr.8.3.276
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Several efforts are under way to partition single-read expressed sequence tag (EST), as well as full-length transcript data, into large-scale gene indices, where transcripts are in common index classes if and only if they share a common progenitor gene. Accurate gene indexing facilitates gene expression studies, as well as inexpensive and early gene sequence discovery through assembly of ESTs that are derived From genes that have not been sequenced by classical methods. We extend, correct, and enhance the information obtained from index groups by splitting index classes into subclasses based on sequence dissimilarity (diversity). Two applications of this are highlighted in this report. First it is shown that our method can ameliorate the damage that artifacts, such as chimerism, inflict on index integrity. Additionally, we demonstrate how the organization imposed by an effective subpartition can greatly increase the sensitivity of gene expression studies by accounting for the existence and tissue-or pathology-specific regulation of novel gene isoforms and polymorphisms. We apply our subpartitioning treatment to the UniGene gene indexing project to measure a marked increase in information quality and abundance (in terms of assembly length and insertion/deletion error) after treatment and demonstrate cases where new levels of information concerning differential expression of alternate gene forms, such as regulated alternative splicing, are discovered.
引用
收藏
页码:276 / 290
页数:15
相关论文
共 41 条
  • [1] Toward the development of a gene index to the human genome: An assessment of the nature of high-throughput EST sequence data
    Aaronson, JS
    Eckman, B
    Blevins, RA
    Borkowski, JA
    Myerson, J
    Imran, S
    Elliston, KO
    [J]. GENOME RESEARCH, 1996, 6 (09): : 829 - 845
  • [2] SEQUENCE IDENTIFICATION OF 2,375 HUMAN BRAIN GENES
    ADAMS, MD
    DUBNICK, M
    KERLAVAGE, AR
    MORENO, R
    KELLEY, JM
    UTTERBACK, TR
    NAGLE, JW
    FIELDS, C
    VENTER, JC
    [J]. NATURE, 1992, 355 (6361) : 632 - 634
  • [3] ADAMS MD, 1995, NATURE, V377, P3
  • [4] COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT
    ADAMS, MD
    KELLEY, JM
    GOCAYNE, JD
    DUBNICK, M
    POLYMEROPOULOS, MH
    XIAO, H
    MERRIL, CR
    WU, A
    OLDE, B
    MORENO, RF
    KERLAVAGE, AR
    MCCOMBIE, WR
    VENTER, JC
    [J]. SCIENCE, 1991, 252 (5013) : 1651 - 1656
  • [5] Multiple domain protein diagnostic patterns
    Adams, RM
    Das, S
    Smith, TF
    [J]. PROTEIN SCIENCE, 1996, 5 (07) : 1240 - 1249
  • [6] AKASHI M, 1994, BLOOD, V83, P3182
  • [7] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [8] [Anonymous], 1990, METHOD ENZYMOL
  • [9] 2 ACETYL-COA ACETYLTRANSFERASE GENES LOCATED IN THE T-COMPLEX REGION OF MOUSE CHROMOSOME-17 PARTIALLY OVERLAP THE TCP-1 AND TCP-1X GENES
    ASHWORTH, A
    [J]. GENOMICS, 1993, 18 (02) : 195 - 198
  • [10] The significance of digital gene expression profiles
    Audic, S
    Claverie, JM
    [J]. GENOME RESEARCH, 1997, 7 (10): : 986 - 995