Extraction of transcript diversity from scientific literature

被引:23
作者
Shah, PK
Jensen, LJ
Boué, S
Bork, P [1 ]
机构
[1] European Mol Biol Lab, Struct & Computat Biol Program, Heidelberg, Germany
[2] Max Delbruck Ctr Mol Med, Berlin, Germany
关键词
D O I
10.1371/journal.pcbi.0010010
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcript diversity generated by alternative splicing and associated mechanisms contributes heavily to the functional complexity of biological systems. The numerous examples of the mechanisms and functional implications of these events are scattered throughout the scientific literature. Thus, it is crucial to have a tool that can automatically extract the relevant facts and collect them in a knowledge base that can aid the interpretation of data from high-throughput methods. We have developed and applied a composite text-mining method for extracting information on transcript diversity from the entire MEDLINE database in order to create a database of genes with alternative transcripts. It contains information on tissue specificity, number of isoforms, causative mechanisms, functional implications, and experimental methods used for detection. We have mined this resource to identify 959 instances of tissue-specific splicing. Our results in combination with those from EST-based methods suggest that alternative splicing is the preferred mechanism for generating transcript diversity in the nervous system. We provide new annotations for 1,860 genes with the potential for generating transcript diversity. We assign the MeSH term "alternative splicing" to 1,536 additional abstracts in the MEDLINE database and suggest new MeSH terms for other events. We have successfully extracted information about transcript diversity and semiautomatically generated a database, LSAT, that can provide a quantitative understanding of the mechanisms behind tissue-specific gene expression. LSAT (Literature Support for Alternative Transcripts) is publicly available at http://www.bork.embi.de/LSAT/.
引用
收藏
页码:67 / 73
页数:7
相关论文
共 49 条
  • [1] Automated extraction of information in molecular biology
    Andrade, MA
    Bork, P
    [J]. FEBS LETTERS, 2000, 476 (1-2) : 12 - 17
  • [2] The ENZYME database in 2000
    Bairoch, A
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 304 - 305
  • [3] GenBank: update
    Benson, DA
    Karsch-Mizrachi, I
    Lipman, DJ
    Ostell, J
    Wheeler, DL
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D23 - D26
  • [4] An overview of ensembl
    Birney, E
    Andrews, TD
    Bevan, P
    Caccamo, M
    Chen, Y
    Clarke, L
    Coates, G
    Cuff, J
    Curwen, V
    Cutts, T
    Down, T
    Eyras, E
    Fernandez-Suarez, XM
    Gane, P
    Gibbins, B
    Gilbert, J
    Hammond, M
    Hotz, HR
    Iyer, V
    Jekosch, K
    Kahari, A
    Kasprzyk, A
    Keefe, D
    Keenan, S
    Lehvaslaiho, H
    McVicker, G
    Melsopp, C
    Meidl, P
    Mongin, E
    Pettett, R
    Potter, S
    Proctor, G
    Rae, M
    Searle, S
    Slater, G
    Smedley, D
    Smith, J
    Spooner, W
    Stabenau, A
    Stalker, J
    Storey, R
    Ureta-Vidal, A
    Woodwark, KC
    Cameron, G
    Durbin, R
    Cox, A
    Hubbard, T
    Clamp, M
    [J]. GENOME RESEARCH, 2004, 14 (05) : 925 - 928
  • [5] Ensembl 2004
    Birney, E
    Andrews, D
    Bevan, P
    Caccamo, M
    Cameron, G
    Chen, Y
    Clarke, L
    Coates, G
    Cox, T
    Cuff, J
    Curwen, V
    Cutts, T
    Down, T
    Durbin, R
    Eyras, E
    Fernandez-Suarez, XM
    Gane, P
    Gibbins, B
    Gilbert, J
    Hammond, M
    Hotz, H
    Iyer, V
    Kahari, A
    Jekosch, K
    Kasprzyk, A
    Keefe, D
    Keenan, S
    Lehvaslaiho, H
    McVicker, G
    Melsopp, C
    Meidl, P
    Mongin, E
    Pettett, R
    Potter, S
    Proctor, G
    Rae, M
    Searle, S
    Slater, G
    Smedley, D
    Smith, J
    Spooner, W
    Stabenau, A
    Stalker, J
    Storey, R
    Ureta-Vidal, A
    Woodwark, C
    Clamp, M
    Hubbard, T
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D468 - D470
  • [6] Mechanisms of alternative pre-messenger RNA splicing
    Black, DL
    [J]. ANNUAL REVIEW OF BIOCHEMISTRY, 2003, 72 : 291 - 336
  • [7] Alternative splicing and evolution
    Boue, S
    Letunic, I
    Bork, P
    [J]. BIOESSAYS, 2003, 25 (11) : 1031 - 1034
  • [8] Alternative splicing and genome complexity
    Brett, D
    Pospisil, H
    Valcárcel, J
    Reich, J
    Bork, P
    [J]. NATURE GENETICS, 2002, 30 (01) : 29 - 30
  • [9] Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
  • [10] Getting to the (c)ore of knowledge: mining biomedical literature
    de Bruijn, B
    Martin, J
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2002, 67 (1-3) : 7 - 18