Investigation of protein functions through data-mining on integrated human transcriptome database, H-Invitational database (H-InvDB)

被引:17
作者
Yamasaki, C
Koyanagi, KO
Fujii, Y
Itoh, T
Barrero, R
Tamura, T
Yamaguchi-Kabata, Y
Tanino, M
Takeda, J
Fukuchi, S
Miyazaki, S
Nomura, N
Sugano, S
Imanishi, T
Gojobori, T
机构
[1] Natl Inst Adv Sci & Technol, Biol Informat Res Ctr, Koto Ku, Tokyo 1350064, Japan
[2] Hokkaido Univ, Sapporo, Hokkaido 060, Japan
[3] Japan Biol Informat Consortium, Japan Biol Informat Res Ctr, Tokyo, Japan
[4] Natl Inst Agrobiol Sci, Ibaraki, Japan
[5] Natl Inst Genet, Ctr Informat Biol, Shizuoka, Japan
[6] Natl Inst Genet, DNA Data Bank Japan, Shizuoka, Japan
[7] BITS Co Ltd, Shizuoka, Japan
[8] Tokyo Univ Sci, Chiba, Japan
[9] Univ Tokyo, Tokyo, Japan
[10] Grad Univ Adv Studies, Dept Genet, Shizuoka, Japan
关键词
protein function; human transcriptome; integrated annotation database; data-mining;
D O I
10.1016/j.gene.2005.05.036
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
H-Invitational Database (H-InvDB; http://www.h-invitational.jp/) is a human transcriptome database, containing integrative annotation of 41,118 full-length cDNA clones originated from 21,037 loci. H-InvDB is a product of the H-Invitational project, an international collaboration to systematically and functionally validate human genes by analysis of a unique set of high quality full-length cDNA clones using automatic annotation and human curation under unified criteria. Here, 19,574 proteins encoded by these cDNAs were classified into 11,709 function-known and 7865 function-unknown hypothetical proteins by similarity with protein databases and motif prediction (InterProScan). The proportion of "hypothetical proteins" in H-InvDB was as high as 40.4%. In this study, we thus conducted data-mining in H-InvDB with the aim of assigning advanced functional annotations to those hypothetical proteins. First, by data-mining in the H-InvDB version of GTOP. we identified 337 SCOP domains within 7865 H-Inv hypothetical proteins. Second, by data-mining of predicted subcellular localization by SOSUI and TMHMM in H-InvDB, we found 1032 transmembrane proteins within H-Inv hypothetical proteins. These results clearly demonstrate that structural prediction is effective for functional annotation of proteins with unknown functions. All the data in H-InvDB are shown in two main views, the cDNA view and the Locus view, and five auxiliary databases with web-based viewers; DiseaseInfo Viewer, H-ANGEL, Clustering Viewer, G-integra and TOPO Viewer; the data also are provided as flat files and XML files. The data consists of descriptions of their gene structures, novel alternative splicing isoforms, functional RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs in relation with orphan diseases, gene expression profiling, and comparisons with mouse full-length cDNAs in the context of molecular evolution. This unique integrative platform for conducting in silico data-mining represents a substantial contribution to resources required for the exploration of human biology and pathology. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:99 / 107
页数:9
相关论文
共 34 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] ESTABLISHING A HUMAN TRANSCRIPT MAP
    BOGUSKI, MS
    SCHULER, GD
    [J]. NATURE GENETICS, 1995, 10 (04) : 369 - 371
  • [3] DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES
    BORODOVSKY, M
    MCININCH, JD
    KOONIN, EV
    RUDD, KE
    MEDIGUE, C
    DANCHIN, A
    [J]. NUCLEIC ACIDS RESEARCH, 1995, 23 (17) : 3554 - 3562
  • [4] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [5] GATA1 mutations in Down syndrome:: Implications for biology and diagnosis of children with transient myeloproliferative disorder and acute megakaryoblastic leukemia
    Crispino, JD
    [J]. PEDIATRIC BLOOD & CANCER, 2005, 44 (01) : 40 - 44
  • [6] Geneticists lay foundations for human transcriptome database
    Cyranoski, D
    [J]. NATURE, 2002, 419 (6902) : 3 - 4
  • [7] Computational identification of promoters and first exons in the human genome
    Davuluri, RV
    Grosse, I
    Zhang, MQ
    [J]. NATURE GENETICS, 2001, 29 (04) : 412 - 417
  • [8] Predicting subcellular localization of proteins based on their N-terminal amino acid sequence
    Emanuelsson, O
    Nielsen, H
    Brunak, S
    von Heijne, G
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) : 1005 - 1016
  • [9] MODULAR STRUCTURAL UNITS, EXONS, AND FUNCTION IN CHICKEN LYSOZYME
    GO, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1983, 80 (07): : 1964 - 1968
  • [10] Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders
    Hamosh, A
    Scott, AF
    Amberger, J
    Bocchini, C
    Valle, D
    McKusick, VA
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 52 - 55