Mining functional information associated with expression arrays

被引:50
作者
Blaschke C. [1 ]
Oliveros J.C. [1 ]
Valencia A. [1 ]
机构
[1] Protein Design Group, National Center for Biotechnology, CNB-CSIC, Madrid 28049, Cantoblanco
关键词
DNA chips; Expression arrays; Protein function; Text analysis information extraction;
D O I
10.1007/s101420000036
中图分类号
学科分类号
摘要
Deciphering the networks of interactions between molecules in biological systems has gained momentum with the monitoring of gene expression patterns at the genomic scale. Expression array experiments provide vast amounts of experimental data about these networks, the analysis of which requires new computational methods. In particular, issues related to the extraction of biological information are key for the end users. We propose here a strategy, implemented in a system called GEISHA (gene expression information system for human analysis) and able to detect biological terms significantly associated to different gene expression clusters by mining collections of Medline abstracts. GEISHA is based on a comparison of the frequency of abstracts linked to different gene clusters and containing a given term. Interpretation by the end user of the biological meaning of the terms is facilitated by embedding them in the corresponding significant sentences and abstracts and by establishing relations with other, equally significant terms. The information provided by GEISHA for the available yeast expression data compares favorably with the functional annotations provided by human experts, demonstrating the potential value of GEISHA as an assistant for the analysis of expression array experiments.
引用
收藏
页码:256 / 268
页数:12
相关论文
共 47 条
  • [1] Alizadeh A.A., Eisen M.B., Et al., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, pp. 503-511, (2000)
  • [2] Andrade M.A., Valencia A., Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts, pp. 25-32, (1997)
  • [3] Andrade M.A., Valencia A., Automatic extraction of keywords from scientific text: Application to the knowledge domain of protein families, Bioinformatics, 14, pp. 600-607, (1998)
  • [4] Bairoch A., Apweiler R., The SWISS-PROT protein sequence data bank and its supplement TREMBL, Nucleic Acids Res, 25, pp. 31-36, (1997)
  • [5] Bassett D.E., Eisen M.B., Boguski M.S., Gene expression informatics - It's all in your mine, Nat Genet, 21, pp. 51-55, (1999)
  • [6] Blaschke C., Andrade A.M., Ouzounis C., Valencia A., Automatic extraction of biological information from scientific text: Protein-protein interactions, pp. 60-67, (1999)
  • [7] Bookstein A., Kraft D., Operations research applied to document indexing and retrieval decisions, J Assoc Comput Mach, 24, pp. 418-427, (1977)
  • [8] Bookstein A., Klein S.T., Raita T., Clumping properties of content-bearing words, J Am Soc Inf Sci, 49, pp. 102-114, (1998)
  • [9] Carr D.B., Somogyi R., Michaels G., Templates for looking at gene expression clustering, Stat Comput Graphics Newsl, 8, pp. 20-29, (1997)
  • [10] Cho R.J., Campbell M.J., Winzeler E.A., Steinmetz L., Conway A., Wodicka L., Wolfsberg T.G., Gabrielian A.E., Landsman D., Lockhart D.J., Davis R.W., A genome-wide transcriptional analysis of the mitotic cell cycle, Mol Cell, 2, pp. 65-73, (1998)