The quest to deduce protein function from sequence: the role of pattern databases

被引:30
作者
Attwood, TK [1 ]
机构
[1] Univ Manchester, Sch Biol Sci, Manchester M13 9PT, Lancs, England
关键词
bioinformatics; similarity search; sequence alignment; pattern recognition; function annotation;
D O I
10.1016/S1357-2725(99)00106-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In the wake of the numerous now-fruitful genome projects, we have witnessed a 'tsunami' of sequence data and with it the birth of the field of bioinformatics. Bioinformatics involves the application of information technology to the management and analysis of biological data. For many of us, this means that databases and their search tools have become an essential part of the research environment. However, the rate of sequence generation and the haphazard proliferation of databases have made it difficult to keep pace with developments, even for the cognoscenti. Moreover, increasing amounts of sequence information do not necessarily equate with an increase in knowledge, and in the panic to automate the route From raw data to biological insight, we may be generating and propagating innumerable errors in our precious databases. In the genome era upon us, researchers want rapid, easy-to-use, reliable tools for functional characterisation of newly determined sequences. For the pharmaceutical industry in particular, the Pandora's box of bioinformatics harbours an information-rich nugget, Pipe with potential drug targets and possible new avenues for the development of therapeutic agents. This review outlines the current status of the major pattern databases now used routinely in the analysis of protein sequences. The review is divided into three main sections. In the first, commonly used terms are defined and the methods behind the databases are briefly described; in the second, the structure and content of the principal pattern databases are discussed; and in the final part, several alignment databases, which are frequently confused with pattern databases, are mentioned. For the new-comer, the array of resources, the range of methods behind them and the different tools required to search them can be confusing. The review therefore also briefly mentions a current international endeavour to integrate the diverse databases, which effort should facilitate sequence analysis in the future. This is particularly important for target-discovery programmes, where the challenge is to rationalise the enormous numbers of potential targets generated by sequence database searches. This problem may be addressed, at least in part, by reducing search outputs to the more focused and manageable subsets suggested by searches of integrated groups of family-specific pattern databases. (C) 2000 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:139 / 155
页数:17
相关论文
共 40 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 1978, Atlas of protein sequence and structure
  • [3] Attwood T.K., 1999, INTRO BIOINFORMATICS, V1st
  • [4] PRINTS prepares for the new millennium
    Attwood, TK
    Flower, DR
    Lewis, AP
    Mabey, JE
    Morgan, SR
    Scordis, P
    Selley, JN
    Wright, W
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 220 - 225
  • [5] ATTWOOD TK, 1997, OXFORD DICT BIOCH MO, P715
  • [6] The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 49 - 54
  • [7] The PIR-International Protein Sequence Database
    Barker, WC
    Garavelli, JS
    McGarvey, PB
    Marzec, CR
    Orcutt, BC
    Srinivasarao, GY
    Yeh, LSL
    Ledley, RS
    Mewes, HW
    Pfeiffer, F
    Tsugita, A
    Wu, C
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 39 - 43
  • [8] Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins
    Bateman, A
    Birney, E
    Durbin, R
    Eddy, SR
    Finn, RD
    Sonnhammer, ELL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 260 - 262
  • [9] PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES
    BERNSTEIN, FC
    KOETZLE, TF
    WILLIAMS, GJB
    MEYER, EF
    BRICE, MD
    RODGERS, JR
    KENNARD, O
    SHIMANOUCHI, T
    TASUMI, M
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) : 535 - 542
  • [10] Recent improvements of the ProDom database of protein domain families
    Corpet, F
    Gouzy, J
    Kahn, D
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 263 - 267