PROF_PAT 1.3: Updated database of patterns used to detect local similarities

被引:9
作者
Bachinsky, AG
Frolov, AS
Naumochkin, AN
Nizolenko, LP
Yarigin, AA
机构
[1] Res Inst Mol Biol, Dept Theoret, Koltsov 630559, Novosibirsk Reg, Russia
[2] Russian Acad Sci, Inst Cytol & Genet, Novosibirsk 630090, Russia
关键词
D O I
10.1093/bioinformatics/16.4.358
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: When analysing novel protein sequences, it is now essential to extend search strategies to include a range of 'secondary' databases. Pattern databases have become vital tools for identifying distant relationships in sequences, and hence for predicting protein function and structure. The main drawback of such methods is the relatively small representation of proteins in trial samples at the time of their construction. Therefore, a negative result of an amino acid sequence comparison with such a databank forces a researcher to search for similarities in the original protein banks. We developed a database of patterns constructed for groups of related proteins with maximum representation of amino acid sequences of SWISS-PROT in the groups. Results: Software tools and a new method have been designed to construct patterns of protein families. By using such method, a new version of databank of protein family patterns, PROF_PAT 1.3, is produced. This bank is based on SWISS-PROT (r1.38) and TrEMBL (r1.11), and contains patterns of more than 13 000 groups of related proteins in a format similar to that of the PROSITE. Motifs of patterns, which had the minimum level of probability to be found in random sequences, were selected. Flexible fast search program accompanies the bank. The researcher can specify a similarity matrix (the type PAM, BLOSUM and other). Variable levels of similarity can be set (permitting search strategies ranging from exact matches to increasing levels of 'fuzziness').
引用
收藏
页码:358 / 366
页数:9
相关论文
共 19 条
  • [1] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
    AHO, AV
    CORASICK, MJ
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
  • [2] [Anonymous], 1972, ATLAS PROTEIN SEQUEN
  • [3] PRINTS prepares for the new millennium
    Attwood, TK
    Flower, DR
    Lewis, AP
    Mabey, JE
    Morgan, SR
    Scordis, P
    Selley, JN
    Wright, W
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 220 - 225
  • [4] Bachinskii A. G., 1996, Molekulyarnaya Biologiya (Moscow), V30, P1409
  • [5] Bachinsky AG, 1997, COMPUT APPL BIOSCI, V13, P115
  • [6] The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 49 - 54
  • [7] The PIR-International Protein Sequence Database
    Barker, WC
    Garavelli, JS
    McGarvey, PB
    Marzec, CR
    Orcutt, BC
    Srinivasarao, GY
    Yeh, LSL
    Ledley, RS
    Mewes, HW
    Pfeiffer, F
    Tsugita, A
    Wu, C
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 39 - 43
  • [8] Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins
    Bateman, A
    Birney, E
    Durbin, R
    Eddy, SR
    Finn, RD
    Sonnhammer, ELL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 260 - 262
  • [9] New features of the blocks database servers
    Henikoff, JG
    Henikoff, S
    Pietrokovski, S
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 226 - 228
  • [10] AUTOMATED ASSEMBLY OF PROTEIN BLOCKS FOR DATABASE SEARCHING
    HENIKOFF, S
    HENIKOFF, JG
    [J]. NUCLEIC ACIDS RESEARCH, 1991, 19 (23) : 6565 - 6572