Machine learning approaches for the prediction of signal peptides and other protein sorting signals

被引:507
作者
Nielsen, H [1 ]
Brunak, S
von Heijne, G
机构
[1] Tech Univ Denmark, Dept Biotechnol, Ctr Biol Sequence Anal, DK-2800 Lyngby, Denmark
[2] Univ Stockholm, Arrhenius Lab, Dept Biochem, S-10691 Stockholm, Sweden
来源
PROTEIN ENGINEERING | 1999年 / 12卷 / 01期
关键词
D O I
10.1093/protein/12.1.3
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Prediction of protein sorting signals from the sequence of amino acids has great importance in the field of proteomics today, Recently, the growth of protein databases, combined with machine learning approaches, such as neural networks and hidden Markov models, have made it possible to achieve a level of reliability where practical use in, for example automatic database annotation is feasible. In this review we concentrate on the present status and future perspectives of SignalP, our neural network-based method for prediction of the most well-known sorting signal: the secretory signal peptide. We discuss the problems associated with the use of SignalP on genomic sequences, showing that signal peptide prediction will improve further if integrated with predictions of start codons and transmembrane helices, As a step towards this goal, a hidden Markov model version of SignalP has been developed, making it possible to discriminate between cleaved signal peptides and uncleaved signal anchors. Furthermore, we show how SignalP can be used to characterize putative signal peptides from an archaeon, Methanococcus jannaschii, Finally, we briefly review a few methods for predicting other protein sorting signals and discuss the future of protein sorting prediction in general.
引用
收藏
页码:3 / 9
页数:7
相关论文
共 40 条
  • [1] Altschul SF, 1996, METHOD ENZYMOL, V266, P460
  • [2] Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
  • [3] The SWISS-PROT protein sequence data bank and its supplement TrEMBL
    Bairoch, A
    Apweller, R
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (01) : 31 - 36
  • [4] Baldi P., 1998, Bioinformatics: The machine learning approach
  • [5] NEURAL NETWORK DETECTS ERRORS IN THE ASSIGNMENT OF MESSENGER-RNA SPLICE SITES
    BRUNAK, S
    ENGELBRECHT, J
    KNUDSEN, S
    [J]. NUCLEIC ACIDS RESEARCH, 1990, 18 (16) : 4797 - 4801
  • [6] CLEANING UP GENE DATABASES
    BRUNAK, S
    ENGELBRECHT, J
    KNUDSEN, S
    [J]. NATURE, 1990, 343 (6254) : 123 - 123
  • [7] BRUNAK S, 1993, COMPUTATION BIOMOLEC, P43
  • [8] Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
    Bult, CJ
    White, O
    Olsen, GJ
    Zhou, LX
    Fleischmann, RD
    Sutton, GG
    Blake, JA
    FitzGerald, LM
    Clayton, RA
    Gocayne, JD
    Kerlavage, AR
    Dougherty, BA
    Tomb, JF
    Adams, MD
    Reich, CI
    Overbeek, R
    Kirkness, EF
    Weinstock, KG
    Merrick, JM
    Glodek, A
    Scott, JL
    Geoghagen, NSM
    Weidman, JF
    Fuhrmann, JL
    Nguyen, D
    Utterback, TR
    Kelley, JM
    Peterson, JD
    Sadow, PW
    Hanna, MC
    Cotton, MD
    Roberts, KM
    Hurst, MA
    Kaine, BP
    Borodovsky, M
    Klenk, HP
    Fraser, CM
    Smith, HO
    Woese, CR
    Venter, JC
    [J]. SCIENCE, 1996, 273 (5278) : 1058 - 1073
  • [9] Relation between amino acid composition and cellular location of proteins
    Cedano, J
    Aloy, P
    PerezPons, JA
    Querol, E
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) : 594 - 600
  • [10] CHOU MM, 1990, J BIOL CHEM, V265, P2873