Prediction of kinase-specific phosphorylation sites using conditional random fields

被引:62
作者
Dang, Thanh Hai [1 ]
Van Leemput, Koenraad [2 ]
Verschoren, Alain [1 ]
Laukens, Kris [1 ]
机构
[1] Intelligent Syst Lab, B-2020 Antwerp, Belgium
[2] Adv Database Res & Modelling, Dept Math & Comp Sci, B-2020 Antwerp, Belgium
关键词
D O I
10.1093/bioinformatics/btn546
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Motivation: Phosphorylation is a crucial post-translational protein modi. cation mechanism with important regulatory functions in biological systems. It is catalyzed by a group of enzymes called kinases, each of which recognizes certain target sites in its substrate proteins. Several authors have built computational models trained from sets of experimentally validated phosphorylation sites to predict these target sites for each given kinase. All of these models suffer from certain limitations, such as the fact that they do not take into account the dependencies between amino acid motifs within protein sequences in a global fashion. Results: We propose a novel approach to predict phosphorylation sites from the protein sequence. The method uses a positive dataset to train a conditional random field (CRF) model. The negative training dataset is used to specify the decision threshold corresponding to a desired false positive rate. Application of the method on experimentally verified benchmark phosphorylation data (Phospho. ELM) shows that it performs well compared to existing methods for most kinases. This is to our knowledge that the first report of the use of CRFs to predict post-translational modi. cation sites in protein sequences.
引用
收藏
页码:2857 / 2864
页数:8
相关论文
共 34 条
[1]
Sequence and structure-based prediction of eukaryotic protein phosphorylation sites [J].
Blom, N ;
Gammeltoft, S ;
Brunak, S .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 294 (05) :1351-1362
[2]
Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence [J].
Blom, N ;
Sicheritz-Pontén, T ;
Gupta, R ;
Gammeltoft, S ;
Brunak, S .
PROTEOMICS, 2004, 4 (06) :1633-1649
[3]
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[4]
Kernel-based data fusion for gene prioritization [J].
De Bie, Tijl ;
Tranchevent, Leon-Charles ;
Van Oeffelen, Liesbeth M. M. ;
Moreau, Yves .
BIOINFORMATICS, 2007, 23 (13) :I125-I132
[5]
Inducing features of random fields [J].
DellaPietra, S ;
DellaPietra, V ;
Lafferty, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393
[6]
Phospho.ELM:: A database of experimentally verified phosphorylation sites in eukaryotic proteins -: art. no. 79 [J].
Diella, F ;
Cameron, S ;
Gemünd, C ;
Linding, R ;
Via, A ;
Kuster, B ;
Sicheritz-Pontén, T ;
Blom, N ;
Gibson, TJ .
BMC BIOINFORMATICS, 2004, 5 (1)
[7]
Phospho.ELM: a database of phosphorylation sites - update 2008 [J].
Diella, Francesca ;
Gould, Cathryn M. ;
Chica, Claudia ;
Via, Allegra ;
Gibson, Toby J. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D240-D244
[8]
Ewens W.J., 2001, STAT METHODS BIOINFO
[9]
FREITAG D, 2000, P 17 NAT C ART INT 1
[10]
PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites [J].
Gnad, Florian ;
Ren, Shubin ;
Cox, Juergen ;
Olsen, Jesper V. ;
Macek, Boris ;
Oroshi, Mario ;
Mann, Matthias .
GENOME BIOLOGY, 2007, 8 (11)