Prediction of protein interdomain linker regions by a hidden Markov model

被引:13
作者
Bae, KW
Mallick, BK
Elsik, CG [1 ]
机构
[1] Texas A&M Univ, Dept Anim Sci, College Stn, TX 77843 USA
[2] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[3] Texas A&M Univ, Intercollegiate Fac Genet, College Stn, TX 77843 USA
关键词
D O I
10.1093/bioinformatics/bti363
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Our aim was to predict protein interdomain linker regions using sequence alone, without requiring known homology. Identifying linker regions will delineate domain boundaries, and can be used to computationally dissect proteins into domains prior to clustering them into families. We developed a hidden Markov model of linker/non-linker sequence regions using a linker index derived from amino acid propensity. We employed an efficient Bayesian estimation of the model using Markov Chain Monte Carlo, Gibbs sampling in particular, to simulate parameters from the posteriors. Our model recognizes sequence data to be continuous rather than categorical, and generates a probabilistic output. Results: We applied our method to a dataset of protein sequences in which domains and interdomain linkers had been delineated using the Pfam-A database. The prediction results are superior to a simpler method that also uses linker index.
引用
收藏
页码:2264 / 2270
页数:7
相关论文
共 44 条
[1]   BAYES INFERENCE VIA GIBBS SAMPLING OF AUTOREGRESSIVE TIME-SERIES SUBJECT TO MARKOV MEAN AND VARIANCE SHIFTS [J].
ALBERT, JH ;
CHIB, S .
JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 1993, 11 (01) :1-15
[2]  
Apweiler R, 2001, Brief Bioinform, V2, P9, DOI 10.1093/bib/2.1.9
[3]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[4]   AN INVESTIGATION OF OLIGOPEPTIDES LINKING DOMAINS IN PROTEIN TERTIARY STRUCTURES AND POSSIBLE CANDIDATES FOR GENERAL GENE FUSION [J].
ARGOS, P .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 211 (04) :943-958
[5]  
Asai K., 1993, Proceeding of the Twenty-Sixth Hawaii International Conference on System Sciences (Cat. No.93TH0501-7), P783, DOI 10.1109/HICSS.1993.270612
[6]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[7]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[8]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[9]   EXPECTATION MAXIMIZATION ALGORITHM FOR IDENTIFYING PROTEIN-BINDING SITES WITH VARIABLE LENGTHS FROM UNALIGNED DNA FRAGMENTS [J].
CARDON, LR ;
STORMO, GD .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 223 (01) :159-170
[10]  
CHURCHILL GA, 1989, B MATH BIOL, V51, P79