Automatic prediction of protein domains from sequence information using a hybrid learning system

被引:56
作者
Nagarajan, N [1 ]
Yona, G [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/bth086
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains. Results: The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed.
引用
收藏
页码:1335 / 1360
页数:26
相关论文
共 51 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 1997, PROBABILISTIC REASON
[3]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[4]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :49-54
[5]   Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins [J].
Bateman, A ;
Birney, E ;
Durbin, R ;
Eddy, SR ;
Finn, RD ;
Sonnhammer, ELL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :260-262
[6]   DEVELOPMENT OF HYDROPHOBICITY PARAMETERS TO ANALYZE PROTEINS WHICH BEAR POSTTRANSLATIONAL OR COTRANSLATIONAL MODIFICATIONS [J].
BLACK, SD ;
MOULD, DR .
ANALYTICAL BIOCHEMISTRY, 1991, 193 (01) :72-82
[7]  
CSISZAR I, 1997, INFORMATION THEORETI
[8]   STRONG LIMIT-THEOREMS OF EMPIRICAL FUNCTIONALS FOR LARGE EXCEEDANCES OF PARTIAL-SUMS OF IID VARIABLES [J].
DEMBO, A ;
KARLIN, S .
ANNALS OF PROBABILITY, 1991, 19 (04) :1737-1755
[9]  
FERRAN EA, 1994, PROTEIN SCI, V3, P507
[10]   The PIR-International protein sequence database [J].
George, DG ;
Barker, WC ;
Mewes, HW ;
Pfeiffer, F ;
Tsugita, A .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :17-20