PSSM-based prediction of DNA binding sites in proteins

被引:199
作者
Ahmad, S [1 ]
Sarai, A
机构
[1] Kyushu Inst Technol, Dept Bioinformat & Biosci, Iizuka, Fukuoka 8208502, Japan
[2] Jamia Millia Islamia, Dept Biosci, New Delhi 110025, India
关键词
D O I
10.1186/1471-2105-6-33
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Detection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites. Results: An average of sensitivity and specificity using PSSMs is up to 8.7% better than the prediction with sequence information only. Much smaller data sets could be used to generate PSSM with minimal loss of prediction accuracy. Conclusion: One problem in using PSSM-derived prediction is obtaining lengthy and time-consuming alignments against large sequence databases. In order to speed up the process of generating PSSMs, we tried to use different reference data sets ( sequence space) against which a target protein is scanned for PSI-BLAST iterations. We find that a very small set of proteins can actually be used as such a reference data without losing much of the prediction value. This makes the process of generating PSSMs very rapid and even amenable to be used at a genome level. A web server has been developed to provide these predictions of DNA-binding sites for any new protein from its amino acid sequence.
引用
收藏
页数:6
相关论文
共 16 条
[1]   Accurate prediction of solvent accessibility using neural networks-based regression [J].
Adamczak, R ;
Porollo, A ;
Meller, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) :753-767
[2]   Moment-based prediction of DNA-binding proteins [J].
Ahmad, S ;
Sarai, A .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (01) :65-71
[3]   Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[4]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[5]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[6]   Protein sequence databases [J].
Apweiler, R ;
Bairoch, A ;
Wu, CH .
CURRENT OPINION IN CHEMICAL BIOLOGY, 2004, 8 (01) :76-80
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]  
Cuff JA, 2000, PROTEINS, V40, P502, DOI 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO
[9]  
2-Q
[10]   Protein secondary structure prediction based on position-specific scoring matrices [J].
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 292 (02) :195-202