Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information

被引:289
作者
Ahmad, S [1 ]
Gromiha, MM
Sarai, A
机构
[1] Kyushu Inst Technol, Dept Biochem Sci & Engn, Iizuka, Fukuoka 8208502, Japan
[2] Jamia Millia Islamia, Dept Biosci, New Delhi 110025, India
[3] AIST, Computat Biol Res Ctr, CBRC, Koto Ku, Tokyo 1350064, Japan
关键词
D O I
10.1093/bioinformatics/btg432
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Though vitally important to cell function, the mechanism of protein-DNA binding has not yet been completely understood. We therefore analysed the relationship between DNA binding and protein sequence composition, solvent accessibility and secondary structure. Using non-redundant databases of transcription factors and protein-DNA complexes, neural network models were developed to utilize the information present in this relationship to predict DNA-binding proteins and their binding residues. Results: Sequence composition was found to provide sufficient information to predict the probability of its binding to DNA with nearly 69% sensitivity at 64% accuracy for the considered proteins; sequence neighbourhood and solvent accessibility information were sufficient to make binding site predictions with 40% sensitivity at 79% accuracy. Detailed analysis of binding residues shows that some three- and five-residue segments frequently bind to DNA and that solvent accessibility plays a major role in binding. Although, binding behaviour was not associated with any particular secondary structure, there were interesting exceptions at the residue level. Over-representation of some residues in the binding sites was largely lost at the total sequence level, but a different kind of compositional preference was observed in DNA-binding proteins.
引用
收藏
页码:477 / 486
页数:10
相关论文
共 20 条
  • [1] Real value prediction of solvent accessibility from amino acid sequence
    Ahmad, S
    Gromiha, MM
    Sarai, A
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 50 (04) : 629 - 635
  • [2] NETASA: neural network based prediction of solvent accessibility
    Ahmad, S
    Gromiha, MM
    [J]. BIOINFORMATICS, 2002, 18 (06) : 819 - 824
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [5] Cuff JA, 2000, PROTEINS, V40, P502, DOI 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO
  • [6] 2-Q
  • [7] Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: Application of long-range order to folding rate prediction
    Gromiha, MM
    Selvaraj, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 310 (01) : 27 - 32
  • [8] Role of structural and sequence information in the prediction of protein stability changes: comparison between buried and partially buried mutations
    Gromiha, MM
    Oobatake, M
    Kono, H
    Uedaira, H
    Sarai, A
    [J]. PROTEIN ENGINEERING, 1999, 12 (07): : 549 - 555
  • [9] Removing near-neighbour redundancy from large protein sequence collections
    Holm, L
    Sander, C
    [J]. BIOINFORMATICS, 1998, 14 (05) : 423 - 429
  • [10] DICTIONARY OF PROTEIN SECONDARY STRUCTURE - PATTERN-RECOGNITION OF HYDROGEN-BONDED AND GEOMETRICAL FEATURES
    KABSCH, W
    SANDER, C
    [J]. BIOPOLYMERS, 1983, 22 (12) : 2577 - 2637