Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology

被引:14
作者
Brown, J. B. [1 ]
Akutsu, Tatsuya [1 ]
机构
[1] Kyoto Univ, Inst Chem Res, Bioinformat Ctr, Kyoto 6110011, Japan
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
SUBCELLULAR LOCATION PREDICTION; FUSION CLASSIFIER; PHOSPHODIESTERASE; LOCALIZATION; RESOURCE; HISTONES; DATABASE; CELLS; MPLOC; PLOC;
D O I
10.1186/1471-2105-10-25
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA repair is the general term for the collection of critical mechanisms which repair many forms of DNA damage such as methylation or ionizing radiation. DNA repair has mainly been studied in experimental and clinical situations, and relatively few information-based approaches to new extracting DNA repair knowledge exist. As a first step, automatic detection of DNA repair proteins in genomes via informatics techniques is desirable; however, there are many forms of DNA repair and it is not a straightforward process to identify and classify repair proteins with a single optimal method. We perform a study of the ability of homology and machine learning-based methods to identify and classify DNA repair proteins, as well as scan vertebrate genomes for the presence of novel repair proteins. Combinations of primary sequence polypeptide frequency, secondary structure, and homology information are used as feature information for input to a Support Vector Machine (SVM). Results: We identify that SVM techniques are capable of identifying portions of DNA repair protein datasets without admitting false positives; at low levels of false positive tolerance, homology can also identify and classify proteins with good performance. Secondary structure information provides improved performance compared to using primary structure alone. Furthermore, we observe that machine learning methods incorporating homology information perform best when data is filtered by some clustering technique. Analysis by applying these methodologies to the scanning of multiple vertebrate genomes confirms a positive correlation between the size of a genome and the number of DNA repair protein transcripts it is likely to contain, and simultaneously suggests that all organisms have a non-zero minimum number of repair genes. In addition, the scan result clusters several organisms' repair abilities in an evolutionarily consistent fashion. Analysis also identifies several functionally unconfirmed proteins that are highly likely to be involved in the repair process. A new web service, INTREPED, has been made available for the immediate search and annotation of DNA repair proteins in newly sequenced genomes. Conclusion: Despite complexity due to a multitude of repair pathways, combinations of sequence, structure, and homology with Support Vector Machines offer good methods in addition to existing homology searches for DNA repair protein identification and functional annotation. Most importantly, this study has uncovered relationships between the size of a genome and a genome's available repair repetoire, and offers a number of new predictions as well as a prediction service, both which reduce the search time and cost for novel repair genes and proteins.
引用
收藏
页数:22
相关论文
共 48 条
[11]   Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
NATURE PROTOCOLS, 2008, 3 (02) :153-162
[12]   Euk-mPLoc: A fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
JOURNAL OF PROTEOME RESEARCH, 2007, 6 (05) :1728-1734
[13]   Large-scale plant protein subcellular location prediction [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2007, 100 (03) :665-678
[14]  
Cristianini N, 2000, SUPPORT VECTOR MACHI, DOI [10.1017/CBO9780511801389, DOI 10.1017/CBO9780511801389]
[15]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[16]   Twists and turns in the function of DNA damage signaling and repair proteins by post-translational modifications [J].
Dery, Ugo ;
Masson, Jean-Yves .
DNA REPAIR, 2007, 6 (05) :561-577
[17]   Tyrosyl-DNA phosphodiesterase as a target for anticancer therapy [J].
Dexheimer, Thomas S. ;
Antony, Smitha ;
Marchand, Christophe ;
Pommier, Yves .
ANTI-CANCER AGENTS IN MEDICINAL CHEMISTRY, 2008, 8 (04) :381-389
[18]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923
[19]   Using support vector classification for SAR of fentanyl derivatives [J].
Dong, N ;
Lu, WC ;
Chen, NY ;
Zhu, YC ;
Chen, KX .
ACTA PHARMACOLOGICA SINICA, 2005, 26 (01) :107-112
[20]   Predicting linear B-cell epitopes using string kernels [J].
El-Manzalawy, Yasser ;
Dobbs, Drena ;
Honavar, Vasant .
JOURNAL OF MOLECULAR RECOGNITION, 2008, 21 (04) :243-255