Application of canonical correlation analysis for identifying viral integration preferences

被引:15
作者
Gumus, Ergun [1 ]
Kursun, Olcay [1 ]
Sertbas, Ahmet [1 ]
Ustek, Duran [2 ]
机构
[1] Istanbul Univ, Dept Comp Engn, TR-34320 Istanbul, Turkey
[2] Istanbul Univ, Inst Expt Med, Dept Genet, TR-34093 Istanbul, Turkey
关键词
TARGET SITES; HIV-1; DNA;
D O I
10.1093/bioinformatics/bts027
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: Gene therapy aims at using viral vectors for attaching helpful genetic code to target genes. Therefore, it is of great importance to develop methods that can discover significant patterns around viral integration sites. Canonical correlation analysis is an unsupervised statistical tool that is used to describe the relations between two related views of the same semantic object, which fits well for identifying such salient patterns. Results: Proposed method is demonstrated on a sequence dataset obtained from a study on HIV-1 preferred integration regions. The subsequences on the left and right sides of the integration points are given to the method as the two views, and statistically significant relations are found between sequence-driven features derived from these two views, which suggest that the viral preference must be the factor responsible for this correlation. We found that there are significant correlations at x=5 indicating a palindromic behavior surrounding the viral integration site, which complies with the previously reported results. Availability: Developed software tool is available at http://ce.istanbul.edu.tr/bioinformatics/hiv1/
引用
收藏
页码:651 / 655
页数:5
相关论文
共 20 条
[1]
[Anonymous], 2010, MATL VERS 7 10 0
[2]
Borga M., 2001, LIUIMTEX0062 LINK U
[3]
A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications [J].
Deng, Mo ;
Yu, Chenglong ;
Liang, Qian ;
He, Rong L. ;
Yau, Stephen S. -T. .
PLOS ONE, 2011, 6 (03)
[4]
Identifying target sites for cooperatively binding factors [J].
GuhaThakurta, D ;
Stormo, GD .
BIOINFORMATICS, 2001, 17 (07) :608-621
[5]
Canonical correlation analysis: An overview with application to learning methods [J].
Hardoon, DR ;
Szedmak, S ;
Shawe-Taylor, J .
NEURAL COMPUTATION, 2004, 16 (12) :2639-2664
[6]
Identifying DNA and protein patterns with statistically significant alignments of multiple sequences [J].
Hertz, GZ ;
Stormo, GD .
BIOINFORMATICS, 1999, 15 (7-8) :563-577
[7]
Holman AG, 2005, P NATL ACAD SCI USA, V102, P6103, DOI 10.1073/pnas.0501646102
[8]
Izenman AJ, 2008, SPRINGER TEXTS STAT, P1, DOI 10.1007/978-0-387-78189-1_1
[9]
Canonical correlation analysis using within-class coupling [J].
Kursun, Olcay ;
Alpaydin, Ethem ;
Favorov, Oleg V. .
PATTERN RECOGNITION LETTERS, 2011, 32 (02) :134-144
[10]
DETECTING SUBTLE SEQUENCE SIGNALS - A GIBBS SAMPLING STRATEGY FOR MULTIPLE ALIGNMENT [J].
LAWRENCE, CE ;
ALTSCHUL, SF ;
BOGUSKI, MS ;
LIU, JS ;
NEUWALD, AF ;
WOOTTON, JC .
SCIENCE, 1993, 262 (5131) :208-214