Representation of DNA sequences with virtual potentials and their processing by (SEQREP) Kohonen self-organizing maps

被引:6
作者
Aires-de-Sousa, J
Aires-de-Sousa, L
机构
[1] Univ Nova Lisboa, Dept Quim, CQFB, P-2829516 Monte De Caparica, Portugal
[2] Hosp Santa Maria, Clin Univ Pediat, P-1699 Lisbon, Portugal
关键词
D O I
10.1093/bioinformatics/19.1.30
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques. Results: To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment.
引用
收藏
页码:30 / 36
页数:7
相关论文
共 22 条
[1]   JATOON: Java']Java tools for neural networks [J].
Aires-de-Sousa, J .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 61 (1-2) :167-173
[2]   Significance of HIV-1 genetic subtypes [J].
Alaeus, A .
SCANDINAVIAN JOURNAL OF INFECTIOUS DISEASES, 2000, 32 (05) :455-463
[3]   Universal sequence map (USM) of arbitrary discrete sequences [J].
Almeida, JS ;
Vinga, S .
BMC BIOINFORMATICS, 2002, 3 (1)
[4]   Analysis of genomic sequences by Chaos Game Representation [J].
Almeida, JS ;
Carriço, JA ;
Maretzek, A ;
Noble, PA ;
Fletcher, M .
BIOINFORMATICS, 2001, 17 (05) :429-437
[5]  
[Anonymous], 1988, SELF ORG ASS MEMORY
[6]   Alu repeats and human genomic diversity [J].
Batzer, MA ;
Deininger, PL .
NATURE REVIEWS GENETICS, 2002, 3 (05) :370-379
[7]   The impact of immigration on env HIV-1 subtype distribution among heterosexuals in the Netherlands:: influx of subtype B and non-B strains [J].
de Coul, ELMO ;
Coutinho, RA ;
van der Schoot, A ;
van Doornum, GJJ ;
Lukashov, VV ;
Goudsmit, J ;
Cornelissen, M .
AIDS, 2001, 15 (17) :2277-2286
[8]   CHAOS GAME REPRESENTATION OF GENE STRUCTURE [J].
JEFFREY, HJ .
NUCLEIC ACIDS RESEARCH, 1990, 18 (08) :2163-2170
[9]   Censor - A program for identification and elimination of repetitive elements from DNA sequences [J].
Jurka, J ;
Klonowski, P ;
Dagman, V ;
Pelton, P .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :119-121
[10]   Vaccine-induced antibodies to the native, oligomeric envelope glycoproteins of primary HIV-1 isolates [J].
Lee, SA ;
Orque, R ;
Escarpe, PA ;
Peterson, ML ;
Good, JW ;
Zaharias, EM ;
Berman, PW ;
Sheppard, HW ;
Shibata, R .
VACCINE, 2001, 20 (3-4) :563-576