Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices

被引:33
作者
Cannata, N [1 ]
Toppo, S [1 ]
Romualdi, C [1 ]
Valle, G [1 ]
机构
[1] Univ Padua, CRIBI Biotechnol Ctr, I-35131 Padua, Italy
关键词
D O I
10.1093/bioinformatics/18.8.1102
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein and DNA are generally represented by sequences of letters. In a number of circumstances simplified alphabets (where one or more letters would be represented by the same symbol) have proved their potential utility in several fields of bioinformatics including searching for patterns occurring at an unexpected rate, studying protein folding and finding consensus sequences in multiple alignments. The main issue addressed in this paper is the possibility of finding a general approach that would allow an exhaustive analysis of all the possible simplified alphabets, using substitution matrices like PAM and BLOSUM as a measure for scoring. Results: The computational approach presented in this paper has led to a computer program called AlphaSimp (Alphabet Simplifier) that can perform an exhaustive analysis of the possible simplified amino acid alphabets, using a branch and bound algorithm together with standard or user-defined substitution matrices. The program returns a ranked list of the highest-scoring simplified alphabets. When the extent of the simplification is limited and the simplified alphabets are maintained above ten symbols the program is able to complete the analysis in minutes or even seconds on a personal computer. However, the performance becomes worse, taking up to several hours, for highly simplified alphabets.
引用
收藏
页码:1102 / 1108
页数:7
相关论文
共 12 条
[1]   SET PARTITIONING - SURVEY [J].
BALAS, E ;
PADBERG, MW .
SIAM REVIEW, 1976, 18 (04) :710-760
[2]   Approaches to the automatic discovery of patterns in biosequences [J].
Brazma, A ;
Jonassen, I ;
Eidhammer, I ;
Gilbert, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) :279-305
[3]  
Dayhoff M., 1978, ATLAS PROTEIN SEQ ST, V5, P353
[4]   THEORY FOR THE FOLDING AND STABILITY OF GLOBULAR-PROTEINS [J].
DILL, KA .
BIOCHEMISTRY, 1985, 24 (06) :1501-1509
[5]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[6]   PROTEIN DESIGN BY BINARY PATTERNING OF POLAR AND NONPOLAR AMINO-ACIDS [J].
KAMTEKAR, S ;
SCHIFFER, JM ;
XIONG, HY ;
BABIK, JM ;
HECHT, MH .
SCIENCE, 1993, 262 (5140) :1680-1685
[7]   MULTIPLE-ALPHABET AMINO-ACID-SEQUENCE COMPARISONS OF THE IMMUNOGLOBULIN K-CHAIN CONSTANT DOMAIN [J].
KARLIN, S ;
GHANDOUR, G .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1985, 82 (24) :8597-8601
[8]   Simplified amino acid alphabets for protein fold recognition and implications for folding [J].
Murphy, LR ;
Wallqvist, A ;
Levy, RM .
PROTEIN ENGINEERING, 2000, 13 (03) :149-152
[9]   Functional rapidly folding proteins from simplified amino acid sequences [J].
Riddle, DS ;
Santiago, JV ;
BrayHall, ST ;
Doshi, N ;
Grantcharova, VP ;
Yi, Q ;
Baker, D .
NATURE STRUCTURAL BIOLOGY, 1997, 4 (10) :805-809
[10]   Multiple sequence comparison - A peptide matching approach [J].
Sagot, MF ;
Viari, A ;
Soldano, H .
THEORETICAL COMPUTER SCIENCE, 1997, 180 (1-2) :115-137