A COMPARISON OF SEVERAL SIMILARITY INDEXES USED IN THE CLASSIFICATION OF PROTEIN SEQUENCES - A MULTIVARIATE-ANALYSIS

被引:15
作者
LANDES, C
HENAUT, A
RISLER, JL
机构
[1] Centre de Génétique Moléculaire du CNRS
关键词
D O I
10.1093/nar/20.14.3631
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The present work describes an attempt to identify reliable criteria which could be used as distance indices between protein sequences. Seven different criteria have been tested: i and ii) the scores of the alignments as given by the BESTFIT and the FASTA programs; iii) the ratio parametrer, i.e. the BESTFIT score divided by the length of the aligned peptides; iv and v) the statistical significance (Z-scores) of the scores calculated by BESTFIT and FASTA, as obtained by comparison with shuffled sequences; vi) the Z-scores provided by the program RELATE which performs a segment-by-segment comparison of 2 sequences, and vii) an original distance index calculated by the program DOCMA from all the pairwise dotplots between the sequences. These 7 criteria have been tested against the aminoacid sequences of 39 globins and those of the 20 aminoacyl-tRNA synthetases from E. coli. The distances between the sequences were analyzed by the multivariate analysis techniques. The results show that the distances calculated from the scores of the pairwise alignments are not adequately sensitive. The Z-score from RELATE is not selective enough and too demanding in computer time. Three criteria gave a classification consistent with the known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dotplot comparison distance index from DOCMA.
引用
收藏
页码:3631 / 3637
页数:7
相关论文
共 35 条
[21]  
HECK JD, 1988, J BIOL CHEM, V263, P868
[22]  
HEIN J, 1990, METHOD ENZYMOL, V183, P626
[23]   DISTANCE MATRIX COMPARISON AND TREE CONSTRUCTION [J].
HENAUT, A ;
DELORME, MO .
PATTERN RECOGNITION LETTERS, 1988, 7 (04) :207-213
[24]   CLUSTAL - A PACKAGE FOR PERFORMING MULTIPLE SEQUENCE ALIGNMENT ON A MICROCOMPUTER [J].
HIGGINS, DG ;
SHARP, PM .
GENE, 1988, 73 (01) :237-244
[25]  
LANDES C, 1992, THESIS U PIERRE MARI
[26]  
LEHMANN EL, 1975, NONPARAMETRICS
[27]   ON THE STATISTICAL SIGNIFICANCE OF NUCLEIC-ACID SIMILARITIES [J].
LIPMAN, DJ ;
WILBUR, WJ ;
SMITH, TF ;
WATERMAN, MS .
NUCLEIC ACIDS RESEARCH, 1984, 12 (01) :215-226
[28]   ENHANCED GRAPHIC MATRIX ANALYSIS OF NUCLEIC-ACID AND PROTEIN SEQUENCES [J].
MAIZEL, JV ;
LENK, RP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1981, 78 (12) :7665-7669
[29]   DETECTING HOMOLOGY OF DISTANTLY RELATED PROTEINS WITH CONSENSUS SEQUENCES [J].
PATTHY, L .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (04) :567-577
[30]   IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON [J].
PEARSON, WR ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) :2444-2448