Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching

被引:343
作者
Gribskov, M [1 ]
Robinson, NL [1 ]
机构
[1] SEQUANA THERAPEUT INC, LA JOLLA, CA 92037 USA
来源
COMPUTERS & CHEMISTRY | 1996年 / 20卷 / 01期
关键词
D O I
10.1016/S0097-8485(96)80004-0
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this paper, we borrow the idea of the receiver operating characteristic (ROC) from clinical medicine and demonstrate its application to sequence comparison. The ROC includes elements of both sensitivity and specificity, and is a quantitative measure of the usefulness of a diagnostic. The ROC is used in this work to investigate the effects of scoring table and gap penalties on database searches. Studies on three families of proteins, 4Fe-4S ferredoxins, lysR bacterial regulatory proteins, and bacterial RNA polymerase sigma-factors lead to the following conclusions: sequence families are quite idiosyncratic, but the best PAM distance for database searches using the Smith-Waterman method is somewhat larger than predicted by theoretical methods, about 200 PAM. The length independent gap penalty (gap initation penalty) is quite important, but shows a broad peak at values of about 20-24. The length dependent gap penalty (gap extension penalty) is almost irrelevant suggesting that successful database searches rely only to a limited degree on gapped alignments. Taken together, these observations lead to the conclusion that the optimal conditions for alignments and database searches are not, and should not be expected to be, the same.
引用
收藏
页码:25 / 33
页数:9
相关论文
共 12 条
[1]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]   THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK, RECENT DEVELOPMENTS [J].
BAIROCH, A ;
BOECKMANN, B .
NUCLEIC ACIDS RESEARCH, 1993, 21 (13) :3093-3096
[4]   AREA ABOVE ORDINAL DOMINANCE GRAPH AND AREA BELOW RECEIVER OPERATING CHARACTERISTIC GRAPH [J].
BAMBER, D .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1975, 12 (04) :387-415
[5]   A LARGE FAMILY OF BACTERIAL ACTIVATOR PROTEINS [J].
HENIKOFF, S ;
HAUGHN, GW ;
CALVO, JM ;
WALLACE, JC .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (18) :6602-6606
[6]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[7]   THE SIGMA-70 FAMILY - SEQUENCE CONSERVATION AND EVOLUTIONARY RELATIONSHIPS [J].
LONETTO, M ;
GRIBSKOV, M ;
GROSS, CA .
JOURNAL OF BACTERIOLOGY, 1992, 174 (12) :3843-3849
[8]   SEARCHING PROTEIN-SEQUENCE LIBRARIES - COMPARISON OF THE SENSITIVITY AND SELECTIVITY OF THE SMITH-WATERMAN AND FASTA ALGORITHMS [J].
PEARSON, WR .
GENOMICS, 1991, 11 (03) :635-650
[9]  
SCHELL MA, 1993, ANNU REV MICROBIOL, V47, P597, DOI 10.1146/annurev.mi.47.100193.003121
[10]   IDENTIFICATION OF COMMON MOLECULAR SUBSEQUENCES [J].
SMITH, TF ;
WATERMAN, MS .
JOURNAL OF MOLECULAR BIOLOGY, 1981, 147 (01) :195-197