Combining evidence using p-values: application to sequence homology searches

被引:924
作者
Bailey, TL [1 ]
Gribskov, M [1 ]
机构
[1] San Diego Supercomp Ctr, San Diego, CA 92186 USA
关键词
D O I
10.1093/bioinformatics/14.1.48
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. Results: In sequence analysis, two or more (approximately) independent measure of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. an example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence a as p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.
引用
收藏
页码:48 / 54
页数:7
相关论文
共 15 条
  • [1] Bailey T L, 1996, Proc Int Conf Intell Syst Mol Biol, V4, P15
  • [2] Score distributions for simultaneous matching to multiple motifs
    Bailey, TL
    Gribskov, M
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (01) : 45 - 59
  • [3] BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
  • [4] BAIROCH A, 1994, NUCLEIC ACIDS RES, V22, P3578
  • [5] Feller W., 1957, An introduction to probability theory and its applications, VII
  • [6] Fisher R.A., 1970, STAT METHODS RES WOR
  • [7] Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching
    Gribskov, M
    Robinson, NL
    [J]. COMPUTERS & CHEMISTRY, 1996, 20 (01): : 25 - 33
  • [8] HENIKOFF S, 1995, GENE, V163, pGC17, DOI 10.1016/0378-1119(95)00486-P
  • [9] GIBBS MOTIF SAMPLING - DETECTION OF BACTERIAL OUTER-MEMBRANE PROTEIN REPEATS
    NEUWALD, AF
    LIU, JS
    LAWRENCE, CE
    [J]. PROTEIN SCIENCE, 1995, 4 (08) : 1618 - 1632
  • [10] OOSTERHOFF J, 1969, COMBINATION ONE SIDE