Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches

被引:40
作者
Yu, Yi-Kuo [1 ]
Gertz, E. Michael [1 ]
Agarwala, Richa [1 ]
Schaeffer, Alejandro A. [1 ]
Altschul, Stephen F. [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, US Dept HHS, NIH, Bethesda, MD 20894 USA
关键词
D O I
10.1093/nar/gkl731
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein sequence database search programs may be evaluated both for their retrieval accuracy the ability to separate meaningful from chance similarities-and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.
引用
收藏
页码:5966 / 5973
页数:8
相关论文
共 43 条
[1]   Protein database searches using compositionally adjusted substitution matrices [J].
Altschul, SF ;
Wootton, JC ;
Gertz, EM ;
Agarwala, R ;
Morgulis, A ;
Schäffer, AA ;
Yu, YK .
FEBS JOURNAL, 2005, 272 (20) :5101-5109
[2]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[3]   The estimation of statistical parameters for local alignment score distributions [J].
Altschul, SF ;
Bundschuh, R ;
Olsen, R ;
Hwa, T .
NUCLEIC ACIDS RESEARCH, 2001, 29 (02) :351-361
[4]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[5]   Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases [J].
Altschul, SF ;
Koonin, EV .
TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (11) :444-447
[6]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[7]  
[Anonymous], 1994, Ann. Prob
[8]   Combining evidence using p-values: application to sequence homology searches [J].
Bailey, TL ;
Gribskov, M .
BIOINFORMATICS, 1998, 14 (01) :48-54
[9]   PREDICTING COILED COILS BY USE SF PAIRWISE RESIDUE CORRELATIONS [J].
BERGER, B ;
WILSON, DB ;
WOLF, E ;
TONCHEV, T ;
MILLA, M ;
KIM, PS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (18) :8259-8263
[10]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078