Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices

被引:15
作者
Agrawal, Ankit [1 ]
Huang, Xiaoqiu [1 ]
机构
[1] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
关键词
Database statistical significance; homologs; pairwise statistical significance; position-specific scoring matrices (PSSMs); sequence alignment; substitution matrices; PSI-BLAST; SIMILARITY; ALGORITHM; ACCURACY;
D O I
10.1109/TCBB.2009.69
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Pairwise sequence alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-independent. In this paper, we use sequence-specific and position-specific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequence-specific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequence-specific substitution matrices at different levels of sequence-specific contribution were conducted, and results confirm that using sequence-specific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSI-BLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSI-BLAST results are significantly better. Further, using position-specific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSI-BLAST using pretrained PSSMs.
引用
收藏
页码:194 / 205
页数:12
相关论文
共 49 条
[1]   Conservative, Non-Conservative and Average Pairwise Statistical Significance of Local Sequence Alignment [J].
Agrawal, Ankit ;
Huang, Xiaoqiu .
2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, :433-436
[2]   Pairwise Statistical Significance of Local Sequence Alignment Using Substitution Matrices with Sequence-Pair-Specific Distance [J].
Agrawal, Ankit ;
Huang, Xiaoqiu .
ICIT 2008: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, 2008, :94-99
[3]   Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty [J].
Agrawal, Ankit ;
Huang, Xiaoqiu .
BMC BIOINFORMATICS, 2009, 10
[4]  
Agrawal Ankit, 2008, International Journal of Computational Biology and Drug Design, V1, P347
[5]  
ALTSCHUL S, 1986, SEPT, V48, P603
[6]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[7]  
Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[8]   The estimation of statistical parameters for local alignment score distributions [J].
Altschul, SF ;
Bundschuh, R ;
Olsen, R ;
Hwa, T .
NUCLEIC ACIDS RESEARCH, 2001, 29 (02) :351-361
[9]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[10]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410