A min-cut algorithm for the consistency problem in multiple sequence alignment

被引:14
作者
Corel, Eduardo [1 ]
Pitschi, Florian [2 ]
Morgenstern, Burkhard [1 ]
机构
[1] Univ Gottingen, Inst Mikrobiol & Genet, D-37077 Gottingen, Germany
[2] CAS MPG, Partner Inst Computat Biol, Shanghai 200031, Peoples R China
关键词
IMPROVEMENT; ACCURACY; SEARCH; DNA;
D O I
10.1093/bioinformatics/btq082
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Multiple sequence alignments can be constructed on the basis of pairwise local sequence similarities. This approach is rather. exible and can combine the advantages of global and local alignment methods. The restriction to pairwise alignments as building blocks, however, can lead to misalignments since weak homologies may be missed if only pairs of sequences are compared. Results: Herein, we propose a graph-theoretical approach to find local multiple sequence similarities. Starting with pairwise alignments produced by DIALIGN, we use a min-cut algorithm to find potential (partial) alignment columns that we use to construct a final multiple alignment. On real and simulated benchmark data, our approach consistently outperforms the standard version of DIALIGN where local pairwise alignments are greedily incorporated into a multiple alignment.
引用
收藏
页码:1015 / 1021
页数:7
相关论文
共 38 条
[1]  
ABDEDDAIM S, 2001, LECT NOTES COMPUTER, V2066, P1
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
Bailey TL., 1994, Proc Int Conf Intel Syst Mol Biol, V2, P28
[4]  
Cormen T., 2001, Introduction to Algorithms
[5]   ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340
[6]  
DO CB, 2006, LECT NOTES COMPUTER, V3909
[7]  
Durbin R., 1998, Biological sequence analysis: probabilistic models of proteins and nucleic acids
[8]  
Eddy SR., 1995, Proc Third Int Conf Intelligent Systems for Molecular Biology, P114
[9]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[10]   Multiple sequence alignment [J].
Edgar, Robert C. ;
Batzoglou, Serafim .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) :368-373