Generating consensus sequences from partial order multiple sequence alignment graphs

被引:56
作者
Lee, C [1 ]
机构
[1] Univ Calif Los Angeles, DOE Ctr Genom & Proteom, Inst Mol Biol, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Chem & Biochem, Los Angeles, CA 90095 USA
关键词
D O I
10.1093/bioinformatics/btg109
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. However, how can a consensus be constructed when its inherent assumption-that the aligned sequences form a single linear consensus-is not true? Results: Partial Order Alignment (POA) enables construction and analysis of multiple sequence alignments as directed acyclic graphs containing complex branching structure. Here we present a dynamic programming algorithm (heaviest_bundle) for generating multiple consensus sequences from such complex alignments. The number and relationships of these consensus sequences reveals the degree of structural complexity of the source alignment. This is a powerful and general approach for analyzing and visualizing complex alignment structures, and can be applied to any alignment. We illustrate its value for analyzing expressed sequence alignments to detect alternative splicing, reconstruct full length mRNA isoform sequences from EST fragments, and separate paralog mixtures that can cause incorrect SNP predictions.
引用
收藏
页码:999 / 1008
页数:10
相关论文
共 41 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]   The PROSITE database, its status in 1995 [J].
Bairoch, A ;
Bucher, P ;
Hofmann, K .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :189-196
[4]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[5]   Alternative gene form discovery and candidate gene selection from gene indexing projects [J].
Burke, J ;
Wang, H ;
Hide, W ;
Davison, DB .
GENOME RESEARCH, 1998, 8 (03) :276-290
[6]   STACK: Sequence Tag Alignment and Consensus Knowledgebase [J].
Christoffels, A ;
van Gelder, A ;
Greyling, G ;
Miller, R ;
Hide, T ;
Hide, W .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :234-238
[7]   De novo peptide sequencing via tandem mass spectrometry [J].
Dancík, V ;
Addona, TA ;
Clauser, KR ;
Vath, JE ;
Pevzner, PA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :327-342
[8]   Hidden Markov models [J].
Eddy, SR .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :361-365
[9]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[10]   Analysis of expressed sequence tags indicates 35,000 human genes [J].
Ewing, B ;
Green, P .
NATURE GENETICS, 2000, 25 (02) :232-234