Finding the most significant common sequence and structure motifs in a set of RNA sequences

被引:145
作者
Gorodkin, J
Heyer, LJ
Stormo, GD
机构
[1] UNIV COLORADO,DEPT MOL CELLULAR & DEV BIOL,BOULDER,CO 80309
[2] UNIV COLORADO,DEPT APPL MATH,BOULDER,CO 80309
[3] TECH UNIV DENMARK,CTR BIOL SEQUENCE ANAL,DK-2800 LYNGBY,DENMARK
关键词
D O I
10.1093/nar/25.18.3724
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons, The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed, Example solutions, and comparisons with other approaches, are provided, The solutions include finding consensus structures identical to published ones.
引用
收藏
页码:3724 / 3732
页数:9
相关论文
共 40 条
[1]   HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION [J].
BALDI, P ;
CHAUVIN, Y ;
HUNKAPILLER, T ;
MCCLURE, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) :1059-1063
[2]  
CARY RB, 1995, P 3 INT C INT SYST M, P75
[3]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088
[4]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[5]   STATISTICS OF RNA SECONDARY STRUCTURES [J].
FONTANA, W ;
KONINGS, DAM ;
STADLER, PF ;
SCHUSTER, P .
BIOPOLYMERS, 1993, 33 (09) :1389-1404
[6]  
GORODKIN J, 1997, IN PRESS COMPUT APPL
[7]  
GORODKIN J, 1997, P 5 INT C INT SYST M, P120
[8]   AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES [J].
GOTOH, O .
JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) :705-708
[9]   THE COMPUTER-SIMULATION OF RNA FOLDING PATHWAYS USING A GENETIC ALGORITHM [J].
GULTYAEV, AP ;
VANBATENBURG, FHD ;
PLEIJ, CWA .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 250 (01) :37-51
[10]  
HERTZ GZ, 1990, COMPUT APPL BIOSCI, V6, P81