A new method to predict the consensus secondary structure of a set of unaligned RNA sequences

被引:25
作者
Bouthinon, D [1 ]
Soldano, H [1 ]
机构
[1] Univ Paris 06, Atelier Bioinformat, F-75005 Paris, France
关键词
D O I
10.1093/bioinformatics/15.10.785
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: To predict the consensus secondary structure, possibly including pseudoknots, of a set of RNA unaligned sequences. Results: We have designed a method based on a new representation of any RNA secondary structure as a set of structural relationships between the helices of the structure, We refer to this representation as a structural pattern. In a first step, we use thermodynamic parameters to select, for each sequence, the hest secondary structures according to energy minimization and we represent each of them using its corresponding structural pattern. In a second step, we search for the repeated structural patterns, i.e. the largest structural patterns that occur in at least one sequence, ie. included in at least one of the structural patter-ns associated to each sequence. Thanks to ar? efficient encoding of structural patterns, this search comes down to identifying the largest repealed word suffixes in a dictionary. In a third step, we compute the plausibility of each repeated structural pattern by checking if it occurs more frequently in the studied sequences than in random RNA sequences. We then suppose that the consensus secondary structure corresponds to the repeated structural pattern that displays the highest plausibility We present several experiments concerning tRNA, fragments of 16S rRNA and 10Sa RNA (including pseudoknots); in each of them, we found the putative consensus secondary structure.
引用
收藏
页码:785 / 798
页数:14
相关论文
共 49 条
[1]  
[Anonymous], 1986, LECT MATH LIFE SCI
[2]  
[Anonymous], MATH METHODS DNA SEQ
[3]   Palingol: A declarative programming language to describe nucleic acids' secondary structures and to sequence databases [J].
Billoud, B ;
Kontic, M ;
Viari, A .
NUCLEIC ACIDS RESEARCH, 1996, 24 (08) :1395-1403
[4]  
BOUTHINON D, 1996, APPRENTISSAGE PARTIR
[5]  
BOUTHINON D, 1998, 10 EUR C MACH LEARN, V1398, P238
[6]  
BOUTHINON D, 1998, 11 C REC FORM INT AR, P137
[7]  
CARY RB, 1995, P 3 INT C INT SYST M, P75
[8]  
CHIU DKY, 1991, COMPUT APPL BIOSCI, V7, P347
[9]   EFFICIENT ALGORITHMS FOR FOLDING AND COMPARING NUCLEIC-ACID SEQUENCES [J].
DUMAS, JP ;
NINIO, J .
NUCLEIC ACIDS RESEARCH, 1982, 10 (01) :197-206
[10]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088