Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles

被引:218
作者
Gautheret, D
Lambert, A
机构
[1] CNRS, Ctr Immunol Marseille Luminy, UMR 6102, INSERM,U136, F-13288 Marseille 09, France
[2] CNRS, Ctr Phys Theor, UPR 7061, F-13288 Marseille 9, France
关键词
RNA motifs; sequence alignment; secondary structure; motif search; profiles;
D O I
10.1006/jmbi.2001.5102
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. TI-le proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs. (C) 2001 Academic Press.
引用
收藏
页码:1003 / 1011
页数:9
相关论文
共 24 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Singly and bifurcated hydrogen-bonded base-pairs in tRNA anticodon hairpins and ribozymes [J].
Auffinger, P ;
Westhof, E .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 292 (03) :467-483
[3]  
BAN N, 2000, SCIENCE, V289, P878
[4]   Palingol: A declarative programming language to describe nucleic acids' secondary structures and to sequence databases [J].
Billoud, B ;
Kontic, M ;
Viari, A .
NUCLEIC ACIDS RESEARCH, 1996, 24 (08) :1395-1403
[5]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[6]  
Durbin R., 1998, BIOL SEQUENCE ANAL P
[7]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088
[8]   IDENTIFYING POTENTIAL TRANSFER-RNA GENES IN GENOMIC DNA-SEQUENCES [J].
FICHANT, GA ;
BURKS, C .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 220 (03) :659-671
[9]  
GAUTHERET D, 1990, COMPUT APPL BIOSCI, V6, P325
[10]   New mammalian selenocysteine-containing proteins identified with an algorithm that searches for selenocysteine insertion sequence elements [J].
Kryukov, GV ;
Kryukov, VM ;
Gladyshev, VN .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1999, 274 (48) :33888-33897