RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble

被引:241
作者
Ding, Y
Chan, CY
Lawrence, CE
机构
[1] New York State Dept Hlth, Wadsworth Ctr Labs & Res, Bioinformat Ctr, Albany, NY 12208 USA
[2] Brown Univ, Ctr Computat Mol Biol, Providence, RI 02912 USA
[3] Brown Univ, Div Appl Math, Providence, RI 02912 USA
关键词
secondary structure prediction; centroid; Boltzmann ensemble;
D O I
10.1261/rna.2500605
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Prediction of RNA secondary structure by free energy minimization has been the standard for over two decades. Here we describe a novel method that forsakes this paradigm for predictions based on Boltzmann-weighted structure ensemble. We introduce the notion of a centroid structure as a representative for a set of structures and describe a procedure for its identification. In comparison with the minimum free energy (MFE) structure using diverse types of structural RNAs, the centroid of the ensemble makes 30.0% fewer prediction errors as measured by the positive predictive value (PPV) with marginally improved sensitivity. The Boltzmann ensemble can be separated into a small number (3.2 on average) of clusters. Among the centroids of these clusters, the "best cluster centroid" as determined by comparison to the known structure simultaneously improves PPV by 46.5% and sensitivity by 21.7%. For 58% of the studied sequences for which the MFE structure is outside the cluster containing the best centroid, the improvements by the best centroid are 62.5% for PPV and 31.4% for sensitivity. These results suggest that the energy well containing the MFE structure under the current incomplete energy model is often different from the one for the unavailable complete model that presumably contains the unique native structure. Centroids are available on the Sfold server at http://sfold.wadsworth.org.
引用
收藏
页码:1157 / 1166
页数:10
相关论文
共 35 条
[1]   TOWARDS PROTEIN-FOLDING BY GLOBAL ENERGY OPTIMIZATION [J].
ABAGYAN, RA .
FEBS LETTERS, 1993, 325 (1-2) :17-22
[2]   PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS [J].
ANFINSEN, CB .
SCIENCE, 1973, 181 (4096) :223-230
[3]  
[Anonymous], 1977, MULTIDIMENSIONAL SCA
[4]   New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control [J].
Barrick, JE ;
Corbino, KA ;
Winkler, WC ;
Nahvi, A ;
Mandal, M ;
Collins, J ;
Lee, M ;
Roth, A ;
Sudarsan, N ;
Jona, I ;
Wickiser, JK ;
Breaker, RR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (17) :6421-6426
[5]  
BONHOEFFER S, 1993, EUR BIOPHYS J BIOPHY, V22, P13, DOI 10.1007/BF00205808
[6]   The Ribonuclease P Database [J].
Brown, JW .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :314-314
[7]  
Calinski T., 1974, COMMUN STAT, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]
[8]   The Comparative RNA Web (CRW) Site:: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs -: art. no. 2 [J].
Cannone, JJ ;
Subramanian, S ;
Schnare, MN ;
Collett, JR ;
D'Souza, LM ;
Du, YS ;
Feng, B ;
Lin, N ;
Madabusi, LV ;
Müller, KM ;
Pande, N ;
Shang, ZD ;
Yu, N ;
Gutell, RR .
BMC BIOINFORMATICS, 2002, 3 (1)
[9]   Comparisons and validation of statistical clustering techniques for microarray gene expression data [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2003, 19 (04) :459-466
[10]   A statistical sampling algorithm for RNA secondary structure prediction [J].
Ding, Y ;
Lawrence, CE .
NUCLEIC ACIDS RESEARCH, 2003, 31 (24) :7280-7301