> top > projects > MyTest > docs > PMC:1459172 > spans > 8213-8789
MyTest  

PMC:1459172 / 8213-8789 JSONTXT

Partition function and base pairing probabilities of RNA heterodimers Abstract Background RNA has been recognized as a key player in cellular regulation in recent years. In many cases, non-coding RNAs exert their function by binding to other nucleic acids, as in the case of microRNAs and snoRNAs. The specificity of these interactions derives from the stability of inter-molecular base pairing. The accurate computational treatment of RNA-RNA binding therefore lies at the heart of target prediction algorithms. Methods The standard dynamic programming algorithms for computing secondary structures of linear single-stranded RNA molecules are extended to the co-folding of two interacting RNAs. Results We present a program, RNAcofold, that computes the hybridization energy and base pairing pattern of a pair of interacting RNA molecules. In contrast to earlier approaches, complex internal structures in both RNAs are fully taken into account. RNAcofold supports the calculation of the minimum energy structure and of a complete set of suboptimal structures in an energy band above the ground state. Furthermore, it provides an extension of McCaskill's partition function algorithm to compute base pairing probabilities, realistic interaction energies, and equilibrium concentrations of duplex structures. Availability RNAcofold is distributed as part of the Vienna RNA Package, . Contact Stephan H. Bernhart – berni@tbi.univie.ac.at Background Over the last decade, our picture of RNA as a mere information carrier has changed dramatically. Since the discovery of microRNAs and siRNAs (see e.g. [1,2] for a recent reviews), small noncoding RNAs have been recognized as key regulators in gene expression. Both computational surveys, e.g. [3-7] and experimental data [8-11] now provide compelling evidence that non-protein-coding transcripts are a common phenomenon. Indeed, at least in higher eukaryotes, the complexity of the non-coding RNome appears to be comparable with the complexity of the proteome. This extensive inventory of non-coding RNAs has been implicated in diverse mechanisms of gene regulation, see e.g. [12-16] for reviews. Regulatory RNAs more often than not function by means of direct RNA-RNA binding. The specificity of these interactions is a direct consequence of complementary base pairing, allowing the same basic mechanisms to be used with very high specificity in large collections of target and effector RNAs. This mechanism underlies the post-transcriptional gene silencing pathways of microRNAs and siRNAs (reviewed e.g. in [17]), it is crucial for snoRNA-directed RNA editing [18], and it is used in the gRNA directed mRNA editing in kinetoplastids [19]. Furthermore, RNA-RNA interactions determine the specificity of important experimental techniques for changing the gene expression patterns including RNAi [20] and modifier RNAs [21-24]. RNA-RNA binding occurs by formation of stacked intermolecular base pairs, which of course compete with the propensity of both interacting partners to form intramolecular base pairs. These base pairing patterns, usually referred to as secondary structures, not only comprise the dominating part of the energetics of structure formation, they also appear as intermediates in the formation of the tertiary structure of RNAs [25], and they are in many cases well conserved in evolution. Consequently, secondary structures provide a convenient, and computationally tractable, approximation not only to RNA structure but also to the thermodynamics of RNA-RNA interaction. From the computational point of view, this requires the extension of RNA folding algorithms to include intermolecular as well as intramolecular base pairs. Several approximations have been described in the literature: Rehmsmeier et al. [26] as well as Dimitrov and Zuker [27] introduced algorithms that consider exclusively intermolecular base pairs, leading to a drastic algorithmic simplification of the folding algorithms since multi-branch loops are by construction excluded in this case. Andronescu et al. [28], like the present contribution, consider all base pairs that can be formed in secondary structures in a concatenation of the two hybridizing molecules. This set in particular contains the complete structural ensemble of both partners in isolation. Mückstein et al. [29] recently considered an asymmetric model in which base pairing is unrestricted in a large target RNA, while the (short) interaction partner is restricted to intermolecular base pairs. A consistent treatment of the thermodynamic aspects of RNA-RNA interactions requires that one takes into account the entire ensemble of suboptimal structures. This can be approximated by explicitly computing all structures in an energy band above the ground state. Corresponding algorithms are discussed in [30] for single RNAs and in [28] for two interacting RNAs. A more direct approach, that becomes much more efficient for larger molecules, is to directly compute the partition function of the entire ensemble along the lines of McCaskill's algorithm [31]. This is the main topic of the present contribution. As pointed out by Dimitrov and Zuker [27], the concentration of the two interacting RNAs as well as the possibility to form homo-dimers plays an important role and cannot be neglected when quantitative predictions on RNA-RNA binding are required. In our implementation of RNAcofold we therefore follow their approach and explicitly compute the concentration dependencies of the equilibrium ensemble in a mixture of two partially hybridizing RNA species. This contribution is organized as follows: We first review the energy model for RNA secondary structures and recall the minimum energy folding algorithm for simple linear RNA molecules. Then we discuss the modifications that are necessary to treat intermolecular base pairs in the partition function setting and describe the computation of base pairing probabilities. Then the equations for concentration dependencies are derived. Short sections summarize implementation, performance, as well as an application to real-world data. RNA secondary structures A secondary structure S on a sequence x of length n is a set of base pairs (i, j), i where ℓ1 is the length of the unpaired strand between i and k and ℓ2 is the length of the unpaired strand between l and j. Symmetry of the energy model dictates (i, j; k, l) = (l, k; j, i). If ℓ1 = ℓ2 = 0 we have a (stabilizing) stacked pair, if only one of ℓ1 and ℓ2 vanish we have a bulge. For multiloops, finally we have an additive energy model of the form = a + b × β + c × ℓ where ℓ is the length of multiloop (again expressed as the number of unpaired nucleotides) and β is the number of branches, not counting the branch in which the closing pair of the loop resides. So-called dangling end contributions arise from the stacking of unpaired bases to an adjacent base pair. We have to distinguish two types of dangling ends: (1) interior dangles, where the unpaired base i + 1 stacks onto i of the adjacent basepair (i, j) and correspondingly j - 1 stacks onto j and (2) exterior dangles, where i - 1 stack onto i and j + 1 stacks on j. The corresponding energy contributions are denoted by and , respectively. Within the additive energy model, dangling end terms are interpreted as the contribution of 3' and 5' dangling nucleotides: Here | separates the dangling nucleotide position from the adjacent base pair, d5' (k - 1|k, l) thus is the energy of the nucleotide at position k - 1 when interacting with following base pair (k, l), while d3' (k, l|l + 1) scores the interaction of position l + 1 with the preceding pair (k, l). The Vienna RNA Package currently implements three different models for handling the dangling-end contributions: They can be (a) ignored, (b) taken into account for every combination of adjacent bases and base pairs, or (c) a more complex model can be used in which the unpaired base can stack with at most one base pair. In cases (a) and (b) one can absorb the dangling end contributions in the loop energies (with the exception of contributions in the external loop). Model (c) strictly speaking violates the secondary structure model in that an unpaired base xi between two base pairs (xp, xi-1) and (xi+1, xq) has three distinct states with different energies: xi does not stack to its neighbors, xi stacks to xi-1, or xi+1. The algorithm then minimizes over these possibilities. While model (c) is the default for computing minimum free energy structures in most implementations such as RNAfold and mfold, it is not tractable in a partition function approach in a consistent way unless different positions of the dangling ends are explicitly treated as different configurations. RNA secondary structure prediction Because of the no-(pseudo)knot condition 3 above, every base pair (i, j) subdivides a secondary structure into an interior and an exterior structure that do not interact with each other. This observation is the starting point of all dynamic programming approaches to RNA folding, see e.g. [32,33,37]. Including various classes of pseudoknots is feasible in dynamic programming approaches [38-40] at the expense of a dramatic increase in computational costs, which precludes the application of these approaches to large molecules such as most mRNAs. In the course of the "normal" RNA folding algorithm for linear RNA molecules as implemented in the Vienna RNA Package [41,42], and in a similar way in Michael Zuker's mfold package [43-45] the following arrays are computed for i

Document structure show

Annnotations TAB TSV DIC JSON TextAE

  • Denotations: 0
  • Blocks: 0
  • Relations: 0