PMC:1459172 / 10738-19410
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"16722605-6163133-1692554","span":{"begin":331,"end":333},"obj":"6163133"},{"id":"16722605-9925784-1692555","span":{"begin":424,"end":426},"obj":"9925784"},{"id":"16722605-15294028-1692555","span":{"begin":424,"end":426},"obj":"15294028"},{"id":"16722605-12926009-1692555","span":{"begin":424,"end":426},"obj":"12926009"},{"id":"16722605-12824340-1692556","span":{"begin":706,"end":708},"obj":"12824340"},{"id":"16722605-2468181-1692557","span":{"begin":766,"end":768},"obj":"2468181"},{"id":"16722605-12824337-1692557","span":{"begin":766,"end":768},"obj":"12824337"},{"id":"16722605-1695107-1692558","span":{"begin":2254,"end":2256},"obj":"1695107"},{"id":"16722605-15459283-1692559","span":{"begin":3411,"end":3413},"obj":"15459283"},{"id":"16722605-12926009-1692560","span":{"begin":3623,"end":3625},"obj":"12926009"},{"id":"16722605-15123812-1692561","span":{"begin":7863,"end":7865},"obj":"15123812"}],"text":"RNA secondary structure prediction\nBecause of the no-(pseudo)knot condition 3 above, every base pair (i, j) subdivides a secondary structure into an interior and an exterior structure that do not interact with each other. This observation is the starting point of all dynamic programming approaches to RNA folding, see e.g. [32,33,37]. Including various classes of pseudoknots is feasible in dynamic programming approaches [38-40] at the expense of a dramatic increase in computational costs, which precludes the application of these approaches to large molecules such as most mRNAs.\nIn the course of the \"normal\" RNA folding algorithm for linear RNA molecules as implemented in the Vienna RNA Package [41,42], and in a similar way in Michael Zuker's mfold package [43-45] the following arrays are computed for i \u003cj:\nFij free energy of the optimal substructure on the subsequence x[i, j].\nCij free energy of the optimal substructure on the subsequence x[i, j] subject to the constraint that i and j form a basepair.\nMij free energy of the optimal substructure on the subsequence x[i, j] subject to the constraint that that x[i, j] is part of a multiloop and has at least one component, i.e., a sub-sequence that is enclosed by a base pair.\nfree energy of the optimal substructure on the subsequence x[i, j] subject to the constraint that that x[i, j] is part of a multiloop and has exactly one component, which has the closing pair i, h for some h satisfying i ≤ h \u003cj.\nThe \"conventional\" energy minimization algorithm (for simplicity of presentation without dangling end contributions) for linear RNA molecules can be summarized in the following way, which corresponds to the recursions implemented in the Vienna RNA Package:\nThe F table is initialized as Fi+1, i = 0, while the other tables are are set to infinity for empty intervals. It is straightforward to translate these recursions into recursions for the partition function because they already provide a partition of the set of all secondary structures that can be formed by the sequence x. This unambiguity of the decomposition of the ensemble structure is not important for energy minimization, while it is crucial for enumeration and hence also for the computation of the partition function [31]. Let us write Zij for the partition function on [xi, xj]. for the partition function constrained to structures with an (i, j) pair, and , for the partition function versions of the multiloop terms Mij and .\nThe adaptation of the recursion to the folding of two RNAs A and B of length n1 and n2 into a dimeric structure is straightforward: the two molecules are concatenated to form a single sequence of length n = n1 + n2. It follows from the algorithmic considerations below that the order of the two parts is arbitrary.\nA basic limitation of this approach arises from the no-pseudoknots condition: It restricts not only the intramolecular base pairs but also affects intermolecular pairs. Let SA and SB denote the intramolecular pairs in a cofolded structure S. These sets of base pairs define secondary structures on A and B respectively. Because of the no-pseudoknot condition on S, an intermolecular base pair in S\\(SA ∪ SB) can only connect nucleotides in the external loops of A and B. This is a serious restriction for some applications, because it excludes among other pseudoknot-like structures also the so-called kissing hairpin complexes [46]. Taking such structures into account is equivalent to employing folding algorithms for structure models that include certain types of pseudoknots, such as the partition function approach by Dirks and Pierce [40]. Its high computational cost, however, precludes the analysis of large mRNAs. In an alternative model [29], no intramolecular interactions are allowed in the small partner B, thus allowing B to form basepairs with all contiguous unpaired regions in SA. From a biophysical point of view, however, it makes sense to consider exclusively hybridization in the exterior loop provided both partners are large structured RNAs. In this case, hybridization either stops early, i.e., at a kissing hairpin complex (in the case of very stable local structures) or it is thermodynamically controlled and runs into the ground state via a complete melting of the local structure. In the latter case, the no-pseudoknots condition is the same approximation that is also made when folding individual molecules. Note that this approximation does not imply that the process of hybridization could only start at external bases.\nLet us now consider the algorithmic details of folding two concatenated RNA sequences. The missing backbone edge between the last nucleotide of the first molecule, position n1 in the concatenated sequence, and the first nucleotide of the second molecule (now numbered n1+1) will be referred to as the cut c. In each dimeric structure there is a unique loop Lc that contains the cut c. If c lies in the external loop of a structure S then the two molecules A and B do no interact in this structure. Algorithmically, Lc is either a hairpin loop, interior loop, or multibranch loop. From an energetic point of view, however, Lc is an exterior loop, i.e., it does not contribute to the folding energy (relative to the random coil reference state). For example, an interior loop (i, j; k, l) does not contribute to the energy if either i ≤ n1 \u003ck or l ≤ n1 \u003cj. Naturally, dangling end contributions must not span the cut, either. Hairpin loops and interior loops (including the special cases of bulges and stacked pairs) can therefore be dealt with by a simple modification of the energy rules. In the case of the multiloop there is also no problem as long as one is only interested in energy minimization, since multiloops are always destabilizing and hence have strictly positive energy contribution. Such a modified MFE algorithm has been described already in [41].\nFor partition function calculations and the generation of suboptimal structures, however, we have to ensure that every secondary structure is counted exactly once. This requires one to explicitly keep track of loops that contain the cut c. The cut c needs to be taken into account explicitly only in the recursion for the ZP terms, where one has to distinguish between true hairpin and interior loops with closing pair (i, j) (upper alternatives in eq.(6)) and loops containing the cut c in their backbone (lower alternatives in eq.(6)). Explicitly, this means i ≤ n1 \u003cj in the hairpin loop case, in the interior loop case, this either means i ≤ n1 \u003ck or l ≤ n1 \u003cj. Since multiloops are decomposed into two components, it is sufficient to ensure during the construction of ZM1 and ZM that these components neither start nor end adjacent to the cut, see Fig. 1.\nFigure 1 Loops with cuts have to be scored differently. Top row: hairpins and interior loops containing the cut between n1 (black ball) and n1 + 1 (white ball). Below: multi loops containing the cut. Neither M1 nor M components must start at n1 + 1 or stop at n1. Note that the construction of ZM out of ZM and ZM1 ensures that the cut is not inside the loop part of ZM either. In their full form including dangling end terms, the forward recursions for the partition function of an interacting pair of RNAs become\nUpper alternatives refer to regular loops, lower alternatives to the loop containing the cutpoint. For brevity we have used here abbreviations (i, j) = exp(-(i, j)/RT), and equivalently , , , , , for the Boltzmann factors of the energy contributions. In the remainder of this presentation we will again suppress the dangling end terms for simplicity of presentation.\nA second complication arises from the initiation energy ΦI that describes the entropy necessary to bring the two molecules into contact. This term, which is considered to be independent of sequence length and composition [47], has to be taken into account exactly once for every dimer structure if and only if the structure contains at least one base pair (i, j) that crosses the cut, i.e., i ≤ n1 \u003cj. The resulting bookkeeping problems fortunately can be avoided by introducing this term only after the dynamic programming tables have been filled. To this end we observe that Zi, j = , 1 ≤ i, j ≤ n1 are the partition functions for subsequences of the isolated A molecule, while , 1 ≤ i, j ≤ n2 are the corresponding quantities for the second interaction partner. Thus we can immediately compute the partition function ZAB - ZAZB that counts only the structures with intermolecular pairs, i.e., those that carry the additional initiation energy contribution. The total partition function including the initiation term is therefore"}