Efficient deformation algorithm for plasmid DNA simulations Abstract Background Plasmid DNA molecules are closed circular molecules that are widely used in life sciences, particularly in gene therapy research. Monte Carlo methods have been used for several years to simulate the conformational behavior of DNA molecules. In each iteration these simulation methods randomly generate a new trial conformation, which is either accepted or rejected according to a criterion based on energy calculations and stochastic rules. These simulation trials are generated using a method based on crankshaft motion that, apart from some slight improvements, has remained the same for many years. Results In this paper, we present a new algorithm for the deformation of plasmid DNA molecules for Monte Carlo simulations. The move underlying our algorithm preserves the size and connectivity of straight-line segments of the plasmid DNA skeleton. We also present the results of three experiments comparing our deformation move with the standard and biased crankshaft moves in terms of acceptance ratio of the trials, energy and temperature evolution, and average displacement of the molecule. Our algorithm can also be used as a generic geometric algorithm for the deformation of regular polygons or polylines that preserves the connections and lengths of their segments. Conclusion Compared with both crankshaft moves, our move generates simulation trials with higher acceptance ratios and smoother deformations, making it suitable for real-time visualization of plasmid DNA coiling. For that purpose, we have adopted a DNA assembly algorithm that uses nucleotides as building blocks. Background Plasmid DNA (pDNA) is a family of DNA molecules widely used in life sciences, more specifically in gene therapy research. These molecules are produced inside host cells in a supercoiled conformation (i.e., their natural conformation), which is the desired conformation for therapy purposes. However, such molecules can lose their original conformation in the production and purification processes, assuming more relaxed or even linear conformations, owing to thermodynamic changes (e.g., temperature changes). One of the main challenges for researchers is to find optimal thermodynamic conditions for plasmid DNA therapeutic application without losing its supercoiled conformation or, at least, minimizing the occurrence of relaxed or open DNA molecules. For many years, computational methods based on laboratory experimental data have been proposed to model and simulate the dynamic behavior and conformational changes in pDNA molecules under certain conditions. The Monte Carlo (MC) method has generally been accepted as a reliable tool for simulation purposes, and is seen as the standard. This iterative method tries to minimize the elastic energy of the molecule in each iteration step of the simulation process, testing the probability of acceptance of each new trial. The goal is to make the molecule converge to an equilibrium state after performing as few iterations as possible, i.e., maximizing the acceptance ratio of the trials without compromising the effectiveness and reliability of the simulation. To simplify the simulation process, each plasmid DNA molecule is reduced to a linear skeleton (i.e., polyline) with equal sized segments that represents the topological conformation of the molecule. Random deformations are then applied to this skeleton, generating new trial conformations, which are either accepted or rejected. Interestingly, the essence of the method used to randomly generate each new trial, referred to as the standard crankshaft move, has remained the same for many years, with its origins dating back to the early 1960s [1–3], more specifically in the context of lattice polymer chains. This move was later adapted for simulation of flexible molecules like DNA using MC methods. However, the standard crankshaft move has a very low acceptance ratio of trials, i.e., many trials are rejected. Moreover, it can present very unnatural behavior as it features very sudden motions along large portions of the molecule. To enhance the efficiency of MC moves, biasing was found to be a solution [4, 5]. However, as Earl and Deem noted [6], biasing a deformation move implies that the probability of moving from one state to another is no longer symmetric; consequently, the acceptance rule used must be altered to maintain the detailed balance. In this paper, we present a new unbiased move for plasmid DNA, whose skeleton is a closed polyline. This move not only preserves the size of each segment and its connectivity, but is also very effective in maximizing the acceptance ratio of the trials and stabilizing the molecule, thereby allowing steady, gradual temperature changes during the simulation. Our method also generates natural and realistic animations that can be used in real-time simulation and visualization. Related work In this section, we briefly review the MC methods in computational biology and chemistry, as well as the generative methods for DNA conformations that form the core of these MC methods. Monte Carlo simulations The MC simulation method is one of the most important methods used in DNA simulations. This method, which was originally presented by Metropolis et al. [7], generates DNA conformations combining energy calculations, random conformational changes, and statistics. Frank-Kamenetskii et al. [8], Vologodskii et al. [9] and Lebret [10] were the first to use an MC method to present numerical results of the probability of the occurrence of knots on pDNA. Frank-Kamenetskii and Vologodskii also presented valuable information on DNA torsional rigidity [11]. A few years later, Vologodskii et al. used MC simulations to study the conformational and thermodynamic properties of DNA molecules with physiological levels of supercoiling [5]. Vologodskii also included a chapter on “Monte Carlo Simulation of DNA Topological Properties” in the book “Topology in Molecular Biology” [12], and with Rybenkov, they reviewed how conformational properties of DNA catenanes can be studied using MC simulations [13]. Gebe et al. [14] presented an MC algorithm to simulate supercoiling free energies in unknotted and trefoil knotted inextensible circular chains with finite twisting and bending rigidity, while Marko et al. [15] made use of MC simulations to study the relationship between the amount of twisting in DNA molecules and its supercoiling. Kundu et al. used an MC algorithm to explain denaturation characteristics in a supercoiled plasmid and calculate the probability of denaturation for each base pair at different supercoiling degrees [16]. In their work on the relationship between knots and supercoiling, Cozzareli et al. used an MC simulation procedure to generate an equilibrium set of conformations [17]. MC simulations have also contributed to the understanding of the interplay between base-pair stacking interaction and permanent hydrogen-bond constraints in supercoiled DNA elasticity [18]. Based on the fact that atomic force microscopy has generated images of supercoiled DNA confined to a surface, which affects conformational properties such as twist and writhe, Fujimoto and Schurr modified an existing program, developed to perform MC simulations of supercoiled DNA in solution, flattening the DNA to simulate the effect of deposition on a surface [19]. Fujimoto and Schurr also presented a method to estimate torsional rigidities of weakly strained DNA [20]. Burnier et al. used MC calculations to identify a mechanism by which topoisomerases can keep the knotting level low [21]. More recently, Olson et al., in their paper “How stiff is DNA?”, used MC simulations to understand the behavior of a long, double-helical polymer in the tight confines of a cell and in the design of novel nanomaterials and molecular devices [22]. Generative methods of DNA conformations It has generally been accepted that supercoiled, i.e., the self twisting of the double stranded molecule over itself, is the desired conformation for pDNA molecules [23]. Thus, it is necessary to measure the supercoiling of a given molecule. One of the most important quantitative measures of closed circular DNA supercoiling is the linking number (Lk) of the two DNA strands, which is an integer corresponding to the number of double-helical turns of the molecule. There are several methods for calculating Lk, with one of the most widely used involving the computation of two very important geometrical properties of closed circular DNA molecules: twist (Tw) and writhe (Wr). Tw features the coiling of the two DNA strands around the axis of the helix, while Wr is a measure of the coiling of the helix axis in space [24]. Thus, the main result is: In our implementation, we used Klenin and Langowski’s computation method (2a) to calculate Wr [25]. Owing to the nature of three-dimensional closed polylines, knots can occur in some pDNA conformations. This is not a desirable feature, i.e., each closed circular DNA molecule must remain unknotted during the simulation process, keeping its original topology even if supercoiling occurs. Knot detection methods must be used during simulation to reject possible knotted conformations. We adopted Harris-Harvey’s knot detection algorithm, which uses the Alexander polynomial to detect the existence of knots [26]. This algorithm is based on the predicate that if two knots have different Alexander polynomials, then the knots are topologically distinct. Thus, because the Alexander polynomial of an unknotted closed circular DNA molecule is equal to one, all conformations for which this polynomial does not equal one must be rejected during the simulation. Each trial conformation must be generated in such a way that the size and connectivity of each segment of the DNA chain do not change. A major deformation method used to displace vertices of the DNA chain was introduced by Klenin et al. [4, 5]. This method, which is just a biased crankshaft move, starts by randomly choosing two vertices vm and vn. Then, all the vertices (and consequently all connecting segments) are rotated a randomly selected angle θ around the axis defined by the line connecting vm and vn, as shown in Figure 1. Furthermore, following Klenin et al. [4], the value of θ is uniformly distributed over a certain interval, and must be continuously adjusted during the simulation to guarantee that about half the steps are accepted. Figure 1 Crankshaft motion. To increase the acceptance ratio of the simulation trials another type of motion has been proposed in the literature. This improvement performs a sub-chain translation, which is usually referred to as reptation motion [5], and is illustrated in Figure 2. First, two vertices vi and vj are randomly chosen. Then, the sub-chain between vi and vj is translated by one segment length along the chain contour. The segment that was immediately after vj is also translated to fill the gap between vi and vi+1. This motion suggests movement analogous to a snake slithering and, hence, the name reptation motion. Other types of motion can be adopted if the Metropolis microscopic reversibility requirement is satisfied, i.e., if the probability of each trial conformation is the same as that of the reverse movement [7]. Figure 2 Reptation motion. Visualization of DNA conformational changes over time is also important as part of the entire simulation of DNA behavior. This is usually performed only when the entire simulation procedure ends and is typically done by assembling the DNA atoms along the DNA axis. Interestingly, a more efficient DNA assembly algorithm was presented to allow the visualization of all the steps of the simulation procedure in real-time [27]. In this method, each DNA nucleotide is represented by a three-dimensional building block, allowing the assembly of the entire molecule faster, but in a realistic way. In geometric terms, each of the four building blocks featuring DNA nucleotides is a Gaussian isosurface, which was previously generated by an algorithm that triangulates molecular surfaces [28]. Methods The deformation algorithm presented in this paper uses a linear skeleton (i.e., a polyline) with equal sized segments, henceforth called the DNA skeleton. Before introducing the core of the method itself, we explain how the DNA skeleton can be created for use by the deformation algorithm. Initial conformation of the DNA skeleton The DNA skeleton can assume any closed unknotted conformation. The simplest of these conformations is a completely relaxed circular conformation. Besides, the length of each segment of the DNA skeleton corresponds approximately to 30 base pairs of the double helix [12]. That said, the first step of the algorithm is to determine the number of segments of the DNA skeleton ensuring around 30 base pairs per segment. Assuming that the DNA has a sequence of n base pairs, we want to find an integer s denoting the number of segments of the DNA skeleton. We define two integer parameters min and max, respectively, as the minimum and maximum numbers of base pairs that are admissible per segment, such that min < 30