> top > projects > LitCovid-sentences > docs > PMC:7553147 > annotations

PMC:7553147 JSONTXT 15 Projects

Annnotations TAB TSV DIC JSON TextAE Lectin_function IAV-Glycan

Id Subject Object Predicate Lexical cue
T1 0-72 Sentence denotes Continuous flexibility analysis of SARS-CoV-2 spike prefusion structures
T2 73-110 Sentence denotes SARS-CoV-2 spike prefusion structures
T3 112-222 Sentence denotes The flexibility and conformational dynamics of the SARS-CoV-2 spike in the prefusion state have been analyzed.
T4 223-485 Sentence denotes An ensemble map with minimum bias was obtained, revealing concerted motions involving the receptor-binding domain (RBD), N-terminal domain and subdomains 1 and 2 around the previously characterized 1-RBD-up state, which have been modeled as elastic deformations.
T5 487-495 Sentence denotes Abstract
T6 496-704 Sentence denotes Using a new consensus-based image-processing approach together with principal component analysis, the flexibility and conformational dynamics of the SARS-CoV-2 spike in the prefusion state have been analysed.
T7 705-933 Sentence denotes These studies revealed concerted motions involving the receptor-binding domain (RBD), N-terminal domain, and subdomains 1 and 2 around the previously characterized 1-RBD-up state, which have been modeled as elastic deformations.
T8 934-1060 Sentence denotes It is shown that in this data set there are not well defined, stable spike conformations, but virtually a continuum of states.
T9 1061-1220 Sentence denotes An ensemble map was obtained with minimum bias, from which the extremes of the change along the direction of maximal variance were modeled by flexible fitting.
T10 1221-1409 Sentence denotes The results provide a warning of the potential image-processing classification instability of these complicated data sets, which has a direct impact on the interpretability of the results.
T11 1411-1413 Sentence denotes 1.
T12 1415-1429 Sentence denotes Introduction  
T13 1430-1537 Sentence denotes SARS-CoV-2 infects target cells through the interaction of the viral spike (S) protein with cell receptors.
T14 1538-1640 Sentence denotes This is an essentially dynamic event that is hard to analyze using most structural biology techniques.
T15 1641-1986 Sentence denotes However, cryo-EM offers some unique capabilities that makes it a very suitable approach for this task, especially the facts that it can work with noncrystalline samples and, to a certain degree, those with structural flexibility (Dashti et al., 2014 ▸; Maji et al., 2020 ▸; Scheres et al., 2007 ▸; Sorzano et al., 2019 ▸; Tagare et al., 2015 ▸).
T16 1987-2179 Sentence denotes In turn, cryo-EM information is complex, being buried in thousands of very noisy movies, making it a real challenge to reveal a three-dimensional (3D) structure from this collection of images.
T17 2180-2386 Sentence denotes Furthermore, cryo-EM is in the middle of a methodo­logical and instrumental ‘revolution’ (Kühlbrandt, 2014 ▸) that has already been in progress for several years, with new methods constantly being produced.
T18 2387-2550 Sentence denotes In this context, the original data of Wrapp et al. (2020 ▸) have been reanalyzed, applying newer workflows and algorithms, and thus obtaining improved information.
T19 2551-2890 Sentence denotes Considering that we were studying a biological system that is characterized by its continuous flexibility, we have not strictly followed the standard multi-class approach (Scheres et al., 2007 ▸), which is very well suited to cases of discrete flexibility, since the mathematical modeling and the biological reality could be too far apart.
T20 2891-3232 Sentence denotes Instead, we have calculated a new ‘ensemble’ map at 3 Å global resolution in which the bias has been carefully reduced, followed by both a 3D classification process and a continuous flexibility analysis in 3D principal component (PC) space using a GPU-accelerated and algorithmically improved version of the method of Tagare et al. (2015 ▸).
T21 3233-3279 Sentence denotes The ensemble map was used for atomic modeling.
T22 3280-3417 Sentence denotes Our aim was to explore a larger part of the structural flexibility present in the data set than is achievable by 3D classification alone.
T23 3418-3704 Sentence denotes Using this mixed procedure, and through scatter plots of the projection of the different particle images onto the principal component axes, we have clearly shown how the spike flexibility in this data set should be understood as a continuum of states rather than discrete conformations.
T24 3705-3912 Sentence denotes Using maximum-likelihood-based classification, we have obtained two maps that are projected at the extremes of the main principal component on which flexible fitting from the ensemble map has been performed.
T25 3913-4110 Sentence denotes However, these extreme maps have an intrinsic blurring in the most flexible areas, since for any class that we may define the images come from a continuum of states and are therefore heterogeneous.
T26 4111-4317 Sentence denotes This flexibility is substantially reduced in a recently described biochemically stabilized spike (Hsieh et al., 2020 ▸), as shown by the reduced blurring, which translates into an improved local resolution.
T27 4318-4552 Sentence denotes In this work, we describe the new structural information that has been obtained and how it impacts our biological understanding of the system, together with the new workflows and algorithms that have made this accomplishment possible.
T28 4553-4687 Sentence denotes We used Scipion 2.0 (de la Rosa-Trevín et al., 2016 ▸) in order to easily combine different software suites in the analysis workflows.
T29 4688-5257 Sentence denotes Maps and models have been deposited in public databases [EMPIAR (Iudin et al., 2016 ▸) and EMDB (Lawson et al., 2011 ▸)]: SARS-CoV-2 spike in the prefusion state as EMDB entry EMD-11328 and PDB entry 6zow, SARS-CoV-2 stabilized spike in the prefusion state (1-up conformation) as EMDB entry EMD-11341, SARS-CoV-2 spike in the prefusion state (flexibility analysis, 1-up closed conformation) as EMDB entry EMD-11336 and PDB entry 6zp5, and SARS-CoV-2 spike in the prefusion state (flexibility analysis, 1-up open conformation) as EMDB entry EMD-11337 and PDB entry 6zp7.
T30 5258-5470 Sentence denotes All of the used data, the image-processing workflow and the intermediate results were also uploaded to EMPIAR (entries EMPIAR-10514 and EMPIAR-10516) by running the EMPIAR automatic deposition feature in Scipion.
T31 5472-5474 Sentence denotes 2.
T32 5476-5499 Sentence denotes Materials and methods  
T33 5501-5505 Sentence denotes 2.1.
T34 5507-5534 Sentence denotes Image-processing workflow  
T35 5535-6158 Sentence denotes The basic elements of the workflow combine classic cryo-EM algorithms with recent improvements in particle picking (Sanchez-Garcia et al., 2018 ▸; Sanchez-Garcia, Segura et al., 2020 ▸; Wagner et al., 2019 ▸) and the key ideas of meta classifiers, which integrate multiple classifiers by a ‘consensus’ approach (Sorzano et al., 2020 ▸), and finish with a totally new approach to map post-processing based on deep learning that we term Deep cryo-EM Map Enhancer (DeepEMhancer; Sanchez-Garcia, Gomez-Blanco et al., 2020 ▸), which complements our previous proposal on local deblurring (Ramírez-Aportela, Vilas et al., 2020 ▸).
T36 6159-6335 Sentence denotes Naturally, map and map–model quality analyses are performed using a variety of tools (Pintilie et al., 2020 ▸; Ramírez-Aportela, Maluenda et al., 2020 ▸; Vilas et al., 2020 ▸).
T37 6336-6577 Sentence denotes Conformational variability analysis is carried out by explicitly addressing the continuously flexible nature of the underlying biological reality, in which the SARS-CoV-2 spike explores the conformational space to bind the cellular receptor.
T38 6578-6811 Sentence denotes Most of the image processing performed in this work was performed using the Scipion framework (de la Rosa-Trevín et al., 2016 ▸), which is a public domain image-processing framework that is freely available at http://scipion.i2pc.es.
T39 6812-6928 Sentence denotes A graphical representation of the image-processing workflow used in this work can be found in Supplementary Fig. S1.
T40 6930-6934 Sentence denotes 2.2.
T41 6936-6954 Sentence denotes Meta classifiers  
T42 6955-7265 Sentence denotes With meta classifiers, and as discussed in Sorzano et al. (2020 ▸), the rationale is that a careful analysis of the ratio between algorithmic degrees of freedom and data size shows that cryo-EM may has transitioned from an area characterized by parameter variance to one dominated by possible parameter biases.
T43 7266-7387 Sentence denotes In very simple terms, we have a lot of data, so we can counteract the variance in our data if we deal with random errors.
T44 7388-7569 Sentence denotes However, whenever there is the possibility of a systematic error, a so-called ‘bias’, artifacts may occur in the maps and, if this is the case, they can be very difficult to detect.
T45 7570-7851 Sentence denotes We deal with the problem of introducing bias into the map through ‘consensus’, so that we select those parameters for which several methods, which are as methodologically ‘orthogonal’ as possible, concur on the same answer (sometimes we also use different runs of the same method).
T46 7852-7985 Sentence denotes This notion has been used in several different steps of the workflow as listed below.(i) Contrast transfer function (CTF) estimation.
T47 7986-8051 Sentence denotes We estimated the microscope defocus using two different programs:
T48 8052-8115 Sentence denotes Gctf (Zhang, 2016 ▸) and CTFFIND4 (Rohou & Grigorieff, 2015 ▸).
T49 8116-8232 Sentence denotes We only selected those micrographs for which both estimates agreed up to 2.1 Å resolution (Marabini et al., 2015 ▸).
T50 8233-8257 Sentence denotes (ii) Particle selection.
T51 8258-8298 Sentence denotes We used two particle-picking algorithms:
T52 8299-8367 Sentence denotes Xmipp (Abrishami et al., 2013 ▸) and crYOLO (Wagner et al., 2019 ▸).
T53 8368-8628 Sentence denotes We submitted both results to a picking consensus algorithm using deep learning (Sanchez-Garcia et al., 2018 ▸) and also removed all of the coordinates in contaminations, carbon edges etc. using a deep-learning algorithm (Sanchez-Garcia, Segura et al., 2020 ▸).
T54 8629-8845 Sentence denotes We then cleaned the set of selected particles using two rounds of cryoSPARC 2D classification (Punjani et al., 2017 ▸; Punjani & Fleet, 2020 ▸) and the consensus of two independent 3D classifications with cryo­SPARC.
T55 8846-8867 Sentence denotes (iii) Initial volume.
T56 8868-9059 Sentence denotes As an initial volume, we selected the major class from the two 3D classifications above and refined it with Xmipp Highres (Sorzano et al., 2018 ▸) with a local refinement of the 3D alignment.
T57 9060-9083 Sentence denotes (iv) 3D reconstruction.
T58 9084-9237 Sentence denotes We then performed a cryoSPARC non-uniform 3D reconstruction, followed by a local angular refinement using RELION with a 3D mask (Zivanov et al., 2018 ▸).
T59 9238-9498 Sentence denotes Particle images were subjected to CTF refinement and Bayesian polishing (Zivanov et al., 2018 ▸), before performing another two rounds of CTF refinement and local angular refinement in RELION, where we improved the resolution versus the first local refinement.
T60 9499-9559 Sentence denotes Finally, we performed a non-uniform refinement in cryoSPARC.
T61 9560-9683 Sentence denotes The reported nominal resolution of 2.96 Å is based on the gold-standard Fourier shell correlation (FSC) of 0.143 criterion.
T62 9684-9915 Sentence denotes Actually, by using Xmipp Highres (Sorzano et al., 2018 ▸) we could improve the resolution to 2.2 Å in the central region of the volume (the region that is not flexible), but at the expense of reducing it more in the flexible areas.
T63 9916-9938 Sentence denotes (v) 3D classification.
T64 9939-10080 Sentence denotes We then performed two rounds of 3D classification with RELION followed by a consensus 3D classification, yielding two stables, large classes.
T65 10081-10200 Sentence denotes Using these two classes, we then performed a local angular refinement using a cryo­SPARC non-uniform 3D reconstruction.
T66 10202-10206 Sentence denotes 2.3.
T67 10208-10228 Sentence denotes Particle selection  
T68 10229-10381 Sentence denotes We found that the micrographs and particles that are used for the 3D reconstruction play a key role in the quality and characteristics of the final map.
T69 10382-10453 Sentence denotes In particular, we used the following two procedures.(i) CTF estimation.
T70 10454-10514 Sentence denotes We estimated the microscope defocus using Gctf and CTFFIND4.
T71 10515-10649 Sentence denotes We required that both estimates were similar (the phase of their corresponding CTFs differed by less than 90°) up to 2.1 Å resolution.
T72 10650-10697 Sentence denotes Only 70% of the micrographs met this criterion.
T73 10698-10875 Sentence denotes We then estimated the CTF envelope using Xmipp CTF (Sorzano et al., 2007 ▸) while keeping the defocus value fixed (calculated as the average of the Gctf and CTFFIND4 estimates).
T74 10876-10954 Sentence denotes We found this step to be very important to retain high-resolution information.
T75 10955-11251 Sentence denotes Using Xmipp CTF, we discovered that most of the micrographs had a non-astigmatic validity of between 3 and 4 Å (meaning that at this resolution the assumption of non-astigmatism broke down for most of the micrographs, and only a minority of 30% reached higher resolution in a non-astigmatic way).
T76 11252-11276 Sentence denotes (ii) Particle selection.
T77 11277-11332 Sentence denotes Two advanced particle-picking algorithms were employed:
T78 11333-11350 Sentence denotes Xmipp and crYOLO.
T79 11351-11470 Sentence denotes The first identified 1.2 million coordinates possibly pointing to spike particles, while the second identified 730 000.
T80 11471-11582 Sentence denotes We then combined the estimates using Deep Consensus with a threshold of 0.99, resulting in 620 000 coordinates.
T81 11583-11732 Sentence denotes MicrographCleaner was used to rule out particles selected in the carbon edges, aggregations or contaminations, rejecting a total of 50 000 particles.
T82 11733-11961 Sentence denotes After two rounds of CryoSPARC 2D classification with a pixel size of 2.1 Å and an image size of 140 × 140 pixels, we kept 298 000 particles assigned to 2D classes whose centroids clearly corresponded to projections of the spike.
T83 11962-12085 Sentence denotes At this point, we performed two initial volume estimates using CryoSPARC, classifying the input particles into two classes.
T84 12086-12273 Sentence denotes In both executions, one of the structures clearly corresponded to the spike (with 80% of particles), while the other resulted in a 3D structure that clearly corresponded to contamination.
T85 12274-12413 Sentence denotes We calculated the consensus of the two CryoSPARC 3D classifications (those particles that were consistently assigned to the same 3D class).
T86 12414-12503 Sentence denotes Only 203 000 particles belonged to the class that was consistently assigned to the spike.
T87 12505-12509 Sentence denotes 2.4.
T88 12511-12544 Sentence denotes Validation and quality analysis  
T89 12545-12790 Sentence denotes To judge the quality of our structural results, we concentrated here on three of the newest approaches: directional local resolution (Vilas et al., 2020 ▸), Q-score (Pintilie et al., 2020 ▸) and FSC-Q (Ramirez-Aportela, Maluenda et al., 2020 ▸).
T90 12791-13046 Sentence denotes The first provides a measure of map quality, while the latter two focus on the relationship between the map and the structural model; in other words, how well the model is supported by the map density, without any other complementary piece of information.
T91 13047-13106 Sentence denotes In terms of map-to-model validation, in Supplementary Figs.
T92 13107-13271 Sentence denotes S3(a) and S3(b) we present Q-score and FSC-Q metrics, respectively, showing the agreement between the ensemble cryo-EM map and the structural model derived from it.
T93 13272-13466 Sentence denotes In most areas the agreement is very good, with the exception of the receptor-binding domain (RBD) and substantial parts of the N-terminal domain (NTD), as expected from their higher flexibility.
T94 13468-13472 Sentence denotes 2.5.
T95 13474-13498 Sentence denotes Volume post-processing  
T96 13499-13682 Sentence denotes In this work, we used two volume post-processing approaches that both depart substantially from the traditional approach in the field, which is the application of global B-sharpening.
T97 13683-13808 Sentence denotes One of the approaches is our previously introduced LocalDeblur sharpening method (Ramírez-Aportela, Maluenda et al., 2020 ▸).
T98 13809-13922 Sentence denotes The second approach is a totally new method based on deep learning (Sanchez-Garcia, Gomez-Blanco et al., 2020 ▸).
T99 13923-14227 Sentence denotes Concentrating on the latter, this method, DeepEMhancer, relies on a common approach in modern pattern recognition in which a convolutional neural network (CNN) is trained on a known data set comprised of pairs of data points and targets, with the aim of predicting the targets for new unseen data points.
T100 14228-14424 Sentence denotes In this case, the training was performed by presenting the CNN with pairs of cryo-EM maps collected from the EMDB and maps derived from the structural models associated with the experimental maps.
T101 14425-14570 Sentence denotes As a result, our CNN learned how to obtain much cleaner and detailed versions of the experimental cryo-EM maps, improving their interpretability.
T102 14571-14711 Sentence denotes Trying to take advantage of their complementary information, we used the two post-processed maps to trace the atomic model (PDB entry 6zow).
T103 14712-14848 Sentence denotes Some examples of the similar improvement of structure modeling according to these two sharpened maps are shown in Supplementary Fig. S2.
T104 14849-14920 Sentence denotes The sharpened and unsharpened maps have all been deposited in the EMDB.
T105 14922-14926 Sentence denotes 2.6.
T106 14928-14944 Sentence denotes Model building  
T107 14945-15135 Sentence denotes The atomic interpretation of the SARS-Cov-2 spike 3D map (PDB entry 6zow) was performed taking advantage of the modeling tools integrated in Scipion as described in Martínez et al. (2020 ▸).
T108 15136-15343 Sentence denotes Owing to a lack of sufficient density for the ‘up’ conformation of the RBD, we rigidly fitted the structure of chain A (residues 336–525) of the SARS-CoV-2 RBD in complex with CR30022 Fab (PDB entry 6yla; J.
T109 15344-15351 Sentence denotes Huo, Y.
T110 15352-15360 Sentence denotes Zhao, J.
T111 15361-15368 Sentence denotes Ren, D.
T112 15369-15377 Sentence denotes Zhou, H.
T113 15378-15380 Sentence denotes M.
T114 15381-15389 Sentence denotes Ginn, E.
T115 15390-15392 Sentence denotes E.
T116 15393-15400 Sentence denotes Fry, R.
T117 15401-15411 Sentence denotes Owens & D.
T118 15412-15414 Sentence denotes I.
T119 15415-15501 Sentence denotes Stuart, unpublished work) to the 3D map using UCSF Chimera (Pettersen et al., 2004 ▸).
T120 15502-15688 Sentence denotes This unmodeled part of the structure was called chain ‘a’ since it was part of chain A in the structure previously inferred from the same data set (PDB entry 6vsb; Wrapp et al., 2020 ▸).
T121 15689-15895 Sentence denotes The rest of the molecule was modeled using the same original structure (PDB entry 6vsb) as a template, as well as another spike ectodomain structure in the open state (PDB entry 6vyb; Walls et al., 2020 ▸).
T122 15896-16061 Sentence denotes The former structure (PDB entry 6vsb) was fitted to the new map and refined using Coot (Emsley et al., 2010 ▸) and phenix_real_space_refine (Afonine et al., 2018 ▸).
T123 16062-16364 Sentence denotes Validation metrics were computed to assess the geometry of the new hybrid model and its correlation with the map using ‘Comprehensive Validation (cryo-EM)’ in Phenix, the EMRinger algorithm (Barad et al., 2015 ▸), Q-score (Pintilie et al., 2020 ▸) and FSC-Q (Ramírez-Aportela, Maluenda et al., 2020 ▸).
T124 16365-16484 Sentence denotes Score values considering the whole hybrid spike and excluding the unmodeled RBD are detailed in Supplementary Table S1.
T125 16485-16540 Sentence denotes The hybrid atomic structures were submitted to the PDB.
T126 16541-16672 Sentence denotes iMODFIT (Lopéz-Blanco & Chacón, 2013 ▸) was employed to flexibly fit the hybrid atomic structure to the open and closed class maps.
T127 16674-16678 Sentence denotes 2.7.
T128 16680-16710 Sentence denotes Principal component analysis  
T129 16711-16868 Sentence denotes The principal component analysis used the expectation–maximization (EM) algorithm presented in Tagare et al. (2015 ▸) with the following minor modifications.
T130 16869-17170 Sentence denotes Firstly, in contrast to Tagare et al. (2015 ▸), the images were not Wiener filtered, nor was the projected mean subtracted from the images; instead, the CTF of each image was incorporated into the projection operator of that image and a variable contrast was allowed for the mean volume in each image.
T131 17171-17262 Sentence denotes The extent of the variable contrast was determined by the principal component EM algorithm.
T132 17263-17423 Sentence denotes Secondly, the mean volume was projected along each projection direction and an image mask was constructed with a liberal soft margin to allow for heterogeneity.
T133 17424-17565 Sentence denotes The different masks thus created, with one mask per projection direction, were applied to the images and the masked images were used as data.
T134 17566-17751 Sentence denotes This step corresponds to imposing a form of sparsity on the data, which is known to improve the estimation of principal components in high-dimensional spaces (Johnstone & Paul, 2018 ▸).
T135 17752-17863 Sentence denotes All images were downsampled by a factor of two to improve the signal-to-noise ratio and to speed up processing.
T136 17864-18004 Sentence denotes Finally, during each EM iteration, the principal components were low-pass filtered with a very broad filter whose pass band extended to 4 Å.
T137 18005-18121 Sentence denotes This helped with the convergence of the algorithm without significantly limiting the principal component resolution.
T138 18122-18359 Sentence denotes As part of the EM iteration, the algorithm in Tagare et al. (2015 ▸) conveniently estimates the expected amount by which each principal component is present in each image (this is the term E[z_j] in equation 15 of Tagare et al., 2015 ▸).
T139 18360-18401 Sentence denotes Fig. 3(b) shows a scatter plot of E[z_j].
T140 18402-18592 Sentence denotes It is interesting to note that in the algorithm of Tagare et al. (2015 ▸) the latent variables (representing the contributions of the principal components to each particle) are marginalized.
T141 18593-18784 Sentence denotes Because of this marginalization, the number of unknown parameters that need to be estimated (the principal components and variances) is fixed and does not change with the number of particles.
T142 18785-19019 Sentence denotes We have found this feature to be very valuable for relatively small sets of images (say 100 000 images), which is the case in our work, in order to prevent the number of parameters to be estimated growing with the number of particles.
T143 19020-19156 Sentence denotes Statistically speaking, nonmarginalization is known to be a problem when there are few particles, where the estimates can be unreliable.
T144 19157-19256 Sentence denotes Since the method developed by Tagare and coworkers does not suffer from this, we chose this method.
T145 19258-19260 Sentence denotes 3.
T146 19262-19271 Sentence denotes Results  
T147 19272-19363 Sentence denotes With the goal set at analyzing spike flexibility, we describe our key results step by step.
T148 19365-19369 Sentence denotes 3.1.
T149 19371-19414 Sentence denotes The ensemble map and the way to obtain it  
T150 19415-19554 Sentence denotes In the following, we describe the analysis of SARS-CoV-2 spike stabilized in the prefusion state by two proline substitutions in S2 (S-2P).
T151 19555-19885 Sentence denotes We will objectively demonstrate that the flexibility of the spike protein should be understood as a quasi-continuum of conformations, so that when performing a structural analysis on this specimen special care has to be paid to the image-processing workflows, since they may directly impact on the interpretability of the results.
T152 19886-20144 Sentence denotes Starting from the original SARS-CoV-2 S-2P data set of Wrapp et al. (2020 ▸), we have completely reanalyzed the data using our public domain software integration platform Scipion (de la Rosa-Trevín et al., 2016 ▸), breaking the global 3 Å resolution barrier.
T153 20145-20444 Sentence denotes A representative view of the new ensemble map and its corresponding global FSC curve is shown in Fig. 1 ▸(a) (EMDB entry EMD-11328); the sequence of a monomer of the S protein is shown on the right to facilitate the further discussion of structure–function relationships (from Wrapp et al., 2020 ▸).
T154 20445-20450 Sentence denotes Figs.
T155 20451-20628 Sentence denotes 1 ▸(b) and 1 ▸(c) show a comparison between the original map (Wrapp et al., 2020 ▸) with EMDB code EMD-21375 and the newly reconstructed ensemble map corresponding to EMD-11328.
T156 20629-20718 Sentence denotes Clearly, the local resolution (Vilas et al., 2018 ▸), which is shown on the left in Figs.
T157 20719-20833 Sentence denotes 1 ▸(b) and 1 ▸(c), is increased in the new map, and the anisotropy, which is shown in the center, is much reduced.
T158 20834-21065 Sentence denotes Finally, on the right we present plots of the radially averaged tangential resolution, which is related to the quality of the angular alignment (Vilas et al., 2020 ▸); the steeper the slope, the higher the angular assignment error.
T159 21066-21354 Sentence denotes As can be appreciated, the slope calculated from the newly obtained map is almost zero when compared with that for the map from Wrapp et al. (2020 ▸), indicating that, in relative terms, the particle alignment used to create the new map is better than that used to build the original map.
T160 21355-21420 Sentence denotes The result is an overall quantitative enhancement in map quality.
T161 21421-21776 Sentence denotes In terms of tracing, besides modeling several additional residue side chains and improving the geometry of the carbon skeleton (see Supplementary Fig. S2), one of the most noticeable improvements that we observed in the new map is an extension of the glycan chains that were initially built, particularly throughout the S2 fusion subunit (PDB entry 6zow).
T162 21777-21967 Sentence denotes A quantitative comparison can be made between the length of glycan chains in the new ‘ensemble structure’ with respect to the previous structure (PDB entry 6vsb; see Supplementary Table S2).
T163 21968-22254 Sentence denotes Although the total number of N-linked glycosylation sequons throughout the SARS-CoV-2 S trimer is essentially the same in the new structure (45) and PDB entry 6vsb (44), we have substantially increased the length of the glycan chains, expanding the total number of glycans by about 50%.
T164 22255-22472 Sentence denotes We note the importance of this extensive glycosylation for epitope accessibility and how the accurate determination of this glycan shield will facilitate efforts to rapidly develop effective vaccines and therapeutics.
T165 22473-22673 Sentence denotes Supplementary Fig. S2 shows a representative section of sharpened versions of the ensemble map (EMDB entry EMD-11328) compared with EMDB entry EMD-21375, in which the glycans can now be better traced.
T166 22674-22911 Sentence denotes However, we should not forget that the ensemble map contains images in which the receptor-binding domain (RBD) and N-terminal domain (NTD) are in different positions (see Section 3.2), and consequently these domains appear to be blurred.
T167 22912-23140 Sentence denotes Details of how the tracing was performed can be found in Section 2, while in Supplementary Fig. S3 we present two map-to-model quality figures indicating the good fit in general, with the obvious exception of the variable parts.
T168 23142-23146 Sentence denotes 3.2.
T169 23148-23170 Sentence denotes Flexibility analysis  
T170 23171-23405 Sentence denotes Starting from a carefully selected set of particles obtained from our consensus and cleaning approaches (see Section 2), together with the ensemble map described previously, we subjected the data to the following flexibility analysis.
T171 23406-23578 Sentence denotes The original images that were part of the ensemble map went through a ‘consensus classification’ procedure aimed at separating them into two algorithmically stable classes.
T172 23579-23785 Sentence denotes Essentially, and as described in more detail in Section 2, we performed two independent classifications, further selecting those particles that were consistently together throughout the two classifications.
T173 23786-23852 Sentence denotes In this way, we obtained the two new classes shown in Fig. 2 ▸(a).
T174 23853-24023 Sentence denotes We will refer to these as the ‘closed conformation’ [Fig. 2 ▸(a), Class 1, EMDB entry EMD-11336] and the ‘open conformation’ [Fig. 2 ▸(a), Class 2, EMDB entry EMD-11337].
T175 24024-24183 Sentence denotes The number of images in each class was reduced to 45 000 in one case and 21 000 in the other, with global FSC-based resolutions of 3.1 and 3.3 Å, respectively.
T176 24184-24357 Sentence denotes The open and closed structures depict a clear and concerted movement of the ‘thumb’ formed by the RBD and sub­domains 1 and 2 (SD1 and SD2) and the NTD of an adjacent chain.
T177 24358-24448 Sentence denotes The thumb moves away from the central spike axis, exposing the RBD in the up conformation.
T178 24449-24796 Sentence denotes In order to make clearer where the changes are at the level of the Class 1 and Class 2 maps, we have made use of the representation of map local strains in Sorzano et al. (2016 ▸), which helps to very clearly visualize the type of strains needed to relate two maps, whether these are rigid-body rotations or more complex deformations (stretching).
T179 24797-24981 Sentence denotes We have termed the maps resulting from this elastic analysis as ‘1s’ (Class 1, stretching) and ‘1r’ (Class 1, rotations) on the right-hand side of Fig. 2 ▸(a) and the same for Class 2.
T180 24982-25074 Sentence denotes The color scale for both stretching and rotations goes from blue for small to red for large.
T181 25075-25425 Sentence denotes Clearly, the differences among the classes with respect to the NTD and RBD have a very substantial component of pure coordinated rigid-body rotations, while the different RBDs present a much more complex pattern of deformations (stretching), indicating an important structural rearrangement in this area that does not occur elsewhere in the specimen.
T182 25426-25679 Sentence denotes In terms of atomic modeling, we performed a flexible fitting of the ensemble model onto the closed and open forms [see Fig. 2 ▸(a), rightmost map; the PDB code for the open conformation is PDB entry 6zp7, while that for the closed conformation is 6zp5].
T183 25680-25862 Sentence denotes Focusing on rotations, which are the most simple element to follow, we can quantify that the degree of rotation of the thumb in these classes is close to 6°, as shown in Fig. 2 ▸(b).
T184 25863-26066 Sentence denotes Given this flexibility, we consider that the best way to correctly present the experimental results is through the movie provided as Supplementary Movie S1, in which maps and atomic models are presented.
T185 26067-26412 Sentence denotes Within the approximation to modeling that a flexible fitting represents, we can appreciate two hinge movements of the RBD–SD1–SD2 domains: one located between amino acids 318–326 and 588–595 that produces most of the displacement, and the other between amino acids 330–335 and 527–531 that accompanies a less pronounced ‘up’ movement of the RBD.
T186 26413-26505 Sentence denotes This thumb motion is completed by the accompanying motion of the NTD from an adjacent chain.
T187 26506-26767 Sentence denotes Also in a collective way, other NTDs and RBDs in the down conformation move slightly, as can better be appreciated in Supplementary Movie S1, where the transition between fitted models overlaps with the interpolation between observed high-resolution class maps.
T188 26768-26862 Sentence denotes To further investigate whether or not the flexibility was continuous, we proceeded as follows.
T189 26863-26994 Sentence denotes Images from the two classes were pooled together and, using the ensemble map, subjected to a 3D principal component analysis (PCA).
T190 26995-27100 Sentence denotes The approach we followed is based on Tagare et al. (2015 ▸), with some minor modifications of the method.
T191 27101-27167 Sentence denotes A detailed explanation of the modifications is given in Section 2.
T192 27168-27328 Sentence denotes We initialized the first principal component (PC) to the difference between the open and closed conformation, while the remaining PCs were initialized randomly.
T193 27329-27430 Sentence denotes Upon convergence, the eigenvalue of each PC and the scatter of the images in PC space was calculated.
T194 27431-27483 Sentence denotes The eigenvalues of the PCs are shown in Fig. 3 ▸(a).
T195 27484-27529 Sentence denotes Clearly, the first three PCs are significant.
T196 27530-27606 Sentence denotes The scatter plot of the image data in PC1–PC3 space is shown in Fig. 3 ▸(b).
T197 27607-27720 Sentence denotes Fig. 3 ▸(b) strongly suggests that there is ‘continuous flexibility’ rather than ‘tightly clustered’ flexibility.
T198 27721-27861 Sentence denotes Fig. 3 ▸(b) also shows the projection of the maps corresponding to the open and closed conformations on the extremes of the first three PCs.
T199 27862-28037 Sentence denotes It is clear that the open and closed conformations are aligned mostly along the first PC, suggesting that the open/closed classification captures the most significant changes.
T200 28038-28168 Sentence denotes Fig. 3 ▸(c) shows side views of a pair of structures (mean ± 2 × std, where std is the square root of the eigenvalue) for each PC.
T201 28169-28244 Sentence denotes Additional details of these structures are available in Supplementary Figs.
T202 28245-28255 Sentence denotes S4 and S5.
T203 28256-28405 Sentence denotes Note that PCs are not to be understood as structural pathways with a biological meaning, but as directions that summarize the variance of a data set.
T204 28406-28710 Sentence denotes For instance, the fact that the RBD appears and disappears at the two extremes of PC3 indicates that there is an important variability in these voxels, which is probably indicative of the up and down conformations of the RBD [to be understood in the context of the elastic analysis shown in Fig. 2 ▸(b)].
T205 28711-28854 Sentence denotes Through this combination of approaches, we have learnt that the spike conformation fluctuates virtually randomly in a rather continuous manner.
T206 28855-29173 Sentence denotes Additionally, the approach taken to define the two algorithmically stable ‘classes’ has clearly partitioned the data set according to the main axis of variance, PC1, since the projections of the maps of these classes fall almost exclusively along PC1 and are located towards the extremes of the image-projection cloud.
T207 29174-29390 Sentence denotes Note that the fraction of structural flexibility owing to PC2 and PC3 is also important in terms of the total variance of the complete image set, but that classification approaches do not seem to properly explore it.
T208 29391-29547 Sentence denotes Unfortunately, the resolution in PC2 and PC3 is currently limited, so it is difficult to derive clear structural conclusions from these low-resolution maps.
T209 29548-29764 Sentence denotes However, it is clear from these data that the dynamics of the spike are far richer than just a rigid body closing and opening, and involves more profound rearrangements, especially at the RBD but also at other sites.
T210 29765-29863 Sentence denotes This observation is similar to that of Ke et al. (2020 ▸) when working with subtomogram averaging.
T211 29864-30309 Sentence denotes Additionally, the fact that PCA indicates this continuous flexibility to be a key characteristic of the spike dynamics also suggests that many other forms of partitioning (rather than properly ‘classifying’) of this continuous data set could be devised, this fact just being a consequence of the intrinsic instability created by forcing a quasi-continuous data distribution without any clustering structure to fit into a defined set of clusters.
T212 30310-30677 Sentence denotes In this work, we have clearly forced the classification to go to the extremes of the data distribution, as shown in Fig. 3 ▸, probably by enforcing an algorithmically stable classification, but the key result is that any other degree of movement of the spike in between these extremes of PC1 as well as PC2 and PC3 would also be consistent with the experimental data.
T213 30678-30898 Sentence denotes In other words, since the continuum of conformations does not have clear ‘cutting/classification’ points, there is a certain algorithmic uncertainty and instability as to the possible results of a classification process.
T214 30899-31168 Sentence denotes Note that this instability could be exacerbated by the step of particle picking, in the sense that different picking algorithms may have different biases (precisely to minimize this instability, we have performed a ‘consensus’ approach to picking throughout this work).
T215 31169-31439 Sentence denotes Clearly, flexibility is key in this system, so that alterations in its dynamics may cause profound effects, including viral neutralization, and this could be one of the reasons for the neutralization mechanism of antibodies directed against the NTD (Chi et al., 2020 ▸).
T216 31441-31445 Sentence denotes 3.3.
T217 31447-31506 Sentence denotes Structure of a biochemically stabilized form of the spike  
T218 31507-31607 Sentence denotes We have also worked with a more recent variant containing six proline substitutions in S2 (HexaPro).
T219 31608-31670 Sentence denotes This second protein was also studied by Hsieh et al. (2020 ▸).
T220 31671-31995 Sentence denotes In this case, after going through the same stringent particle-selection process as for the previous specimen, as presented in depth in Section 2, it was impossible to obtain stable classes, so that in Fig. 4 ▸ we present a single map (EMDB entry EMD-11341) together with its global FSC curve and a local resolution analysis.
T221 31996-32149 Sentence denotes It is clear that the local resolution has increased in the moving parts (mostly the RBD and NTD), although we did not feel confident in further modeling.
T222 32151-32153 Sentence denotes 4.
T223 32155-32168 Sentence denotes Conclusions  
T224 32169-32432 Sentence denotes In this work, we present a clear example of how the structural discovery process can be greatly accelerated by a wise combination of rapid data sharing and the use of the wave of newly developed algorithms that characterize this phase of the ‘cryo-EM revolution’.
T225 32433-32624 Sentence denotes The reanalysis of the data used in Wrapp et al. (2020 ▸), but with new workflows and new tools, has resulted in a rich analysis of the spike flexibility as a key characteristic of the system.
T226 32625-32798 Sentence denotes Essentially, and at least to a first approximation, the spike moves in a continuous manner with no preferential states, as clearly shown in the scatter plots in Fig. 3 ▸(b).
T227 32799-32976 Sentence denotes In this way, the result of a particular instance of image-processing analysis, including a 3D classification, should be regarded as a snapshot of this quasi-continuum of states.
T228 32977-33229 Sentence denotes In our case, we have shown that a particular meta image-classification approach, implemented through a consensus among different methods in many steps of the analysis, results in classes that are at the extreme of the main axis of variance in PC space.
T229 33230-33462 Sentence denotes Clearly, PC1, through the analysis of the two extreme classes, reflects a concerted motion of the NTD–RBD–SD1–SD2 thumb, although there are smaller collective movements throughout the spike (see Fig. 2 ▸ and Supplementary Movie S1).
T230 33463-33599 Sentence denotes In this case, the RBD moves together with the NTD, with a smaller degree of independent flexibility and always in the ‘up’ conformation.
T231 33600-33853 Sentence denotes The NTD–RBD movement can be characterized to a large degree as a rotation, but the different RBDs present a much more complex pattern of flexibility, indicating an important structural rearrangement [from elastic analysis (Fig. 2 ▸) and PCA (Fig. 3 ▸)].
T232 33854-34130 Sentence denotes The presence of quasi-solid body rotation hinges is clearly located between amino acids 318–326 and 588–595, which produce most of the displacement, together with other hinges between amino acids 330–335 and 527–531, which accompany a less pronounced ‘up’ movement of the RBD.
T233 34131-34288 Sentence denotes However, there are other PC axes explaining significant fractions of the inter-image variance that are not properly explored at the level of our two classes.
T234 34289-34479 Sentence denotes PC3 is a clear example, indicating a high variance at the voxels associated with RBD up, which probably suggests large conformational changes in this area that result in the RBD moving down.
T235 34480-34833 Sentence denotes The flexibility analysis performed in this work complements previous analysis showing large rotations together with RBD up–down structural changes (Pinto et al., 2020 ▸; Wrapp et al., 2020 ▸), in the sense that the different studies present ‘snapshots’ of a continuum of movements obtained by a particular instance of an image-processing classification.
T236 34834-34963 Sentence denotes In a sense, all of these results are correct, but none of them is able to capture the richness of the flexibility of this system.
T237 34964-35201 Sentence denotes This fact reflects the intrinsic instability of segmenting a continuum into defined clusters, which is a clear limitation of the classification approaches that needs to be considered in detailed analysis of any data set from this system.
T238 35202-35483 Sentence denotes An obvious way to increase the resolution of the moving parts of the spike is to reduce their mobility, as is the case, for instance, in the biochemical stabilization of Hsieh et al. (2020 ▸) and also in the formation of a complex with an antibody against NTD (Chi et al., 2020 ▸).
T239 35484-35853 Sentence denotes On the other hand, the route towards a more complete analysis of the flexibility of the spike protein necessarily involves the analysis of data sets that are quite substantially larger than those being used in most current SARS-CoV-2 studies, so that all of the main axes of inter-image variability can be explored; this is work that is under development at the moment.
T240 35854-36194 Sentence denotes From a biomedical perspective, the proof that a quasi-continuum of flexibility is a key characteristic of this specimen, a concept that has been implicitly considered in much of the structural work performed so far but never demonstrated, suggests that ways to interfere with this flexibility could be important components of new therapies.
T241 36196-36218 Sentence denotes Supplementary Material
T242 36219-36281 Sentence denotes EMDB reference: SARS-CoV-2 spike in prefusion state, EMD-11328
T243 36282-36393 Sentence denotes EMDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up closed conformation), EMD-11336
T244 36394-36503 Sentence denotes EMDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up open conformation), EMD-11337
T245 36504-36597 Sentence denotes EMDB reference: SARS-CoV-2 stabilized spike in prefusion state (1-up conformation), EMD-11341
T246 36598-36654 Sentence denotes PDB reference: SARS-CoV-2 spike in prefusion state, 6zow
T247 36655-36760 Sentence denotes PDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up closed conformation), 6zp5
T248 36761-36864 Sentence denotes PDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up open conformation), 6zp7
T249 36865-36898 Sentence denotes Supplementary Figures and Tables.
T250 36899-36903 Sentence denotes DOI:
T251 36904-36944 Sentence denotes 10.1107/S2052252520012725/fq5016sup1.pdf
T252 36945-36981 Sentence denotes Click here for additional data file.
T253 36982-37005 Sentence denotes Supplementary Movie S1.
T254 37006-37147 Sentence denotes Movie presenting the morphing between the two algorithmically stable classes described in the main text, spanning principal component axis 1.
T255 37148-37152 Sentence denotes DOI:
T256 37153-37193 Sentence denotes 10.1107/S2052252520012725/fq5016sup2.mp4
T257 37195-37405 Sentence denotes We acknowledge the support from the Advanced Computing and e-Science group at the Institute of Physics of Cantabria (IFCA–CSIC–UC) as well as the Barcelona Supercomputer Center (access project BCV-2020-2-0005).
T258 37406-37539 Sentence denotes The authors acknowledge the support and the use of resources of Instruct, a Landmark ESFRI project (Instruct Access Project ID11775).
T259 37540-37577 Sentence denotes Author contributions were as follows.
T260 37578-37762 Sentence denotes Roberto Melero and COSS performed all of the image analysis in Scipion, while BF performed equivalent work in the principal component analysis and JLV in the local resolution analysis.
T261 37763-37931 Sentence denotes MM and Roberto Marabini were in charge of structural modeling, while Pablo Chacon performed the flexible fittings and incorporated important sections of the manuscript.
T262 37932-38122 Sentence denotes ER-A performed the map-to-model analysis as well as generating the sharpened cryo-EM maps, while RS-G also worked in new sharpening methods and DH performed the elastic inter-class analysis.
T263 38123-38209 Sentence denotes Pablo Conesa, YF-R, LdC and PL were in charge of the IT hardware and software support.
T264 38210-38282 Sentence denotes JMcL and DW supplied the images and provided advice throughout the work.
T265 38283-38402 Sentence denotes HT, COSS and JMC conceptualized the work, with JMC writing the manuscript, which was complemented by all other authors.
T266 38403-38457 Sentence denotes PC, JMcL, HT and JMC were responsible for the funding.
T267 38458-38503 Sentence denotes The authors declare no conflicts of interest.
T268 38505-38713 Sentence denotes Figure 1 The spike and the ensemble map. (a) A representative view of the new map (EMDB entry EMD-11328), the corresponding FSC curve and the sequence of a monomer of the S protein (from Wrapp et al., 2020).
T269 38714-38854 Sentence denotes The scale bar is 5 nm in length. (b, c) New ensemble cryo-EM map (EMD-11328) compared with that originally presented (EMDB entry EMD-21375).
T270 38855-38936 Sentence denotes The first row (b) corresponds to the new map and the second row (c) to EMD-21375.
T271 38937-39322 Sentence denotes In each row, from left to right: a map representation showing the local resolution (computed with MonoRes; Vilas et al., 2018 ▸), a histogram representation of the local directional resolution dispersion (interquartile range between percentiles 17 and 83) and, finally, a plot showing the radial average of the local tangential resolution (analyzed with MonoDir; Vilas et al., 2020 ▸).
T272 39323-39513 Sentence denotes Figure 2 Flexibility analysis. (a) A representative view of the new ensemble map and the two new classes showing the ‘open conformation’ in Class 1 and the ‘closed conformation’ in Class 2.
T273 39514-39675 Sentence denotes Note the elastic analysis of deformations performed on the Class 1 and Class 2 maps (see the main text), with 1s referring to ‘stretching’ and 1r to ‘rotations’.
T274 39676-39759 Sentence denotes The color code is from blue for minimal deformation to red for maximal deformation.
T275 39760-39903 Sentence denotes The scale bar is 5 nm in length. (b) Representation of the angles defined by the spike when transitioning between the opened and closed states.
T276 39904-39986 Sentence denotes The regions shown in magenta represent the hinges used by the RBD domain to pivot.
T277 39987-40048 Sentence denotes Note that each hinge encompasses two different chain regions.
T278 40049-40173 Sentence denotes The first hinge spans amino acids 318–326 and 588–595, while the second hinge is defined by amino acids 330–335 and 527–531.
T279 40174-40211 Sentence denotes The angles were measured using PyMOL.
T280 40212-40332 Sentence denotes Figure 3 Principal component analysis of the SARS-CoV-2 spike structure. (a) Eigenvalues of principal components (PCs).
T281 40333-40537 Sentence denotes The first three PCs are significant. (b) Scatter plot of the contribution of the first three PCs to each particle image together with the projection of the open and closed class maps, shown as red points.
T282 40538-40753 Sentence denotes The difference between the projections of the two maps is mostly aligned along principal component 1 (PC1). (c) Side view of the first two PCs shown as mean ± 2 × std, where std is the square root of the eigenvalue.
T283 40754-40840 Sentence denotes Coloring indicates the z-depth of the structure, and is added to assist visualization.
T284 40841-40860 Sentence denotes Supplementary Figs.
T285 40861-40916 Sentence denotes S4 and S5 contain additional views of these structures.
T286 40917-40949 Sentence denotes The scale bar is 5 nm in length.
T287 40950-41119 Sentence denotes Figure 4 Analysis of a biochemically stabilized form of the spike. (a, b) A representative view of the stabilized form of the spike map and the corresponding FSC curve.
T288 41120-41205 Sentence denotes The scale bar is 5 nm in length. (c) The local resolution map estimated with MonoRes.