PubAnnotation

Id	Subject	Object	Predicate	Lexical cue
T1	0-72	Sentence	denotes	Continuous flexibility analysis of SARS-CoV-2 spike prefusion structures
T2	73-110	Sentence	denotes	SARS-CoV-2 spike prefusion structures
T3	112-222	Sentence	denotes	The flexibility and conformational dynamics of the SARS-CoV-2 spike in the prefusion state have been analyzed.
T4	223-485	Sentence	denotes	An ensemble map with minimum bias was obtained, revealing concerted motions involving the receptor-binding domain (RBD), N-terminal domain and subdomains 1 and 2 around the previously characterized 1-RBD-up state, which have been modeled as elastic deformations.
T5	487-495	Sentence	denotes	Abstract
T6	496-704	Sentence	denotes	Using a new consensus-based image-processing approach together with principal component analysis, the flexibility and conformational dynamics of the SARS-CoV-2 spike in the prefusion state have been analysed.
T7	705-933	Sentence	denotes	These studies revealed concerted motions involving the receptor-binding domain (RBD), N-terminal domain, and subdomains 1 and 2 around the previously characterized 1-RBD-up state, which have been modeled as elastic deformations.
T8	934-1060	Sentence	denotes	It is shown that in this data set there are not well defined, stable spike conformations, but virtually a continuum of states.
T9	1061-1220	Sentence	denotes	An ensemble map was obtained with minimum bias, from which the extremes of the change along the direction of maximal variance were modeled by flexible fitting.
T10	1221-1409	Sentence	denotes	The results provide a warning of the potential image-processing classification instability of these complicated data sets, which has a direct impact on the interpretability of the results.
T11	1411-1413	Sentence	denotes	1.
T12	1415-1429	Sentence	denotes	Introduction
T13	1430-1537	Sentence	denotes	SARS-CoV-2 infects target cells through the interaction of the viral spike (S) protein with cell receptors.
T14	1538-1640	Sentence	denotes	This is an essentially dynamic event that is hard to analyze using most structural biology techniques.
T15	1641-1986	Sentence	denotes	However, cryo-EM offers some unique capabilities that makes it a very suitable approach for this task, especially the facts that it can work with noncrystalline samples and, to a certain degree, those with structural flexibility (Dashti et al., 2014 ▸; Maji et al., 2020 ▸; Scheres et al., 2007 ▸; Sorzano et al., 2019 ▸; Tagare et al., 2015 ▸).
T16	1987-2179	Sentence	denotes	In turn, cryo-EM information is complex, being buried in thousands of very noisy movies, making it a real challenge to reveal a three-dimensional (3D) structure from this collection of images.
T17	2180-2386	Sentence	denotes	Furthermore, cryo-EM is in the middle of a methodological and instrumental ‘revolution’ (Kühlbrandt, 2014 ▸) that has already been in progress for several years, with new methods constantly being produced.
T18	2387-2550	Sentence	denotes	In this context, the original data of Wrapp et al. (2020 ▸) have been reanalyzed, applying newer workflows and algorithms, and thus obtaining improved information.
T19	2551-2890	Sentence	denotes	Considering that we were studying a biological system that is characterized by its continuous flexibility, we have not strictly followed the standard multi-class approach (Scheres et al., 2007 ▸), which is very well suited to cases of discrete flexibility, since the mathematical modeling and the biological reality could be too far apart.
T20	2891-3232	Sentence	denotes	Instead, we have calculated a new ‘ensemble’ map at 3 Å global resolution in which the bias has been carefully reduced, followed by both a 3D classification process and a continuous flexibility analysis in 3D principal component (PC) space using a GPU-accelerated and algorithmically improved version of the method of Tagare et al. (2015 ▸).
T21	3233-3279	Sentence	denotes	The ensemble map was used for atomic modeling.
T22	3280-3417	Sentence	denotes	Our aim was to explore a larger part of the structural flexibility present in the data set than is achievable by 3D classification alone.
T23	3418-3704	Sentence	denotes	Using this mixed procedure, and through scatter plots of the projection of the different particle images onto the principal component axes, we have clearly shown how the spike flexibility in this data set should be understood as a continuum of states rather than discrete conformations.
T24	3705-3912	Sentence	denotes	Using maximum-likelihood-based classification, we have obtained two maps that are projected at the extremes of the main principal component on which flexible fitting from the ensemble map has been performed.
T25	3913-4110	Sentence	denotes	However, these extreme maps have an intrinsic blurring in the most flexible areas, since for any class that we may define the images come from a continuum of states and are therefore heterogeneous.
T26	4111-4317	Sentence	denotes	This flexibility is substantially reduced in a recently described biochemically stabilized spike (Hsieh et al., 2020 ▸), as shown by the reduced blurring, which translates into an improved local resolution.
T27	4318-4552	Sentence	denotes	In this work, we describe the new structural information that has been obtained and how it impacts our biological understanding of the system, together with the new workflows and algorithms that have made this accomplishment possible.
T28	4553-4687	Sentence	denotes	We used Scipion 2.0 (de la Rosa-Trevín et al., 2016 ▸) in order to easily combine different software suites in the analysis workflows.
T29	4688-5257	Sentence	denotes	Maps and models have been deposited in public databases [EMPIAR (Iudin et al., 2016 ▸) and EMDB (Lawson et al., 2011 ▸)]: SARS-CoV-2 spike in the prefusion state as EMDB entry EMD-11328 and PDB entry 6zow, SARS-CoV-2 stabilized spike in the prefusion state (1-up conformation) as EMDB entry EMD-11341, SARS-CoV-2 spike in the prefusion state (flexibility analysis, 1-up closed conformation) as EMDB entry EMD-11336 and PDB entry 6zp5, and SARS-CoV-2 spike in the prefusion state (flexibility analysis, 1-up open conformation) as EMDB entry EMD-11337 and PDB entry 6zp7.
T30	5258-5470	Sentence	denotes	All of the used data, the image-processing workflow and the intermediate results were also uploaded to EMPIAR (entries EMPIAR-10514 and EMPIAR-10516) by running the EMPIAR automatic deposition feature in Scipion.
T31	5472-5474	Sentence	denotes	2.
T32	5476-5499	Sentence	denotes	Materials and methods
T33	5501-5505	Sentence	denotes	2.1.
T34	5507-5534	Sentence	denotes	Image-processing workflow
T35	5535-6158	Sentence	denotes	The basic elements of the workflow combine classic cryo-EM algorithms with recent improvements in particle picking (Sanchez-Garcia et al., 2018 ▸; Sanchez-Garcia, Segura et al., 2020 ▸; Wagner et al., 2019 ▸) and the key ideas of meta classifiers, which integrate multiple classifiers by a ‘consensus’ approach (Sorzano et al., 2020 ▸), and finish with a totally new approach to map post-processing based on deep learning that we term Deep cryo-EM Map Enhancer (DeepEMhancer; Sanchez-Garcia, Gomez-Blanco et al., 2020 ▸), which complements our previous proposal on local deblurring (Ramírez-Aportela, Vilas et al., 2020 ▸).
T36	6159-6335	Sentence	denotes	Naturally, map and map–model quality analyses are performed using a variety of tools (Pintilie et al., 2020 ▸; Ramírez-Aportela, Maluenda et al., 2020 ▸; Vilas et al., 2020 ▸).
T37	6336-6577	Sentence	denotes	Conformational variability analysis is carried out by explicitly addressing the continuously flexible nature of the underlying biological reality, in which the SARS-CoV-2 spike explores the conformational space to bind the cellular receptor.
T38	6578-6811	Sentence	denotes	Most of the image processing performed in this work was performed using the Scipion framework (de la Rosa-Trevín et al., 2016 ▸), which is a public domain image-processing framework that is freely available at http://scipion.i2pc.es.
T39	6812-6928	Sentence	denotes	A graphical representation of the image-processing workflow used in this work can be found in Supplementary Fig. S1.
T40	6930-6934	Sentence	denotes	2.2.
T41	6936-6954	Sentence	denotes	Meta classifiers
T42	6955-7265	Sentence	denotes	With meta classifiers, and as discussed in Sorzano et al. (2020 ▸), the rationale is that a careful analysis of the ratio between algorithmic degrees of freedom and data size shows that cryo-EM may has transitioned from an area characterized by parameter variance to one dominated by possible parameter biases.
T43	7266-7387	Sentence	denotes	In very simple terms, we have a lot of data, so we can counteract the variance in our data if we deal with random errors.
T44	7388-7569	Sentence	denotes	However, whenever there is the possibility of a systematic error, a so-called ‘bias’, artifacts may occur in the maps and, if this is the case, they can be very difficult to detect.
T45	7570-7851	Sentence	denotes	We deal with the problem of introducing bias into the map through ‘consensus’, so that we select those parameters for which several methods, which are as methodologically ‘orthogonal’ as possible, concur on the same answer (sometimes we also use different runs of the same method).
T46	7852-7985	Sentence	denotes	This notion has been used in several different steps of the workflow as listed below.(i) Contrast transfer function (CTF) estimation.
T47	7986-8051	Sentence	denotes	We estimated the microscope defocus using two different programs:
T48	8052-8115	Sentence	denotes	Gctf (Zhang, 2016 ▸) and CTFFIND4 (Rohou & Grigorieff, 2015 ▸).
T49	8116-8232	Sentence	denotes	We only selected those micrographs for which both estimates agreed up to 2.1 Å resolution (Marabini et al., 2015 ▸).
T50	8233-8257	Sentence	denotes	(ii) Particle selection.
T51	8258-8298	Sentence	denotes	We used two particle-picking algorithms:
T52	8299-8367	Sentence	denotes	Xmipp (Abrishami et al., 2013 ▸) and crYOLO (Wagner et al., 2019 ▸).
T53	8368-8628	Sentence	denotes	We submitted both results to a picking consensus algorithm using deep learning (Sanchez-Garcia et al., 2018 ▸) and also removed all of the coordinates in contaminations, carbon edges etc. using a deep-learning algorithm (Sanchez-Garcia, Segura et al., 2020 ▸).
T54	8629-8845	Sentence	denotes	We then cleaned the set of selected particles using two rounds of cryoSPARC 2D classification (Punjani et al., 2017 ▸; Punjani & Fleet, 2020 ▸) and the consensus of two independent 3D classifications with cryoSPARC.
T55	8846-8867	Sentence	denotes	(iii) Initial volume.
T56	8868-9059	Sentence	denotes	As an initial volume, we selected the major class from the two 3D classifications above and refined it with Xmipp Highres (Sorzano et al., 2018 ▸) with a local refinement of the 3D alignment.
T57	9060-9083	Sentence	denotes	(iv) 3D reconstruction.
T58	9084-9237	Sentence	denotes	We then performed a cryoSPARC non-uniform 3D reconstruction, followed by a local angular refinement using RELION with a 3D mask (Zivanov et al., 2018 ▸).
T59	9238-9498	Sentence	denotes	Particle images were subjected to CTF refinement and Bayesian polishing (Zivanov et al., 2018 ▸), before performing another two rounds of CTF refinement and local angular refinement in RELION, where we improved the resolution versus the first local refinement.
T60	9499-9559	Sentence	denotes	Finally, we performed a non-uniform refinement in cryoSPARC.
T61	9560-9683	Sentence	denotes	The reported nominal resolution of 2.96 Å is based on the gold-standard Fourier shell correlation (FSC) of 0.143 criterion.
T62	9684-9915	Sentence	denotes	Actually, by using Xmipp Highres (Sorzano et al., 2018 ▸) we could improve the resolution to 2.2 Å in the central region of the volume (the region that is not flexible), but at the expense of reducing it more in the flexible areas.
T63	9916-9938	Sentence	denotes	(v) 3D classification.
T64	9939-10080	Sentence	denotes	We then performed two rounds of 3D classification with RELION followed by a consensus 3D classification, yielding two stables, large classes.
T65	10081-10200	Sentence	denotes	Using these two classes, we then performed a local angular refinement using a cryoSPARC non-uniform 3D reconstruction.
T66	10202-10206	Sentence	denotes	2.3.
T67	10208-10228	Sentence	denotes	Particle selection
T68	10229-10381	Sentence	denotes	We found that the micrographs and particles that are used for the 3D reconstruction play a key role in the quality and characteristics of the final map.
T69	10382-10453	Sentence	denotes	In particular, we used the following two procedures.(i) CTF estimation.
T70	10454-10514	Sentence	denotes	We estimated the microscope defocus using Gctf and CTFFIND4.
T71	10515-10649	Sentence	denotes	We required that both estimates were similar (the phase of their corresponding CTFs differed by less than 90°) up to 2.1 Å resolution.
T72	10650-10697	Sentence	denotes	Only 70% of the micrographs met this criterion.
T73	10698-10875	Sentence	denotes	We then estimated the CTF envelope using Xmipp CTF (Sorzano et al., 2007 ▸) while keeping the defocus value fixed (calculated as the average of the Gctf and CTFFIND4 estimates).
T74	10876-10954	Sentence	denotes	We found this step to be very important to retain high-resolution information.
T75	10955-11251	Sentence	denotes	Using Xmipp CTF, we discovered that most of the micrographs had a non-astigmatic validity of between 3 and 4 Å (meaning that at this resolution the assumption of non-astigmatism broke down for most of the micrographs, and only a minority of 30% reached higher resolution in a non-astigmatic way).
T76	11252-11276	Sentence	denotes	(ii) Particle selection.
T77	11277-11332	Sentence	denotes	Two advanced particle-picking algorithms were employed:
T78	11333-11350	Sentence	denotes	Xmipp and crYOLO.
T79	11351-11470	Sentence	denotes	The first identified 1.2 million coordinates possibly pointing to spike particles, while the second identified 730 000.
T80	11471-11582	Sentence	denotes	We then combined the estimates using Deep Consensus with a threshold of 0.99, resulting in 620 000 coordinates.
T81	11583-11732	Sentence	denotes	MicrographCleaner was used to rule out particles selected in the carbon edges, aggregations or contaminations, rejecting a total of 50 000 particles.
T82	11733-11961	Sentence	denotes	After two rounds of CryoSPARC 2D classification with a pixel size of 2.1 Å and an image size of 140 × 140 pixels, we kept 298 000 particles assigned to 2D classes whose centroids clearly corresponded to projections of the spike.
T83	11962-12085	Sentence	denotes	At this point, we performed two initial volume estimates using CryoSPARC, classifying the input particles into two classes.
T84	12086-12273	Sentence	denotes	In both executions, one of the structures clearly corresponded to the spike (with 80% of particles), while the other resulted in a 3D structure that clearly corresponded to contamination.
T85	12274-12413	Sentence	denotes	We calculated the consensus of the two CryoSPARC 3D classifications (those particles that were consistently assigned to the same 3D class).
T86	12414-12503	Sentence	denotes	Only 203 000 particles belonged to the class that was consistently assigned to the spike.
T87	12505-12509	Sentence	denotes	2.4.
T88	12511-12544	Sentence	denotes	Validation and quality analysis
T89	12545-12790	Sentence	denotes	To judge the quality of our structural results, we concentrated here on three of the newest approaches: directional local resolution (Vilas et al., 2020 ▸), Q-score (Pintilie et al., 2020 ▸) and FSC-Q (Ramirez-Aportela, Maluenda et al., 2020 ▸).
T90	12791-13046	Sentence	denotes	The first provides a measure of map quality, while the latter two focus on the relationship between the map and the structural model; in other words, how well the model is supported by the map density, without any other complementary piece of information.
T91	13047-13106	Sentence	denotes	In terms of map-to-model validation, in Supplementary Figs.
T92	13107-13271	Sentence	denotes	S3(a) and S3(b) we present Q-score and FSC-Q metrics, respectively, showing the agreement between the ensemble cryo-EM map and the structural model derived from it.
T93	13272-13466	Sentence	denotes	In most areas the agreement is very good, with the exception of the receptor-binding domain (RBD) and substantial parts of the N-terminal domain (NTD), as expected from their higher flexibility.
T94	13468-13472	Sentence	denotes	2.5.
T95	13474-13498	Sentence	denotes	Volume post-processing
T96	13499-13682	Sentence	denotes	In this work, we used two volume post-processing approaches that both depart substantially from the traditional approach in the field, which is the application of global B-sharpening.
T97	13683-13808	Sentence	denotes	One of the approaches is our previously introduced LocalDeblur sharpening method (Ramírez-Aportela, Maluenda et al., 2020 ▸).
T98	13809-13922	Sentence	denotes	The second approach is a totally new method based on deep learning (Sanchez-Garcia, Gomez-Blanco et al., 2020 ▸).
T99	13923-14227	Sentence	denotes	Concentrating on the latter, this method, DeepEMhancer, relies on a common approach in modern pattern recognition in which a convolutional neural network (CNN) is trained on a known data set comprised of pairs of data points and targets, with the aim of predicting the targets for new unseen data points.
T100	14228-14424	Sentence	denotes	In this case, the training was performed by presenting the CNN with pairs of cryo-EM maps collected from the EMDB and maps derived from the structural models associated with the experimental maps.
T101	14425-14570	Sentence	denotes	As a result, our CNN learned how to obtain much cleaner and detailed versions of the experimental cryo-EM maps, improving their interpretability.
T102	14571-14711	Sentence	denotes	Trying to take advantage of their complementary information, we used the two post-processed maps to trace the atomic model (PDB entry 6zow).
T103	14712-14848	Sentence	denotes	Some examples of the similar improvement of structure modeling according to these two sharpened maps are shown in Supplementary Fig. S2.
T104	14849-14920	Sentence	denotes	The sharpened and unsharpened maps have all been deposited in the EMDB.
T105	14922-14926	Sentence	denotes	2.6.
T106	14928-14944	Sentence	denotes	Model building
T107	14945-15135	Sentence	denotes	The atomic interpretation of the SARS-Cov-2 spike 3D map (PDB entry 6zow) was performed taking advantage of the modeling tools integrated in Scipion as described in Martínez et al. (2020 ▸).
T108	15136-15343	Sentence	denotes	Owing to a lack of sufficient density for the ‘up’ conformation of the RBD, we rigidly fitted the structure of chain A (residues 336–525) of the SARS-CoV-2 RBD in complex with CR30022 Fab (PDB entry 6yla; J.
T109	15344-15351	Sentence	denotes	Huo, Y.
T110	15352-15360	Sentence	denotes	Zhao, J.
T111	15361-15368	Sentence	denotes	Ren, D.
T112	15369-15377	Sentence	denotes	Zhou, H.
T113	15378-15380	Sentence	denotes	M.
T114	15381-15389	Sentence	denotes	Ginn, E.
T115	15390-15392	Sentence	denotes	E.
T116	15393-15400	Sentence	denotes	Fry, R.
T117	15401-15411	Sentence	denotes	Owens & D.
T118	15412-15414	Sentence	denotes	I.
T119	15415-15501	Sentence	denotes	Stuart, unpublished work) to the 3D map using UCSF Chimera (Pettersen et al., 2004 ▸).
T120	15502-15688	Sentence	denotes	This unmodeled part of the structure was called chain ‘a’ since it was part of chain A in the structure previously inferred from the same data set (PDB entry 6vsb; Wrapp et al., 2020 ▸).
T121	15689-15895	Sentence	denotes	The rest of the molecule was modeled using the same original structure (PDB entry 6vsb) as a template, as well as another spike ectodomain structure in the open state (PDB entry 6vyb; Walls et al., 2020 ▸).
T122	15896-16061	Sentence	denotes	The former structure (PDB entry 6vsb) was fitted to the new map and refined using Coot (Emsley et al., 2010 ▸) and phenix_real_space_refine (Afonine et al., 2018 ▸).
T123	16062-16364	Sentence	denotes	Validation metrics were computed to assess the geometry of the new hybrid model and its correlation with the map using ‘Comprehensive Validation (cryo-EM)’ in Phenix, the EMRinger algorithm (Barad et al., 2015 ▸), Q-score (Pintilie et al., 2020 ▸) and FSC-Q (Ramírez-Aportela, Maluenda et al., 2020 ▸).
T124	16365-16484	Sentence	denotes	Score values considering the whole hybrid spike and excluding the unmodeled RBD are detailed in Supplementary Table S1.
T125	16485-16540	Sentence	denotes	The hybrid atomic structures were submitted to the PDB.
T126	16541-16672	Sentence	denotes	iMODFIT (Lopéz-Blanco & Chacón, 2013 ▸) was employed to flexibly fit the hybrid atomic structure to the open and closed class maps.
T127	16674-16678	Sentence	denotes	2.7.
T128	16680-16710	Sentence	denotes	Principal component analysis
T129	16711-16868	Sentence	denotes	The principal component analysis used the expectation–maximization (EM) algorithm presented in Tagare et al. (2015 ▸) with the following minor modifications.
T130	16869-17170	Sentence	denotes	Firstly, in contrast to Tagare et al. (2015 ▸), the images were not Wiener filtered, nor was the projected mean subtracted from the images; instead, the CTF of each image was incorporated into the projection operator of that image and a variable contrast was allowed for the mean volume in each image.
T131	17171-17262	Sentence	denotes	The extent of the variable contrast was determined by the principal component EM algorithm.
T132	17263-17423	Sentence	denotes	Secondly, the mean volume was projected along each projection direction and an image mask was constructed with a liberal soft margin to allow for heterogeneity.
T133	17424-17565	Sentence	denotes	The different masks thus created, with one mask per projection direction, were applied to the images and the masked images were used as data.
T134	17566-17751	Sentence	denotes	This step corresponds to imposing a form of sparsity on the data, which is known to improve the estimation of principal components in high-dimensional spaces (Johnstone & Paul, 2018 ▸).
T135	17752-17863	Sentence	denotes	All images were downsampled by a factor of two to improve the signal-to-noise ratio and to speed up processing.
T136	17864-18004	Sentence	denotes	Finally, during each EM iteration, the principal components were low-pass filtered with a very broad filter whose pass band extended to 4 Å.
T137	18005-18121	Sentence	denotes	This helped with the convergence of the algorithm without significantly limiting the principal component resolution.
T138	18122-18359	Sentence	denotes	As part of the EM iteration, the algorithm in Tagare et al. (2015 ▸) conveniently estimates the expected amount by which each principal component is present in each image (this is the term E[z_j] in equation 15 of Tagare et al., 2015 ▸).
T139	18360-18401	Sentence	denotes	Fig. 3(b) shows a scatter plot of E[z_j].
T140	18402-18592	Sentence	denotes	It is interesting to note that in the algorithm of Tagare et al. (2015 ▸) the latent variables (representing the contributions of the principal components to each particle) are marginalized.
T141	18593-18784	Sentence	denotes	Because of this marginalization, the number of unknown parameters that need to be estimated (the principal components and variances) is fixed and does not change with the number of particles.
T142	18785-19019	Sentence	denotes	We have found this feature to be very valuable for relatively small sets of images (say 100 000 images), which is the case in our work, in order to prevent the number of parameters to be estimated growing with the number of particles.
T143	19020-19156	Sentence	denotes	Statistically speaking, nonmarginalization is known to be a problem when there are few particles, where the estimates can be unreliable.
T144	19157-19256	Sentence	denotes	Since the method developed by Tagare and coworkers does not suffer from this, we chose this method.
T145	19258-19260	Sentence	denotes	3.
T146	19262-19271	Sentence	denotes	Results
T147	19272-19363	Sentence	denotes	With the goal set at analyzing spike flexibility, we describe our key results step by step.
T148	19365-19369	Sentence	denotes	3.1.
T149	19371-19414	Sentence	denotes	The ensemble map and the way to obtain it
T150	19415-19554	Sentence	denotes	In the following, we describe the analysis of SARS-CoV-2 spike stabilized in the prefusion state by two proline substitutions in S2 (S-2P).
T151	19555-19885	Sentence	denotes	We will objectively demonstrate that the flexibility of the spike protein should be understood as a quasi-continuum of conformations, so that when performing a structural analysis on this specimen special care has to be paid to the image-processing workflows, since they may directly impact on the interpretability of the results.
T152	19886-20144	Sentence	denotes	Starting from the original SARS-CoV-2 S-2P data set of Wrapp et al. (2020 ▸), we have completely reanalyzed the data using our public domain software integration platform Scipion (de la Rosa-Trevín et al., 2016 ▸), breaking the global 3 Å resolution barrier.
T153	20145-20444	Sentence	denotes	A representative view of the new ensemble map and its corresponding global FSC curve is shown in Fig. 1 ▸(a) (EMDB entry EMD-11328); the sequence of a monomer of the S protein is shown on the right to facilitate the further discussion of structure–function relationships (from Wrapp et al., 2020 ▸).
T154	20445-20450	Sentence	denotes	Figs.
T155	20451-20628	Sentence	denotes	1 ▸(b) and 1 ▸(c) show a comparison between the original map (Wrapp et al., 2020 ▸) with EMDB code EMD-21375 and the newly reconstructed ensemble map corresponding to EMD-11328.
T156	20629-20718	Sentence	denotes	Clearly, the local resolution (Vilas et al., 2018 ▸), which is shown on the left in Figs.
T157	20719-20833	Sentence	denotes	1 ▸(b) and 1 ▸(c), is increased in the new map, and the anisotropy, which is shown in the center, is much reduced.
T158	20834-21065	Sentence	denotes	Finally, on the right we present plots of the radially averaged tangential resolution, which is related to the quality of the angular alignment (Vilas et al., 2020 ▸); the steeper the slope, the higher the angular assignment error.
T159	21066-21354	Sentence	denotes	As can be appreciated, the slope calculated from the newly obtained map is almost zero when compared with that for the map from Wrapp et al. (2020 ▸), indicating that, in relative terms, the particle alignment used to create the new map is better than that used to build the original map.
T160	21355-21420	Sentence	denotes	The result is an overall quantitative enhancement in map quality.
T161	21421-21776	Sentence	denotes	In terms of tracing, besides modeling several additional residue side chains and improving the geometry of the carbon skeleton (see Supplementary Fig. S2), one of the most noticeable improvements that we observed in the new map is an extension of the glycan chains that were initially built, particularly throughout the S2 fusion subunit (PDB entry 6zow).
T162	21777-21967	Sentence	denotes	A quantitative comparison can be made between the length of glycan chains in the new ‘ensemble structure’ with respect to the previous structure (PDB entry 6vsb; see Supplementary Table S2).
T163	21968-22254	Sentence	denotes	Although the total number of N-linked glycosylation sequons throughout the SARS-CoV-2 S trimer is essentially the same in the new structure (45) and PDB entry 6vsb (44), we have substantially increased the length of the glycan chains, expanding the total number of glycans by about 50%.
T164	22255-22472	Sentence	denotes	We note the importance of this extensive glycosylation for epitope accessibility and how the accurate determination of this glycan shield will facilitate efforts to rapidly develop effective vaccines and therapeutics.
T165	22473-22673	Sentence	denotes	Supplementary Fig. S2 shows a representative section of sharpened versions of the ensemble map (EMDB entry EMD-11328) compared with EMDB entry EMD-21375, in which the glycans can now be better traced.
T166	22674-22911	Sentence	denotes	However, we should not forget that the ensemble map contains images in which the receptor-binding domain (RBD) and N-terminal domain (NTD) are in different positions (see Section 3.2), and consequently these domains appear to be blurred.
T167	22912-23140	Sentence	denotes	Details of how the tracing was performed can be found in Section 2, while in Supplementary Fig. S3 we present two map-to-model quality figures indicating the good fit in general, with the obvious exception of the variable parts.
T168	23142-23146	Sentence	denotes	3.2.
T169	23148-23170	Sentence	denotes	Flexibility analysis
T170	23171-23405	Sentence	denotes	Starting from a carefully selected set of particles obtained from our consensus and cleaning approaches (see Section 2), together with the ensemble map described previously, we subjected the data to the following flexibility analysis.
T171	23406-23578	Sentence	denotes	The original images that were part of the ensemble map went through a ‘consensus classification’ procedure aimed at separating them into two algorithmically stable classes.
T172	23579-23785	Sentence	denotes	Essentially, and as described in more detail in Section 2, we performed two independent classifications, further selecting those particles that were consistently together throughout the two classifications.
T173	23786-23852	Sentence	denotes	In this way, we obtained the two new classes shown in Fig. 2 ▸(a).
T174	23853-24023	Sentence	denotes	We will refer to these as the ‘closed conformation’ [Fig. 2 ▸(a), Class 1, EMDB entry EMD-11336] and the ‘open conformation’ [Fig. 2 ▸(a), Class 2, EMDB entry EMD-11337].
T175	24024-24183	Sentence	denotes	The number of images in each class was reduced to 45 000 in one case and 21 000 in the other, with global FSC-based resolutions of 3.1 and 3.3 Å, respectively.
T176	24184-24357	Sentence	denotes	The open and closed structures depict a clear and concerted movement of the ‘thumb’ formed by the RBD and subdomains 1 and 2 (SD1 and SD2) and the NTD of an adjacent chain.
T177	24358-24448	Sentence	denotes	The thumb moves away from the central spike axis, exposing the RBD in the up conformation.
T178	24449-24796	Sentence	denotes	In order to make clearer where the changes are at the level of the Class 1 and Class 2 maps, we have made use of the representation of map local strains in Sorzano et al. (2016 ▸), which helps to very clearly visualize the type of strains needed to relate two maps, whether these are rigid-body rotations or more complex deformations (stretching).
T179	24797-24981	Sentence	denotes	We have termed the maps resulting from this elastic analysis as ‘1s’ (Class 1, stretching) and ‘1r’ (Class 1, rotations) on the right-hand side of Fig. 2 ▸(a) and the same for Class 2.
T180	24982-25074	Sentence	denotes	The color scale for both stretching and rotations goes from blue for small to red for large.
T181	25075-25425	Sentence	denotes	Clearly, the differences among the classes with respect to the NTD and RBD have a very substantial component of pure coordinated rigid-body rotations, while the different RBDs present a much more complex pattern of deformations (stretching), indicating an important structural rearrangement in this area that does not occur elsewhere in the specimen.
T182	25426-25679	Sentence	denotes	In terms of atomic modeling, we performed a flexible fitting of the ensemble model onto the closed and open forms [see Fig. 2 ▸(a), rightmost map; the PDB code for the open conformation is PDB entry 6zp7, while that for the closed conformation is 6zp5].
T183	25680-25862	Sentence	denotes	Focusing on rotations, which are the most simple element to follow, we can quantify that the degree of rotation of the thumb in these classes is close to 6°, as shown in Fig. 2 ▸(b).
T184	25863-26066	Sentence	denotes	Given this flexibility, we consider that the best way to correctly present the experimental results is through the movie provided as Supplementary Movie S1, in which maps and atomic models are presented.
T185	26067-26412	Sentence	denotes	Within the approximation to modeling that a flexible fitting represents, we can appreciate two hinge movements of the RBD–SD1–SD2 domains: one located between amino acids 318–326 and 588–595 that produces most of the displacement, and the other between amino acids 330–335 and 527–531 that accompanies a less pronounced ‘up’ movement of the RBD.
T186	26413-26505	Sentence	denotes	This thumb motion is completed by the accompanying motion of the NTD from an adjacent chain.
T187	26506-26767	Sentence	denotes	Also in a collective way, other NTDs and RBDs in the down conformation move slightly, as can better be appreciated in Supplementary Movie S1, where the transition between fitted models overlaps with the interpolation between observed high-resolution class maps.
T188	26768-26862	Sentence	denotes	To further investigate whether or not the flexibility was continuous, we proceeded as follows.
T189	26863-26994	Sentence	denotes	Images from the two classes were pooled together and, using the ensemble map, subjected to a 3D principal component analysis (PCA).
T190	26995-27100	Sentence	denotes	The approach we followed is based on Tagare et al. (2015 ▸), with some minor modifications of the method.
T191	27101-27167	Sentence	denotes	A detailed explanation of the modifications is given in Section 2.
T192	27168-27328	Sentence	denotes	We initialized the first principal component (PC) to the difference between the open and closed conformation, while the remaining PCs were initialized randomly.
T193	27329-27430	Sentence	denotes	Upon convergence, the eigenvalue of each PC and the scatter of the images in PC space was calculated.
T194	27431-27483	Sentence	denotes	The eigenvalues of the PCs are shown in Fig. 3 ▸(a).
T195	27484-27529	Sentence	denotes	Clearly, the first three PCs are significant.
T196	27530-27606	Sentence	denotes	The scatter plot of the image data in PC1–PC3 space is shown in Fig. 3 ▸(b).
T197	27607-27720	Sentence	denotes	Fig. 3 ▸(b) strongly suggests that there is ‘continuous flexibility’ rather than ‘tightly clustered’ flexibility.
T198	27721-27861	Sentence	denotes	Fig. 3 ▸(b) also shows the projection of the maps corresponding to the open and closed conformations on the extremes of the first three PCs.
T199	27862-28037	Sentence	denotes	It is clear that the open and closed conformations are aligned mostly along the first PC, suggesting that the open/closed classification captures the most significant changes.
T200	28038-28168	Sentence	denotes	Fig. 3 ▸(c) shows side views of a pair of structures (mean ± 2 × std, where std is the square root of the eigenvalue) for each PC.
T201	28169-28244	Sentence	denotes	Additional details of these structures are available in Supplementary Figs.
T202	28245-28255	Sentence	denotes	S4 and S5.
T203	28256-28405	Sentence	denotes	Note that PCs are not to be understood as structural pathways with a biological meaning, but as directions that summarize the variance of a data set.
T204	28406-28710	Sentence	denotes	For instance, the fact that the RBD appears and disappears at the two extremes of PC3 indicates that there is an important variability in these voxels, which is probably indicative of the up and down conformations of the RBD [to be understood in the context of the elastic analysis shown in Fig. 2 ▸(b)].
T205	28711-28854	Sentence	denotes	Through this combination of approaches, we have learnt that the spike conformation fluctuates virtually randomly in a rather continuous manner.
T206	28855-29173	Sentence	denotes	Additionally, the approach taken to define the two algorithmically stable ‘classes’ has clearly partitioned the data set according to the main axis of variance, PC1, since the projections of the maps of these classes fall almost exclusively along PC1 and are located towards the extremes of the image-projection cloud.
T207	29174-29390	Sentence	denotes	Note that the fraction of structural flexibility owing to PC2 and PC3 is also important in terms of the total variance of the complete image set, but that classification approaches do not seem to properly explore it.
T208	29391-29547	Sentence	denotes	Unfortunately, the resolution in PC2 and PC3 is currently limited, so it is difficult to derive clear structural conclusions from these low-resolution maps.
T209	29548-29764	Sentence	denotes	However, it is clear from these data that the dynamics of the spike are far richer than just a rigid body closing and opening, and involves more profound rearrangements, especially at the RBD but also at other sites.
T210	29765-29863	Sentence	denotes	This observation is similar to that of Ke et al. (2020 ▸) when working with subtomogram averaging.
T211	29864-30309	Sentence	denotes	Additionally, the fact that PCA indicates this continuous flexibility to be a key characteristic of the spike dynamics also suggests that many other forms of partitioning (rather than properly ‘classifying’) of this continuous data set could be devised, this fact just being a consequence of the intrinsic instability created by forcing a quasi-continuous data distribution without any clustering structure to fit into a defined set of clusters.
T212	30310-30677	Sentence	denotes	In this work, we have clearly forced the classification to go to the extremes of the data distribution, as shown in Fig. 3 ▸, probably by enforcing an algorithmically stable classification, but the key result is that any other degree of movement of the spike in between these extremes of PC1 as well as PC2 and PC3 would also be consistent with the experimental data.
T213	30678-30898	Sentence	denotes	In other words, since the continuum of conformations does not have clear ‘cutting/classification’ points, there is a certain algorithmic uncertainty and instability as to the possible results of a classification process.
T214	30899-31168	Sentence	denotes	Note that this instability could be exacerbated by the step of particle picking, in the sense that different picking algorithms may have different biases (precisely to minimize this instability, we have performed a ‘consensus’ approach to picking throughout this work).
T215	31169-31439	Sentence	denotes	Clearly, flexibility is key in this system, so that alterations in its dynamics may cause profound effects, including viral neutralization, and this could be one of the reasons for the neutralization mechanism of antibodies directed against the NTD (Chi et al., 2020 ▸).
T216	31441-31445	Sentence	denotes	3.3.
T217	31447-31506	Sentence	denotes	Structure of a biochemically stabilized form of the spike
T218	31507-31607	Sentence	denotes	We have also worked with a more recent variant containing six proline substitutions in S2 (HexaPro).
T219	31608-31670	Sentence	denotes	This second protein was also studied by Hsieh et al. (2020 ▸).
T220	31671-31995	Sentence	denotes	In this case, after going through the same stringent particle-selection process as for the previous specimen, as presented in depth in Section 2, it was impossible to obtain stable classes, so that in Fig. 4 ▸ we present a single map (EMDB entry EMD-11341) together with its global FSC curve and a local resolution analysis.
T221	31996-32149	Sentence	denotes	It is clear that the local resolution has increased in the moving parts (mostly the RBD and NTD), although we did not feel confident in further modeling.
T222	32151-32153	Sentence	denotes	4.
T223	32155-32168	Sentence	denotes	Conclusions
T224	32169-32432	Sentence	denotes	In this work, we present a clear example of how the structural discovery process can be greatly accelerated by a wise combination of rapid data sharing and the use of the wave of newly developed algorithms that characterize this phase of the ‘cryo-EM revolution’.
T225	32433-32624	Sentence	denotes	The reanalysis of the data used in Wrapp et al. (2020 ▸), but with new workflows and new tools, has resulted in a rich analysis of the spike flexibility as a key characteristic of the system.
T226	32625-32798	Sentence	denotes	Essentially, and at least to a first approximation, the spike moves in a continuous manner with no preferential states, as clearly shown in the scatter plots in Fig. 3 ▸(b).
T227	32799-32976	Sentence	denotes	In this way, the result of a particular instance of image-processing analysis, including a 3D classification, should be regarded as a snapshot of this quasi-continuum of states.
T228	32977-33229	Sentence	denotes	In our case, we have shown that a particular meta image-classification approach, implemented through a consensus among different methods in many steps of the analysis, results in classes that are at the extreme of the main axis of variance in PC space.
T229	33230-33462	Sentence	denotes	Clearly, PC1, through the analysis of the two extreme classes, reflects a concerted motion of the NTD–RBD–SD1–SD2 thumb, although there are smaller collective movements throughout the spike (see Fig. 2 ▸ and Supplementary Movie S1).
T230	33463-33599	Sentence	denotes	In this case, the RBD moves together with the NTD, with a smaller degree of independent flexibility and always in the ‘up’ conformation.
T231	33600-33853	Sentence	denotes	The NTD–RBD movement can be characterized to a large degree as a rotation, but the different RBDs present a much more complex pattern of flexibility, indicating an important structural rearrangement [from elastic analysis (Fig. 2 ▸) and PCA (Fig. 3 ▸)].
T232	33854-34130	Sentence	denotes	The presence of quasi-solid body rotation hinges is clearly located between amino acids 318–326 and 588–595, which produce most of the displacement, together with other hinges between amino acids 330–335 and 527–531, which accompany a less pronounced ‘up’ movement of the RBD.
T233	34131-34288	Sentence	denotes	However, there are other PC axes explaining significant fractions of the inter-image variance that are not properly explored at the level of our two classes.
T234	34289-34479	Sentence	denotes	PC3 is a clear example, indicating a high variance at the voxels associated with RBD up, which probably suggests large conformational changes in this area that result in the RBD moving down.
T235	34480-34833	Sentence	denotes	The flexibility analysis performed in this work complements previous analysis showing large rotations together with RBD up–down structural changes (Pinto et al., 2020 ▸; Wrapp et al., 2020 ▸), in the sense that the different studies present ‘snapshots’ of a continuum of movements obtained by a particular instance of an image-processing classification.
T236	34834-34963	Sentence	denotes	In a sense, all of these results are correct, but none of them is able to capture the richness of the flexibility of this system.
T237	34964-35201	Sentence	denotes	This fact reflects the intrinsic instability of segmenting a continuum into defined clusters, which is a clear limitation of the classification approaches that needs to be considered in detailed analysis of any data set from this system.
T238	35202-35483	Sentence	denotes	An obvious way to increase the resolution of the moving parts of the spike is to reduce their mobility, as is the case, for instance, in the biochemical stabilization of Hsieh et al. (2020 ▸) and also in the formation of a complex with an antibody against NTD (Chi et al., 2020 ▸).
T239	35484-35853	Sentence	denotes	On the other hand, the route towards a more complete analysis of the flexibility of the spike protein necessarily involves the analysis of data sets that are quite substantially larger than those being used in most current SARS-CoV-2 studies, so that all of the main axes of inter-image variability can be explored; this is work that is under development at the moment.
T240	35854-36194	Sentence	denotes	From a biomedical perspective, the proof that a quasi-continuum of flexibility is a key characteristic of this specimen, a concept that has been implicitly considered in much of the structural work performed so far but never demonstrated, suggests that ways to interfere with this flexibility could be important components of new therapies.
T241	36196-36218	Sentence	denotes	Supplementary Material
T242	36219-36281	Sentence	denotes	EMDB reference: SARS-CoV-2 spike in prefusion state, EMD-11328
T243	36282-36393	Sentence	denotes	EMDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up closed conformation), EMD-11336
T244	36394-36503	Sentence	denotes	EMDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up open conformation), EMD-11337
T245	36504-36597	Sentence	denotes	EMDB reference: SARS-CoV-2 stabilized spike in prefusion state (1-up conformation), EMD-11341
T246	36598-36654	Sentence	denotes	PDB reference: SARS-CoV-2 spike in prefusion state, 6zow
T247	36655-36760	Sentence	denotes	PDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up closed conformation), 6zp5
T248	36761-36864	Sentence	denotes	PDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up open conformation), 6zp7
T249	36865-36898	Sentence	denotes	Supplementary Figures and Tables.
T250	36899-36903	Sentence	denotes	DOI:
T251	36904-36944	Sentence	denotes	10.1107/S2052252520012725/fq5016sup1.pdf
T252	36945-36981	Sentence	denotes	Click here for additional data file.
T253	36982-37005	Sentence	denotes	Supplementary Movie S1.
T254	37006-37147	Sentence	denotes	Movie presenting the morphing between the two algorithmically stable classes described in the main text, spanning principal component axis 1.
T255	37148-37152	Sentence	denotes	DOI:
T256	37153-37193	Sentence	denotes	10.1107/S2052252520012725/fq5016sup2.mp4
T257	37195-37405	Sentence	denotes	We acknowledge the support from the Advanced Computing and e-Science group at the Institute of Physics of Cantabria (IFCA–CSIC–UC) as well as the Barcelona Supercomputer Center (access project BCV-2020-2-0005).
T258	37406-37539	Sentence	denotes	The authors acknowledge the support and the use of resources of Instruct, a Landmark ESFRI project (Instruct Access Project ID11775).
T259	37540-37577	Sentence	denotes	Author contributions were as follows.
T260	37578-37762	Sentence	denotes	Roberto Melero and COSS performed all of the image analysis in Scipion, while BF performed equivalent work in the principal component analysis and JLV in the local resolution analysis.
T261	37763-37931	Sentence	denotes	MM and Roberto Marabini were in charge of structural modeling, while Pablo Chacon performed the flexible fittings and incorporated important sections of the manuscript.
T262	37932-38122	Sentence	denotes	ER-A performed the map-to-model analysis as well as generating the sharpened cryo-EM maps, while RS-G also worked in new sharpening methods and DH performed the elastic inter-class analysis.
T263	38123-38209	Sentence	denotes	Pablo Conesa, YF-R, LdC and PL were in charge of the IT hardware and software support.
T264	38210-38282	Sentence	denotes	JMcL and DW supplied the images and provided advice throughout the work.
T265	38283-38402	Sentence	denotes	HT, COSS and JMC conceptualized the work, with JMC writing the manuscript, which was complemented by all other authors.
T266	38403-38457	Sentence	denotes	PC, JMcL, HT and JMC were responsible for the funding.
T267	38458-38503	Sentence	denotes	The authors declare no conflicts of interest.
T268	38505-38713	Sentence	denotes	Figure 1 The spike and the ensemble map. (a) A representative view of the new map (EMDB entry EMD-11328), the corresponding FSC curve and the sequence of a monomer of the S protein (from Wrapp et al., 2020).
T269	38714-38854	Sentence	denotes	The scale bar is 5 nm in length. (b, c) New ensemble cryo-EM map (EMD-11328) compared with that originally presented (EMDB entry EMD-21375).
T270	38855-38936	Sentence	denotes	The first row (b) corresponds to the new map and the second row (c) to EMD-21375.
T271	38937-39322	Sentence	denotes	In each row, from left to right: a map representation showing the local resolution (computed with MonoRes; Vilas et al., 2018 ▸), a histogram representation of the local directional resolution dispersion (interquartile range between percentiles 17 and 83) and, finally, a plot showing the radial average of the local tangential resolution (analyzed with MonoDir; Vilas et al., 2020 ▸).
T272	39323-39513	Sentence	denotes	Figure 2 Flexibility analysis. (a) A representative view of the new ensemble map and the two new classes showing the ‘open conformation’ in Class 1 and the ‘closed conformation’ in Class 2.
T273	39514-39675	Sentence	denotes	Note the elastic analysis of deformations performed on the Class 1 and Class 2 maps (see the main text), with 1s referring to ‘stretching’ and 1r to ‘rotations’.
T274	39676-39759	Sentence	denotes	The color code is from blue for minimal deformation to red for maximal deformation.
T275	39760-39903	Sentence	denotes	The scale bar is 5 nm in length. (b) Representation of the angles defined by the spike when transitioning between the opened and closed states.
T276	39904-39986	Sentence	denotes	The regions shown in magenta represent the hinges used by the RBD domain to pivot.
T277	39987-40048	Sentence	denotes	Note that each hinge encompasses two different chain regions.
T278	40049-40173	Sentence	denotes	The first hinge spans amino acids 318–326 and 588–595, while the second hinge is defined by amino acids 330–335 and 527–531.
T279	40174-40211	Sentence	denotes	The angles were measured using PyMOL.
T280	40212-40332	Sentence	denotes	Figure 3 Principal component analysis of the SARS-CoV-2 spike structure. (a) Eigenvalues of principal components (PCs).
T281	40333-40537	Sentence	denotes	The first three PCs are significant. (b) Scatter plot of the contribution of the first three PCs to each particle image together with the projection of the open and closed class maps, shown as red points.
T282	40538-40753	Sentence	denotes	The difference between the projections of the two maps is mostly aligned along principal component 1 (PC1). (c) Side view of the first two PCs shown as mean ± 2 × std, where std is the square root of the eigenvalue.
T283	40754-40840	Sentence	denotes	Coloring indicates the z-depth of the structure, and is added to assist visualization.
T284	40841-40860	Sentence	denotes	Supplementary Figs.
T285	40861-40916	Sentence	denotes	S4 and S5 contain additional views of these structures.
T286	40917-40949	Sentence	denotes	The scale bar is 5 nm in length.
T287	40950-41119	Sentence	denotes	Figure 4 Analysis of a biochemically stabilized form of the spike. (a, b) A representative view of the stabilized form of the spike map and the corresponding FSC curve.
T288	41120-41205	Sentence	denotes	The scale bar is 5 nm in length. (c) The local resolution map estimated with MonoRes.

T1

0-72

Sentence

denotes

Continuous flexibility analysis of SARS-CoV-2 spike prefusion structures

T2

73-110

Sentence

denotes

SARS-CoV-2 spike prefusion structures

T3

112-222

Sentence

denotes

The flexibility and conformational dynamics of the SARS-CoV-2 spike in the prefusion state have been analyzed.

T4

223-485

Sentence

denotes

An ensemble map with minimum bias was obtained, revealing concerted motions involving the receptor-binding domain (RBD), N-terminal domain and subdomains 1 and 2 around the previously characterized 1-RBD-up state, which have been modeled as elastic deformations.

T5

487-495

Sentence

denotes

Abstract

T6

496-704

Sentence

denotes

Using a new consensus-based image-processing approach together with principal component analysis, the flexibility and conformational dynamics of the SARS-CoV-2 spike in the prefusion state have been analysed.

T7

705-933

Sentence

denotes

These studies revealed concerted motions involving the receptor-binding domain (RBD), N-terminal domain, and subdomains 1 and 2 around the previously characterized 1-RBD-up state, which have been modeled as elastic deformations.

T8

934-1060

Sentence

denotes

It is shown that in this data set there are not well defined, stable spike conformations, but virtually a continuum of states.

T9

1061-1220

Sentence

denotes

An ensemble map was obtained with minimum bias, from which the extremes of the change along the direction of maximal variance were modeled by flexible fitting.

T10

1221-1409

Sentence

denotes

The results provide a warning of the potential image-processing classification instability of these complicated data sets, which has a direct impact on the interpretability of the results.

T11

1411-1413

Sentence

denotes

1.

T12

1415-1429

Sentence

denotes

Introduction

T13

1430-1537

Sentence

denotes

SARS-CoV-2 infects target cells through the interaction of the viral spike (S) protein with cell receptors.

T14

1538-1640

Sentence

denotes

This is an essentially dynamic event that is hard to analyze using most structural biology techniques.

T15

1641-1986

Sentence

denotes

However, cryo-EM offers some unique capabilities that makes it a very suitable approach for this task, especially the facts that it can work with noncrystalline samples and, to a certain degree, those with structural flexibility (Dashti et al., 2014 ▸; Maji et al., 2020 ▸; Scheres et al., 2007 ▸; Sorzano et al., 2019 ▸; Tagare et al., 2015 ▸).

T16

1987-2179

Sentence

denotes

In turn, cryo-EM information is complex, being buried in thousands of very noisy movies, making it a real challenge to reveal a three-dimensional (3D) structure from this collection of images.

T17

2180-2386

Sentence

denotes

Furthermore, cryo-EM is in the middle of a methodological and instrumental ‘revolution’ (Kühlbrandt, 2014 ▸) that has already been in progress for several years, with new methods constantly being produced.

T18

2387-2550

Sentence

denotes

In this context, the original data of Wrapp et al. (2020 ▸) have been reanalyzed, applying newer workflows and algorithms, and thus obtaining improved information.

T19

2551-2890

Sentence

denotes

Considering that we were studying a biological system that is characterized by its continuous flexibility, we have not strictly followed the standard multi-class approach (Scheres et al., 2007 ▸), which is very well suited to cases of discrete flexibility, since the mathematical modeling and the biological reality could be too far apart.

T20

2891-3232

Sentence

denotes

Instead, we have calculated a new ‘ensemble’ map at 3 Å global resolution in which the bias has been carefully reduced, followed by both a 3D classification process and a continuous flexibility analysis in 3D principal component (PC) space using a GPU-accelerated and algorithmically improved version of the method of Tagare et al. (2015 ▸).

T21

3233-3279

Sentence

denotes

The ensemble map was used for atomic modeling.

T22

3280-3417

Sentence

denotes

Our aim was to explore a larger part of the structural flexibility present in the data set than is achievable by 3D classification alone.

T23

3418-3704

Sentence

denotes

Using this mixed procedure, and through scatter plots of the projection of the different particle images onto the principal component axes, we have clearly shown how the spike flexibility in this data set should be understood as a continuum of states rather than discrete conformations.

T24

3705-3912

Sentence

denotes

Using maximum-likelihood-based classification, we have obtained two maps that are projected at the extremes of the main principal component on which flexible fitting from the ensemble map has been performed.

T25

3913-4110

Sentence

denotes

However, these extreme maps have an intrinsic blurring in the most flexible areas, since for any class that we may define the images come from a continuum of states and are therefore heterogeneous.

T26

4111-4317

Sentence

denotes

This flexibility is substantially reduced in a recently described biochemically stabilized spike (Hsieh et al., 2020 ▸), as shown by the reduced blurring, which translates into an improved local resolution.

T27

4318-4552

Sentence

denotes

In this work, we describe the new structural information that has been obtained and how it impacts our biological understanding of the system, together with the new workflows and algorithms that have made this accomplishment possible.

T28

4553-4687

Sentence

denotes

We used Scipion 2.0 (de la Rosa-Trevín et al., 2016 ▸) in order to easily combine different software suites in the analysis workflows.

T29

4688-5257

Sentence

denotes

Maps and models have been deposited in public databases [EMPIAR (Iudin et al., 2016 ▸) and EMDB (Lawson et al., 2011 ▸)]: SARS-CoV-2 spike in the prefusion state as EMDB entry EMD-11328 and PDB entry 6zow, SARS-CoV-2 stabilized spike in the prefusion state (1-up conformation) as EMDB entry EMD-11341, SARS-CoV-2 spike in the prefusion state (flexibility analysis, 1-up closed conformation) as EMDB entry EMD-11336 and PDB entry 6zp5, and SARS-CoV-2 spike in the prefusion state (flexibility analysis, 1-up open conformation) as EMDB entry EMD-11337 and PDB entry 6zp7.

T30

5258-5470

Sentence

denotes

All of the used data, the image-processing workflow and the intermediate results were also uploaded to EMPIAR (entries EMPIAR-10514 and EMPIAR-10516) by running the EMPIAR automatic deposition feature in Scipion.

T31

5472-5474

Sentence

denotes

2.

T32

5476-5499

Sentence

denotes

Materials and methods

T33

5501-5505

Sentence

denotes

2.1.

T34

5507-5534

Sentence

denotes

Image-processing workflow

T35

5535-6158

Sentence

denotes

The basic elements of the workflow combine classic cryo-EM algorithms with recent improvements in particle picking (Sanchez-Garcia et al., 2018 ▸; Sanchez-Garcia, Segura et al., 2020 ▸; Wagner et al., 2019 ▸) and the key ideas of meta classifiers, which integrate multiple classifiers by a ‘consensus’ approach (Sorzano et al., 2020 ▸), and finish with a totally new approach to map post-processing based on deep learning that we term Deep cryo-EM Map Enhancer (DeepEMhancer; Sanchez-Garcia, Gomez-Blanco et al., 2020 ▸), which complements our previous proposal on local deblurring (Ramírez-Aportela, Vilas et al., 2020 ▸).

T36

6159-6335

Sentence

denotes

Naturally, map and map–model quality analyses are performed using a variety of tools (Pintilie et al., 2020 ▸; Ramírez-Aportela, Maluenda et al., 2020 ▸; Vilas et al., 2020 ▸).

T37

6336-6577

Sentence

denotes

Conformational variability analysis is carried out by explicitly addressing the continuously flexible nature of the underlying biological reality, in which the SARS-CoV-2 spike explores the conformational space to bind the cellular receptor.

T38

6578-6811

Sentence

denotes

Most of the image processing performed in this work was performed using the Scipion framework (de la Rosa-Trevín et al., 2016 ▸), which is a public domain image-processing framework that is freely available at http://scipion.i2pc.es.

T39

6812-6928

Sentence

denotes

A graphical representation of the image-processing workflow used in this work can be found in Supplementary Fig. S1.

T40

6930-6934

Sentence

denotes

2.2.

T41

6936-6954

Sentence

denotes

Meta classifiers

T42

6955-7265

Sentence

denotes

With meta classifiers, and as discussed in Sorzano et al. (2020 ▸), the rationale is that a careful analysis of the ratio between algorithmic degrees of freedom and data size shows that cryo-EM may has transitioned from an area characterized by parameter variance to one dominated by possible parameter biases.

T43

7266-7387

Sentence

denotes

In very simple terms, we have a lot of data, so we can counteract the variance in our data if we deal with random errors.

T44

7388-7569

Sentence

denotes

However, whenever there is the possibility of a systematic error, a so-called ‘bias’, artifacts may occur in the maps and, if this is the case, they can be very difficult to detect.

T45

7570-7851

Sentence

denotes

We deal with the problem of introducing bias into the map through ‘consensus’, so that we select those parameters for which several methods, which are as methodologically ‘orthogonal’ as possible, concur on the same answer (sometimes we also use different runs of the same method).

T46

7852-7985

Sentence

denotes

This notion has been used in several different steps of the workflow as listed below.(i) Contrast transfer function (CTF) estimation.

T47

7986-8051

Sentence

denotes

We estimated the microscope defocus using two different programs:

T48

8052-8115

Sentence

denotes

Gctf (Zhang, 2016 ▸) and CTFFIND4 (Rohou & Grigorieff, 2015 ▸).

T49

8116-8232

Sentence

denotes

We only selected those micrographs for which both estimates agreed up to 2.1 Å resolution (Marabini et al., 2015 ▸).

T50

8233-8257

Sentence

denotes

(ii) Particle selection.

T51

8258-8298

Sentence

denotes

We used two particle-picking algorithms:

T52

8299-8367

Sentence

denotes

Xmipp (Abrishami et al., 2013 ▸) and crYOLO (Wagner et al., 2019 ▸).

T53

8368-8628

Sentence

denotes

We submitted both results to a picking consensus algorithm using deep learning (Sanchez-Garcia et al., 2018 ▸) and also removed all of the coordinates in contaminations, carbon edges etc. using a deep-learning algorithm (Sanchez-Garcia, Segura et al., 2020 ▸).

T54

8629-8845

Sentence

denotes

We then cleaned the set of selected particles using two rounds of cryoSPARC 2D classification (Punjani et al., 2017 ▸; Punjani & Fleet, 2020 ▸) and the consensus of two independent 3D classifications with cryoSPARC.

T55

8846-8867

Sentence

denotes

(iii) Initial volume.

T56

8868-9059

Sentence

denotes

As an initial volume, we selected the major class from the two 3D classifications above and refined it with Xmipp Highres (Sorzano et al., 2018 ▸) with a local refinement of the 3D alignment.

T57

9060-9083

Sentence

denotes

(iv) 3D reconstruction.

T58

9084-9237

Sentence

denotes

We then performed a cryoSPARC non-uniform 3D reconstruction, followed by a local angular refinement using RELION with a 3D mask (Zivanov et al., 2018 ▸).

T59

9238-9498

Sentence

denotes

Particle images were subjected to CTF refinement and Bayesian polishing (Zivanov et al., 2018 ▸), before performing another two rounds of CTF refinement and local angular refinement in RELION, where we improved the resolution versus the first local refinement.

T60

9499-9559

Sentence

denotes

Finally, we performed a non-uniform refinement in cryoSPARC.

T61

9560-9683

Sentence

denotes

The reported nominal resolution of 2.96 Å is based on the gold-standard Fourier shell correlation (FSC) of 0.143 criterion.

T62

9684-9915

Sentence

denotes

Actually, by using Xmipp Highres (Sorzano et al., 2018 ▸) we could improve the resolution to 2.2 Å in the central region of the volume (the region that is not flexible), but at the expense of reducing it more in the flexible areas.

T63

9916-9938

Sentence

denotes

(v) 3D classification.

T64

9939-10080

Sentence

denotes

We then performed two rounds of 3D classification with RELION followed by a consensus 3D classification, yielding two stables, large classes.

T65

10081-10200

Sentence

denotes

Using these two classes, we then performed a local angular refinement using a cryoSPARC non-uniform 3D reconstruction.

T66

10202-10206

Sentence

denotes

2.3.

T67

10208-10228

Sentence

denotes

Particle selection

T68

10229-10381

Sentence

denotes

We found that the micrographs and particles that are used for the 3D reconstruction play a key role in the quality and characteristics of the final map.

T69

10382-10453

Sentence

denotes

In particular, we used the following two procedures.(i) CTF estimation.

T70

10454-10514

Sentence

denotes

We estimated the microscope defocus using Gctf and CTFFIND4.

T71

10515-10649

Sentence

denotes

We required that both estimates were similar (the phase of their corresponding CTFs differed by less than 90°) up to 2.1 Å resolution.

T72

10650-10697

Sentence

denotes

Only 70% of the micrographs met this criterion.

T73

10698-10875

Sentence

denotes

We then estimated the CTF envelope using Xmipp CTF (Sorzano et al., 2007 ▸) while keeping the defocus value fixed (calculated as the average of the Gctf and CTFFIND4 estimates).

T74

10876-10954

Sentence

denotes

We found this step to be very important to retain high-resolution information.

T75

10955-11251

Sentence

denotes

Using Xmipp CTF, we discovered that most of the micrographs had a non-astigmatic validity of between 3 and 4 Å (meaning that at this resolution the assumption of non-astigmatism broke down for most of the micrographs, and only a minority of 30% reached higher resolution in a non-astigmatic way).

T76

11252-11276

Sentence

denotes

(ii) Particle selection.

T77

11277-11332

Sentence

denotes

Two advanced particle-picking algorithms were employed:

T78

11333-11350

Sentence

denotes

Xmipp and crYOLO.

T79

11351-11470

Sentence

denotes

The first identified 1.2 million coordinates possibly pointing to spike particles, while the second identified 730 000.

T80

11471-11582

Sentence

denotes

We then combined the estimates using Deep Consensus with a threshold of 0.99, resulting in 620 000 coordinates.

T81

11583-11732

Sentence

denotes

MicrographCleaner was used to rule out particles selected in the carbon edges, aggregations or contaminations, rejecting a total of 50 000 particles.

T82

11733-11961

Sentence

denotes

After two rounds of CryoSPARC 2D classification with a pixel size of 2.1 Å and an image size of 140 × 140 pixels, we kept 298 000 particles assigned to 2D classes whose centroids clearly corresponded to projections of the spike.

T83

11962-12085

Sentence

denotes

At this point, we performed two initial volume estimates using CryoSPARC, classifying the input particles into two classes.

T84

12086-12273

Sentence

denotes

In both executions, one of the structures clearly corresponded to the spike (with 80% of particles), while the other resulted in a 3D structure that clearly corresponded to contamination.

T85

12274-12413

Sentence

denotes

We calculated the consensus of the two CryoSPARC 3D classifications (those particles that were consistently assigned to the same 3D class).

T86

12414-12503

Sentence

denotes

Only 203 000 particles belonged to the class that was consistently assigned to the spike.

T87

12505-12509

Sentence

denotes

2.4.

T88

12511-12544

Sentence

denotes

Validation and quality analysis

T89

12545-12790

Sentence

denotes

To judge the quality of our structural results, we concentrated here on three of the newest approaches: directional local resolution (Vilas et al., 2020 ▸), Q-score (Pintilie et al., 2020 ▸) and FSC-Q (Ramirez-Aportela, Maluenda et al., 2020 ▸).

T90

12791-13046

Sentence

denotes

The first provides a measure of map quality, while the latter two focus on the relationship between the map and the structural model; in other words, how well the model is supported by the map density, without any other complementary piece of information.

T91

13047-13106

Sentence

denotes

In terms of map-to-model validation, in Supplementary Figs.

T92

13107-13271

Sentence

denotes

S3(a) and S3(b) we present Q-score and FSC-Q metrics, respectively, showing the agreement between the ensemble cryo-EM map and the structural model derived from it.

T93

13272-13466

Sentence

denotes

In most areas the agreement is very good, with the exception of the receptor-binding domain (RBD) and substantial parts of the N-terminal domain (NTD), as expected from their higher flexibility.

T94

13468-13472

Sentence

denotes

2.5.

T95

13474-13498

Sentence

denotes

Volume post-processing

T96

13499-13682

Sentence

denotes

In this work, we used two volume post-processing approaches that both depart substantially from the traditional approach in the field, which is the application of global B-sharpening.

T97

13683-13808

Sentence

denotes

One of the approaches is our previously introduced LocalDeblur sharpening method (Ramírez-Aportela, Maluenda et al., 2020 ▸).

T98

13809-13922

Sentence

denotes

The second approach is a totally new method based on deep learning (Sanchez-Garcia, Gomez-Blanco et al., 2020 ▸).

T99

13923-14227

Sentence

denotes

Concentrating on the latter, this method, DeepEMhancer, relies on a common approach in modern pattern recognition in which a convolutional neural network (CNN) is trained on a known data set comprised of pairs of data points and targets, with the aim of predicting the targets for new unseen data points.

T100

14228-14424

Sentence

denotes

In this case, the training was performed by presenting the CNN with pairs of cryo-EM maps collected from the EMDB and maps derived from the structural models associated with the experimental maps.

T101

14425-14570

Sentence

denotes

As a result, our CNN learned how to obtain much cleaner and detailed versions of the experimental cryo-EM maps, improving their interpretability.

T102

14571-14711

Sentence

denotes

Trying to take advantage of their complementary information, we used the two post-processed maps to trace the atomic model (PDB entry 6zow).

T103

14712-14848

Sentence

denotes

Some examples of the similar improvement of structure modeling according to these two sharpened maps are shown in Supplementary Fig. S2.

T104

14849-14920

Sentence

denotes

The sharpened and unsharpened maps have all been deposited in the EMDB.

T105

14922-14926

Sentence

denotes

2.6.

T106

14928-14944

Sentence

denotes

Model building

T107

14945-15135

Sentence

denotes

The atomic interpretation of the SARS-Cov-2 spike 3D map (PDB entry 6zow) was performed taking advantage of the modeling tools integrated in Scipion as described in Martínez et al. (2020 ▸).

T108

15136-15343

Sentence

denotes

Owing to a lack of sufficient density for the ‘up’ conformation of the RBD, we rigidly fitted the structure of chain A (residues 336–525) of the SARS-CoV-2 RBD in complex with CR30022 Fab (PDB entry 6yla; J.

T109

15344-15351

Sentence

denotes

Huo, Y.

T110

15352-15360

Sentence

denotes

Zhao, J.

T111

15361-15368

Sentence

denotes

Ren, D.

T112

15369-15377

Sentence

denotes

Zhou, H.

T113

15378-15380

Sentence

denotes

M.

T114

15381-15389

Sentence

denotes

Ginn, E.

T115

15390-15392

Sentence

denotes

E.

T116

15393-15400

Sentence

denotes

Fry, R.

T117

15401-15411

Sentence

denotes

Owens & D.

T118

15412-15414

Sentence

denotes

I.

T119

15415-15501

Sentence

denotes

Stuart, unpublished work) to the 3D map using UCSF Chimera (Pettersen et al., 2004 ▸).

T120

15502-15688

Sentence

denotes

This unmodeled part of the structure was called chain ‘a’ since it was part of chain A in the structure previously inferred from the same data set (PDB entry 6vsb; Wrapp et al., 2020 ▸).

T121

15689-15895

Sentence

denotes

The rest of the molecule was modeled using the same original structure (PDB entry 6vsb) as a template, as well as another spike ectodomain structure in the open state (PDB entry 6vyb; Walls et al., 2020 ▸).

T122

15896-16061

Sentence

denotes

The former structure (PDB entry 6vsb) was fitted to the new map and refined using Coot (Emsley et al., 2010 ▸) and phenix_real_space_refine (Afonine et al., 2018 ▸).

T123

16062-16364

Sentence

denotes

Validation metrics were computed to assess the geometry of the new hybrid model and its correlation with the map using ‘Comprehensive Validation (cryo-EM)’ in Phenix, the EMRinger algorithm (Barad et al., 2015 ▸), Q-score (Pintilie et al., 2020 ▸) and FSC-Q (Ramírez-Aportela, Maluenda et al., 2020 ▸).

T124

16365-16484

Sentence

denotes

Score values considering the whole hybrid spike and excluding the unmodeled RBD are detailed in Supplementary Table S1.

T125

16485-16540

Sentence

denotes

The hybrid atomic structures were submitted to the PDB.

T126

16541-16672

Sentence

denotes

iMODFIT (Lopéz-Blanco & Chacón, 2013 ▸) was employed to flexibly fit the hybrid atomic structure to the open and closed class maps.

T127

16674-16678

Sentence

denotes

2.7.

T128

16680-16710

Sentence

denotes

Principal component analysis

T129

16711-16868

Sentence

denotes

The principal component analysis used the expectation–maximization (EM) algorithm presented in Tagare et al. (2015 ▸) with the following minor modifications.

T130

16869-17170

Sentence

denotes

Firstly, in contrast to Tagare et al. (2015 ▸), the images were not Wiener filtered, nor was the projected mean subtracted from the images; instead, the CTF of each image was incorporated into the projection operator of that image and a variable contrast was allowed for the mean volume in each image.

T131

17171-17262

Sentence

denotes

The extent of the variable contrast was determined by the principal component EM algorithm.

T132

17263-17423

Sentence

denotes

Secondly, the mean volume was projected along each projection direction and an image mask was constructed with a liberal soft margin to allow for heterogeneity.

T133

17424-17565

Sentence

denotes

The different masks thus created, with one mask per projection direction, were applied to the images and the masked images were used as data.

T134

17566-17751

Sentence

denotes

This step corresponds to imposing a form of sparsity on the data, which is known to improve the estimation of principal components in high-dimensional spaces (Johnstone & Paul, 2018 ▸).

T135

17752-17863

Sentence

denotes

All images were downsampled by a factor of two to improve the signal-to-noise ratio and to speed up processing.

T136

17864-18004

Sentence

denotes

Finally, during each EM iteration, the principal components were low-pass filtered with a very broad filter whose pass band extended to 4 Å.

T137

18005-18121

Sentence

denotes

This helped with the convergence of the algorithm without significantly limiting the principal component resolution.

T138

18122-18359

Sentence

denotes

As part of the EM iteration, the algorithm in Tagare et al. (2015 ▸) conveniently estimates the expected amount by which each principal component is present in each image (this is the term E[z_j] in equation 15 of Tagare et al., 2015 ▸).

T139

18360-18401

Sentence

denotes

Fig. 3(b) shows a scatter plot of E[z_j].

T140

18402-18592

Sentence

denotes

It is interesting to note that in the algorithm of Tagare et al. (2015 ▸) the latent variables (representing the contributions of the principal components to each particle) are marginalized.

T141

18593-18784

Sentence

denotes

Because of this marginalization, the number of unknown parameters that need to be estimated (the principal components and variances) is fixed and does not change with the number of particles.

T142

18785-19019

Sentence

denotes

We have found this feature to be very valuable for relatively small sets of images (say 100 000 images), which is the case in our work, in order to prevent the number of parameters to be estimated growing with the number of particles.

T143

19020-19156

Sentence

denotes

Statistically speaking, nonmarginalization is known to be a problem when there are few particles, where the estimates can be unreliable.

T144

19157-19256

Sentence

denotes

Since the method developed by Tagare and coworkers does not suffer from this, we chose this method.

T145

19258-19260

Sentence

denotes

3.

T146

19262-19271

Sentence

denotes

Results

T147

19272-19363

Sentence

denotes

With the goal set at analyzing spike flexibility, we describe our key results step by step.

T148

19365-19369

Sentence

denotes

3.1.

T149

19371-19414

Sentence

denotes

The ensemble map and the way to obtain it

T150

19415-19554

Sentence

denotes

In the following, we describe the analysis of SARS-CoV-2 spike stabilized in the prefusion state by two proline substitutions in S2 (S-2P).

T151

19555-19885

Sentence

denotes

We will objectively demonstrate that the flexibility of the spike protein should be understood as a quasi-continuum of conformations, so that when performing a structural analysis on this specimen special care has to be paid to the image-processing workflows, since they may directly impact on the interpretability of the results.

T152

19886-20144

Sentence

denotes

Starting from the original SARS-CoV-2 S-2P data set of Wrapp et al. (2020 ▸), we have completely reanalyzed the data using our public domain software integration platform Scipion (de la Rosa-Trevín et al., 2016 ▸), breaking the global 3 Å resolution barrier.

T153

20145-20444

Sentence

denotes

A representative view of the new ensemble map and its corresponding global FSC curve is shown in Fig. 1 ▸(a) (EMDB entry EMD-11328); the sequence of a monomer of the S protein is shown on the right to facilitate the further discussion of structure–function relationships (from Wrapp et al., 2020 ▸).

T154

20445-20450

Sentence

denotes

Figs.

T155

20451-20628

Sentence

denotes

1 ▸(b) and 1 ▸(c) show a comparison between the original map (Wrapp et al., 2020 ▸) with EMDB code EMD-21375 and the newly reconstructed ensemble map corresponding to EMD-11328.

T156

20629-20718

Sentence

denotes

Clearly, the local resolution (Vilas et al., 2018 ▸), which is shown on the left in Figs.

T157

20719-20833

Sentence

denotes

1 ▸(b) and 1 ▸(c), is increased in the new map, and the anisotropy, which is shown in the center, is much reduced.

T158

20834-21065

Sentence

denotes

Finally, on the right we present plots of the radially averaged tangential resolution, which is related to the quality of the angular alignment (Vilas et al., 2020 ▸); the steeper the slope, the higher the angular assignment error.

T159

21066-21354

Sentence

denotes

As can be appreciated, the slope calculated from the newly obtained map is almost zero when compared with that for the map from Wrapp et al. (2020 ▸), indicating that, in relative terms, the particle alignment used to create the new map is better than that used to build the original map.

T160

21355-21420

Sentence

denotes

The result is an overall quantitative enhancement in map quality.

T161

21421-21776

Sentence

denotes

In terms of tracing, besides modeling several additional residue side chains and improving the geometry of the carbon skeleton (see Supplementary Fig. S2), one of the most noticeable improvements that we observed in the new map is an extension of the glycan chains that were initially built, particularly throughout the S2 fusion subunit (PDB entry 6zow).

T162

21777-21967

Sentence

denotes

A quantitative comparison can be made between the length of glycan chains in the new ‘ensemble structure’ with respect to the previous structure (PDB entry 6vsb; see Supplementary Table S2).

T163

21968-22254

Sentence

denotes

Although the total number of N-linked glycosylation sequons throughout the SARS-CoV-2 S trimer is essentially the same in the new structure (45) and PDB entry 6vsb (44), we have substantially increased the length of the glycan chains, expanding the total number of glycans by about 50%.

T164

22255-22472

Sentence

denotes

We note the importance of this extensive glycosylation for epitope accessibility and how the accurate determination of this glycan shield will facilitate efforts to rapidly develop effective vaccines and therapeutics.

T165

22473-22673

Sentence

denotes

Supplementary Fig. S2 shows a representative section of sharpened versions of the ensemble map (EMDB entry EMD-11328) compared with EMDB entry EMD-21375, in which the glycans can now be better traced.

T166

22674-22911

Sentence

denotes

However, we should not forget that the ensemble map contains images in which the receptor-binding domain (RBD) and N-terminal domain (NTD) are in different positions (see Section 3.2), and consequently these domains appear to be blurred.

T167

22912-23140

Sentence

denotes

Details of how the tracing was performed can be found in Section 2, while in Supplementary Fig. S3 we present two map-to-model quality figures indicating the good fit in general, with the obvious exception of the variable parts.

T168

23142-23146

Sentence

denotes

3.2.

T169

23148-23170

Sentence

denotes

Flexibility analysis

T170

23171-23405

Sentence

denotes

Starting from a carefully selected set of particles obtained from our consensus and cleaning approaches (see Section 2), together with the ensemble map described previously, we subjected the data to the following flexibility analysis.

T171

23406-23578

Sentence

denotes

The original images that were part of the ensemble map went through a ‘consensus classification’ procedure aimed at separating them into two algorithmically stable classes.

T172

23579-23785

Sentence

denotes

Essentially, and as described in more detail in Section 2, we performed two independent classifications, further selecting those particles that were consistently together throughout the two classifications.

T173

23786-23852

Sentence

denotes

In this way, we obtained the two new classes shown in Fig. 2 ▸(a).

T174

23853-24023

Sentence

denotes

We will refer to these as the ‘closed conformation’ [Fig. 2 ▸(a), Class 1, EMDB entry EMD-11336] and the ‘open conformation’ [Fig. 2 ▸(a), Class 2, EMDB entry EMD-11337].

T175

24024-24183

Sentence

denotes

The number of images in each class was reduced to 45 000 in one case and 21 000 in the other, with global FSC-based resolutions of 3.1 and 3.3 Å, respectively.

T176

24184-24357

Sentence

denotes

The open and closed structures depict a clear and concerted movement of the ‘thumb’ formed by the RBD and subdomains 1 and 2 (SD1 and SD2) and the NTD of an adjacent chain.

T177

24358-24448

Sentence

denotes

The thumb moves away from the central spike axis, exposing the RBD in the up conformation.

T178

24449-24796

Sentence

denotes

In order to make clearer where the changes are at the level of the Class 1 and Class 2 maps, we have made use of the representation of map local strains in Sorzano et al. (2016 ▸), which helps to very clearly visualize the type of strains needed to relate two maps, whether these are rigid-body rotations or more complex deformations (stretching).

T179

24797-24981

Sentence

denotes

We have termed the maps resulting from this elastic analysis as ‘1s’ (Class 1, stretching) and ‘1r’ (Class 1, rotations) on the right-hand side of Fig. 2 ▸(a) and the same for Class 2.

T180

24982-25074

Sentence

denotes

The color scale for both stretching and rotations goes from blue for small to red for large.

T181

25075-25425

Sentence

denotes

Clearly, the differences among the classes with respect to the NTD and RBD have a very substantial component of pure coordinated rigid-body rotations, while the different RBDs present a much more complex pattern of deformations (stretching), indicating an important structural rearrangement in this area that does not occur elsewhere in the specimen.

T182

25426-25679

Sentence

denotes

In terms of atomic modeling, we performed a flexible fitting of the ensemble model onto the closed and open forms [see Fig. 2 ▸(a), rightmost map; the PDB code for the open conformation is PDB entry 6zp7, while that for the closed conformation is 6zp5].

T183

25680-25862

Sentence

denotes

Focusing on rotations, which are the most simple element to follow, we can quantify that the degree of rotation of the thumb in these classes is close to 6°, as shown in Fig. 2 ▸(b).

T184

25863-26066

Sentence

denotes

Given this flexibility, we consider that the best way to correctly present the experimental results is through the movie provided as Supplementary Movie S1, in which maps and atomic models are presented.

T185

26067-26412

Sentence

denotes

Within the approximation to modeling that a flexible fitting represents, we can appreciate two hinge movements of the RBD–SD1–SD2 domains: one located between amino acids 318–326 and 588–595 that produces most of the displacement, and the other between amino acids 330–335 and 527–531 that accompanies a less pronounced ‘up’ movement of the RBD.

T186

26413-26505

Sentence

denotes

This thumb motion is completed by the accompanying motion of the NTD from an adjacent chain.

T187

26506-26767

Sentence

denotes

Also in a collective way, other NTDs and RBDs in the down conformation move slightly, as can better be appreciated in Supplementary Movie S1, where the transition between fitted models overlaps with the interpolation between observed high-resolution class maps.

T188

26768-26862

Sentence

denotes

To further investigate whether or not the flexibility was continuous, we proceeded as follows.

T189

26863-26994

Sentence

denotes

Images from the two classes were pooled together and, using the ensemble map, subjected to a 3D principal component analysis (PCA).

T190

26995-27100

Sentence

denotes

The approach we followed is based on Tagare et al. (2015 ▸), with some minor modifications of the method.

T191

27101-27167

Sentence

denotes

A detailed explanation of the modifications is given in Section 2.

T192

27168-27328

Sentence

denotes

We initialized the first principal component (PC) to the difference between the open and closed conformation, while the remaining PCs were initialized randomly.

T193

27329-27430

Sentence

denotes

Upon convergence, the eigenvalue of each PC and the scatter of the images in PC space was calculated.

T194

27431-27483

Sentence

denotes

The eigenvalues of the PCs are shown in Fig. 3 ▸(a).

T195

27484-27529

Sentence

denotes

Clearly, the first three PCs are significant.

T196

27530-27606

Sentence

denotes

The scatter plot of the image data in PC1–PC3 space is shown in Fig. 3 ▸(b).

T197

27607-27720

Sentence

denotes

Fig. 3 ▸(b) strongly suggests that there is ‘continuous flexibility’ rather than ‘tightly clustered’ flexibility.

T198

27721-27861

Sentence

denotes

Fig. 3 ▸(b) also shows the projection of the maps corresponding to the open and closed conformations on the extremes of the first three PCs.

T199

27862-28037

Sentence

denotes

It is clear that the open and closed conformations are aligned mostly along the first PC, suggesting that the open/closed classification captures the most significant changes.

T200

28038-28168

Sentence

denotes

Fig. 3 ▸(c) shows side views of a pair of structures (mean ± 2 × std, where std is the square root of the eigenvalue) for each PC.

T201

28169-28244

Sentence

denotes

Additional details of these structures are available in Supplementary Figs.

T202

28245-28255

Sentence

denotes

S4 and S5.

T203

28256-28405

Sentence

denotes

Note that PCs are not to be understood as structural pathways with a biological meaning, but as directions that summarize the variance of a data set.

T204

28406-28710

Sentence

denotes

For instance, the fact that the RBD appears and disappears at the two extremes of PC3 indicates that there is an important variability in these voxels, which is probably indicative of the up and down conformations of the RBD [to be understood in the context of the elastic analysis shown in Fig. 2 ▸(b)].

T205

28711-28854

Sentence

denotes

Through this combination of approaches, we have learnt that the spike conformation fluctuates virtually randomly in a rather continuous manner.

T206

28855-29173

Sentence

denotes

Additionally, the approach taken to define the two algorithmically stable ‘classes’ has clearly partitioned the data set according to the main axis of variance, PC1, since the projections of the maps of these classes fall almost exclusively along PC1 and are located towards the extremes of the image-projection cloud.

T207

29174-29390

Sentence

denotes

Note that the fraction of structural flexibility owing to PC2 and PC3 is also important in terms of the total variance of the complete image set, but that classification approaches do not seem to properly explore it.

T208

29391-29547

Sentence

denotes

Unfortunately, the resolution in PC2 and PC3 is currently limited, so it is difficult to derive clear structural conclusions from these low-resolution maps.

T209

29548-29764

Sentence

denotes

However, it is clear from these data that the dynamics of the spike are far richer than just a rigid body closing and opening, and involves more profound rearrangements, especially at the RBD but also at other sites.

T210

29765-29863

Sentence

denotes

This observation is similar to that of Ke et al. (2020 ▸) when working with subtomogram averaging.

T211

29864-30309

Sentence

denotes

Additionally, the fact that PCA indicates this continuous flexibility to be a key characteristic of the spike dynamics also suggests that many other forms of partitioning (rather than properly ‘classifying’) of this continuous data set could be devised, this fact just being a consequence of the intrinsic instability created by forcing a quasi-continuous data distribution without any clustering structure to fit into a defined set of clusters.

T212

30310-30677

Sentence

denotes

In this work, we have clearly forced the classification to go to the extremes of the data distribution, as shown in Fig. 3 ▸, probably by enforcing an algorithmically stable classification, but the key result is that any other degree of movement of the spike in between these extremes of PC1 as well as PC2 and PC3 would also be consistent with the experimental data.

T213

30678-30898

Sentence

denotes

In other words, since the continuum of conformations does not have clear ‘cutting/classification’ points, there is a certain algorithmic uncertainty and instability as to the possible results of a classification process.

T214

30899-31168

Sentence

denotes

Note that this instability could be exacerbated by the step of particle picking, in the sense that different picking algorithms may have different biases (precisely to minimize this instability, we have performed a ‘consensus’ approach to picking throughout this work).

T215

31169-31439

Sentence

denotes

Clearly, flexibility is key in this system, so that alterations in its dynamics may cause profound effects, including viral neutralization, and this could be one of the reasons for the neutralization mechanism of antibodies directed against the NTD (Chi et al., 2020 ▸).

T216

31441-31445

Sentence

denotes

3.3.

T217

31447-31506

Sentence

denotes

Structure of a biochemically stabilized form of the spike

T218

31507-31607

Sentence

denotes

We have also worked with a more recent variant containing six proline substitutions in S2 (HexaPro).

T219

31608-31670

Sentence

denotes

This second protein was also studied by Hsieh et al. (2020 ▸).

T220

31671-31995

Sentence

denotes

In this case, after going through the same stringent particle-selection process as for the previous specimen, as presented in depth in Section 2, it was impossible to obtain stable classes, so that in Fig. 4 ▸ we present a single map (EMDB entry EMD-11341) together with its global FSC curve and a local resolution analysis.

T221

31996-32149

Sentence

denotes

It is clear that the local resolution has increased in the moving parts (mostly the RBD and NTD), although we did not feel confident in further modeling.

T222

32151-32153

Sentence

denotes

4.

T223

32155-32168

Sentence

denotes

Conclusions

T224

32169-32432

Sentence

denotes

In this work, we present a clear example of how the structural discovery process can be greatly accelerated by a wise combination of rapid data sharing and the use of the wave of newly developed algorithms that characterize this phase of the ‘cryo-EM revolution’.

T225

32433-32624

Sentence

denotes

The reanalysis of the data used in Wrapp et al. (2020 ▸), but with new workflows and new tools, has resulted in a rich analysis of the spike flexibility as a key characteristic of the system.

T226

32625-32798

Sentence

denotes

Essentially, and at least to a first approximation, the spike moves in a continuous manner with no preferential states, as clearly shown in the scatter plots in Fig. 3 ▸(b).

T227

32799-32976

Sentence

denotes

In this way, the result of a particular instance of image-processing analysis, including a 3D classification, should be regarded as a snapshot of this quasi-continuum of states.

T228

32977-33229

Sentence

denotes

In our case, we have shown that a particular meta image-classification approach, implemented through a consensus among different methods in many steps of the analysis, results in classes that are at the extreme of the main axis of variance in PC space.

T229

33230-33462

Sentence

denotes

Clearly, PC1, through the analysis of the two extreme classes, reflects a concerted motion of the NTD–RBD–SD1–SD2 thumb, although there are smaller collective movements throughout the spike (see Fig. 2 ▸ and Supplementary Movie S1).

T230

33463-33599

Sentence

denotes

In this case, the RBD moves together with the NTD, with a smaller degree of independent flexibility and always in the ‘up’ conformation.

T231

33600-33853

Sentence

denotes

The NTD–RBD movement can be characterized to a large degree as a rotation, but the different RBDs present a much more complex pattern of flexibility, indicating an important structural rearrangement [from elastic analysis (Fig. 2 ▸) and PCA (Fig. 3 ▸)].

T232

33854-34130

Sentence

denotes

The presence of quasi-solid body rotation hinges is clearly located between amino acids 318–326 and 588–595, which produce most of the displacement, together with other hinges between amino acids 330–335 and 527–531, which accompany a less pronounced ‘up’ movement of the RBD.

T233

34131-34288

Sentence

denotes

However, there are other PC axes explaining significant fractions of the inter-image variance that are not properly explored at the level of our two classes.

T234

34289-34479

Sentence

denotes

PC3 is a clear example, indicating a high variance at the voxels associated with RBD up, which probably suggests large conformational changes in this area that result in the RBD moving down.

T235

34480-34833

Sentence

denotes

The flexibility analysis performed in this work complements previous analysis showing large rotations together with RBD up–down structural changes (Pinto et al., 2020 ▸; Wrapp et al., 2020 ▸), in the sense that the different studies present ‘snapshots’ of a continuum of movements obtained by a particular instance of an image-processing classification.

T236

34834-34963

Sentence

denotes

In a sense, all of these results are correct, but none of them is able to capture the richness of the flexibility of this system.

T237

34964-35201

Sentence

denotes

This fact reflects the intrinsic instability of segmenting a continuum into defined clusters, which is a clear limitation of the classification approaches that needs to be considered in detailed analysis of any data set from this system.

T238

35202-35483

Sentence

denotes

An obvious way to increase the resolution of the moving parts of the spike is to reduce their mobility, as is the case, for instance, in the biochemical stabilization of Hsieh et al. (2020 ▸) and also in the formation of a complex with an antibody against NTD (Chi et al., 2020 ▸).

T239

35484-35853

Sentence

denotes

On the other hand, the route towards a more complete analysis of the flexibility of the spike protein necessarily involves the analysis of data sets that are quite substantially larger than those being used in most current SARS-CoV-2 studies, so that all of the main axes of inter-image variability can be explored; this is work that is under development at the moment.

T240

35854-36194

Sentence

denotes

From a biomedical perspective, the proof that a quasi-continuum of flexibility is a key characteristic of this specimen, a concept that has been implicitly considered in much of the structural work performed so far but never demonstrated, suggests that ways to interfere with this flexibility could be important components of new therapies.

T241

36196-36218

Sentence

denotes

Supplementary Material

T242

36219-36281

Sentence

denotes

EMDB reference: SARS-CoV-2 spike in prefusion state, EMD-11328

T243

36282-36393

Sentence

denotes

EMDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up closed conformation), EMD-11336

T244

36394-36503

Sentence

denotes

EMDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up open conformation), EMD-11337

T245

36504-36597

Sentence

denotes

EMDB reference: SARS-CoV-2 stabilized spike in prefusion state (1-up conformation), EMD-11341

T246

36598-36654

Sentence

denotes

PDB reference: SARS-CoV-2 spike in prefusion state, 6zow

T247

36655-36760

Sentence

denotes

PDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up closed conformation), 6zp5

T248

36761-36864

Sentence

denotes

PDB reference: SARS-CoV-2 spike in prefusion state (flexibility analysis, 1-up open conformation), 6zp7

T249

36865-36898

Sentence

denotes

Supplementary Figures and Tables.

T250

36899-36903

Sentence

denotes

DOI:

T251

36904-36944

Sentence

denotes

10.1107/S2052252520012725/fq5016sup1.pdf

T252

36945-36981

Sentence

denotes

Click here for additional data file.

T253

36982-37005

Sentence

denotes

Supplementary Movie S1.

T254

37006-37147

Sentence

denotes

Movie presenting the morphing between the two algorithmically stable classes described in the main text, spanning principal component axis 1.

T255

37148-37152

Sentence

denotes

DOI:

T256

37153-37193

Sentence

denotes

10.1107/S2052252520012725/fq5016sup2.mp4

T257

37195-37405

Sentence

denotes

We acknowledge the support from the Advanced Computing and e-Science group at the Institute of Physics of Cantabria (IFCA–CSIC–UC) as well as the Barcelona Supercomputer Center (access project BCV-2020-2-0005).

T258

37406-37539

Sentence

denotes

The authors acknowledge the support and the use of resources of Instruct, a Landmark ESFRI project (Instruct Access Project ID11775).

T259

37540-37577

Sentence

denotes

Author contributions were as follows.

T260

37578-37762

Sentence

denotes

Roberto Melero and COSS performed all of the image analysis in Scipion, while BF performed equivalent work in the principal component analysis and JLV in the local resolution analysis.

T261

37763-37931

Sentence

denotes

MM and Roberto Marabini were in charge of structural modeling, while Pablo Chacon performed the flexible fittings and incorporated important sections of the manuscript.

T262

37932-38122

Sentence

denotes

ER-A performed the map-to-model analysis as well as generating the sharpened cryo-EM maps, while RS-G also worked in new sharpening methods and DH performed the elastic inter-class analysis.

T263

38123-38209

Sentence

denotes

Pablo Conesa, YF-R, LdC and PL were in charge of the IT hardware and software support.

T264

38210-38282

Sentence

denotes

JMcL and DW supplied the images and provided advice throughout the work.

T265

38283-38402

Sentence

denotes

HT, COSS and JMC conceptualized the work, with JMC writing the manuscript, which was complemented by all other authors.

T266

38403-38457

Sentence

denotes

PC, JMcL, HT and JMC were responsible for the funding.

T267

38458-38503

Sentence

denotes

The authors declare no conflicts of interest.

T268

38505-38713

Sentence

denotes

Figure 1 The spike and the ensemble map. (a) A representative view of the new map (EMDB entry EMD-11328), the corresponding FSC curve and the sequence of a monomer of the S protein (from Wrapp et al., 2020).

T269

38714-38854

Sentence

denotes

The scale bar is 5 nm in length. (b, c) New ensemble cryo-EM map (EMD-11328) compared with that originally presented (EMDB entry EMD-21375).

T270

38855-38936

Sentence

denotes

The first row (b) corresponds to the new map and the second row (c) to EMD-21375.

T271

38937-39322

Sentence

denotes

In each row, from left to right: a map representation showing the local resolution (computed with MonoRes; Vilas et al., 2018 ▸), a histogram representation of the local directional resolution dispersion (interquartile range between percentiles 17 and 83) and, finally, a plot showing the radial average of the local tangential resolution (analyzed with MonoDir; Vilas et al., 2020 ▸).

T272

39323-39513

Sentence

denotes

Figure 2 Flexibility analysis. (a) A representative view of the new ensemble map and the two new classes showing the ‘open conformation’ in Class 1 and the ‘closed conformation’ in Class 2.

T273

39514-39675

Sentence

denotes

Note the elastic analysis of deformations performed on the Class 1 and Class 2 maps (see the main text), with 1s referring to ‘stretching’ and 1r to ‘rotations’.

T274

39676-39759

Sentence

denotes

The color code is from blue for minimal deformation to red for maximal deformation.

T275

39760-39903

Sentence

denotes

The scale bar is 5 nm in length. (b) Representation of the angles defined by the spike when transitioning between the opened and closed states.

T276

39904-39986

Sentence

denotes

The regions shown in magenta represent the hinges used by the RBD domain to pivot.

T277

39987-40048

Sentence

denotes

Note that each hinge encompasses two different chain regions.

T278

40049-40173

Sentence

denotes

The first hinge spans amino acids 318–326 and 588–595, while the second hinge is defined by amino acids 330–335 and 527–531.

T279

40174-40211

Sentence

denotes

The angles were measured using PyMOL.

T280

40212-40332

Sentence

denotes

Figure 3 Principal component analysis of the SARS-CoV-2 spike structure. (a) Eigenvalues of principal components (PCs).

T281

40333-40537

Sentence

denotes

The first three PCs are significant. (b) Scatter plot of the contribution of the first three PCs to each particle image together with the projection of the open and closed class maps, shown as red points.

T282

40538-40753

Sentence

denotes

The difference between the projections of the two maps is mostly aligned along principal component 1 (PC1). (c) Side view of the first two PCs shown as mean ± 2 × std, where std is the square root of the eigenvalue.

T283

40754-40840

Sentence

denotes

Coloring indicates the z-depth of the structure, and is added to assist visualization.

T284

40841-40860

Sentence

denotes

Supplementary Figs.

T285

40861-40916

Sentence

denotes

S4 and S5 contain additional views of these structures.

T286

40917-40949

Sentence

denotes

The scale bar is 5 nm in length.

T287

40950-41119

Sentence

denotes

Figure 4 Analysis of a biochemically stabilized form of the spike. (a, b) A representative view of the stabilized form of the spike map and the corresponding FSC curve.

T288

41120-41205

Sentence

denotes

The scale bar is 5 nm in length. (c) The local resolution map estimated with MonoRes.

PMC:7553147 JSON TXT 15 Projects

Annnotations TAB TSV DIC JSON TextAE Lectin_function IAV-Glycan

PMC:7553147 JSONTXT 15 Projects

Annnotations TAB TSV DIC JSON TextAE Lectin_function IAV-Glycan

PMC:7553147 JSON TXT 15 Projects