2.7.  Principal component analysis  
The principal component analysis used the expectation–maximization (EM) algorithm presented in Tagare et al. (2015 ▸) with the following minor modifications. Firstly, in contrast to Tagare et al. (2015 ▸), the images were not Wiener filtered, nor was the projected mean subtracted from the images; instead, the CTF of each image was incorporated into the projection operator of that image and a variable contrast was allowed for the mean volume in each image. The extent of the variable contrast was determined by the principal component EM algorithm. Secondly, the mean volume was projected along each projection direction and an image mask was constructed with a liberal soft margin to allow for heterogeneity. The different masks thus created, with one mask per projection direction, were applied to the images and the masked images were used as data. This step corresponds to imposing a form of sparsity on the data, which is known to improve the estimation of principal components in high-dimensional spaces (Johnstone & Paul, 2018 ▸). All images were downsampled by a factor of two to improve the signal-to-noise ratio and to speed up processing. Finally, during each EM iteration, the principal components were low-pass filtered with a very broad filter whose pass band extended to 4 Å. This helped with the convergence of the algorithm without significantly limiting the principal component resolution.
As part of the EM iteration, the algorithm in Tagare et al. (2015 ▸) conveniently estimates the expected amount by which each principal component is present in each image (this is the term E[z_j] in equation 15 of Tagare et al., 2015 ▸). Fig. 3(b) shows a scatter plot of E[z_j].
It is interesting to note that in the algorithm of Tagare et al. (2015 ▸) the latent variables (representing the contributions of the principal components to each particle) are marginalized. Because of this marginalization, the number of unknown parameters that need to be estimated (the principal components and variances) is fixed and does not change with the number of particles. We have found this feature to be very valuable for relatively small sets of images (say 100 000 images), which is the case in our work, in order to prevent the number of parameters to be estimated growing with the number of particles. Statistically speaking, nonmarginalization is known to be a problem when there are few particles, where the estimates can be unreliable. Since the method developed by Tagare and coworkers does not suffer from this, we chose this method.