PubAnnotation

Id	Subject	Object	Predicate	Lexical cue
T127	0-4	Sentence	denotes	2.7.
T128	6-36	Sentence	denotes	Principal component analysis
T129	37-194	Sentence	denotes	The principal component analysis used the expectation–maximization (EM) algorithm presented in Tagare et al. (2015 ▸) with the following minor modifications.
T130	195-496	Sentence	denotes	Firstly, in contrast to Tagare et al. (2015 ▸), the images were not Wiener filtered, nor was the projected mean subtracted from the images; instead, the CTF of each image was incorporated into the projection operator of that image and a variable contrast was allowed for the mean volume in each image.
T131	497-588	Sentence	denotes	The extent of the variable contrast was determined by the principal component EM algorithm.
T132	589-749	Sentence	denotes	Secondly, the mean volume was projected along each projection direction and an image mask was constructed with a liberal soft margin to allow for heterogeneity.
T133	750-891	Sentence	denotes	The different masks thus created, with one mask per projection direction, were applied to the images and the masked images were used as data.
T134	892-1077	Sentence	denotes	This step corresponds to imposing a form of sparsity on the data, which is known to improve the estimation of principal components in high-dimensional spaces (Johnstone & Paul, 2018 ▸).
T135	1078-1189	Sentence	denotes	All images were downsampled by a factor of two to improve the signal-to-noise ratio and to speed up processing.
T136	1190-1330	Sentence	denotes	Finally, during each EM iteration, the principal components were low-pass filtered with a very broad filter whose pass band extended to 4 Å.
T137	1331-1447	Sentence	denotes	This helped with the convergence of the algorithm without significantly limiting the principal component resolution.
T138	1448-1685	Sentence	denotes	As part of the EM iteration, the algorithm in Tagare et al. (2015 ▸) conveniently estimates the expected amount by which each principal component is present in each image (this is the term E[z_j] in equation 15 of Tagare et al., 2015 ▸).
T139	1686-1727	Sentence	denotes	Fig. 3(b) shows a scatter plot of E[z_j].
T140	1728-1918	Sentence	denotes	It is interesting to note that in the algorithm of Tagare et al. (2015 ▸) the latent variables (representing the contributions of the principal components to each particle) are marginalized.
T141	1919-2110	Sentence	denotes	Because of this marginalization, the number of unknown parameters that need to be estimated (the principal components and variances) is fixed and does not change with the number of particles.
T142	2111-2345	Sentence	denotes	We have found this feature to be very valuable for relatively small sets of images (say 100 000 images), which is the case in our work, in order to prevent the number of parameters to be estimated growing with the number of particles.
T143	2346-2482	Sentence	denotes	Statistically speaking, nonmarginalization is known to be a problem when there are few particles, where the estimates can be unreliable.
T144	2483-2582	Sentence	denotes	Since the method developed by Tagare and coworkers does not suffer from this, we chose this method.

T127

0-4

Sentence

denotes

2.7.

T128

6-36

Sentence

denotes

Principal component analysis

T129

37-194

Sentence

denotes

The principal component analysis used the expectation–maximization (EM) algorithm presented in Tagare et al. (2015 ▸) with the following minor modifications.

T130

195-496

Sentence

denotes

Firstly, in contrast to Tagare et al. (2015 ▸), the images were not Wiener filtered, nor was the projected mean subtracted from the images; instead, the CTF of each image was incorporated into the projection operator of that image and a variable contrast was allowed for the mean volume in each image.

T131

497-588

Sentence

denotes

The extent of the variable contrast was determined by the principal component EM algorithm.

T132

589-749

Sentence

denotes

Secondly, the mean volume was projected along each projection direction and an image mask was constructed with a liberal soft margin to allow for heterogeneity.

T133

750-891

Sentence

denotes

The different masks thus created, with one mask per projection direction, were applied to the images and the masked images were used as data.

T134

892-1077

Sentence

denotes

This step corresponds to imposing a form of sparsity on the data, which is known to improve the estimation of principal components in high-dimensional spaces (Johnstone & Paul, 2018 ▸).

T135

1078-1189

Sentence

denotes

All images were downsampled by a factor of two to improve the signal-to-noise ratio and to speed up processing.

T136

1190-1330

Sentence

denotes

Finally, during each EM iteration, the principal components were low-pass filtered with a very broad filter whose pass band extended to 4 Å.

T137

1331-1447

Sentence

denotes

This helped with the convergence of the algorithm without significantly limiting the principal component resolution.

T138

1448-1685

Sentence

denotes

As part of the EM iteration, the algorithm in Tagare et al. (2015 ▸) conveniently estimates the expected amount by which each principal component is present in each image (this is the term E[z_j] in equation 15 of Tagare et al., 2015 ▸).

T139

1686-1727

Sentence

denotes

Fig. 3(b) shows a scatter plot of E[z_j].

T140

1728-1918

Sentence

denotes

It is interesting to note that in the algorithm of Tagare et al. (2015 ▸) the latent variables (representing the contributions of the principal components to each particle) are marginalized.

T141

1919-2110

Sentence

denotes

Because of this marginalization, the number of unknown parameters that need to be estimated (the principal components and variances) is fixed and does not change with the number of particles.

T142

2111-2345

Sentence

denotes

We have found this feature to be very valuable for relatively small sets of images (say 100 000 images), which is the case in our work, in order to prevent the number of parameters to be estimated growing with the number of particles.

T143

2346-2482

Sentence

denotes

Statistically speaking, nonmarginalization is known to be a problem when there are few particles, where the estimates can be unreliable.

T144

2483-2582

Sentence

denotes

Since the method developed by Tagare and coworkers does not suffer from this, we chose this method.

PMC:7553147 / 16674-19256 JSON TXT 2 Projects

Annnotations TAB TSV DIC JSON TextAE

PMC:7553147 / 16674-19256 JSONTXT 2 Projects

Annnotations TAB TSV DIC JSON TextAE

PMC:7553147 / 16674-19256 JSON TXT 2 Projects