PMC:7252096 / 102220-126093 JSON TXT

Annnotations TAB JSON ListView MergeView

LitCovid-PD-FMA-UBERON

LitCovid-PD-UBERON

LitCovid-PD-MONDO

{"project":"LitCovid-PD-MONDO","denotations":[{"id":"T372","span":{"begin":3699,"end":3703},"obj":"Disease"},{"id":"T373","span":{"begin":6474,"end":6482},"obj":"Disease"},{"id":"T374","span":{"begin":6483,"end":6495},"obj":"Disease"},{"id":"T375","span":{"begin":6931,"end":6943},"obj":"Disease"},{"id":"T376","span":{"begin":12790,"end":12809},"obj":"Disease"},{"id":"T377","span":{"begin":12958,"end":12967},"obj":"Disease"},{"id":"T378","span":{"begin":14030,"end":14039},"obj":"Disease"},{"id":"T379","span":{"begin":15220,"end":15229},"obj":"Disease"},{"id":"T380","span":{"begin":15336,"end":15345},"obj":"Disease"}],"attributes":[{"id":"A372","pred":"mondo_id","subj":"T372","obj":"http://purl.obolibrary.org/obo/MONDO_0014280"},{"id":"A373","pred":"mondo_id","subj":"T373","obj":"http://purl.obolibrary.org/obo/MONDO_0004980"},{"id":"A374","pred":"mondo_id","subj":"T374","obj":"http://purl.obolibrary.org/obo/MONDO_0021166"},{"id":"A375","pred":"mondo_id","subj":"T375","obj":"http://purl.obolibrary.org/obo/MONDO_0018076"},{"id":"A376","pred":"mondo_id","subj":"T376","obj":"http://purl.obolibrary.org/obo/MONDO_0005812"},{"id":"A377","pred":"mondo_id","subj":"T377","obj":"http://purl.obolibrary.org/obo/MONDO_0005812"},{"id":"A378","pred":"mondo_id","subj":"T378","obj":"http://purl.obolibrary.org/obo/MONDO_0005812"},{"id":"A379","pred":"mondo_id","subj":"T379","obj":"http://purl.obolibrary.org/obo/MONDO_0005812"},{"id":"A380","pred":"mondo_id","subj":"T380","obj":"http://purl.obolibrary.org/obo/MONDO_0005812"}],"text":"Quantification and Statistical Analysis\n\nNon-Human Primate Lung and Ileum\nLibraries corresponding to 7 animals (variable number of tissues per animal) were sequenced using Illumina NextSeq. Reads were aligned to the M. mulatta genome assembly 8.0.1 annotation version 102 and processed according to the Drop-Seq Computational Protocol v2.0 (https://github.com/broadinstitute/Drop-seq). Data was normalized and scaled using the Seurat R package v2.3.4 (https://satijalab.org/seurat/): transforming the data to loge(UMI+1) and applying a scale factor of 10,000. To identify major axes of variation within our data, we first examined only highly variable genes across all cells, yielding approximately 1,000-3,000 variable genes with average expression \u003e 0.1 log-normalized UMI across all cells. An approximate principal component analysis was applied to the cells to generate 100 principal components (PCs). Using the JackStraw function within Seurat, we identified significant PCs to be used for subsequent clustering and further dimensionality reduction. For 2D visualization and cell type clustering, we used a Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction technique (https://github.com/lmcinnes/umap) with “min_dist” set to 0.5 and “n_neighbors” set to 30. To identify clusters of transcriptionally similar cells, we employed unsupervised clustering as described above using the FindClusters tool within the Seurat R package with default parameters and k.param set to 10 and resolution set to 0.5. Each cluster was sub-clustered to identify more granular cell types, requiring each cell type to express \u003e 25 significantly upregulated genes by differential expression test (FindMarkers implemented in Seurat, setting “test.use” to “bimod,” Bonferroni-adjusted p value cutoff \u003c 0.001). Differential expression tests between cells from ACE2 + versus ACE2 - Type II Pneumocytes were conducted using the SCDE R package with default parameters (Kharchenko et al., 2014). Expression data for epithelial cells and enterocytes included in this dataset can be visualized and downloaded here: https://singlecell.broadinstitute.org/single_cell/study/SCP807?scpbr=the-alexandria-project#study-summary.\n\nHuman Lung Tissue\nLibraries corresponding to 8 donors were sequenced using Illumina NextSeq. Reads were aligned to the hg19 genome assembly and processed according to the Drop-Seq Computational Protocol v2.0 (https://github.com/broadinstitute/Drop-seq). Data was normalized and scaled using the Seurat R package v3.1.0 (https://satijalab.org/seurat/), transforming the data to loge(UMI+1) and applying a scale factor of 10,000. For each array, we assessed the quality of constructed libraries by examining the distribution of reads, genes and transcripts per cell. Variable gene selection, principal components analysis, and selection of significant principal components was performed as above. We visualized our results in a two-dimensional space using UMAP (https://github.com/lmcinnes/umap), and annotated each cluster based on the identification of highly expressed genes. To further characterize substructure within cell types (for example, T cells), we performed dimensionality reduction (PCA) and clustering over those cells alone. Sub-clusters (i.e., clusters within broad cell type classifications) were annotated by cross-referencing cluster-defining genes with curated gene lists and online databases SaVanT (http://newpathways.mcdb.ucla.edu/savant-dev/) and GSEA/MsigDB (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp). Proliferating cells from the human lung (Figure 2C) express high levels of mitotic markers, such as MKI67, and represent primarily T cells (CD3D, CD3E), B cells/antibody-secreting cells (IGJ, MZB1, IGHG1), and myeloid cells (CD14, APOE) and represent a composite cell cluster. Differential expression analysis between ACE2+ TMPRSS2+ and negative type II pneumocytes was performed in Seurat using a likelihood-ratio test (FindMarkers implemented in Seurat, setting “test.use” to bimod). Expression data for epithelial cells included in this dataset can be visualized and downloaded here: https://singlecell.broadinstitute.org/single_cell/study/SCP814?scpbr=the-alexandria-project#study-summary.\n\nHuman Ileum\nLibraries corresponding to 13 donors were sequenced using Illumina NovaSeq S2 with a Read 1 26bp, Read 2 91bp, Index 1 8bp configuration before reads were aligned to GRCh38. Each sample was filtered individually for low quality cells and genes by analyzing distributions of reads, transcripts, percent reads mapped to mitochondrial genes, and complexity per cell, then merged as an outer join to create a single dataset. Clustering and differential expression tests were processed using Seurat v3.1.0 (https://satijalab.org/seurat/). Normalization and variable gene selection was processed with SCTransform (https://github.com/ChristophH/sctransform). Clustering for major cell types was performed using Louvain clustering on dimensionally reduced PCA space with resolution set via grid search optimizing for maximum average silhouette score. Due to the scale of the dataset, a randomized subsampling from across the dataset was used to calculate the silhouette score. We annotated clusters based on highly expressed genes, then sub-clusters were characterized by performing PCA dimensionality reduction and clustering over those cells alone, and annotated based on highly expressed genes found via one-versus-rest differential expression test (Wilcoxon) within the major cell type. Differential expression analysis between ACE2 + TMPRSS2 + and negative epithelial cells was performed in Seurat using a Wilcoxon test and Bonferroni p value correction. Expression data for epithelial cells included in this dataset can be visualized and downloaded here: https://singlecell.broadinstitute.org/single_cell/study/SCP812?scpbr=the-alexandria-project#study-summary.\n\nHuman Adult Nasal Mucosa\nSample processing, sequencing, and analysis was performed as in (Ordovas-Montanes et al., 2018). Briefly, scRNA-seq cell suspensions were freshly processed using Seq-Well v1 and Seurat v2.3.4 was utilized for computational analyses presented here (Butler et al., 2018, Satija et al., 2015). Cell by gene matrix and R code for initialization of object available to download as Supplemental Data and Supplementary Tables here https://www.nature.com/articles/s41586-018-0449-8 and here:\nhttp://shaleklab.com/resource/mapping-allergic-inflammation/? and visualized here: https://singlecell.broadinstitute.org/single_cell/study/SCP253?scpbr=the-alexandria-project#study-summary. Scores for various cytokines acting on human airway epithelial cells were calculated based on gene lists derived for (Ordovas-Montanes et al., 2018), calculated using AddModuleScore function Seurat, and effect size calculated by Cohen’s d, as previously reported.\n\nGranulomatous Tissue from Mycobacterium Tuberculosis Infected NHPs\nLibraries corresponding to 10 animals (variable number of tissues/animal) were sequenced using Illumina NovaSeq S2. Data was aligned using the Dropseq-tools pipeline on Terra (app.terra.bio) to M. fascicularis reference genome assembly 5, annotation version 101. Clustering was performed using Leiden clustering in the Scanpy (scanpy.readthedocs.io) package (Wolf et al., 2018). Cell type labels were assigned using known marker genes. In this analysis, we include all epithelial cell subsets (secretory, multiciliated, type II pneumocytes, and type I pneumocytes) from all samples. Differential expression between ACE2 + TMPRSS2 + cells and other cells of the matched cell subtype (e.g., Secretory Cells) were performed using the “bimod” likelihood-ratio test within each cell subtype and filtered on Benjamini-Hochberg-corrected p value \u003c 0.05. Expression data for epithelial cells included in this dataset can be visualized and downloaded here:\nhttps://singlecell.broadinstitute.org/single_cell/study/SCP806?scpbr=the-alexandria-project#study-summary.\n\nBasal Cell Cytokine Stimulation\nLibraries corresponding to 279 populations were sequenced using Illumina NextSeq. Reads were aligned to the hg19 or mm10 genome assembly using the cumulus platform https://cumulus-doc.readthedocs.io/en/0.12.0/smart_seq_2.html and output as TPM using RSEM v1.3.2. Populations were transformed to transcripts per 10K reads and log2(1+TP10K) transformed. ACE2 expression by stimulation condition and dose were assessed using one-way ANOVA with post hoc testing using a Bonferroni correction. Plots were generated using ggplot2, and transcriptome-wide differential expression was calculated using the Seurat R package v3.1.0 (https://satijalab.org/seurat), function FindMarkers with test.use = ”bimod.” Expression data can be visualized and downloaded here:\nhttps://singlecell.broadinstitute.org/single_cell/study/SCP822?scpbr=the-alexandria-project.\n\nInterferon Treatment of Mouse Nasal Mucosa\nLibraries corresponding to 4 mice, with 2 Seq-Well arrays per mouse were sequenced using Illumina NextSeq as described (Gierahn et al., 2017, Hughes et al., 2019). Reads were aligned to the mm10 genome and processed according to the Drop-Seq Computational Protocol v2.0 (https://github.com/broadinstitute/Drop-seq). Data was normalized and scaled using the Seurat R package v2.3.4 (https://satijalab.org/seurat/): transforming the data to loge(UMI+1) and applying a scale factor of 10,000. Cells with fewer than 1000 UMIs and 500 unique genes were removed. To identify major axes of variation within our data, we first examined only highly variable genes across all cells, yielding approximately 5,000 variable genes. An approximate principal component analysis was applied to the cells to generate 200 principal components (PCs). Using a combination of the Jackstraw function in Seurat and observing the “elbow” of the standard deviations of PCs, we chose the top 70 PCs for subsequent clustering and visualization. For 2D visualization, we used a Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction technique (https://github.com/lmcinnes/umap) with “min_dist” set to 0.3 and “n_neighbors” set to 50. To identify clusters of transcriptionally similar cells, we employed unsupervised clustering as described above using the FindClusters tool within the Seurat R package with default parameters and k.param set to 10. Resolution was chosen based on maximization of the average silhouette width across all cells. Clusters were merged if a cell type expressed fewer than 25 significantly upregulated genes by differential expression test (FindAllMarkers implemented in Seurat, setting “test.use” to “bimod,” Bonferroni-adjusted p value cutoff \u003c 0.001). Differential expression tests between cells from saline-treated or IFNa-treated mice were assessed using the FindMarkers function with “test.use” set to “bimod. This dataset can be visualized and downloaded here:\nhttps://singlecell.broadinstitute.org/single_cell/study/SCP832?scpbr=the-alexandria-project#study-summary.\n\nLung from MHV68-Infected WT and IFNγR KO Mice\nLibraries corresponding to 14 mice were aligned to a custom reference genome encompassing both murine (mm10) and herpes virus genes: 84 known genes from MHV68 were retrieved from NCBI (NCBI: txid33708) and added to the mm10 mouse genome. Reads were aligned to the custom joint genome and processed according to the Drop-Seq Computational Protocol v2.0 (https://github.com/broadinstitute/Drop-seq). Barcodes with \u003c 200 unique genes, \u003e 20,000 UMI counts, and \u003e 30% of transcript counts derived from mitochondrially encoded genes were discarded. Data analysis was performed using the Scanpy Package following the common procedure, the expression matrices were normalized using scran’s size factor based approach and log transformed via scanpy’s pp.log1p() function (Lun et al., 2016, Wolf et al., 2018). SoupX was utilized to reduce ambient RNA bias, using default parameters with pCut set to 0.3, and was applied to each sample before merging the count matrices (Young and Behjati, 2020). UMI per cell and cell cycle were regressed out. Highly variable genes were selected by running pp.highly_variable_genes() for each sample separately, returning the top 4,000 variable genes per sample, and genes identified in variable in \u003e 5 samples were retained, yielding 14,305 genes. Next, only Epcam+ cells were considered, principal components (PCs) were calculated using only the selected variable genes, and 6 PCs were used to perform unsupervised Louvain clustering. Type I Pneumocytes were excluded from this analysis based on uniformly negative expression of Ace2, resulting in a final dataset subset of 5,558 cells. Cells were identified as infected if at least one viral read was detected.\n\nNasal Washes during Influenza Infection\nSample processing, sequencing, and analysis was performed as in (Cao et al., 2020). Reads were aligned to the GRCh37 reference genome combined with influenza genomes. Mapped reads from each sample were then corrected for Drop-seq barcode synthesis error using the Drop-seq core computational tools developed by the McCarroll Lab (Macosko et al., 2015). Genes were quantified using End Sequence Analysis Toolkit (ESAT, github/garber-lab/ESAT) with parameters -wlen 100 -wOlap 50 -wExt 0 -scPrep (Derr et al., 2016). Finally, UMIs that likely result from sequencing errors were corrected by merging any UMIs that were observed only once and have 1 hamming distance from a UMI detected by two or more aligned reads. Only cell barcodes with more than 1,000 UMIs were analyzed. Cell barcodes with mostly erythrocyte genes (HBA, HBB) were removed. From here on, the remaining cell barcodes in the matrix would be referred to as cells. The final gene by cell matrix was normalized using the scran package v3.10 (Lun et al., 2016). The normalized matrix was used for dimensionality reduction by first selecting variable genes that had a high coefficient of variance (CV) and were expressed (\u003e = 1 UMI) by more than three cells. Influenza viral genes, interferon stimulated genes, and cell cycle related genes were removed from the variable gene list in order to minimize the impact of viral responses and mitosis on clustering and cell type identification. This resulted in the selection of 2484 variable genes. t-distributed stochastic neighbor embedding (tSNE) was applied to the first ten principal components (PCs), which explained 95% of the total data variance. Density clustering (Rodriguez and Laio, 2014) was performed on the resulting tSNE coordinates and identified four major clusters: epithelial cells, neutrophils, macrophages and leukocytes. The epithelial cell cluster and the leukocyte cluster were then re-clustered independently, as described above, to identify populations within each metacluster. Specifically, the epithelial cell cluster was re-embedded using 2629 variable genes selected by the same criteria mentioned in the previous section and 13 PCs that explained 95% of the variance. Density clustering over the epithelial cell subset revealed ten clusters. Differential gene expression analysis using edgeR (Robinson et al., 2010) was performed to identify marker genes for each cluster. Influenza-infected and bystander cells were identified after correcting for sample-specific distribution of ambient influenza mRNA contamination and predicted cells most likely to be infected identified using a hurdle zero inflated negative binomial (ZINB) model and a support vector machine (SVM) classifier.\n\nPower Calculations for Detection of Rare Transcripts\nWe conducted the following statistical analysis to estimate the effects of various factors on our ability to make confident claims regarding the presence/absence of transcripts of interest (e.g., ACE2), both within individual cells and clusters (Figure S6). Specifically, we investigated the roles of capture/reverse transcription efficiency, ACE2 expression level, sequencing depth, and cell numbers. Taken together, the results of this power analysis are in agreement with other efforts to model biological and technical sources of zero-inflation within scRNA-seq data (e.g., https://satijalab.org/howmanycells and Kharchenko et al., 2014, Svensson, 2020).\nWe began by quantifying how likely we are to capture and transcribe at least one ACE2 mRNA molecule, as a function of the number ACE2 mRNA molecules per cell and a protocol’s efficiency (Figure S6A). Drop-Seq has a capture/transcription efficiency of ∼10% (as estimated using ERCC spike ins; see (Macosko et al., 2015), and the experimental platforms used in this study are either equivalent (e.g., Seq-Well v1, (Gierahn et al., 2017) or superior (e.g., 10-fold better unique molecule detection, 5-fold better gene detection using Seq-Well S3;(Hughes et al., 2019)). Most relevant to this context, inferior turbinate scrapings were processed using both Seq-Well v1 and Seq-Well S3 (Figure S3B). Importantly, Seq-Well S3 provided \u003e two-fold increase in the detection frequency of rare ACE2 transcripts (i.e., ACE2+: 4.7% for v1 versus 9.8% for S3), making it reasonable to expect that such improvements in single-cell experimental technologies have yielded corresponding improvements in capture and transcription efficiency. Based on Drop-Seq’s 10% efficiency, even if ACE2 is expressed at the low level of 5 mRNA molecules per cell (a reasonable order-of-magnitude estimate, given that non-human primate ileum cells had a maximum of 10 ACE2 unique molecules per cell observed via sequencing and an average of 1.93 molecules per cell in expressing cells, see Figures 3B and 3C), our experimental platforms have a minimum likelihood of 41% to capture and reverse transcribe at least one ACE2 mRNA molecule in any given individual cell. This likelihood rapidly increases if we estimate higher efficiencies for improved scRNA-seq technologies (e.g., 67% likelihood within any individual cell at 20% capture/transcription efficiency, 76% likelihood at 25% efficiency, Figure S6A). Thus, while transcript drop-out may reduce the fraction of positive cells, with the capture and transcription efficiencies of improved single-cell technologies, the impact is likely to be minor (reads are likely underestimated by up to a factor of ∼2.5x), given a sufficient depth of sequencing (see below). We note that this impacts both clusters deemed to contain and not contain ACE2+ cells, and suggests our percentages are likely lower bounds for true expression (within a factor of ∼2.5x).\nNext, we examined the probability of sequencing an ACE2 transcript as a function of read depth and ACE2’s fractional abundance in each single cell within our sequencing libraries. First, across two different tissues (non-human primate ileum and lung, representing a high expresser of ACE2 and low expresser, respectively), we calculated the proportion of unique ACE2 molecules in our ACE2+ cells (defined as any cell with at least 1 UMI aligning to ACE2) as a fraction of total reads within individual cells to provide an order-of-magnitude estimate for average ACE2 abundance in our single-cell sequencing libraries (i.e., the probability that a read within a cell corresponds to a unique molecule of ACE2, Figure S6B). We highlight that by calculating probabilities based on ACE2 unique molecules divided by an individual cell’s total reads, we are providing a conservative estimate for the probability of observing ACE2 as a function of sequencing depth (e.g., as compared to basing these probabilities on ACE2 non-UMI-collapsed reads divided by total reads). Next, we obtained information on the number of reads in these cell populations to provide estimates of average sequencing depths (Figure S6C). Using the mean fractional abundances of ACE2 from each tissue (Figure S6B) and the mean read depths for all genes (Figure S6C), we calculated the probability of detecting at least 1 ACE2 molecule (i.e., P(detecting \u003e 0 ACE2 molecules) = 1 - (1 - ACE2 fractional abundance)Read depth). This results in a 93.7% probability in ileum-derived cell libraries that contain ACE2, and a 76.0% probability for lung-derived cell libraries, indicating that our sequencing depths are sufficient to detect ACE2+ cells (Figure S6D).\nTo further evaluate whether our ability to detect ACE2+ cells was an artifact of sequencing depth, we compared the number of ACE2+ cells in a cluster to the mean number of reads across all cells in that same cluster (Figure S6E). We did not observe any significant correlation: the ileum cell cluster with the highest number of ACE2+ cells had the lowest sequencing depth of all ileum clusters, and the lung cell cluster with the highest number of ACE2+ cells was approximately average in its read depth (on a log-log scale, Pearson’s r = −0.31, non-significant). Further, when comparing ACE2+ cells to ACE2- cells within a given tissue, we did not observe a positive correlation between read depth and ACE2 status (i.e., mean ± standard error of the mean, SEM, reads among all lung cells = 28,512 ± 344; mean ± SEM reads among ACE2+ lung cells = 28,553 ± 2,988; mean ± SEM reads among all ileum cells = 14,864 ± 288; mean ± SEM reads among ACE2+ ileum cells = 10,591 ± 441, full statistics on cell depth among ACE2+ cells compared to ACE2- cells of the same cell type can be found in Table S9). Thus, we can be confident that the observed differences in ACE2+ proportions across clusters are not driven by differences in sequencing depth.\nFinally, we investigated how observed differences in ACE2+ proportions across clusters might be affected by cell sampling. Using the proportion of ACE2+ cells in a “typical” cluster annotated as being ACE2 positive (i.e., 6.8% in non-human primate type II pneumocytes, Figure 1), we calculated the cluster sizes needed to be confident that the probability of observing zero to a few positive cells is unlikely to have arisen by random chance (probabilities calculated under a negative binomial distribution with parameter p = 0.068, Figure S6E). We found that as cluster sizes approach and exceed 100 cells, the probability of observing zero to a few positive cells rapidly approaches zero, if we assume 6.8% of cells are positive. Further, to examine our confidence in estimating an approximate upper bound (ignoring the impact of protocol inefficiencies discussed above) for the fraction of cells positive in a cluster as a function of the number of cells in that cluster, we also calculated the probability of observing zero (and its complement, probability of observing at least 1) ACE2+ cells as a function of cluster size across true positive proportions ranging from 0.1% to 10% (probabilities calculated under a negative binomial distribution with parameter p = 0.001 to 0.1, representing hypothetical proportions of ACE2+ cells Figure S6F). Given our typical cluster sizes (on the order of hundreds of cells, exact values provided in Table S9), we find that for us to observe 0 ACE2+ cells in a cluster due to sampling artifacts, the fraction of true positives must be ∼1% or less. Thus, these complementary approaches demonstrate that our observed variations in ACE2+ cell proportions across clusters likely reflect underlying biological differences, rather than random chance.\n\nStatistical Testing\nParameters such as sample size, number of replicates, number of independent experiments, measures of center, dispersion, and precision (mean ± SEM) and statistical significances are reported in Figures and Figure Legends. A p value less than 0.05 was considered significant. Where appropriate, a Bonferroni or FDR correction was used to account for multiple tests, alternative correction methods are noted in the figure legends or Methods. All statistical tests corresponding to differential gene expression are described above and completed using R language for Statistical Computing."}

LitCovid-PD-CLO

LitCovid-PD-CHEBI

LitCovid-PD-GO-BP

LitCovid-sentences

LitCovid-PubTator

2_test

{"project":"2_test","denotations":[{"id":"32413319-24836921-20790576","span":{"begin":1992,"end":1996},"obj":"24836921"},{"id":"32413319-30135581-20790577","span":{"begin":6042,"end":6046},"obj":"30135581"},{"id":"32413319-29608179-20790578","span":{"begin":6215,"end":6219},"obj":"29608179"},{"id":"32413319-25867923-20790579","span":{"begin":6236,"end":6240},"obj":"25867923"},{"id":"32413319-30135581-20790580","span":{"begin":6769,"end":6773},"obj":"30135581"},{"id":"32413319-29409532-20790581","span":{"begin":7330,"end":7334},"obj":"29409532"},{"id":"32413319-28192419-20790582","span":{"begin":9073,"end":9077},"obj":"28192419"},{"id":"32413319-27909575-20790583","span":{"begin":11855,"end":11859},"obj":"27909575"},{"id":"32413319-29409532-20790584","span":{"begin":11874,"end":11878},"obj":"29409532"},{"id":"32413319-26000488-20790585","span":{"begin":13156,"end":13160},"obj":"26000488"},{"id":"32413319-27470110-20790586","span":{"begin":13318,"end":13322},"obj":"27470110"},{"id":"32413319-27909575-20790587","span":{"begin":13827,"end":13831},"obj":"27909575"},{"id":"32413319-24970081-20790588","span":{"begin":14510,"end":14514},"obj":"24970081"},{"id":"32413319-19910308-20790589","span":{"begin":15157,"end":15161},"obj":"19910308"},{"id":"32413319-24836921-20790590","span":{"begin":16220,"end":16224},"obj":"24836921"},{"id":"32413319-31937974-20790591","span":{"begin":16236,"end":16240},"obj":"31937974"},{"id":"32413319-26000488-20790592","span":{"begin":16556,"end":16560},"obj":"26000488"},{"id":"32413319-28192419-20790593","span":{"begin":16672,"end":16676},"obj":"28192419"}],"text":"Quantification and Statistical Analysis\n\nNon-Human Primate Lung and Ileum\nLibraries corresponding to 7 animals (variable number of tissues per animal) were sequenced using Illumina NextSeq. Reads were aligned to the M. mulatta genome assembly 8.0.1 annotation version 102 and processed according to the Drop-Seq Computational Protocol v2.0 (https://github.com/broadinstitute/Drop-seq). Data was normalized and scaled using the Seurat R package v2.3.4 (https://satijalab.org/seurat/): transforming the data to loge(UMI+1) and applying a scale factor of 10,000. To identify major axes of variation within our data, we first examined only highly variable genes across all cells, yielding approximately 1,000-3,000 variable genes with average expression \u003e 0.1 log-normalized UMI across all cells. An approximate principal component analysis was applied to the cells to generate 100 principal components (PCs). Using the JackStraw function within Seurat, we identified significant PCs to be used for subsequent clustering and further dimensionality reduction. For 2D visualization and cell type clustering, we used a Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction technique (https://github.com/lmcinnes/umap) with “min_dist” set to 0.5 and “n_neighbors” set to 30. To identify clusters of transcriptionally similar cells, we employed unsupervised clustering as described above using the FindClusters tool within the Seurat R package with default parameters and k.param set to 10 and resolution set to 0.5. Each cluster was sub-clustered to identify more granular cell types, requiring each cell type to express \u003e 25 significantly upregulated genes by differential expression test (FindMarkers implemented in Seurat, setting “test.use” to “bimod,” Bonferroni-adjusted p value cutoff \u003c 0.001). Differential expression tests between cells from ACE2 + versus ACE2 - Type II Pneumocytes were conducted using the SCDE R package with default parameters (Kharchenko et al., 2014). Expression data for epithelial cells and enterocytes included in this dataset can be visualized and downloaded here: https://singlecell.broadinstitute.org/single_cell/study/SCP807?scpbr=the-alexandria-project#study-summary.\n\nHuman Lung Tissue\nLibraries corresponding to 8 donors were sequenced using Illumina NextSeq. Reads were aligned to the hg19 genome assembly and processed according to the Drop-Seq Computational Protocol v2.0 (https://github.com/broadinstitute/Drop-seq). Data was normalized and scaled using the Seurat R package v3.1.0 (https://satijalab.org/seurat/), transforming the data to loge(UMI+1) and applying a scale factor of 10,000. For each array, we assessed the quality of constructed libraries by examining the distribution of reads, genes and transcripts per cell. Variable gene selection, principal components analysis, and selection of significant principal components was performed as above. We visualized our results in a two-dimensional space using UMAP (https://github.com/lmcinnes/umap), and annotated each cluster based on the identification of highly expressed genes. To further characterize substructure within cell types (for example, T cells), we performed dimensionality reduction (PCA) and clustering over those cells alone. Sub-clusters (i.e., clusters within broad cell type classifications) were annotated by cross-referencing cluster-defining genes with curated gene lists and online databases SaVanT (http://newpathways.mcdb.ucla.edu/savant-dev/) and GSEA/MsigDB (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp). Proliferating cells from the human lung (Figure 2C) express high levels of mitotic markers, such as MKI67, and represent primarily T cells (CD3D, CD3E), B cells/antibody-secreting cells (IGJ, MZB1, IGHG1), and myeloid cells (CD14, APOE) and represent a composite cell cluster. Differential expression analysis between ACE2+ TMPRSS2+ and negative type II pneumocytes was performed in Seurat using a likelihood-ratio test (FindMarkers implemented in Seurat, setting “test.use” to bimod). Expression data for epithelial cells included in this dataset can be visualized and downloaded here: https://singlecell.broadinstitute.org/single_cell/study/SCP814?scpbr=the-alexandria-project#study-summary.\n\nHuman Ileum\nLibraries corresponding to 13 donors were sequenced using Illumina NovaSeq S2 with a Read 1 26bp, Read 2 91bp, Index 1 8bp configuration before reads were aligned to GRCh38. Each sample was filtered individually for low quality cells and genes by analyzing distributions of reads, transcripts, percent reads mapped to mitochondrial genes, and complexity per cell, then merged as an outer join to create a single dataset. Clustering and differential expression tests were processed using Seurat v3.1.0 (https://satijalab.org/seurat/). Normalization and variable gene selection was processed with SCTransform (https://github.com/ChristophH/sctransform). Clustering for major cell types was performed using Louvain clustering on dimensionally reduced PCA space with resolution set via grid search optimizing for maximum average silhouette score. Due to the scale of the dataset, a randomized subsampling from across the dataset was used to calculate the silhouette score. We annotated clusters based on highly expressed genes, then sub-clusters were characterized by performing PCA dimensionality reduction and clustering over those cells alone, and annotated based on highly expressed genes found via one-versus-rest differential expression test (Wilcoxon) within the major cell type. Differential expression analysis between ACE2 + TMPRSS2 + and negative epithelial cells was performed in Seurat using a Wilcoxon test and Bonferroni p value correction. Expression data for epithelial cells included in this dataset can be visualized and downloaded here: https://singlecell.broadinstitute.org/single_cell/study/SCP812?scpbr=the-alexandria-project#study-summary.\n\nHuman Adult Nasal Mucosa\nSample processing, sequencing, and analysis was performed as in (Ordovas-Montanes et al., 2018). Briefly, scRNA-seq cell suspensions were freshly processed using Seq-Well v1 and Seurat v2.3.4 was utilized for computational analyses presented here (Butler et al., 2018, Satija et al., 2015). Cell by gene matrix and R code for initialization of object available to download as Supplemental Data and Supplementary Tables here https://www.nature.com/articles/s41586-018-0449-8 and here:\nhttp://shaleklab.com/resource/mapping-allergic-inflammation/? and visualized here: https://singlecell.broadinstitute.org/single_cell/study/SCP253?scpbr=the-alexandria-project#study-summary. Scores for various cytokines acting on human airway epithelial cells were calculated based on gene lists derived for (Ordovas-Montanes et al., 2018), calculated using AddModuleScore function Seurat, and effect size calculated by Cohen’s d, as previously reported.\n\nGranulomatous Tissue from Mycobacterium Tuberculosis Infected NHPs\nLibraries corresponding to 10 animals (variable number of tissues/animal) were sequenced using Illumina NovaSeq S2. Data was aligned using the Dropseq-tools pipeline on Terra (app.terra.bio) to M. fascicularis reference genome assembly 5, annotation version 101. Clustering was performed using Leiden clustering in the Scanpy (scanpy.readthedocs.io) package (Wolf et al., 2018). Cell type labels were assigned using known marker genes. In this analysis, we include all epithelial cell subsets (secretory, multiciliated, type II pneumocytes, and type I pneumocytes) from all samples. Differential expression between ACE2 + TMPRSS2 + cells and other cells of the matched cell subtype (e.g., Secretory Cells) were performed using the “bimod” likelihood-ratio test within each cell subtype and filtered on Benjamini-Hochberg-corrected p value \u003c 0.05. Expression data for epithelial cells included in this dataset can be visualized and downloaded here:\nhttps://singlecell.broadinstitute.org/single_cell/study/SCP806?scpbr=the-alexandria-project#study-summary.\n\nBasal Cell Cytokine Stimulation\nLibraries corresponding to 279 populations were sequenced using Illumina NextSeq. Reads were aligned to the hg19 or mm10 genome assembly using the cumulus platform https://cumulus-doc.readthedocs.io/en/0.12.0/smart_seq_2.html and output as TPM using RSEM v1.3.2. Populations were transformed to transcripts per 10K reads and log2(1+TP10K) transformed. ACE2 expression by stimulation condition and dose were assessed using one-way ANOVA with post hoc testing using a Bonferroni correction. Plots were generated using ggplot2, and transcriptome-wide differential expression was calculated using the Seurat R package v3.1.0 (https://satijalab.org/seurat), function FindMarkers with test.use = ”bimod.” Expression data can be visualized and downloaded here:\nhttps://singlecell.broadinstitute.org/single_cell/study/SCP822?scpbr=the-alexandria-project.\n\nInterferon Treatment of Mouse Nasal Mucosa\nLibraries corresponding to 4 mice, with 2 Seq-Well arrays per mouse were sequenced using Illumina NextSeq as described (Gierahn et al., 2017, Hughes et al., 2019). Reads were aligned to the mm10 genome and processed according to the Drop-Seq Computational Protocol v2.0 (https://github.com/broadinstitute/Drop-seq). Data was normalized and scaled using the Seurat R package v2.3.4 (https://satijalab.org/seurat/): transforming the data to loge(UMI+1) and applying a scale factor of 10,000. Cells with fewer than 1000 UMIs and 500 unique genes were removed. To identify major axes of variation within our data, we first examined only highly variable genes across all cells, yielding approximately 5,000 variable genes. An approximate principal component analysis was applied to the cells to generate 200 principal components (PCs). Using a combination of the Jackstraw function in Seurat and observing the “elbow” of the standard deviations of PCs, we chose the top 70 PCs for subsequent clustering and visualization. For 2D visualization, we used a Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction technique (https://github.com/lmcinnes/umap) with “min_dist” set to 0.3 and “n_neighbors” set to 50. To identify clusters of transcriptionally similar cells, we employed unsupervised clustering as described above using the FindClusters tool within the Seurat R package with default parameters and k.param set to 10. Resolution was chosen based on maximization of the average silhouette width across all cells. Clusters were merged if a cell type expressed fewer than 25 significantly upregulated genes by differential expression test (FindAllMarkers implemented in Seurat, setting “test.use” to “bimod,” Bonferroni-adjusted p value cutoff \u003c 0.001). Differential expression tests between cells from saline-treated or IFNa-treated mice were assessed using the FindMarkers function with “test.use” set to “bimod. This dataset can be visualized and downloaded here:\nhttps://singlecell.broadinstitute.org/single_cell/study/SCP832?scpbr=the-alexandria-project#study-summary.\n\nLung from MHV68-Infected WT and IFNγR KO Mice\nLibraries corresponding to 14 mice were aligned to a custom reference genome encompassing both murine (mm10) and herpes virus genes: 84 known genes from MHV68 were retrieved from NCBI (NCBI: txid33708) and added to the mm10 mouse genome. Reads were aligned to the custom joint genome and processed according to the Drop-Seq Computational Protocol v2.0 (https://github.com/broadinstitute/Drop-seq). Barcodes with \u003c 200 unique genes, \u003e 20,000 UMI counts, and \u003e 30% of transcript counts derived from mitochondrially encoded genes were discarded. Data analysis was performed using the Scanpy Package following the common procedure, the expression matrices were normalized using scran’s size factor based approach and log transformed via scanpy’s pp.log1p() function (Lun et al., 2016, Wolf et al., 2018). SoupX was utilized to reduce ambient RNA bias, using default parameters with pCut set to 0.3, and was applied to each sample before merging the count matrices (Young and Behjati, 2020). UMI per cell and cell cycle were regressed out. Highly variable genes were selected by running pp.highly_variable_genes() for each sample separately, returning the top 4,000 variable genes per sample, and genes identified in variable in \u003e 5 samples were retained, yielding 14,305 genes. Next, only Epcam+ cells were considered, principal components (PCs) were calculated using only the selected variable genes, and 6 PCs were used to perform unsupervised Louvain clustering. Type I Pneumocytes were excluded from this analysis based on uniformly negative expression of Ace2, resulting in a final dataset subset of 5,558 cells. Cells were identified as infected if at least one viral read was detected.\n\nNasal Washes during Influenza Infection\nSample processing, sequencing, and analysis was performed as in (Cao et al., 2020). Reads were aligned to the GRCh37 reference genome combined with influenza genomes. Mapped reads from each sample were then corrected for Drop-seq barcode synthesis error using the Drop-seq core computational tools developed by the McCarroll Lab (Macosko et al., 2015). Genes were quantified using End Sequence Analysis Toolkit (ESAT, github/garber-lab/ESAT) with parameters -wlen 100 -wOlap 50 -wExt 0 -scPrep (Derr et al., 2016). Finally, UMIs that likely result from sequencing errors were corrected by merging any UMIs that were observed only once and have 1 hamming distance from a UMI detected by two or more aligned reads. Only cell barcodes with more than 1,000 UMIs were analyzed. Cell barcodes with mostly erythrocyte genes (HBA, HBB) were removed. From here on, the remaining cell barcodes in the matrix would be referred to as cells. The final gene by cell matrix was normalized using the scran package v3.10 (Lun et al., 2016). The normalized matrix was used for dimensionality reduction by first selecting variable genes that had a high coefficient of variance (CV) and were expressed (\u003e = 1 UMI) by more than three cells. Influenza viral genes, interferon stimulated genes, and cell cycle related genes were removed from the variable gene list in order to minimize the impact of viral responses and mitosis on clustering and cell type identification. This resulted in the selection of 2484 variable genes. t-distributed stochastic neighbor embedding (tSNE) was applied to the first ten principal components (PCs), which explained 95% of the total data variance. Density clustering (Rodriguez and Laio, 2014) was performed on the resulting tSNE coordinates and identified four major clusters: epithelial cells, neutrophils, macrophages and leukocytes. The epithelial cell cluster and the leukocyte cluster were then re-clustered independently, as described above, to identify populations within each metacluster. Specifically, the epithelial cell cluster was re-embedded using 2629 variable genes selected by the same criteria mentioned in the previous section and 13 PCs that explained 95% of the variance. Density clustering over the epithelial cell subset revealed ten clusters. Differential gene expression analysis using edgeR (Robinson et al., 2010) was performed to identify marker genes for each cluster. Influenza-infected and bystander cells were identified after correcting for sample-specific distribution of ambient influenza mRNA contamination and predicted cells most likely to be infected identified using a hurdle zero inflated negative binomial (ZINB) model and a support vector machine (SVM) classifier.\n\nPower Calculations for Detection of Rare Transcripts\nWe conducted the following statistical analysis to estimate the effects of various factors on our ability to make confident claims regarding the presence/absence of transcripts of interest (e.g., ACE2), both within individual cells and clusters (Figure S6). Specifically, we investigated the roles of capture/reverse transcription efficiency, ACE2 expression level, sequencing depth, and cell numbers. Taken together, the results of this power analysis are in agreement with other efforts to model biological and technical sources of zero-inflation within scRNA-seq data (e.g., https://satijalab.org/howmanycells and Kharchenko et al., 2014, Svensson, 2020).\nWe began by quantifying how likely we are to capture and transcribe at least one ACE2 mRNA molecule, as a function of the number ACE2 mRNA molecules per cell and a protocol’s efficiency (Figure S6A). Drop-Seq has a capture/transcription efficiency of ∼10% (as estimated using ERCC spike ins; see (Macosko et al., 2015), and the experimental platforms used in this study are either equivalent (e.g., Seq-Well v1, (Gierahn et al., 2017) or superior (e.g., 10-fold better unique molecule detection, 5-fold better gene detection using Seq-Well S3;(Hughes et al., 2019)). Most relevant to this context, inferior turbinate scrapings were processed using both Seq-Well v1 and Seq-Well S3 (Figure S3B). Importantly, Seq-Well S3 provided \u003e two-fold increase in the detection frequency of rare ACE2 transcripts (i.e., ACE2+: 4.7% for v1 versus 9.8% for S3), making it reasonable to expect that such improvements in single-cell experimental technologies have yielded corresponding improvements in capture and transcription efficiency. Based on Drop-Seq’s 10% efficiency, even if ACE2 is expressed at the low level of 5 mRNA molecules per cell (a reasonable order-of-magnitude estimate, given that non-human primate ileum cells had a maximum of 10 ACE2 unique molecules per cell observed via sequencing and an average of 1.93 molecules per cell in expressing cells, see Figures 3B and 3C), our experimental platforms have a minimum likelihood of 41% to capture and reverse transcribe at least one ACE2 mRNA molecule in any given individual cell. This likelihood rapidly increases if we estimate higher efficiencies for improved scRNA-seq technologies (e.g., 67% likelihood within any individual cell at 20% capture/transcription efficiency, 76% likelihood at 25% efficiency, Figure S6A). Thus, while transcript drop-out may reduce the fraction of positive cells, with the capture and transcription efficiencies of improved single-cell technologies, the impact is likely to be minor (reads are likely underestimated by up to a factor of ∼2.5x), given a sufficient depth of sequencing (see below). We note that this impacts both clusters deemed to contain and not contain ACE2+ cells, and suggests our percentages are likely lower bounds for true expression (within a factor of ∼2.5x).\nNext, we examined the probability of sequencing an ACE2 transcript as a function of read depth and ACE2’s fractional abundance in each single cell within our sequencing libraries. First, across two different tissues (non-human primate ileum and lung, representing a high expresser of ACE2 and low expresser, respectively), we calculated the proportion of unique ACE2 molecules in our ACE2+ cells (defined as any cell with at least 1 UMI aligning to ACE2) as a fraction of total reads within individual cells to provide an order-of-magnitude estimate for average ACE2 abundance in our single-cell sequencing libraries (i.e., the probability that a read within a cell corresponds to a unique molecule of ACE2, Figure S6B). We highlight that by calculating probabilities based on ACE2 unique molecules divided by an individual cell’s total reads, we are providing a conservative estimate for the probability of observing ACE2 as a function of sequencing depth (e.g., as compared to basing these probabilities on ACE2 non-UMI-collapsed reads divided by total reads). Next, we obtained information on the number of reads in these cell populations to provide estimates of average sequencing depths (Figure S6C). Using the mean fractional abundances of ACE2 from each tissue (Figure S6B) and the mean read depths for all genes (Figure S6C), we calculated the probability of detecting at least 1 ACE2 molecule (i.e., P(detecting \u003e 0 ACE2 molecules) = 1 - (1 - ACE2 fractional abundance)Read depth). This results in a 93.7% probability in ileum-derived cell libraries that contain ACE2, and a 76.0% probability for lung-derived cell libraries, indicating that our sequencing depths are sufficient to detect ACE2+ cells (Figure S6D).\nTo further evaluate whether our ability to detect ACE2+ cells was an artifact of sequencing depth, we compared the number of ACE2+ cells in a cluster to the mean number of reads across all cells in that same cluster (Figure S6E). We did not observe any significant correlation: the ileum cell cluster with the highest number of ACE2+ cells had the lowest sequencing depth of all ileum clusters, and the lung cell cluster with the highest number of ACE2+ cells was approximately average in its read depth (on a log-log scale, Pearson’s r = −0.31, non-significant). Further, when comparing ACE2+ cells to ACE2- cells within a given tissue, we did not observe a positive correlation between read depth and ACE2 status (i.e., mean ± standard error of the mean, SEM, reads among all lung cells = 28,512 ± 344; mean ± SEM reads among ACE2+ lung cells = 28,553 ± 2,988; mean ± SEM reads among all ileum cells = 14,864 ± 288; mean ± SEM reads among ACE2+ ileum cells = 10,591 ± 441, full statistics on cell depth among ACE2+ cells compared to ACE2- cells of the same cell type can be found in Table S9). Thus, we can be confident that the observed differences in ACE2+ proportions across clusters are not driven by differences in sequencing depth.\nFinally, we investigated how observed differences in ACE2+ proportions across clusters might be affected by cell sampling. Using the proportion of ACE2+ cells in a “typical” cluster annotated as being ACE2 positive (i.e., 6.8% in non-human primate type II pneumocytes, Figure 1), we calculated the cluster sizes needed to be confident that the probability of observing zero to a few positive cells is unlikely to have arisen by random chance (probabilities calculated under a negative binomial distribution with parameter p = 0.068, Figure S6E). We found that as cluster sizes approach and exceed 100 cells, the probability of observing zero to a few positive cells rapidly approaches zero, if we assume 6.8% of cells are positive. Further, to examine our confidence in estimating an approximate upper bound (ignoring the impact of protocol inefficiencies discussed above) for the fraction of cells positive in a cluster as a function of the number of cells in that cluster, we also calculated the probability of observing zero (and its complement, probability of observing at least 1) ACE2+ cells as a function of cluster size across true positive proportions ranging from 0.1% to 10% (probabilities calculated under a negative binomial distribution with parameter p = 0.001 to 0.1, representing hypothetical proportions of ACE2+ cells Figure S6F). Given our typical cluster sizes (on the order of hundreds of cells, exact values provided in Table S9), we find that for us to observe 0 ACE2+ cells in a cluster due to sampling artifacts, the fraction of true positives must be ∼1% or less. Thus, these complementary approaches demonstrate that our observed variations in ACE2+ cell proportions across clusters likely reflect underlying biological differences, rather than random chance.\n\nStatistical Testing\nParameters such as sample size, number of replicates, number of independent experiments, measures of center, dispersion, and precision (mean ± SEM) and statistical significances are reported in Figures and Figure Legends. A p value less than 0.05 was considered significant. Where appropriate, a Bonferroni or FDR correction was used to account for multiple tests, alternative correction methods are noted in the figure legends or Methods. All statistical tests corresponding to differential gene expression are described above and completed using R language for Statistical Computing."}

PMC:7252096 / 102220-126093 JSONTXT

Annnotations TAB JSON ListView MergeView

PMC:7252096 / 102220-126093 JSON TXT