Sample processing, sequencing, and analysis was performed as in (Cao et al., 2020). Reads were aligned to the GRCh37 reference genome combined with influenza genomes. Mapped reads from each sample were then corrected for Drop-seq barcode synthesis error using the Drop-seq core computational tools developed by the McCarroll Lab (Macosko et al., 2015). Genes were quantified using End Sequence Analysis Toolkit (ESAT, github/garber-lab/ESAT) with parameters -wlen 100 -wOlap 50 -wExt 0 -scPrep (Derr et al., 2016). Finally, UMIs that likely result from sequencing errors were corrected by merging any UMIs that were observed only once and have 1 hamming distance from a UMI detected by two or more aligned reads. Only cell barcodes with more than 1,000 UMIs were analyzed. Cell barcodes with mostly erythrocyte genes (HBA, HBB) were removed. From here on, the remaining cell barcodes in the matrix would be referred to as cells. The final gene by cell matrix was normalized using the scran package v3.10 (Lun et al., 2016). The normalized matrix was used for dimensionality reduction by first selecting variable genes that had a high coefficient of variance (CV) and were expressed (> = 1 UMI) by more than three cells. Influenza viral genes, interferon stimulated genes, and cell cycle related genes were removed from the variable gene list in order to minimize the impact of viral responses and mitosis on clustering and cell type identification. This resulted in the selection of 2484 variable genes. t-distributed stochastic neighbor embedding (tSNE) was applied to the first ten principal components (PCs), which explained 95% of the total data variance. Density clustering (Rodriguez and Laio, 2014) was performed on the resulting tSNE coordinates and identified four major clusters: epithelial cells, neutrophils, macrophages and leukocytes. The epithelial cell cluster and the leukocyte cluster were then re-clustered independently, as described above, to identify populations within each metacluster. Specifically, the epithelial cell cluster was re-embedded using 2629 variable genes selected by the same criteria mentioned in the previous section and 13 PCs that explained 95% of the variance. Density clustering over the epithelial cell subset revealed ten clusters. Differential gene expression analysis using edgeR (Robinson et al., 2010) was performed to identify marker genes for each cluster. Influenza-infected and bystander cells were identified after correcting for sample-specific distribution of ambient influenza mRNA contamination and predicted cells most likely to be infected identified using a hurdle zero inflated negative binomial (ZINB) model and a support vector machine (SVM) classifier.