2. Material and Methods

2.1. Genome Sequences
The SARS-CoV-2 genome sequences from China, Italy, Spain (Valencia), and those for MERS, SARS, OC43 and 229E were obtained from NCBI (GenBank: NC_045512.2, LC528232.1, MT066156.1, MT198652.2, KT225476.2, NC_004718.3, NC_006213.1, NC_002645.1, respectively). The SARS-CoV-2 genome sequences from England (hCoV-19/England/20136087804/2020|EPI_ISL_420910, no treatment) and Turkey (hCoV-19/Turkey/GLAB-CoV008/2020) were obtained from the China National Bioinformatics Center, GISAID database [23] (https://www.gisaid.org). In addition, a SARS-CoV-2 strain isolated from a Turkish patient, and infected to Vero E6 cells passage 4 sequence (hCoV-19/Turkey/ERAGEM-001/2020;) was used for alignment studies with the miRBase mature miRNA search tool.

2.2. miR Prediction
The miRTarget and miRBase programmes were used to predict the similarities between the SARS-CoV-2 genome and human miRs; e-value <10 and score >70 were considered as significant. DianaTools miRPath V3 were then used to create heat maps for pathways affected by selected miRs, focusing on the microT-CDS version 5.0 database. The p value threshold was 0.05 and microT threshold was 0.8. Heatmap analysis was done with pathway intersection [24].

2.3. Mutational Analysis of Potential miRNA Sites
Viral genome sequencing data was obtained from the GISAID database (https://www.gisaid.org), and analysed as multiple sequence alignments using the Clustal Omega at EBI (www.ebi.ac.uk/Tools/msa/clustalo/).

2.4. Pathway Analysis
Bioproject data was obtained from PRJNA615032 bioproject trancriptome data, which includes lung biopsies from SARS-CoV-2-infected patients and healthy volunteers as well as mock and SARS-CoV-2-transfected NHEB and A549 cell lines. The data have been deposited with links to BioProject accession number PRJNA615032 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/). All the selected data were reanalysed at the Rosalind bioinformatics server. Data analysis was performed according to 1.5 fold change between untransfected and transfected cell lines in a data pool calculation for both cell lines at p < 0.05 significance level. Data was analyzed by Rosalind (https://rosalind.onramp.bio/), with a HyperScale architecture developed by OnRamp BioInformatics, Inc. (San Diego, CA, USA). Reads were trimmed using cutadapt [25]. Quality scores were assessed using FastQC [26]. Reads were aligned to the Homo sapiens genome built by GRCh38 using STAR [27]. Individual sample reads were quantified using HTseq [28] and normalized via relative log expression (RLE) using DESeq2 R library [29]. Read distribution percentages, violin plots, identity heatmaps, and sample MDS plots were generated as part of the QC step using RSeQC [30]. DEseq2 was also used to calculate fold changes and p-values and perform optional covariate correction. Clustering of genes for the final heatmap of differentially expressed genes was done using the PAM (partitioning around medoids) method using the fpc R library (https://cran.r-project.org/web/packages/fpc/index.html). Hypergeometric distribution was used to analyze the enrichment of pathways, gene ontology, domain structure, and other ontologies. The topGO R library [31], was used to determine local similarities and dependencies between GO terms in order to perform Elim pruning correction. Several database sources were referenced for enrichment analysis, including Interpro [32], NCBI [33], MSigDB [34,35], REACTOME [36], and WikiPathways [37]. Enrichment was calculated relative to a set of background genes relevant for the experiment.