CORD-19:e95fc123e6fd12a0293fb0f13255b0233cd32410 JSONTXT 8 Projects

A metagenomics study for the identification of respiratory viruses in mixed clinical specimens: an application of the iterative mapping approach Abstract Metagenomic approaches to detect viral genomes and variants in clinical samples have various challenges, including low viral titers and bacterial and human genome contamination. To address these limitations, we examined a next-generation sequencing (NGS) and iterative mapping approach for virus detection in clinical samples. We analyzed 40 clinical specimens from hospitalized children diagnosed with acute bronchiolitis, croup, or respiratory tract infections in which virus identification by viral culture or polymerase chain reaction (PCR) was unsuccessful. For our NGS data analysis pipeline, clinical samples were pooled into two NGS groups to reduce sequencing costs, and the depth and coverage of assembled contigs were effectively increased using an iterative mapping approach. PCR was individually performed for each specimen according to the NGS-predicted viral type. We successfully detected previously unidentified respiratory viruses in 26 of 40 specimens using our proposed NGS pipeline. Two dominant populations within the detected viruses were human rhinoviruses (HRVs; n = 14) and human coronavirus NL63 (n = 8), followed by human parainfluenza virus (HPIV), human parechovirus, influenza A virus, respiratory syncytial virus (RSV), and human metapneumovirus. This is the first study reporting the complete genome sequences of HRV-A101, HRV-C3, HPIV-4a, and RSV, as well as an analysis of their genetic variants, in Taiwan. These results demonstrate that this NGS pipeline allows to detect viruses which were not identified by routine diagnostic assays, directly from clinical samples. Virus identification using traditional polymerase chain reaction (PCR)-based methods is challenging [1, 2] , particularly when viral loads are low. Genetic diversity of viruses could also lead to mismatches between probes and primer sequences, resulting in incorrect PCR results [3] . To address these challenges, unbiased next-generation sequencing (NGS) techniques have been developed to improve viral discovery. These methods have been shown to be effective for the identification and genomic characterization of influenza A viruses (IAVs) [4] , hepatitis C virus [5] , and other respiratory viruses [6] . Additionally, not all viruses can be cultured using common cell lines, e.g. human rhinovirus (HRV) type C [7] , human bocavirus (HBoV) [8] , and the human Electronic supplementary material The online version of this article (doi:10.1007/s00705-017-3367-4) contains supplementary material, which is available to authorized users. coronaviruses (HCoV) NL63 and HKU1 [3, 9] . Therefore, NGS techniques allow to successfully detect viruses which can be difficult to culture and may lead to erroneous PCR results. In metagenomics, modern genomic techniques are applied to characterize communities of microbial organisms directly from their natural environments, without the isolation and cultivation of individual species [10] . NGS technology has been applied to metagenomes to detect the presence of viral pathogens from single non-cultured specimens [11] , including influenza virus identification and whole-genome sequencing from swab specimens [12] [13] [14] and respiratory virus identification from nasopharyngeal aspirate specimens [15] . However, the direct recovery of viral genomes from clinical specimens using NGS methods has challenges, including noise from host or microbiota cells and the limited viral RNA quantities [16] . To separate viruses from host or background flora, a novel enrichment technique called NetoVir has been developed for highthroughput sample preparation [17] . Notably, viral etiologies were unknown in *25% of 120 clinical specimens collected from children with bronchiolitis [18] , *45% of 336 hospitalized children (\5 years old) with lower respiratory tract infections (RTIs) [19] , and *60% of 2259 clinical specimens from community acquired pneumonia patients with undetected viral pathogens [20] . Although NGS pipelines [21, 22] have been proposed to identify previously unidentified viruses from clinical specimens, sequencing costs remain an issue. In our previous study, to reduce sequencing costs, we applied NGS to pooled samples [6] . An NGS data analysis pipeline was proposed for the detection of unidentified viruses, including human parechovirus (HPeV), using an iterative mapping approach to obtain genome sequences. In this study, clinical specimens suspected of harboring viruses were collected from hospitalized patients in Taiwan. Our NGS data analysis method was applied to identify unknown viral pathogens directly from these clinical specimens and to obtain their genome sequences by iterative mapping. Furthermore, we explored genetic variation within the samples to clarify the molecular evolution of the observed Taiwanese strains. This study was approved by the Institutional Review Board of Chang Gung Medical Foundation, Linkou Medical Center, Taoyuan, Taiwan (approval no. 100-4378B). Specimen collection, pretreatment, and RNA extraction Children (592) hospitalized between 2008 and 2010 with respiratory symptoms were recruited from Chang Gung Memorial Hospital, Taiwan. Among the patients, 105, 124, and 363 were diagnosed with croup, acute bronchiolitis, and RTIs, respectively. Specimens from these patients were examined by routine culture for further viral/bacterial identification and by real-time PCR for viral identification. Negative results were obtained for 20 croup, 29 acute bronchiolitis, and 138 RTIs cases according to routine culture and real-time PCR. Thus, these patients were suspected to have infections caused by unknown/untested viruses. Forty specimens were randomly collected from these samples for further NGS analyses, including ten, nine, and 21 samples from patients diagnosed with croup, acute bronchiolitis, and RTIs, respectively. Among these, 34, five, and one clinical specimens were collected via throat swabs, nasopharyngeal swabs, and sputum. Specimens (0.5 mL each) were divided into two groups; for each group, 20 samples were pooled in a single tube, and the mixed specimens were filtered through 0.22-lm filters to improve analytical sensitivity and reduce background contamination prior to subsequent analyses. For viral particle enrichment, filtered specimens underwent ultracentrifugation using a SW41/Ti rotor at 28,800 9 g, 4°C overnight. Viral RNA extraction from 140 lL of the specimens was performed using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany). To extract nucleic acids, carrier RNA was replaced using linear acrylamide (Life Technologies, Carlsbad, CA, USA) as a precipitation reagent, included in the QIAamp Viral RNA Mini Kit, according to previously published methods [11] . Extracted RNA was eluted using 30 lL of elution buffer and stored at -70°C. Forty specimens were pooled into two NGS groups and cDNA as well as libraries of mixed samples were synthesized using Ovation RNA-Seq System V2 and Ovation Ultralow Library Systems (NuGEN, San Carlos, CA, USA), according to previously published methods [11] . The Illumina MiSeq system was used to obtain paired-end reads (2 9 250 bp). Approximately 4 GB of raw data were generated for each of NGS 1 and 2 (an average throughput of 0.2 GB for each sample), providing 5,996,746 and 8,140,721 paired-end reads, respectively. All of raw reads were deposited to the Sequence Read Archive (SRA) of NCBI with accession number SRP100814. The NGS data analysis pipeline [6] was used to identify unknown pathogens, assemble whole gen-omes, and investigate viral diversity from mixed clinical samples. Briefly, the pipeline consisted of the following steps: 1) the Illumina system was used to collect data from clinical samples; 2) data were preprocessed to generate NGS reads and determine homologs for the assembled contigs using BLASTN (default setting with an E-value threshold of 10 and word size of 20) and then BLASTX (default settings with an E-value threshold of 10); 3) PCR validation was performed; 4) iterative mapping was performed to obtain viral genomes; 5) reference mapping was performed to identify genetic variants. Six genomes obtained in this study were deposited in GenBank, including HPeV-1 (accession number KY460513), human parainfluenza virus 4a (HPIV-4a, KY460514), HRV-A101 (KY460515), and HRV-C3 (KY460516) detected in NGS 1, and HPIV-4a (KY460517) and RSV (KY460519) detected in NGS 2. To investigate the read depth and detect viral genomic variants, reference mapping was performed with genomic templates using Bowtie2 version 2.2.5 [23] with default settings. Positionspecific read counts and nucleotide compositions obtained after read mapping were examined. A genetic variant was identified when mutant nucleotides were greater than 25% of the total reads and read depth was at least 20 [24, 25] . In addition, viral genomes of different genotypes from the same species usually share high sequence homology. Consequently, one NGS read might redundantly map to different templates if each of them was mapped independently. To avoid multiple mappings of single reads in pooled samples, individual templates from different viral genotypes (e.g. three HRV genomes G1, G2, and G3) were concatenated into a longer template (G1 ? G2 ? G3) which served as an initial template in our iterative mapping approach. Each of the 40 clinical specimens was tested by PCR to confirm the results of the NGS data analysis. Virus-specific primers and probes for enterovirus (EV), HCoV-229E, -NL63 and -OC43, human metapneumovirus type 2 (hMPV-2), HPIV-3 and -4, HPeV, HRV, IAV, RSV, and rotavirus (Rota) were used for viral RNA amplification and detection, as listed in Supplementary File 1. HRV and enterovirus (EV) were genotyped using BLASTN searches against the NCBI nucleotide database. Forty clinical specimens were collected and pooled into two NGS groups; 21,333 and 39,458 contigs (defined as a set of overlapping reads, representing a consensus sequence of a partial or whole genome) were assembled from NGS experiments 1 and 2, respectively. These contigs were primarily catalogued as viruses, bacteria, and eukaryotes based on BLASTN results. Viral contigs identified by BLASTN are summarized in Supplementary File 2, and Figure 1 shows the reads classification according to the BLASTN results for the assembled contigs. The dominant populations in NGS 1 and 2 were viral (86%) and bacterial (69%) reads, respectively. Table 1 shows 16 potential viral types identified by BLASTN, including: HCoV-NL63, HPeV-1, HPIV-4a, HRV-A101, -C3, -C4, and -C40, and IAV from NGS 1, as well as HCoV-NL63, HPeV-1, HRV-B92, -C26, and -C40, HPIV-4a, IAV, and RSV from NGS 2. We then calculated the average depths and genome coverages (%) for the mapped contigs. As shown in Table 1 , in NGS 1, the genomic coverage ranged from 48.4% (from HPIV-4a) to 100% [HCoV-NL63, HRV-C40, and the polymerase acidic (PA), neuraminidase (NA), and matrix protein (MP) genes of IAV] and the contig depths ranged from 6.4 (the NA gene of IAV) to 690.0 (HRV-C40). In NGS 2, the genomic coverage ranged from 4.0% (HPeV-1) to 64.3% (HRV-C40) and the contig depth ranged from 2.2 (HPeV-1) to 167.0 (HPIV-4a). Contig coverages presented in Table 1 were estimated as the covered region of contig(s) divided by the length of the coding sequence. Contig depths were calculated as the average number of times each base in the contig was sequenced from different reads. Only 255 and 552 contigs in NGS 1 and 2, respectively, did not show any matches with database sequences; these were further examined using BLASTX searches. Among them, 153 NGS 1 and 380 NGS 2 contigs matched to sequences in the database, and only two NGS 1 and four NGS 2 contigs belonged to the virus group (unclassified RNA viruses or bacteriophages, but not respiratory viruses), as shown in Supplementary File 2. Increasing genomic depths and coverages via iterative mapping Although a few long contigs might completely or nearly span the full-length genome, many short contigs were mapped only partially to detected viruses (one HRV-C40 and nine HRV-C4 contigs encompassed 100% and 56.3% of the genomes, respectively). An iterative mapping approach was used to include as many NGS reads as possible for reference mapping. Briefly, a consensus sequence was generated from one or more short contigs and reference genomes. The reference genome was used to assemble the missing regions in order to generate an initial template because the short contigs might only cover the partial viral genome. Subsequently, the iterative process was initiated based on this combined genome. Table 1 compares the average depth and coverage using the iterative mapping approach (for various iterations) and de novo assembly. Following three iterations, the genomic coverage of the aforementioned HRV-C4 genome increased from 56.3% to 95.2%, with genomic depths from 10.3 to 16.9. The coverages of detected viruses in NGS 1 included nearly complete genomes, except for HRV-C4, with 95.1% coverage, and each of the eight IAV genes, with [77.0% coverage. Furthermore, the contig depths of viral genomes generated by iterative mapping were generally greater than the original contig depths. These results demonstrated that, by using the iterative mapping process, the depths and genomic content of the terminal templates in NGS 1 and 2 were increased compared with those of their respective initial contigs. Read-coverage distributions along the reported viral genomes in NGS 1 ( Fig. 2A-2O ) and 2 ( Fig. 2P-2W ) are shown in Figure 2 . NGS 1 showed depth distributions with [77.0% coverage and average depths from 7.1 to 7001.1. NGS 2 showed depth Following the NGS analysis of mixed samples, virusspecific molecular diagnoses for each of the 40 specimens were determined using the predicted viruses shown in Table 1 . PCR was performed to validate the presence of viruses, including EV, HCoV-229E, -NL63, and -OC43, hMPV-2, HPIV-3 and -4, HPeV, HRV, IAV, RSV, and Rota, using specific PCR primers (Table 2 and Supplementary File 1). PCR results agreed with NGS predictions for NGS 1; however, for sample IDs 40, 22, and 35, PCR analysis detected hMPV, HRV-C3, and HRV-C (without a specific type), respectively, but these could not be identified by NGS. By contrast, HPeV-1 was detected by NGS, with a genomic depth and coverage of 1.6 and 19.4, respectively, but was not identified by PCR. In terms of clinical symptoms, the 40 specimens used in this study were grouped into ten croup, nine acute bronchiolitis, and 21 RTIs cases. The viruses detected in the hospitalized children with acute bronchiolitis included four HRVs of type A101, C3, C4, and C40, followed by two each of HCoV-NL63 and IAV and one each of HPIV-4a and hMPV. The viruses detected in the hospitalized children with acute bronchiolitis included six HCoV-NL63, followed by one each of HRV-B92, HRV-C40, and HPeV-1. The viruses detected in the hospitalized children with RTIs included seven HRV-Cs, followed by one each of HRV-B92, HPIV-4a, and RSV. Notably, the negative was not shown, due to only one read mapping to this gene detection rate of 52.4% (11 of 21) for hospitalized children with RTIs was much greater than the 11% and 20% rates observed for acute bronchiolitis and croup cases, respectively. A limitation of this study was the inability to distinguish NGS reads from different samples with the same viral type; however, HRV-C3, HRV-A101, HPIV-4a, and HPeV-1 in NGS 1 and RSV and HPIV-4a in NGS 2 showed single viral types with complete coverages in their mixed samples. Read mapping was performed to assess the overall depth distribution along the genome and to identify genetic variants exhibiting heterogeneous nucleotide compositions. A genetic variant was defined as including nucleotide positions where the major nucleotide appeared in \75% of the mapped reads with read depths of at least 20. For the aforementioned viruses, ten, one, three, and five nonsynonymous mutations for HRV-C3, HRV-A101, HPIV-4a, and HPeV-1 of NGS 1, respectively, and 17 and three for RSV and HPIV-4a of NGS 2, respectively (Table 3) , were detected. These six genomes were generated by iterative mapping. In addition to genetic variants, we observed two insertions located at positions 904-906 (codon ATA) and 988-990 (codon CCA) in the VP2 gene in our Taiwanese HRV-A101 strain. Compared with three genomes (accession numbers GQ415051, JQ245965, and GQ415052) downloaded from GenBank in October 2016, the first insertion was absent in GQ415051 and JQ245965, and the second insertion was absent in GQ415052. More than 25% of clinical samples harbour undetected viral pathogens [18] [19] [20] ; therefore, new diagnostic tests, such as NGS-based analyses, are required for viral identification. Here, we collected 40 clinical samples for which viruses were not detected by routine viral culture or PCR methods. After applying an NGS data analysis pipeline, we successfully identified viral pathogens from 26 clinical samples (including three co-infections in sample IDs ten, 29, and 40) in two NGS groups. Detected viruses included HCoV-NL63, HPeV-1, HPIV-4a, HRV-A101, -C3, -C4, and -C40, and IAV from NGS 1 and HCoV-NL63, hMPV, HPeV-1, HPIV-4a, HRV-B92, -C26, and -C40, IAV, and RSV from NGS 2. The two dominant viral populations were HRV (n = 13) and HCoV-NL63 (n = 8). Reads of assembled contigs were categorized as viruses, bacteria, and eukaryotes (Fig. 1) . The two pooled samples exhibited different taxonomic compositions; for example, 86% of total reads belonged to the virus group in NGS 1, but only 6% in NGS 2. This difference might be explained by differences in viral copies in mixed samples. Nevertheless, a detection rate of 60% in NGS 2 was obtained using our NGS data analysis pipeline, approaching 70% in NGS 1. In addition to low viral copies, challenges to the recovery of viral genomes from clinical samples using NGS platforms include bacterial and human genome contamination. In this study, we applied an NGS data analysis pipeline using iterative mapping to improve the recovery of virus-derived reads in mixed clinical samples, in order to identify their genomic sequences. Genomic depths and coverages in two NGS samples were effectively increased using this process; however, these increases were not evident for viral isolates [6] . A decrease in contamination with human and bacterial genomes increases the depth and coverage of viral contigs from viral isolates. In this study, viral genomes were identified following B6 iterations (as reference mappings) using this approach. NGS has various advantages over PCR-based methods. Using this technology, whole genomes can be obtained for viral detection, genotyping, and diversity analyses without designing primers or prior knowledge of viral presence [1, 6] . Correlations between NGS reads and PCR cycle threshold values have been observed, suggesting that NGS reads are suitable indicators of viral copy number [2, 26] . In this study, NGS applied to pooled samples offered advantages, including reduced sequencing costs and the ability to screen multiple epidemiological samples for potential outbreaks. However, the use of mixed samples had some limitations, including the inability to assign viral candidates to each clinical sample or to calculate the association between NGS reads and Ct values when multiple viruses of the same type were pooled in a single NGS group. In this study, 40 specimens were collected from 10, nine, and 21 hospitalized children diagnosed with croup, acute bronchiolitis, and RTIs, respectively. The dominant viral population in the acute bronchiolitis cases was HRV (including types A101, C3, C4, and C40), which is inconsistent with previous studies indicating that RSV is the dominant population in patients with bronchiolitis [18, 27] . HRV type C was first reported in 2007 and could not be grown by standard cell culture methods [28] . A systematic approach has been proposed to culture the virus in differentiated epithelial cells of the human airway at the air-liquid interface [29] ; however, this method is not suitable for routine viral diagnosis in clinical laboratories. Accordingly, the prevalence of HRV-C in acute bronchiolitis cases has likely been underestimated. HCoV-NL63, a global respiratory tract pathogen, is a primary cause of respiratory disease in children admitted to hospital in Taiwan, specifically in patients diagnosed with croup [30] . This agrees with our finding showing that six out of ten croup cases were diagnosed as NL63 infections. One study demonstrated that newly discovered viruses might fail to be amplified by RT-PCR assays designed for known viruses, due to primer sequence mismatches [3] . Therefore, NGS analysis, which does not require specific primers and probes, is superior to these diagnostic assays for the detection of viruses harboring novel mutations in PCR primer binding sites. Viruses detected in patients diagnosed with RTIs included seven HRV type C, and one each of HRV type B, RSV, and HPIV-4a. The negative detection rate for patients with RTIs was 52.4% (11 of 21) , which was greater than that for acute bronchiolitis cases (11.1%; one of nine) and croup cases (20%; two of ten). Further investigations are needed to identify currently unknown human viruses in patients diagnosed with RTIs. HRVs, including 11 type C, two B, and one A, were the dominant viruses detected in this study. HRV-C causes more severe illnesses than HRV-A and -B [31, 32] ; however, limited HRV-C genomes have been published due to the lack of availability of culture systems in clinical laboratory settings [33] . More detailed viral typing information, for viruses such as HRV, HEV, hMPV, and RSV, can be obtained by NGS than by PCRbased methods [26] . Our results demonstrated that NGS effectively characterizes respiratory viral infections, specifically those involving HRV; however, two HRVs (one C3 and the other un-typed) and one hMPV were not identified by NGS in NGS 2 of this study. It was difficult to discriminate between all HRV genotypes in this mixed NGS sample because NGS 2 contained eight HRVs that shared similar genetic segments. Additionally, hMPV might exhibit a low viral load in sample ID 40 co-infected with HCoV-NL63. In other words, most of the reads in NGS 2 were derived from viruses with high titers. However, one HPeV-1 in NGS 2 was not identified by PCR. This HPeV exhibited a genomic coverage of 19.4% in our NGS analysis. Further investigations are needed to explain this low coverage (e.g. low viral titers or bleed-through contamination from the HPeV-1 in NGS 1 [34] ). In this metagenomics study, an NGS data analysis pipeline was used to identify viral pathogens from 40 hospitalized patients. No viruses were detected in these clinical samples using routine culture and PCR methods; however, our NGS pipeline with PCR confirmation resulted in a detection rate of 65% (26 of 40 samples). With respect to nucleotide diversity in the detected viruses, 39 polymorphic genetic variants were detected from six whole genomes using mapped NGS reads. Further analyses are needed to determine the biological consequences of this genetic variation. Our results indicated that the NGS method was able to detect viruses that cannot be directly identified by routine culture or PCR methods targeting divergent or other viruses. Furthermore, this NGS method could be used to reveal genetic sequences and detect diversity from mixed samples using an iterative mapping approach. Our results provide the first complete genomes for HRV-A101, HRV-C3, HPIV-4a, and RSV in Taiwan. In future work, we will apply this method to the rapid identification of emerging viruses directly from clinical samples during potential outbreaks as well as to the detection of genetic variants, to further explore the associations between viral genotypes and disease severity.

Annnotations TAB TSV DIC JSON TextAE

  • Denotations: 7
  • Blocks: 0
  • Relations: 0