CORD-19:2f8c155ac0b65122bf485baf2bd6c2fe78e21373 JSONTXT 8 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
TextSentencer_T1 0-129 Sentence denotes Characterization of the Viral Microbiome in Patients with Severe Lower Respiratory Tract Infections, Using Metagenomic Sequencing
TextSentencer_T1 0-129 Sentence denotes Characterization of the Viral Microbiome in Patients with Severe Lower Respiratory Tract Infections, Using Metagenomic Sequencing
TextSentencer_T2 131-139 Sentence denotes Abstract
TextSentencer_T2 131-139 Sentence denotes Abstract
TextSentencer_T3 140-205 Sentence denotes The human respiratory tract is heavily exposed to microorganisms.
TextSentencer_T3 140-205 Sentence denotes The human respiratory tract is heavily exposed to microorganisms.
TextSentencer_T4 206-345 Sentence denotes Viral respiratory tract pathogens, like RSV, influenza and rhinoviruses cause major morbidity and mortality from respiratory tract disease.
TextSentencer_T4 206-345 Sentence denotes Viral respiratory tract pathogens, like RSV, influenza and rhinoviruses cause major morbidity and mortality from respiratory tract disease.
TextSentencer_T5 346-505 Sentence denotes Furthermore, as viruses have limited means of transmission, viruses that cause pathogenicity in other tissues may be transmitted through the respiratory tract.
TextSentencer_T5 346-505 Sentence denotes Furthermore, as viruses have limited means of transmission, viruses that cause pathogenicity in other tissues may be transmitted through the respiratory tract.
TextSentencer_T6 506-578 Sentence denotes It is therefore important to chart the human virome in this compartment.
TextSentencer_T6 506-578 Sentence denotes It is therefore important to chart the human virome in this compartment.
TextSentencer_T7 579-669 Sentence denotes We have studied nasopharyngeal aspirate samples diagnosis of respiratory tract infections.
TextSentencer_T7 579-669 Sentence denotes We have studied nasopharyngeal aspirate samples diagnosis of respiratory tract infections.
TextSentencer_T8 670-797 Sentence denotes We have used a metagenomic sequencing strategy to characterize viruses, as this provides the most unbiased view of the samples.
TextSentencer_T8 670-797 Sentence denotes We have used a metagenomic sequencing strategy to characterize viruses, as this provides the most unbiased view of the samples.
TextSentencer_T9 798-976 Sentence denotes Virus enrichment followed by 454 sequencing resulted in totally 703,790 reads and 110,931 of these were found to be of viral origin by using an automated classification pipeline.
TextSentencer_T9 798-976 Sentence denotes Virus enrichment followed by 454 sequencing resulted in totally 703,790 reads and 110,931 of these were found to be of viral origin by using an automated classification pipeline.
TextSentencer_T10 977-1097 Sentence denotes The snapshot of the respiratory tract virome of these 210 patients revealed 39 species and many more strains of viruses.
TextSentencer_T10 977-1097 Sentence denotes The snapshot of the respiratory tract virome of these 210 patients revealed 39 species and many more strains of viruses.
TextSentencer_T11 1098-1228 Sentence denotes Most of the viral sequences were classified into one of three major families; Paramyxoviridae, Picornaviridae or Orthomyxoviridae.
TextSentencer_T11 1098-1228 Sentence denotes Most of the viral sequences were classified into one of three major families; Paramyxoviridae, Picornaviridae or Orthomyxoviridae.
TextSentencer_T12 1229-1379 Sentence denotes The study also identified one novel type of Rhinovirus C, and identified a number of previously undescribed viral genetic fragments of unknown origin.
TextSentencer_T12 1229-1379 Sentence denotes The study also identified one novel type of Rhinovirus C, and identified a number of previously undescribed viral genetic fragments of unknown origin.
TextSentencer_T13 1381-1524 Sentence denotes Respiratory tract infections account for great morbidity and mortality in the human population and caused almost 4 million deaths in 2008 [1] .
TextSentencer_T13 1381-1524 Sentence denotes Respiratory tract infections account for great morbidity and mortality in the human population and caused almost 4 million deaths in 2008 [1] .
TextSentencer_T14 1525-1611 Sentence denotes A large proportion of these infections have viral etiology, in particular in children.
TextSentencer_T14 1525-1611 Sentence denotes A large proportion of these infections have viral etiology, in particular in children.
TextSentencer_T15 1612-1890 Sentence denotes While previous studies have identified a number of viral etiologic agents, such as rhinovirus, coronavirus, influenzavirus, parainfluenzavirus, respiratory syncytial virus and adenovirus, approximately 30% of all presumed viral cases fail diagnostic tests for these agents [2] .
TextSentencer_T15 1612-1890 Sentence denotes While previous studies have identified a number of viral etiologic agents, such as rhinovirus, coronavirus, influenzavirus, parainfluenzavirus, respiratory syncytial virus and adenovirus, approximately 30% of all presumed viral cases fail diagnostic tests for these agents [2] .
TextSentencer_T16 1891-2033 Sentence denotes Thus, the tests are either inefficient or the causative agent is unrelated to any of the known viruses associated with respiratory infections.
TextSentencer_T16 1891-2033 Sentence denotes Thus, the tests are either inefficient or the causative agent is unrelated to any of the known viruses associated with respiratory infections.
TextSentencer_T17 2034-2261 Sentence denotes In fact, since 2001, several previously undescribed viruses have been identified by analysis of the human respiratory tract, including metapneumovirus [3] , severe acute respiratory syndrome (SARS) [4] and human bocavirus [5] .
TextSentencer_T17 2034-2261 Sentence denotes In fact, since 2001, several previously undescribed viruses have been identified by analysis of the human respiratory tract, including metapneumovirus [3] , severe acute respiratory syndrome (SARS) [4] and human bocavirus [5] .
TextSentencer_T18 2262-2373 Sentence denotes Viruses have limited means of transmission between organisms, but the respiratory tract is one important route.
TextSentencer_T18 2262-2373 Sentence denotes Viruses have limited means of transmission between organisms, but the respiratory tract is one important route.
TextSentencer_T19 2374-2573 Sentence denotes Many viruses that are primarily associated with non-respiratory infections, for example, herpes viruses, enteroviruses and parvovirus B19 [6, 7] , are still transmitted through the respiratory tract.
TextSentencer_T19 2374-2573 Sentence denotes Many viruses that are primarily associated with non-respiratory infections, for example, herpes viruses, enteroviruses and parvovirus B19 [6, 7] , are still transmitted through the respiratory tract.
TextSentencer_T20 2574-2727 Sentence denotes Therefore, the respiratory tract is an excellent starting point for an in-depth characterization of the human virome and to identify novel human viruses.
TextSentencer_T20 2574-2727 Sentence denotes Therefore, the respiratory tract is an excellent starting point for an in-depth characterization of the human virome and to identify novel human viruses.
TextSentencer_T21 2728-2915 Sentence denotes In recent years, viral metagenomics has become an established method both for finding novel viruses and for detecting the presence of known viruses in new environments [5, 8, 9, 10, 11] .
TextSentencer_T21 2728-2915 Sentence denotes In recent years, viral metagenomics has become an established method both for finding novel viruses and for detecting the presence of known viruses in new environments [5, 8, 9, 10, 11] .
TextSentencer_T22 2916-3100 Sentence denotes We have sequenced and characterized the virome, in respiratory tract secretions from hospitalized patients, mainly infants and children, with severe lower respiratory tract infections.
TextSentencer_T22 2916-3100 Sentence denotes We have sequenced and characterized the virome, in respiratory tract secretions from hospitalized patients, mainly infants and children, with severe lower respiratory tract infections.
TextSentencer_T23 3101-3313 Sentence denotes While many pathogens of the respiratory tract is of bacterial origin, due to chemical enrichment for virus, the bacterial sequences found in this study are likely biased and not representative for these patients.
TextSentencer_T23 3101-3313 Sentence denotes While many pathogens of the respiratory tract is of bacterial origin, due to chemical enrichment for virus, the bacterial sequences found in this study are likely biased and not representative for these patients.
TextSentencer_T24 3314-3448 Sentence denotes Even so, we have provided a crude characterization of the bacterial content found in these samples along with contigs of other origin.
TextSentencer_T24 3314-3448 Sentence denotes Even so, we have provided a crude characterization of the bacterial content found in these samples along with contigs of other origin.
TextSentencer_T25 3449-3552 Sentence denotes We confirmed that the lower respiratory tract was a milieu that was rich in viruses, in these patients.
TextSentencer_T25 3449-3552 Sentence denotes We confirmed that the lower respiratory tract was a milieu that was rich in viruses, in these patients.
TextSentencer_T26 3553-3674 Sentence denotes Many known pathogens were identified, but we also found unexpected virus families as well as one novel rhinovirus C type.
TextSentencer_T26 3553-3674 Sentence denotes Many known pathogens were identified, but we also found unexpected virus families as well as one novel rhinovirus C type.
TextSentencer_T27 3675-3965 Sentence denotes In analysis of 210 pool patient samples we have developed an inhouse metagenomic sequence analysis pipeline for screening the sequencing reads, assembling the reads into contigs and finally to carry out extensive homology searches to classify contigs and singletons (outlined in Figure 1 ).
TextSentencer_T27 3675-3965 Sentence denotes In analysis of 210 pool patient samples we have developed an inhouse metagenomic sequence analysis pipeline for screening the sequencing reads, assembling the reads into contigs and finally to carry out extensive homology searches to classify contigs and singletons (outlined in Figure 1 ).
TextSentencer_T28 3966-4128 Sentence denotes The analysis pipeline was comprised of a series of Shell, Perl, Python scripts and C++ programs, and is available for download at http://www.ifm.liu.se/ bioinfo/.
TextSentencer_T28 3966-4128 Sentence denotes The analysis pipeline was comprised of a series of Shell, Perl, Python scripts and C++ programs, and is available for download at http://www.ifm.liu.se/ bioinfo/.
TextSentencer_T29 4129-4180 Sentence denotes Pre-processing of data and de novo genome assembly.
TextSentencer_T29 4129-4180 Sentence denotes Pre-processing of data and de novo genome assembly.
TextSentencer_T30 4181-4266 Sentence denotes Metagenomic data typically include sequences from a multitude of species and strains.
TextSentencer_T30 4181-4266 Sentence denotes Metagenomic data typically include sequences from a multitude of species and strains.
TextSentencer_T31 4267-4405 Sentence denotes In order to reduce the complexity of the data, putative human sequences and repetitive sequences were removed before the assembly process.
TextSentencer_T31 4267-4405 Sentence denotes In order to reduce the complexity of the data, putative human sequences and repetitive sequences were removed before the assembly process.
TextSentencer_T32 4406-4543 Sentence denotes This pre-processing step improved the accuracy of the assembled genomes; see Methods for a complete description of the analysis pipeline.
TextSentencer_T32 4406-4543 Sentence denotes This pre-processing step improved the accuracy of the assembled genomes; see Methods for a complete description of the analysis pipeline.
TextSentencer_T33 4544-4735 Sentence denotes The pre-assembly screening removed almost a 60% of the 454 reads (see Table S1 ) and the RNA-derived library contained fewer human and repetitive reads as compared to the DNA-derived library.
TextSentencer_T33 4544-4735 Sentence denotes The pre-assembly screening removed almost a 60% of the 454 reads (see Table S1 ) and the RNA-derived library contained fewer human and repetitive reads as compared to the DNA-derived library.
TextSentencer_T34 4736-5020 Sentence denotes This likely reflected the nature of the input genetic material, where more human and bacterial genomic DNA was carried through to the DNA pool, whereas cellular RNA contributed to a lesser extent to the RNA pools, possibly due to lower stability of cellular RNA compared to viral RNA.
TextSentencer_T34 4736-5020 Sentence denotes This likely reflected the nature of the input genetic material, where more human and bacterial genomic DNA was carried through to the DNA pool, whereas cellular RNA contributed to a lesser extent to the RNA pools, possibly due to lower stability of cellular RNA compared to viral RNA.
TextSentencer_T35 5021-5115 Sentence denotes All remaining reads were subsequently used for de novo assembly with the MIRA assembler [12] .
TextSentencer_T35 5021-5115 Sentence denotes All remaining reads were subsequently used for de novo assembly with the MIRA assembler [12] .
TextSentencer_T36 5116-5244 Sentence denotes Assembling the sequencing reads into longer contigs significantly increased the accuracy by which sequences could be classified.
TextSentencer_T36 5116-5244 Sentence denotes Assembling the sequencing reads into longer contigs significantly increased the accuracy by which sequences could be classified.
TextSentencer_T37 5245-5396 Sentence denotes Approximately 35% of the reads did not assemble into contigs but in the following discussion these singeltons will also be included in the term contig.
TextSentencer_T37 5245-5396 Sentence denotes Approximately 35% of the reads did not assemble into contigs but in the following discussion these singeltons will also be included in the term contig.
TextSentencer_T38 5397-5456 Sentence denotes Assembly details and statistics are described in Table S2 .
TextSentencer_T38 5397-5456 Sentence denotes Assembly details and statistics are described in Table S2 .
TextSentencer_T39 5457-5502 Sentence denotes Inferring homology for the assembled contigs.
TextSentencer_T39 5457-5502 Sentence denotes Inferring homology for the assembled contigs.
TextSentencer_T40 5503-5667 Sentence denotes In the final homology search phase, each contig was searched against the NCBI nt (minimally non-redundant nucleotide) and nr (nonredundant protein) databases [13] .
TextSentencer_T40 5503-5667 Sentence denotes In the final homology search phase, each contig was searched against the NCBI nt (minimally non-redundant nucleotide) and nr (nonredundant protein) databases [13] .
TextSentencer_T41 5668-5780 Sentence denotes Local alignments were produced using BLAST [14] and the highest scoring hit was assigned as the closest homolog.
TextSentencer_T41 5668-5780 Sentence denotes Local alignments were produced using BLAST [14] and the highest scoring hit was assigned as the closest homolog.
TextSentencer_T42 5781-5936 Sentence denotes Based on the inferred homology, the sequences were divided into the following categories: viruses, bacteria, mammals, others and undefined (see Figure 2 ).
TextSentencer_T42 5781-5936 Sentence denotes Based on the inferred homology, the sequences were divided into the following categories: viruses, bacteria, mammals, others and undefined (see Figure 2 ).
TextSentencer_T43 5937-6169 Sentence denotes The 'undefined' category included sequences that lacked a known homolog, or for which almost equally 'close' homologs were found in more than one category; see Materials and Methods for a detailed description of category assignment.
TextSentencer_T43 5937-6169 Sentence denotes The 'undefined' category included sequences that lacked a known homolog, or for which almost equally 'close' homologs were found in more than one category; see Materials and Methods for a detailed description of category assignment.
TextSentencer_T44 6170-6283 Sentence denotes All sequences that were reliably classified as non-human were submitted to Genbank, as a GenomeProject, ID 64629.
TextSentencer_T44 6170-6283 Sentence denotes All sequences that were reliably classified as non-human were submitted to Genbank, as a GenomeProject, ID 64629.
TextSentencer_T45 6285-6322 Sentence denotes Quantification of the sample content.
TextSentencer_T45 6285-6322 Sentence denotes Quantification of the sample content.
TextSentencer_T46 6323-6605 Sentence denotes For the purpose of characterizing the sample content we have compared the number of reads, derived from the assembled contigs, rather than comparing the number of contigs, since the amount of sequenced reads is more directly correlated with the amount of DNA in the original sample.
TextSentencer_T46 6323-6605 Sentence denotes For the purpose of characterizing the sample content we have compared the number of reads, derived from the assembled contigs, rather than comparing the number of contigs, since the amount of sequenced reads is more directly correlated with the amount of DNA in the original sample.
TextSentencer_T47 6606-6734 Sentence denotes The reason for this is the variation in copy numbers and sizes of the various genomes that were present in the original samples.
TextSentencer_T47 6606-6734 Sentence denotes The reason for this is the variation in copy numbers and sizes of the various genomes that were present in the original samples.
TextSentencer_T48 6735-6916 Sentence denotes This was evident from Figure 2 for example, where almost 40% of the sequence reads were of viral origin but after assembly only 4% of the assembled contigs were classified as viral.
TextSentencer_T48 6735-6916 Sentence denotes This was evident from Figure 2 for example, where almost 40% of the sequence reads were of viral origin but after assembly only 4% of the assembled contigs were classified as viral.
TextSentencer_T49 6917-6952 Sentence denotes Contigs showing non-viral homology.
TextSentencer_T49 6917-6952 Sentence denotes Contigs showing non-viral homology.
TextSentencer_T50 6953-7177 Sentence denotes The largest nonviral portion of the libraries, even after the initial removal of most human sequences, consisted of contigs of putative mammalian origin ( Figure 2A ) and we expected that these were almost exclusively human.
TextSentencer_T50 6953-7177 Sentence denotes The largest nonviral portion of the libraries, even after the initial removal of most human sequences, consisted of contigs of putative mammalian origin ( Figure 2A ) and we expected that these were almost exclusively human.
TextSentencer_T51 7178-7260 Sentence denotes Based on the closest homolog, 98% of the mammalian sequences were of human origin.
TextSentencer_T51 7178-7260 Sentence denotes Based on the closest homolog, 98% of the mammalian sequences were of human origin.
TextSentencer_T52 7261-7426 Sentence denotes Except for the mitochondrion, which had higher coverage, the distribution across all human chromosomes was even, considering the size of each chromosome (Table S3) .
TextSentencer_T52 7261-7426 Sentence denotes Except for the mitochondrion, which had higher coverage, the distribution across all human chromosomes was even, considering the size of each chromosome (Table S3) .
TextSentencer_T53 7427-7565 Sentence denotes For the remaining 2% of the mammalian sequences, the closest homolog originated from other primates or other mammals, for example rodents.
TextSentencer_T53 7427-7565 Sentence denotes For the remaining 2% of the mammalian sequences, the closest homolog originated from other primates or other mammals, for example rodents.
TextSentencer_T54 7566-7684 Sentence denotes A possible explanation for this is that the human homolog has in some cases not yet been reported in public databases.
TextSentencer_T54 7566-7684 Sentence denotes A possible explanation for this is that the human homolog has in some cases not yet been reported in public databases.
TextSentencer_T55 7685-7836 Sentence denotes Another possibility is the introduction of small amounts of animal DNA through the molecular biology reagents used in the library construction process.
TextSentencer_T55 7685-7836 Sentence denotes Another possibility is the introduction of small amounts of animal DNA through the molecular biology reagents used in the library construction process.
TextSentencer_T56 7837-7959 Sentence denotes Bacterial sequences made up the second largest non-viral portion of the data set, 23% of the sequence reads ( Figure 2A ).
TextSentencer_T56 7837-7959 Sentence denotes Bacterial sequences made up the second largest non-viral portion of the data set, 23% of the sequence reads ( Figure 2A ).
TextSentencer_T57 7960-8077 Sentence denotes The bacterial contigs were split further into classes, as defined by their closest homolog (summarized in Figure 3 ).
TextSentencer_T57 7960-8077 Sentence denotes The bacterial contigs were split further into classes, as defined by their closest homolog (summarized in Figure 3 ).
TextSentencer_T58 8078-8175 Sentence denotes To gain further insight, the sample was split into putative species by using the closest homolog.
TextSentencer_T58 8078-8175 Sentence denotes To gain further insight, the sample was split into putative species by using the closest homolog.
TextSentencer_T59 8176-8260 Sentence denotes Bacterial species for which more than 30 sequences were found are shown in Table 1 .
TextSentencer_T59 8176-8260 Sentence denotes Bacterial species for which more than 30 sequences were found are shown in Table 1 .
TextSentencer_T60 8261-8492 Sentence denotes These included Haemophilus influenzae, Streptococcus pneumoniae and Moraxella catarrhalis, which are known to frequently colonize the nasopharynx of infants and children and are also common pathogens in the respiratory tract [15] .
TextSentencer_T60 8261-8492 Sentence denotes These included Haemophilus influenzae, Streptococcus pneumoniae and Moraxella catarrhalis, which are known to frequently colonize the nasopharynx of infants and children and are also common pathogens in the respiratory tract [15] .
TextSentencer_T61 8493-8678 Sentence denotes However, since chemical and physical purification for virus was employed prior to sequencing all bacterial findings are likely to be biased and are not representative for these samples.
TextSentencer_T61 8493-8678 Sentence denotes However, since chemical and physical purification for virus was employed prior to sequencing all bacterial findings are likely to be biased and are not representative for these samples.
TextSentencer_T62 8679-8828 Sentence denotes To further investigate this likely bias the bacterial content was analyzed further in regard to ribosomal RNA (rRNA) or genomic origin, see Table 2 .
TextSentencer_T62 8679-8828 Sentence denotes To further investigate this likely bias the bacterial content was analyzed further in regard to ribosomal RNA (rRNA) or genomic origin, see Table 2 .
TextSentencer_T63 8829-8960 Sentence denotes As seen in this table, 61% of the leakage of bacterial sequences into the RNA pool was rRNA while rRNA was only 5% of the DNA pool.
TextSentencer_T63 8829-8960 Sentence denotes As seen in this table, 61% of the leakage of bacterial sequences into the RNA pool was rRNA while rRNA was only 5% of the DNA pool.
TextSentencer_T64 8961-9102 Sentence denotes A small portion (1.4%) of the contigs (Figure 2A ) was classified as being from other organisms, besides bacteria, mammals or animal viruses.
TextSentencer_T64 8961-9102 Sentence denotes A small portion (1.4%) of the contigs (Figure 2A ) was classified as being from other organisms, besides bacteria, mammals or animal viruses.
TextSentencer_T65 9103-9228 Sentence denotes A further split into NCBI taxonomy divisions [13] defined by closest homolog of these sequences was performed, see Table S4 .
TextSentencer_T65 9103-9228 Sentence denotes A further split into NCBI taxonomy divisions [13] defined by closest homolog of these sequences was performed, see Table S4 .
TextSentencer_T66 9229-9318 Sentence denotes The sequences included hits to various NCBI divisions of life including phages and fungi.
TextSentencer_T66 9229-9318 Sentence denotes The sequences included hits to various NCBI divisions of life including phages and fungi.
TextSentencer_T67 9319-9422 Sentence denotes Considering all sequences, all NCBI divisions turned out to be represented by at least a few sequences.
TextSentencer_T67 9319-9422 Sentence denotes Considering all sequences, all NCBI divisions turned out to be represented by at least a few sequences.
TextSentencer_T68 9423-9578 Sentence denotes Finally, approximately 12% of the sequences could not be classified, since no homolog was found (e-value.10 23 ) or there were contradicting database hits.
TextSentencer_T68 9423-9578 Sentence denotes Finally, approximately 12% of the sequences could not be classified, since no homolog was found (e-value.10 23 ) or there were contradicting database hits.
TextSentencer_T69 9579-9636 Sentence denotes In Figure 2A this category is referred to as 'undefined'.
TextSentencer_T69 9579-9636 Sentence denotes In Figure 2A this category is referred to as 'undefined'.
TextSentencer_T70 9637-9768 Sentence denotes The ambiguous part of 'undefined' (23974/35113) was split into taxonomy divisions, using the closest homolog (as if non-ambiguous).
TextSentencer_T70 9637-9768 Sentence denotes The ambiguous part of 'undefined' (23974/35113) was split into taxonomy divisions, using the closest homolog (as if non-ambiguous).
TextSentencer_T71 9769-9876 Sentence denotes The resulting, uncertain, major divisions were environmental samples, primate and bacteria (see Table S5 ).
TextSentencer_T71 9769-9876 Sentence denotes The resulting, uncertain, major divisions were environmental samples, primate and bacteria (see Table S5 ).
TextSentencer_T72 9877-9908 Sentence denotes Contigs showing viral homology.
TextSentencer_T72 9877-9908 Sentence denotes Contigs showing viral homology.
TextSentencer_T73 9909-10097 Sentence denotes Viruses made up 39% (n = 110,931) of the reads derived from contigs and thus represented the largest portion of the samples after human and repetitive sequences were removed ( Figure 2A ).
TextSentencer_T73 9909-10097 Sentence denotes Viruses made up 39% (n = 110,931) of the reads derived from contigs and thus represented the largest portion of the samples after human and repetitive sequences were removed ( Figure 2A ).
TextSentencer_T74 10098-10282 Sentence denotes Contigs found to be of viral origin were further divided into virus families based on their closest homolog ( Figure 4A ) and Table 3 shows the complete list of the identified viruses.
TextSentencer_T74 10098-10282 Sentence denotes Contigs found to be of viral origin were further divided into virus families based on their closest homolog ( Figure 4A ) and Table 3 shows the complete list of the identified viruses.
TextSentencer_T75 10283-10442 Sentence denotes To provide a more reliable list of families and species the list was manually curated to address yet unclassified strains which dilute the species designation.
TextSentencer_T75 10283-10442 Sentence denotes To provide a more reliable list of families and species the list was manually curated to address yet unclassified strains which dilute the species designation.
TextSentencer_T76 10443-10541 Sentence denotes Furthermore, alignments assigned an e-value$1e-5, 184 sequences, could not be reliably classified.
TextSentencer_T76 10443-10541 Sentence denotes Furthermore, alignments assigned an e-value$1e-5, 184 sequences, could not be reliably classified.
TextSentencer_T77 10542-10738 Sentence denotes The initial overview showed, as expected, that the main virus species that have previously been associated with LRTI (lower respiratory tract infections) in children were present in these samples.
TextSentencer_T77 10542-10738 Sentence denotes The initial overview showed, as expected, that the main virus species that have previously been associated with LRTI (lower respiratory tract infections) in children were present in these samples.
TextSentencer_T78 10739-10876 Sentence denotes For example, families such as Paramyxoviridae, Orthomyxoviridae and Picornaviridae constituted a large proportion of the viral sequences.
TextSentencer_T78 10739-10876 Sentence denotes For example, families such as Paramyxoviridae, Orthomyxoviridae and Picornaviridae constituted a large proportion of the viral sequences.
TextSentencer_T79 10877-11076 Sentence denotes Within these families several species/types were found and especially within the Picornaviridae family many divergent sequences were found, suggesting the presence of potential new types (see below).
TextSentencer_T79 10877-11076 Sentence denotes Within these families several species/types were found and especially within the Picornaviridae family many divergent sequences were found, suggesting the presence of potential new types (see below).
TextSentencer_T80 11077-11224 Sentence denotes Furthermore, we also identified human bocavirus and a multitude of less common viruses, such as measles virus, circovirus and human picobirnavirus.
TextSentencer_T80 11077-11224 Sentence denotes Furthermore, we also identified human bocavirus and a multitude of less common viruses, such as measles virus, circovirus and human picobirnavirus.
TextSentencer_T81 11225-11308 Sentence denotes Human picobirnavirus has not previously been described in nasopharyngeal aspirates.
TextSentencer_T81 11225-11308 Sentence denotes Human picobirnavirus has not previously been described in nasopharyngeal aspirates.
TextSentencer_T82 11309-11590 Sentence denotes Amongst the species found at high titer, such as HRV (human rhinovirus) from Picornaviridae, there was clear evidence of strains that were represented only by a few sequence reads and we therefore concluded that all viral sequences were not covered by the four 454 sequencing runs.
TextSentencer_T82 11309-11590 Sentence denotes Amongst the species found at high titer, such as HRV (human rhinovirus) from Picornaviridae, there was clear evidence of strains that were represented only by a few sequence reads and we therefore concluded that all viral sequences were not covered by the four 454 sequencing runs.
TextSentencer_T83 11591-11744 Sentence denotes Even so, the analysis clearly showed that a number of hitherto unknown potential pathogens could be and have been discovered using our strategy [5, 16] .
TextSentencer_T83 11591-11744 Sentence denotes Even so, the analysis clearly showed that a number of hitherto unknown potential pathogens could be and have been discovered using our strategy [5, 16] .
TextSentencer_T84 11746-12007 Sentence denotes Common causes for severe lower respiratory tract infections in children, and to some extent in adults, are human respiratory syncytial virus (hRSV), human metapneumovirus (hMPV), human parainfluenza virus (hPIV), influenza virus, and human rhinovirus [17, 18] .
TextSentencer_T84 11746-12007 Sentence denotes Common causes for severe lower respiratory tract infections in children, and to some extent in adults, are human respiratory syncytial virus (hRSV), human metapneumovirus (hMPV), human parainfluenza virus (hPIV), influenza virus, and human rhinovirus [17, 18] .
TextSentencer_T85 12008-12163 Sentence denotes These viruses belong to three families of RNA viruses; Paramyxoviridae (hRSV, hMPV, and hPIV), Orthomyxoviridae (influenza virus) and Picornaviridae (HRV).
TextSentencer_T85 12008-12163 Sentence denotes These viruses belong to three families of RNA viruses; Paramyxoviridae (hRSV, hMPV, and hPIV), Orthomyxoviridae (influenza virus) and Picornaviridae (HRV).
TextSentencer_T86 12164-12265 Sentence denotes These three families together comprised almost 90% of the viral contigs in the samples ( Figure 4A ).
TextSentencer_T86 12164-12265 Sentence denotes These three families together comprised almost 90% of the viral contigs in the samples ( Figure 4A ).
TextSentencer_T87 12266-12282 Sentence denotes Paramyxoviridae.
TextSentencer_T87 12266-12282 Sentence denotes Paramyxoviridae.
TextSentencer_T88 12283-12411 Sentence denotes The most abundant virus family in these samples was Paramyxoviridae, which accounted for 38% of the viral content ( Figure 4A ).
TextSentencer_T88 12283-12411 Sentence denotes The most abundant virus family in these samples was Paramyxoviridae, which accounted for 38% of the viral content ( Figure 4A ).
TextSentencer_T89 12412-12564 Sentence denotes The sequences from this family included 80% human respiratory syncytial virus (hRSV) related reads and 15% human metapneumovirus (hMPV), see Figure 4B .
TextSentencer_T89 12412-12564 Sentence denotes The sequences from this family included 80% human respiratory syncytial virus (hRSV) related reads and 15% human metapneumovirus (hMPV), see Figure 4B .
TextSentencer_T90 12565-12700 Sentence denotes This confirmed previous studies of children with severe lower respiratory tract infections, where both hRSV and hMPV were common [17] .
TextSentencer_T90 12565-12700 Sentence denotes This confirmed previous studies of children with severe lower respiratory tract infections, where both hRSV and hMPV were common [17] .
TextSentencer_T91 12701-12890 Sentence denotes Approximately half of the hRSV homologs were contigs of more than one read and the nucleotide identity to known strains of hRSV varied from 82-100%, for alignments covering at least 100 bp.
TextSentencer_T91 12701-12890 Sentence denotes Approximately half of the hRSV homologs were contigs of more than one read and the nucleotide identity to known strains of hRSV varied from 82-100%, for alignments covering at least 100 bp.
TextSentencer_T92 12891-13085 Sentence denotes The contigs homologous to hRSV of identity bellow 90% could potentially be of new types of hRSV and are spread amongst several genes including; L (large), N (nucleoprotein) and G (glycoprotein).
TextSentencer_T92 12891-13085 Sentence denotes The contigs homologous to hRSV of identity bellow 90% could potentially be of new types of hRSV and are spread amongst several genes including; L (large), N (nucleoprotein) and G (glycoprotein).
TextSentencer_T93 13086-13269 Sentence denotes However, all of these are single read contigs where read quality could affect the results and without longer contigs it is impossible to investigate the possibility of new hRSV types.
TextSentencer_T93 13086-13269 Sentence denotes However, all of these are single read contigs where read quality could affect the results and without longer contigs it is impossible to investigate the possibility of new hRSV types.
TextSentencer_T94 13270-13421 Sentence denotes Similarly, approximately half of the hMPV homologs were also contigs of several reads and the nucleotide identity to known strains varied from 88-100%.
TextSentencer_T94 13270-13421 Sentence denotes Similarly, approximately half of the hMPV homologs were also contigs of several reads and the nucleotide identity to known strains varied from 88-100%.
TextSentencer_T95 13422-13537 Sentence denotes The RNA-derived library contained 52 contigs (103 reads, see Figure 4B ) that were homologous to the measles virus.
TextSentencer_T95 13422-13537 Sentence denotes The RNA-derived library contained 52 contigs (103 reads, see Figure 4B ) that were homologous to the measles virus.
TextSentencer_T96 13538-13658 Sentence denotes The nucleotide identity towards known sequences varied from 91% to 100% (for local alignments covering at least 100 bp).
TextSentencer_T96 13538-13658 Sentence denotes The nucleotide identity towards known sequences varied from 91% to 100% (for local alignments covering at least 100 bp).
TextSentencer_T97 13659-13730 Sentence denotes Due to high vaccination coverage, measles outbreaks are rare in Sweden.
TextSentencer_T97 13659-13730 Sentence denotes Due to high vaccination coverage, measles outbreaks are rare in Sweden.
TextSentencer_T98 13731-13890 Sentence denotes However, measles cases were reported during the sampling time period, and the measles sequences were most likely derived from one or more actual measles cases.
TextSentencer_T98 13731-13890 Sentence denotes However, measles cases were reported during the sampling time period, and the measles sequences were most likely derived from one or more actual measles cases.
TextSentencer_T99 13891-13983 Sentence denotes In order to verify this, the sequences were compared to the MMR vaccine strain (AF266290.1).
TextSentencer_T99 13891-13983 Sentence denotes In order to verify this, the sequences were compared to the MMR vaccine strain (AF266290.1).
TextSentencer_T100 13984-14218 Sentence denotes 34 out of 42 contigs (only accounting for contigs which had an alignment towards a database sequence that covered at least 100 bp) showed lower nucleotide identity compared to the vaccine strain than to other measles database entries.
TextSentencer_T100 13984-14218 Sentence denotes 34 out of 42 contigs (only accounting for contigs which had an alignment towards a database sequence that covered at least 100 bp) showed lower nucleotide identity compared to the vaccine strain than to other measles database entries.
TextSentencer_T101 14219-14295 Sentence denotes Six out of these 34 showed an alignment identity difference of more than 3%.
TextSentencer_T101 14219-14295 Sentence denotes Six out of these 34 showed an alignment identity difference of more than 3%.
TextSentencer_T102 14296-14504 Sentence denotes All these six were alignments of more than 200 bp, among them the longest contig of 402 bp, which showed 90.7% identity to the vaccine strain while having 95% identity towards the T11wild strain (AB481087.1).
TextSentencer_T102 14296-14504 Sentence denotes All these six were alignments of more than 200 bp, among them the longest contig of 402 bp, which showed 90.7% identity to the vaccine strain while having 95% identity towards the T11wild strain (AB481087.1).
TextSentencer_T103 14505-14660 Sentence denotes This indicated that the measles virus sequences that were found in these samples are derived from wildtype measles rather than exposure to the MMR vaccine.
TextSentencer_T103 14505-14660 Sentence denotes This indicated that the measles virus sequences that were found in these samples are derived from wildtype measles rather than exposure to the MMR vaccine.
TextSentencer_T104 14661-14676 Sentence denotes Picornaviridae.
TextSentencer_T104 14661-14676 Sentence denotes Picornaviridae.
TextSentencer_T105 14677-14795 Sentence denotes The second most abundant virus family was Picornaviridae, which accounted for 31% of the sequence reads ( Figure 4A ).
TextSentencer_T105 14677-14795 Sentence denotes The second most abundant virus family was Picornaviridae, which accounted for 31% of the sequence reads ( Figure 4A ).
TextSentencer_T106 14796-14909 Sentence denotes The family was split further into rhinovirus A (65%), rhinovirus B (0.1%) and rhinovirus C (35%), see Figure 4B .
TextSentencer_T106 14796-14909 Sentence denotes The family was split further into rhinovirus A (65%), rhinovirus B (0.1%) and rhinovirus C (35%), see Figure 4B .
TextSentencer_T107 14910-15052 Sentence denotes The rhinovirus A and B homologs showed more than 90% residue identity to known strains while the rhinovirus C homologs showed more divergence.
TextSentencer_T107 14910-15052 Sentence denotes The rhinovirus A and B homologs showed more than 90% residue identity to known strains while the rhinovirus C homologs showed more divergence.
TextSentencer_T108 15053-15153 Sentence denotes Among the rhinovirus C sequences, two long contigs, spanning the whole VP1 gene, could be assembled.
TextSentencer_T108 15053-15153 Sentence denotes Among the rhinovirus C sequences, two long contigs, spanning the whole VP1 gene, could be assembled.
TextSentencer_T109 15154-15312 Sentence denotes Both contigs shared less than 80% amino acid identity to known rhinoviruses, which likely suggests the presence of novel types of rhinovirus C in our samples.
TextSentencer_T109 15154-15312 Sentence denotes Both contigs shared less than 80% amino acid identity to known rhinoviruses, which likely suggests the presence of novel types of rhinovirus C in our samples.
TextSentencer_T110 15313-15458 Sentence denotes Human Rhinovirus C (HRV-C) of the Enterovirus genus is a recently discovered species that is associated with severe respiratory infections [19] .
TextSentencer_T110 15313-15458 Sentence denotes Human Rhinovirus C (HRV-C) of the Enterovirus genus is a recently discovered species that is associated with severe respiratory infections [19] .
TextSentencer_T111 15459-15566 Sentence denotes The first of the two sequences consisted of 6,858 bp in 7,671 reads thus spanning almost the entire genome.
TextSentencer_T111 15459-15566 Sentence denotes The first of the two sequences consisted of 6,858 bp in 7,671 reads thus spanning almost the entire genome.
TextSentencer_T112 15567-15860 Sentence denotes Following the guidelines recently proposed as demarcation criteria for novel types of the HRV-C species [20, 21] this genome was deposited into Genbank under the accession number JF436925 and reported to the Picornaviridae Study Group (2010- [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] .
TextSentencer_T112 15567-15860 Sentence denotes Following the guidelines recently proposed as demarcation criteria for novel types of the HRV-C species [20, 21] this genome was deposited into Genbank under the accession number JF436925 and reported to the Picornaviridae Study Group (2010- [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] .
TextSentencer_T113 15861-15977 Sentence denotes The study group has tentatively designated the provided sequence as the prototype sequence of a novel type, HRV-C35.
TextSentencer_T113 15861-15977 Sentence denotes The study group has tentatively designated the provided sequence as the prototype sequence of a novel type, HRV-C35.
TextSentencer_T114 15978-16136 Sentence denotes The phylogenetic relationship between HRV-C35, other reported members of the HRV-C and representative types of the HRV-A and -B species is shown in Figure 5 .
TextSentencer_T114 15978-16136 Sentence denotes The phylogenetic relationship between HRV-C35, other reported members of the HRV-C and representative types of the HRV-A and -B species is shown in Figure 5 .
TextSentencer_T115 16137-16336 Sentence denotes The second HRV-C sequence which also covered almost the entire genome (6,482 bp in 898 reads) was identical to HRV-C34 (unpublished results), which we previously extracted from these patient samples.
TextSentencer_T115 16137-16336 Sentence denotes The second HRV-C sequence which also covered almost the entire genome (6,482 bp in 898 reads) was identical to HRV-C34 (unpublished results), which we previously extracted from these patient samples.
TextSentencer_T116 16337-16527 Sentence denotes The metagenomic sequence showed two nucleotide substitutions (A to C and W to A) when aligned to the Sanger sequenced and PCR verified sequence previously extracted and submitted as HRV-C34.
TextSentencer_T116 16337-16527 Sentence denotes The metagenomic sequence showed two nucleotide substitutions (A to C and W to A) when aligned to the Sanger sequenced and PCR verified sequence previously extracted and submitted as HRV-C34.
TextSentencer_T117 16528-16545 Sentence denotes Orthomyxoviridae.
TextSentencer_T117 16528-16545 Sentence denotes Orthomyxoviridae.
TextSentencer_T118 16546-16679 Sentence denotes The third most abundant family was Orthomyxoviridae, which accounted for 21% of the viral sequences of the sample pool ( Figure 4A ).
TextSentencer_T118 16546-16679 Sentence denotes The third most abundant family was Orthomyxoviridae, which accounted for 21% of the viral sequences of the sample pool ( Figure 4A ).
TextSentencer_T119 16680-16866 Sentence denotes The majority (96%) of contigs from this family were homologous to the influenza A virus, and the remaining contigs belonged to influenza B (2.3%) and to influenza C (1.7%) ( Figure S1 ).
TextSentencer_T119 16680-16866 Sentence denotes The majority (96%) of contigs from this family were homologous to the influenza A virus, and the remaining contigs belonged to influenza B (2.3%) and to influenza C (1.7%) ( Figure S1 ).
TextSentencer_T120 16867-17018 Sentence denotes All of the influenza A homologs, except two, were most closely related to the H3N2 subtype, while the two exceptions were most closely related to H9N2.
TextSentencer_T120 16867-17018 Sentence denotes All of the influenza A homologs, except two, were most closely related to the H3N2 subtype, while the two exceptions were most closely related to H9N2.
TextSentencer_T121 17019-17127 Sentence denotes For these two exceptions; the difference in identity between the H9N2 and H3N2 database sequences was small.
TextSentencer_T121 17019-17127 Sentence denotes For these two exceptions; the difference in identity between the H9N2 and H3N2 database sequences was small.
TextSentencer_T122 17128-17358 Sentence denotes In context of the vast amounts of H3N2 in the samples and since H9N2 is a bird strain of influenza [22] , although human infections have been described [23] , it is more likely that these two sequences were in fact of H3N2 origin.
TextSentencer_T122 17128-17358 Sentence denotes In context of the vast amounts of H3N2 in the samples and since H9N2 is a bird strain of influenza [22] , although human infections have been described [23] , it is more likely that these two sequences were in fact of H3N2 origin.
TextSentencer_T123 17359-17446 Sentence denotes H3N2 was the predominant influenza A subtype circulating in Stockholm in 2004 and 2005.
TextSentencer_T123 17359-17446 Sentence denotes H3N2 was the predominant influenza A subtype circulating in Stockholm in 2004 and 2005.
TextSentencer_T124 17447-17680 Sentence denotes In addition to the more prevalent viruses, we also observed a multitude of less well-represented viruses, ranging from known pathogens with/without confirmed human respiratory tract pathogenicity to likely environmental contaminants.
TextSentencer_T124 17447-17680 Sentence denotes In addition to the more prevalent viruses, we also observed a multitude of less well-represented viruses, ranging from known pathogens with/without confirmed human respiratory tract pathogenicity to likely environmental contaminants.
TextSentencer_T125 17681-17781 Sentence denotes The results are summarized in Figure 4A and Table 3 and we will highlight some of the findings here.
TextSentencer_T125 17681-17781 Sentence denotes The results are summarized in Figure 4A and Table 3 and we will highlight some of the findings here.
TextSentencer_T126 17782-17813 Sentence denotes Human bocavirus (Parvoviridae).
TextSentencer_T126 17782-17813 Sentence denotes Human bocavirus (Parvoviridae).
TextSentencer_T127 17814-18035 Sentence denotes The sample contained 161 contigs that were homologous to Parvoviridae and the vast The contigs split by virus species defined through closest homolog and sorted by the descending total number of derived reads and contigs.
TextSentencer_T127 17814-18035 Sentence denotes The sample contained 161 contigs that were homologous to Parvoviridae and the vast The contigs split by virus species defined through closest homolog and sorted by the descending total number of derived reads and contigs.
TextSentencer_T128 18036-18163 Sentence denotes The species list have been manually curated and grouped in cases where yet unclassified strains dilute the species designation.
TextSentencer_T128 18036-18163 Sentence denotes The species list have been manually curated and grouped in cases where yet unclassified strains dilute the species designation.
TextSentencer_T129 18164-18219 Sentence denotes Species producing a single read have also been removed.
TextSentencer_T129 18164-18219 Sentence denotes Species producing a single read have also been removed.
TextSentencer_T130 18220-18463 Sentence denotes Furthermore, all alignments with an e-value at or above 1e-5 were ignored (this excluded 184 sequences, for which no species designation is provided). doi:10.1371/journal.pone.0030875.t003 majority (154) were close homologs to human bocavirus.
TextSentencer_T130 18220-18463 Sentence denotes Furthermore, all alignments with an e-value at or above 1e-5 were ignored (this excluded 184 sequences, for which no species designation is provided). doi:10.1371/journal.pone.0030875.t003 majority (154) were close homologs to human bocavirus.
TextSentencer_T131 18464-18612 Sentence denotes Among these, all but two contigs shared the highest nucleotide identity with two previously described strains, st1 and st2 [5] (DQ000495, DQ000496).
TextSentencer_T131 18464-18612 Sentence denotes Among these, all but two contigs shared the highest nucleotide identity with two previously described strains, st1 and st2 [5] (DQ000495, DQ000496).
TextSentencer_T132 18613-18725 Sentence denotes The samples from which human bocavirus was originally discovered were included in the studied sample pools [5] .
TextSentencer_T132 18613-18725 Sentence denotes The samples from which human bocavirus was originally discovered were included in the studied sample pools [5] .
TextSentencer_T133 18726-18814 Sentence denotes Thus, st1 and st2 were likely to be the dominant bocavirus sequences in the DNA library.
TextSentencer_T133 18726-18814 Sentence denotes Thus, st1 and st2 were likely to be the dominant bocavirus sequences in the DNA library.
TextSentencer_T134 18815-18961 Sentence denotes The two longest human bocavirus contigs were both homologous to st2 (in total 3,860 reads) and together they covered almost the entire st2 genome.
TextSentencer_T134 18815-18961 Sentence denotes The two longest human bocavirus contigs were both homologous to st2 (in total 3,860 reads) and together they covered almost the entire st2 genome.
TextSentencer_T135 18962-19122 Sentence denotes Due to the high degree of similarity between the st1 and the st2 isolates it was impossible to reliably assign the shorter contigs to either of the two strains.
TextSentencer_T135 18962-19122 Sentence denotes Due to the high degree of similarity between the st1 and the st2 isolates it was impossible to reliably assign the shorter contigs to either of the two strains.
TextSentencer_T136 19123-19153 Sentence denotes Polyomavirus (Polyomaviridae).
TextSentencer_T136 19123-19153 Sentence denotes Polyomavirus (Polyomaviridae).
TextSentencer_T137 19154-19306 Sentence denotes 12 contigs of 25 reads were identified as KI polyomavirus, a human polyomavirus that was originally discovered from a sample included in the pool [16] .
TextSentencer_T137 19154-19306 Sentence denotes 12 contigs of 25 reads were identified as KI polyomavirus, a human polyomavirus that was originally discovered from a sample included in the pool [16] .
TextSentencer_T138 19307-19395 Sentence denotes The low number of reads suggested a very low genome copy number in the original samples.
TextSentencer_T138 19307-19395 Sentence denotes The low number of reads suggested a very low genome copy number in the original samples.
TextSentencer_T139 19396-19475 Sentence denotes There is no evidence indicating that KI polyomavirus is a respiratory pathogen.
TextSentencer_T139 19396-19475 Sentence denotes There is no evidence indicating that KI polyomavirus is a respiratory pathogen.
TextSentencer_T140 19476-19637 Sentence denotes It appears that various human polyomaviruses appear at low copy numbers in the respiratory tract, although their pathogenic role may occur in other organs [24] .
TextSentencer_T140 19476-19637 Sentence denotes It appears that various human polyomaviruses appear at low copy numbers in the respiratory tract, although their pathogenic role may occur in other organs [24] .
TextSentencer_T141 19638-19672 Sentence denotes Torque teno virus (Anelloviridae).
TextSentencer_T141 19638-19672 Sentence denotes Torque teno virus (Anelloviridae).
TextSentencer_T142 19673-19795 Sentence denotes The torque teno virus (TTV) was first discovered in a search for potential causative agents of non-A to G hepatitis [25] .
TextSentencer_T142 19673-19795 Sentence denotes The torque teno virus (TTV) was first discovered in a search for potential causative agents of non-A to G hepatitis [25] .
TextSentencer_T143 19796-19975 Sentence denotes TTV have since proven to be an entire family of viruses with remarkable sequence heterogeneity [26] , which also includes short anelloviruses called torque teno mini virus (TTMV).
TextSentencer_T143 19796-19975 Sentence denotes TTV have since proven to be an entire family of viruses with remarkable sequence heterogeneity [26] , which also includes short anelloviruses called torque teno mini virus (TTMV).
TextSentencer_T144 19976-20089 Sentence denotes TTV and related anelloviruses have been studied extensively, but they have not been shown to be pathogenic [27] .
TextSentencer_T144 19976-20089 Sentence denotes TTV and related anelloviruses have been studied extensively, but they have not been shown to be pathogenic [27] .
TextSentencer_T145 20090-20142 Sentence denotes TTV can be detected in .90% of healthy adults [28] .
TextSentencer_T145 20090-20142 Sentence denotes TTV can be detected in .90% of healthy adults [28] .
TextSentencer_T146 20143-20317 Sentence denotes The sequence identities for TTV/ TTMV (torque teno mini virus) homologs in our sample ranged down to 35% amino acid identity (for local alignments covering more than 100 bp).
TextSentencer_T146 20143-20317 Sentence denotes The sequence identities for TTV/ TTMV (torque teno mini virus) homologs in our sample ranged down to 35% amino acid identity (for local alignments covering more than 100 bp).
TextSentencer_T147 20318-20437 Sentence denotes This indicates that the samples could possibly contain even more distant anelloviruses that were not detected by BLAST.
TextSentencer_T147 20318-20437 Sentence denotes This indicates that the samples could possibly contain even more distant anelloviruses that were not detected by BLAST.
TextSentencer_T148 20438-20478 Sentence denotes Human picobirnavirus (Picobirnaviridae).
TextSentencer_T148 20438-20478 Sentence denotes Human picobirnavirus (Picobirnaviridae).
TextSentencer_T149 20479-20617 Sentence denotes The RNAderived library contained a single contig with 70% amino acid identity to the RNA-dependent RNA polymerase of human picobirnavirus.
TextSentencer_T149 20479-20617 Sentence denotes The RNAderived library contained a single contig with 70% amino acid identity to the RNA-dependent RNA polymerase of human picobirnavirus.
TextSentencer_T150 20618-20745 Sentence denotes It could be amplified directly from one nasopharyngeal aspirate sample, which confirmed that it originated from a human sample.
TextSentencer_T150 20618-20745 Sentence denotes It could be amplified directly from one nasopharyngeal aspirate sample, which confirmed that it originated from a human sample.
TextSentencer_T151 20746-20849 Sentence denotes However, the copy number appeared to be low and the sequence could not immediately be extended further.
TextSentencer_T151 20746-20849 Sentence denotes However, the copy number appeared to be low and the sequence could not immediately be extended further.
TextSentencer_T152 20850-20986 Sentence denotes Human picobirnavirus is the common designation of a range of variable double-stranded RNA-viruses frequently found in human feces [29] .
TextSentencer_T152 20850-20986 Sentence denotes Human picobirnavirus is the common designation of a range of variable double-stranded RNA-viruses frequently found in human feces [29] .
TextSentencer_T153 20987-21077 Sentence denotes Human picobirnaviruses are poorly studied, and nothing is known about their pathogenicity.
TextSentencer_T153 20987-21077 Sentence denotes Human picobirnaviruses are poorly studied, and nothing is known about their pathogenicity.
TextSentencer_T154 21078-21175 Sentence denotes It is not even clear whether humans or intestinal microorganisms are the hosts for these viruses.
TextSentencer_T154 21078-21175 Sentence denotes It is not even clear whether humans or intestinal microorganisms are the hosts for these viruses.
TextSentencer_T155 21176-21336 Sentence denotes The sequence we recovered was relatively distant from previously reported picobirnavirus sequences, and its classification as a virus is not completely certain.
TextSentencer_T155 21176-21336 Sentence denotes The sequence we recovered was relatively distant from previously reported picobirnavirus sequences, and its classification as a virus is not completely certain.
TextSentencer_T156 21337-21372 Sentence denotes Bell pepper virus (Endornaviridae).
TextSentencer_T156 21337-21372 Sentence denotes Bell pepper virus (Endornaviridae).
TextSentencer_T157 21373-21479 Sentence denotes A single contig nearly identical to Bell Pepper Virus (252/253 identical nucleotides, DQ242514) was found.
TextSentencer_T157 21373-21479 Sentence denotes A single contig nearly identical to Bell Pepper Virus (252/253 identical nucleotides, DQ242514) was found.
TextSentencer_T158 21480-21696 Sentence denotes Previously, the occurrence of Pepper Mild Mottle Virus has been described as an indicator of fecal pollution of water as the virus is ingested with food and passes through the gastro-intestinal tract of humans [30] .
TextSentencer_T158 21480-21696 Sentence denotes Previously, the occurrence of Pepper Mild Mottle Virus has been described as an indicator of fecal pollution of water as the virus is ingested with food and passes through the gastro-intestinal tract of humans [30] .
TextSentencer_T159 21697-21803 Sentence denotes It is thus likely that in this case, the virus may have contaminated the nasopharynx of a sampled patient.
TextSentencer_T159 21697-21803 Sentence denotes It is thus likely that in this case, the virus may have contaminated the nasopharynx of a sampled patient.
TextSentencer_T160 21804-21827 Sentence denotes Nanoviridae-like virus.
TextSentencer_T160 21804-21827 Sentence denotes Nanoviridae-like virus.
TextSentencer_T161 21828-21951 Sentence denotes A single read of 252 bp showed weak similarity to the replication initiation protein of banana bunchy top virus (AAG44003).
TextSentencer_T161 21828-21951 Sentence denotes A single read of 252 bp showed weak similarity to the replication initiation protein of banana bunchy top virus (AAG44003).
TextSentencer_T162 21952-22052 Sentence denotes The two sequences were 45.2% identical throughout the amino acid alignment with an e-value of 5e-10.
TextSentencer_T162 21952-22052 Sentence denotes The two sequences were 45.2% identical throughout the amino acid alignment with an e-value of 5e-10.
TextSentencer_T163 22053-22132 Sentence denotes In comparison, the lowest e-value obtained when excluding viral hits was 7e-04.
TextSentencer_T163 22053-22132 Sentence denotes In comparison, the lowest e-value obtained when excluding viral hits was 7e-04.
TextSentencer_T164 22133-22187 Sentence denotes Nanoviridae are currently only known to infect plants.
TextSentencer_T164 22133-22187 Sentence denotes Nanoviridae are currently only known to infect plants.
TextSentencer_T165 22188-22426 Sentence denotes However, sequence homology between Nanoviridae and porcine circovirus suggests that porcine circovirus is distantly related to nanovirus [31] and we cannot exclude the possibility that this fragment could originate from a mammalian virus.
TextSentencer_T165 22188-22426 Sentence denotes However, sequence homology between Nanoviridae and porcine circovirus suggests that porcine circovirus is distantly related to nanovirus [31] and we cannot exclude the possibility that this fragment could originate from a mammalian virus.
TextSentencer_T166 22427-22500 Sentence denotes Densovirus-like and circovirus-like contigs (Parvoviridae, Circoviridae).
TextSentencer_T166 22427-22500 Sentence denotes Densovirus-like and circovirus-like contigs (Parvoviridae, Circoviridae).
TextSentencer_T167 22501-22670 Sentence denotes Four contigs from the Parvoviridae family were homologous to various species of the subfamily densovirus, and likely represent a hitherto undescribed densovirus species.
TextSentencer_T167 22501-22670 Sentence denotes Four contigs from the Parvoviridae family were homologous to various species of the subfamily densovirus, and likely represent a hitherto undescribed densovirus species.
TextSentencer_T168 22671-22721 Sentence denotes Densovirus species are known to infect arthropods.
TextSentencer_T168 22671-22721 Sentence denotes Densovirus species are known to infect arthropods.
TextSentencer_T169 22722-22855 Sentence denotes Presence of the densovirus-like sequences was linked to the use of a specific DNA extraction kit (QIAamp DNA Blood Mini Kit, Qiagen).
TextSentencer_T169 22722-22855 Sentence denotes Presence of the densovirus-like sequences was linked to the use of a specific DNA extraction kit (QIAamp DNA Blood Mini Kit, Qiagen).
TextSentencer_T170 22856-22995 Sentence denotes The densovirus-like sequences were found in all samples, including water, when extracted by this kit, but not when extracted by other kits.
TextSentencer_T170 22856-22995 Sentence denotes The densovirus-like sequences were found in all samples, including water, when extracted by this kit, but not when extracted by other kits.
TextSentencer_T171 22996-23105 Sentence denotes It was concluded that the densovirus-like sequences were not present in the samples, but was reagent-derived.
TextSentencer_T171 22996-23105 Sentence denotes It was concluded that the densovirus-like sequences were not present in the samples, but was reagent-derived.
TextSentencer_T172 23106-23310 Sentence denotes The sample contained 11 contigs where the closest homolog was found within the Circoviridae family, with the amino acid identity ranging from 30% up to 78% (for local alignments covering at least 100 bp).
TextSentencer_T172 23106-23310 Sentence denotes The sample contained 11 contigs where the closest homolog was found within the Circoviridae family, with the amino acid identity ranging from 30% up to 78% (for local alignments covering at least 100 bp).
TextSentencer_T173 23311-23442 Sentence denotes For many of these contigs the closest homologs were circovirus-like genomes that were recently found in reclaimed wastewater [32] .
TextSentencer_T173 23311-23442 Sentence denotes For many of these contigs the closest homologs were circovirus-like genomes that were recently found in reclaimed wastewater [32] .
TextSentencer_T174 23443-23546 Sentence denotes The sequence diversity indicates that the contigs may be derived from more than one circovirus species.
TextSentencer_T174 23443-23546 Sentence denotes The sequence diversity indicates that the contigs may be derived from more than one circovirus species.
TextSentencer_T175 23547-23625 Sentence denotes One of these contigs was further investigated through PCR of original samples.
TextSentencer_T175 23547-23625 Sentence denotes One of these contigs was further investigated through PCR of original samples.
TextSentencer_T176 23626-23765 Sentence denotes The circovirus-like sequence was not repeatedly amplifiable from any of the original samples, but gave intermittently positive PCR results.
TextSentencer_T176 23626-23765 Sentence denotes The circovirus-like sequence was not repeatedly amplifiable from any of the original samples, but gave intermittently positive PCR results.
TextSentencer_T177 23766-23847 Sentence denotes This was interpreted as low copy number presence of the circovirus-like sequence.
TextSentencer_T177 23766-23847 Sentence denotes This was interpreted as low copy number presence of the circovirus-like sequence.
TextSentencer_T178 23848-23955 Sentence denotes The experience from the densovirus-like sequence prompted an investigation of different extraction methods.
TextSentencer_T178 23848-23955 Sentence denotes The experience from the densovirus-like sequence prompted an investigation of different extraction methods.
TextSentencer_T179 23956-24047 Sentence denotes Intermittent PCR positivity appeared only when QIAamp DNA Blood Mini Kit (Qiagen) was used.
TextSentencer_T179 23956-24047 Sentence denotes Intermittent PCR positivity appeared only when QIAamp DNA Blood Mini Kit (Qiagen) was used.
TextSentencer_T180 24048-24153 Sentence denotes We concluded that this sequence was likely reagent-derived and did not pursue any further investigations.
TextSentencer_T180 24048-24153 Sentence denotes We concluded that this sequence was likely reagent-derived and did not pursue any further investigations.
TextSentencer_T181 24154-24286 Sentence denotes We have conducted a metagenomic analysis of both the DNA and RNA viromes in patients with severe lower respiratory tract infections.
TextSentencer_T181 24154-24286 Sentence denotes We have conducted a metagenomic analysis of both the DNA and RNA viromes in patients with severe lower respiratory tract infections.
TextSentencer_T182 24287-24423 Sentence denotes Approximately 700,000 sequence reads were produced using 454 sequencing, corresponding to 110 Mbp when combining the RNA and DNA sample.
TextSentencer_T182 24287-24423 Sentence denotes Approximately 700,000 sequence reads were produced using 454 sequencing, corresponding to 110 Mbp when combining the RNA and DNA sample.
TextSentencer_T183 24424-24576 Sentence denotes The samples showed a great diversity in the viral flora, with a total of 4,757 contigs originating from 39 species and an even larger number of strains.
TextSentencer_T183 24424-24576 Sentence denotes The samples showed a great diversity in the viral flora, with a total of 4,757 contigs originating from 39 species and an even larger number of strains.
TextSentencer_T184 24577-24743 Sentence denotes We have previously shown that this method is sensitive and thus has high potential for virus identification, even when a small number of sequences were produced [5] .
TextSentencer_T184 24577-24743 Sentence denotes We have previously shown that this method is sensitive and thus has high potential for virus identification, even when a small number of sequences were produced [5] .
TextSentencer_T185 24744-24864 Sentence denotes It is clear that the sensitivity of the current protocol is greatly increased, thanks to the capacity of 454 sequencing.
TextSentencer_T185 24744-24864 Sentence denotes It is clear that the sensitivity of the current protocol is greatly increased, thanks to the capacity of 454 sequencing.
TextSentencer_T186 24865-25035 Sentence denotes While it is possible that a viral genome can go undetected, it is likely that most viruses in the samples are represented in the sequence data, even those with low titer.
TextSentencer_T186 24865-25035 Sentence denotes While it is possible that a viral genome can go undetected, it is likely that most viruses in the samples are represented in the sequence data, even those with low titer.
TextSentencer_T187 25036-25181 Sentence denotes However, we can not conclude that viruses were not excluded prior to sequencing, for example by filtering steps in the viral enrichment protocol.
TextSentencer_T187 25036-25181 Sentence denotes However, we can not conclude that viruses were not excluded prior to sequencing, for example by filtering steps in the viral enrichment protocol.
TextSentencer_T188 25182-25429 Sentence denotes We found that by reducing our datasets by removing repetitive reads and reads of human origin pre-assembly, the total time required for sequence assembly and analysis could be significantly reduced and the complexity of the assembly was decreased.
TextSentencer_T188 25182-25429 Sentence denotes We found that by reducing our datasets by removing repetitive reads and reads of human origin pre-assembly, the total time required for sequence assembly and analysis could be significantly reduced and the complexity of the assembly was decreased.
TextSentencer_T189 25430-25589 Sentence denotes This also reduced the risk of mis-assemblies [33] and in our case most notably the risk of chimeric contigs consisting of reads of both human and viral origin.
TextSentencer_T189 25430-25589 Sentence denotes This also reduced the risk of mis-assemblies [33] and in our case most notably the risk of chimeric contigs consisting of reads of both human and viral origin.
TextSentencer_T190 25590-25770 Sentence denotes The filtering criteria were set so that a very high degree of homology to repeats and human sequences was required for removal, in order to avoid the loss of sequences of interest.
TextSentencer_T190 25590-25770 Sentence denotes The filtering criteria were set so that a very high degree of homology to repeats and human sequences was required for removal, in order to avoid the loss of sequences of interest.
TextSentencer_T191 25771-25927 Sentence denotes Viral metagenomics assembly is a non-trivial computational task, even after pre-assembly screening, and there are no specific metagenomic assembly programs.
TextSentencer_T191 25771-25927 Sentence denotes Viral metagenomics assembly is a non-trivial computational task, even after pre-assembly screening, and there are no specific metagenomic assembly programs.
TextSentencer_T192 25928-26021 Sentence denotes However, most genome assemblers appear to solve metagenomic assemblies relatively well [34] .
TextSentencer_T192 25928-26021 Sentence denotes However, most genome assemblers appear to solve metagenomic assemblies relatively well [34] .
TextSentencer_T193 26022-26259 Sentence denotes We have used the MIRA program [12] , since Newbler [35] incorrectly tagged clearly non-repetitive viral sequences as repetitive, most likely due to the variation in coverage caused by uneven titers of the viruses in the original samples.
TextSentencer_T193 26022-26259 Sentence denotes We have used the MIRA program [12] , since Newbler [35] incorrectly tagged clearly non-repetitive viral sequences as repetitive, most likely due to the variation in coverage caused by uneven titers of the viruses in the original samples.
TextSentencer_T194 26260-26418 Sentence denotes We used a more complex method for classification of sequences than in previous studies, where native BLASTx homolog classification was used for the most part.
TextSentencer_T194 26260-26418 Sentence denotes We used a more complex method for classification of sequences than in previous studies, where native BLASTx homolog classification was used for the most part.
TextSentencer_T195 26419-26520 Sentence denotes We improved on this by using thorough nucleotide and translated nucleotide comparison in combination.
TextSentencer_T195 26419-26520 Sentence denotes We improved on this by using thorough nucleotide and translated nucleotide comparison in combination.
TextSentencer_T196 26521-26639 Sentence denotes Also, an additional comparison step was added after BLAST hits were identified in order to improve the classification.
TextSentencer_T196 26521-26639 Sentence denotes Also, an additional comparison step was added after BLAST hits were identified in order to improve the classification.
TextSentencer_T197 26640-26735 Sentence denotes In this step, the scores of the viral hits were compared with the scores of the non-viral hits.
TextSentencer_T197 26640-26735 Sentence denotes In this step, the scores of the viral hits were compared with the scores of the non-viral hits.
TextSentencer_T198 26736-26922 Sentence denotes As an example, this measurement could reveal the difference between a distinctly viral RNA-dependent RNA polymerase (RNAP) and an RNAP that also shared high identity to a bacterial RNAP.
TextSentencer_T198 26736-26922 Sentence denotes As an example, this measurement could reveal the difference between a distinctly viral RNA-dependent RNA polymerase (RNAP) and an RNAP that also shared high identity to a bacterial RNAP.
TextSentencer_T199 26923-27062 Sentence denotes Viral sequences that were not sufficiently distinct from other categories, most commonly bacteria, could therefore be given lower priority.
TextSentencer_T199 26923-27062 Sentence denotes Viral sequences that were not sufficiently distinct from other categories, most commonly bacteria, could therefore be given lower priority.
TextSentencer_T200 27063-27352 Sentence denotes The vast majority of viruses identified in this study belonged to three abundant families that were dominated by four virus species, namely Paramyxoviridae (hRSV and hPIV), Orthomyxoviridae (influenza virus) and Picornaviridae (HRV), all known to be present in the human respiratory tract.
TextSentencer_T200 27063-27352 Sentence denotes The vast majority of viruses identified in this study belonged to three abundant families that were dominated by four virus species, namely Paramyxoviridae (hRSV and hPIV), Orthomyxoviridae (influenza virus) and Picornaviridae (HRV), all known to be present in the human respiratory tract.
TextSentencer_T201 27353-27600 Sentence denotes We also found other viruses known to be replicating in these tissues; including human bocavirus, human coronavirus and measles virus as well as confirmed human viruses for which no pathogenicity has been described, such as TTV and KI polyomavirus.
TextSentencer_T201 27353-27600 Sentence denotes We also found other viruses known to be replicating in these tissues; including human bocavirus, human coronavirus and measles virus as well as confirmed human viruses for which no pathogenicity has been described, such as TTV and KI polyomavirus.
TextSentencer_T202 27601-27747 Sentence denotes Thus, regarding previously known viruses, the results of this study confirmed previous studies of human respiratory tract viruses [3, 6, 15, 17] .
TextSentencer_T202 27601-27747 Sentence denotes Thus, regarding previously known viruses, the results of this study confirmed previous studies of human respiratory tract viruses [3, 6, 15, 17] .
TextSentencer_T203 27748-27876 Sentence denotes In addition, our results expanded the number of identified strains and possibly types and species of anellovirus and rhinovirus.
TextSentencer_T203 27748-27876 Sentence denotes In addition, our results expanded the number of identified strains and possibly types and species of anellovirus and rhinovirus.
TextSentencer_T204 27877-28037 Sentence denotes In particular, we identified one likely new type of human rhinovirus C designated by the Study Group of Picornaviruses as the prototype of human rhinovirus C35.
TextSentencer_T204 27877-28037 Sentence denotes In particular, we identified one likely new type of human rhinovirus C designated by the Study Group of Picornaviruses as the prototype of human rhinovirus C35.
TextSentencer_T205 28038-28219 Sentence denotes While a vast multitude of viruses was found we cannot exclude that the viral enrichment protocol may exclude some, particularly large, viruses which may be relevant human pathogens.
TextSentencer_T205 28038-28219 Sentence denotes While a vast multitude of viruses was found we cannot exclude that the viral enrichment protocol may exclude some, particularly large, viruses which may be relevant human pathogens.
TextSentencer_T206 28220-28410 Sentence denotes A large proportion of the sequences from viruses and bacteria have been confirmed to be a part of the human microflora and many were in fact from known pathogens of the nasopharyngeal tract.
TextSentencer_T206 28220-28410 Sentence denotes A large proportion of the sequences from viruses and bacteria have been confirmed to be a part of the human microflora and many were in fact from known pathogens of the nasopharyngeal tract.
TextSentencer_T207 28411-28627 Sentence denotes However, while the sampled population suffered from severe lower respiratory tract infections, the nasopharyngeal aspirates collected are likely to contain virus which mainly replicate in the upper respiratory tract.
TextSentencer_T207 28411-28627 Sentence denotes However, while the sampled population suffered from severe lower respiratory tract infections, the nasopharyngeal aspirates collected are likely to contain virus which mainly replicate in the upper respiratory tract.
TextSentencer_T208 28628-28724 Sentence denotes The mucosa may also contain temporary microorganisms that are not part of the normal microflora.
TextSentencer_T208 28628-28724 Sentence denotes The mucosa may also contain temporary microorganisms that are not part of the normal microflora.
TextSentencer_T209 28725-28807 Sentence denotes They may come from the environment, for example from dust and from food and water.
TextSentencer_T209 28725-28807 Sentence denotes They may come from the environment, for example from dust and from food and water.
TextSentencer_T210 28808-28886 Sentence denotes It is likely that a proportion of the sequences came from such microorganisms.
TextSentencer_T210 28808-28886 Sentence denotes It is likely that a proportion of the sequences came from such microorganisms.
TextSentencer_T211 28887-28990 Sentence denotes In addition, the reagents used for sample processing may contribute both viral and bacterial sequences.
TextSentencer_T211 28887-28990 Sentence denotes In addition, the reagents used for sample processing may contribute both viral and bacterial sequences.
TextSentencer_T212 28991-29143 Sentence denotes Furthermore, as no healthy controls were sampled the significance of any findings as causing lower respiratory tract infections, can not be established.
TextSentencer_T212 28991-29143 Sentence denotes Furthermore, as no healthy controls were sampled the significance of any findings as causing lower respiratory tract infections, can not be established.
TextSentencer_T213 29144-29330 Sentence denotes We have attempted to provide a complete picture of the viral content of these samples and have applied stringent quality filters to ensure correct classification of the metagenomic data.
TextSentencer_T213 29144-29330 Sentence denotes We have attempted to provide a complete picture of the viral content of these samples and have applied stringent quality filters to ensure correct classification of the metagenomic data.
TextSentencer_T214 29331-29468 Sentence denotes Even so, incomplete databases made accurate classification problematic in some cases where no, or only very distant, homologs were found.
TextSentencer_T214 29331-29468 Sentence denotes Even so, incomplete databases made accurate classification problematic in some cases where no, or only very distant, homologs were found.
TextSentencer_T215 29469-29544 Sentence denotes We anticipate that this problem will decrease as the public databases grow.
TextSentencer_T215 29469-29544 Sentence denotes We anticipate that this problem will decrease as the public databases grow.
TextSentencer_T216 29545-29604 Sentence denotes A related problem was low scores caused by short sequences.
TextSentencer_T216 29545-29604 Sentence denotes A related problem was low scores caused by short sequences.
TextSentencer_T217 29605-29729 Sentence denotes Some of our virus findings will need to be confirmed by additional sequencing to provide the remainder of the virus genomes.
TextSentencer_T217 29605-29729 Sentence denotes Some of our virus findings will need to be confirmed by additional sequencing to provide the remainder of the virus genomes.
TextSentencer_T218 29730-29827 Sentence denotes We have also provided a brief overview of the bacterial contigs where we found several pathogens.
TextSentencer_T218 29730-29827 Sentence denotes We have also provided a brief overview of the bacterial contigs where we found several pathogens.
TextSentencer_T219 29828-30010 Sentence denotes While these results confirm previous finding [15] , due to the viral purification performed we cannot provide an unbiased characterization of the bacterial content in these patients.
TextSentencer_T219 29828-30010 Sentence denotes While these results confirm previous finding [15] , due to the viral purification performed we cannot provide an unbiased characterization of the bacterial content in these patients.
TextSentencer_T220 30011-30091 Sentence denotes It is clear that viral metagenomics provides a crucial tool for virus discovery.
TextSentencer_T220 30011-30091 Sentence denotes It is clear that viral metagenomics provides a crucial tool for virus discovery.
TextSentencer_T221 30092-30253 Sentence denotes Using this approach; no a priori information is needed, as in directed PCR assays, and viruses that are difficult to propagate in cell culture can be discovered.
TextSentencer_T221 30092-30253 Sentence denotes Using this approach; no a priori information is needed, as in directed PCR assays, and viruses that are difficult to propagate in cell culture can be discovered.
TextSentencer_T222 30254-30439 Sentence denotes Furthermore, our reconfirmation of the sequence of HRV-C34 show remarkable accuracy of the metagenomic assembly and ability to recover a HRV-C isolate in the presence of other isolates.
TextSentencer_T222 30254-30439 Sentence denotes Furthermore, our reconfirmation of the sequence of HRV-C34 show remarkable accuracy of the metagenomic assembly and ability to recover a HRV-C isolate in the presence of other isolates.
TextSentencer_T223 30440-30611 Sentence denotes Our results highlight the strength of the method to not only identify novel viruses, but also to identify viruses that were likely to be missed by ordinary clinical tests.
TextSentencer_T223 30440-30611 Sentence denotes Our results highlight the strength of the method to not only identify novel viruses, but also to identify viruses that were likely to be missed by ordinary clinical tests.
TextSentencer_T224 30612-30736 Sentence denotes As sequencing continuously becomes more available and inexpensive, it could also become a viable clinical diagnostic method.
TextSentencer_T224 30612-30736 Sentence denotes As sequencing continuously becomes more available and inexpensive, it could also become a viable clinical diagnostic method.
TextSentencer_T225 30737-30870 Sentence denotes A schematic overview of the entire process from sample collection and preparation throughout the data-analysis is given by Figure 1 .
TextSentencer_T225 30737-30870 Sentence denotes A schematic overview of the entire process from sample collection and preparation throughout the data-analysis is given by Figure 1 .
TextSentencer_T226 30871-30996 Sentence denotes The first two steps describe sample collection up to sequencing and the following steps (3 to 6) describe in silico analyzes.
TextSentencer_T226 30871-30996 Sentence denotes The first two steps describe sample collection up to sequencing and the following steps (3 to 6) describe in silico analyzes.
TextSentencer_T227 30997-31100 Sentence denotes All patient samples were analyzed anonymously and the study was approved by the local ethics committee.
TextSentencer_T227 30997-31100 Sentence denotes All patient samples were analyzed anonymously and the study was approved by the local ethics committee.
TextSentencer_T228 31101-31204 Sentence denotes Two hundred and ten randomly selected, anonymized, nasopharyngeal aspirates were included in the study.
TextSentencer_T228 31101-31204 Sentence denotes Two hundred and ten randomly selected, anonymized, nasopharyngeal aspirates were included in the study.
TextSentencer_T229 31205-31376 Sentence denotes The samples were originally submitted to the Karolinska University Laboratory, Stockholm, Sweden from March 2004 to May 2005 for diagnosis of respiratory tract infections.
TextSentencer_T229 31205-31376 Sentence denotes The samples were originally submitted to the Karolinska University Laboratory, Stockholm, Sweden from March 2004 to May 2005 for diagnosis of respiratory tract infections.
TextSentencer_T230 31377-31449 Sentence denotes The majority of samples were from the children's hospital at Karolinska.
TextSentencer_T230 31377-31449 Sentence denotes The majority of samples were from the children's hospital at Karolinska.
TextSentencer_T231 31450-31510 Sentence denotes 70% of the samples were derived from children under 7 years.
TextSentencer_T231 31450-31510 Sentence denotes 70% of the samples were derived from children under 7 years.
TextSentencer_T232 31511-31648 Sentence denotes The remaining 30% of samples were mainly from adults (mean age 44 years, range 8-92), and collected for diagnosis of suspected influenza.
TextSentencer_T232 31511-31648 Sentence denotes The remaining 30% of samples were mainly from adults (mean age 44 years, range 8-92), and collected for diagnosis of suspected influenza.
TextSentencer_T233 31649-31755 Sentence denotes The symptoms of the individual patients were not recorded due to the study design with anonymized samples.
TextSentencer_T233 31649-31755 Sentence denotes The symptoms of the individual patients were not recorded due to the study design with anonymized samples.
TextSentencer_T234 31756-31884 Sentence denotes However, the general policy of the children's hospital is to sample only inpatients and not outpatients for respiratory viruses.
TextSentencer_T234 31756-31884 Sentence denotes However, the general policy of the children's hospital is to sample only inpatients and not outpatients for respiratory viruses.
TextSentencer_T235 31885-32024 Sentence denotes It is therefore reasonable to assume that the majority of samples are from patients with symptoms severe enough to require hospitalization.
TextSentencer_T235 31885-32024 Sentence denotes It is therefore reasonable to assume that the majority of samples are from patients with symptoms severe enough to require hospitalization.
TextSentencer_T236 32025-32148 Sentence denotes The most common symptoms reported in children hospitalized for respiratory tract infections are fever, cough, and wheezing.
TextSentencer_T236 32025-32148 Sentence denotes The most common symptoms reported in children hospitalized for respiratory tract infections are fever, cough, and wheezing.
TextSentencer_T237 32149-32208 Sentence denotes The project is based on analysis of human clinical samples.
TextSentencer_T237 32149-32208 Sentence denotes The project is based on analysis of human clinical samples.
TextSentencer_T238 32209-32323 Sentence denotes In order to avoid ethical complications, samples were anonymized and cannot be traced back to individual patients.
TextSentencer_T238 32209-32323 Sentence denotes In order to avoid ethical complications, samples were anonymized and cannot be traced back to individual patients.
TextSentencer_T239 32324-32485 Sentence denotes The study was approved by the local ethics committee at the Karolinska Institute, The Regional Ethical Review Board, Stockholm, Dnr 02-212, 02-422, and 04-836/4.
TextSentencer_T239 32324-32485 Sentence denotes The study was approved by the local ethics committee at the Karolinska Institute, The Regional Ethical Review Board, Stockholm, Dnr 02-212, 02-422, and 04-836/4.
TextSentencer_T240 32486-32613 Sentence denotes Since the samples are completely anonymous, the ethical board determined that no informed consent from the patients was needed.
TextSentencer_T240 32486-32613 Sentence denotes Since the samples are completely anonymous, the ethical board determined that no informed consent from the patients was needed.
TextSentencer_T241 32614-32726 Sentence denotes The samples were processed in 13 pools with 8-24 samples each, using our previously published protocol [5, 16] .
TextSentencer_T241 32614-32726 Sentence denotes The samples were processed in 13 pools with 8-24 samples each, using our previously published protocol [5, 16] .
TextSentencer_T242 32727-32910 Sentence denotes In brief, samples were pooled and each pool was divided into two aliquots, which were filtered through 0.22-and 0.45-mm-pore-size disc filters (Millex GV/HV; Millipore), respectively.
TextSentencer_T242 32727-32910 Sentence denotes In brief, samples were pooled and each pool was divided into two aliquots, which were filtered through 0.22-and 0.45-mm-pore-size disc filters (Millex GV/HV; Millipore), respectively.
TextSentencer_T243 32911-32999 Sentence denotes Both aliquots were ultracentrifuged at 41,000 rpm in an SW41 rotor (Beckman) for 90 min.
TextSentencer_T243 32911-32999 Sentence denotes Both aliquots were ultracentrifuged at 41,000 rpm in an SW41 rotor (Beckman) for 90 min.
TextSentencer_T244 33000-33112 Sentence denotes The resulting pellet was recovered, resuspended, and treated with DNase before DNA and RNA were extracted [36] .
TextSentencer_T244 33000-33112 Sentence denotes The resulting pellet was recovered, resuspended, and treated with DNase before DNA and RNA were extracted [36] .
TextSentencer_T245 33113-33188 Sentence denotes Extracted DNA and RNA were amplified separately by ''random PCR'' [5, 37] .
TextSentencer_T245 33113-33188 Sentence denotes Extracted DNA and RNA were amplified separately by ''random PCR'' [5, 37] .
TextSentencer_T246 33189-33477 Sentence denotes The primer sequences were then removed from the amplification products by restriction enzyme digest before the amplification products were separated on an agarose gel, and fragments between approximately 400 and 1,500 bp in length were cut out and purified for use as sequencing template.
TextSentencer_T246 33189-33477 Sentence denotes The primer sequences were then removed from the amplification products by restriction enzyme digest before the amplification products were separated on an agarose gel, and fragments between approximately 400 and 1,500 bp in length were cut out and purified for use as sequencing template.
TextSentencer_T247 33478-33666 Sentence denotes This resulted in a total of 13 DNA libraries and 13 cDNA libraries, which were in turn pooled to two libraries representing the DNA content and the RNA content of the sample, respectively.
TextSentencer_T247 33478-33666 Sentence denotes This resulted in a total of 13 DNA libraries and 13 cDNA libraries, which were in turn pooled to two libraries representing the DNA content and the RNA content of the sample, respectively.
TextSentencer_T248 33667-33768 Sentence denotes The DNA and RNA-derived libraries were sequenced separately, using the 454 sequencing platform [35] .
TextSentencer_T248 33667-33768 Sentence denotes The DNA and RNA-derived libraries were sequenced separately, using the 454 sequencing platform [35] .
TextSentencer_T249 33769-33907 Sentence denotes The first sequencing run was performed on a GS20 instrument and the second sequencing run was performed on the enhanced GS FLX instrument.
TextSentencer_T249 33769-33907 Sentence denotes The first sequencing run was performed on a GS20 instrument and the second sequencing run was performed on the enhanced GS FLX instrument.
TextSentencer_T250 33908-33969 Sentence denotes Both runs were of two plates each, one DNA and one RNA plate.
TextSentencer_T250 33908-33969 Sentence denotes Both runs were of two plates each, one DNA and one RNA plate.
TextSentencer_T251 33970-34055 Sentence denotes The sequencing produced in total 703,790 reads, the runs are summarized in Table S1 .
TextSentencer_T251 33970-34055 Sentence denotes The sequencing produced in total 703,790 reads, the runs are summarized in Table S1 .
TextSentencer_T252 34056-34163 Sentence denotes This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AFAP00000000.
TextSentencer_T252 34056-34163 Sentence denotes This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AFAP00000000.
TextSentencer_T253 34164-34235 Sentence denotes The version described in this paper is the first version, AFAP01000000.
TextSentencer_T253 34164-34235 Sentence denotes The version described in this paper is the first version, AFAP01000000.
TextSentencer_T254 34236-34336 Sentence denotes In addition, two new Rhinovirus C variants have been deposited, accession numbers JF436925-JF436926.
TextSentencer_T254 34236-34336 Sentence denotes In addition, two new Rhinovirus C variants have been deposited, accession numbers JF436925-JF436926.
TextSentencer_T255 34337-34435 Sentence denotes Initial studies of the samples suggested that the RNA and DNA sequencing runs shared some content.
TextSentencer_T255 34337-34435 Sentence denotes Initial studies of the samples suggested that the RNA and DNA sequencing runs shared some content.
TextSentencer_T256 34436-34582 Sentence denotes Due to this overlap in sample content it was beneficial to analyze the 454 sequencing runs together, rather than as separate RNA and DNA analyzes.
TextSentencer_T256 34436-34582 Sentence denotes Due to this overlap in sample content it was beneficial to analyze the 454 sequencing runs together, rather than as separate RNA and DNA analyzes.
TextSentencer_T257 34583-34747 Sentence denotes By doing so, longer contigs could be formed by combining the reads from the two pools but it was still possible to deduce the origin of each read from the assembly.
TextSentencer_T257 34583-34747 Sentence denotes By doing so, longer contigs could be formed by combining the reads from the two pools but it was still possible to deduce the origin of each read from the assembly.
TextSentencer_T258 34748-34843 Sentence denotes For example, each assembled contig was tagged with origin such as DNA/RNA or a mixture of both.
TextSentencer_T258 34748-34843 Sentence denotes For example, each assembled contig was tagged with origin such as DNA/RNA or a mixture of both.
TextSentencer_T259 34844-34917 Sentence denotes These in silico steps are shown in Figure 1 as 'Step 3' through 'Step 6'.
TextSentencer_T259 34844-34917 Sentence denotes These in silico steps are shown in Figure 1 as 'Step 3' through 'Step 6'.
TextSentencer_T260 34918-34978 Sentence denotes Pre-assembly screening for low complexity and human content.
TextSentencer_T260 34918-34978 Sentence denotes Pre-assembly screening for low complexity and human content.
TextSentencer_T261 34979-35065 Sentence denotes The pre-assembly screening was performed using RepeatMasker [38] and NCBI BLAST [14] .
TextSentencer_T261 34979-35065 Sentence denotes The pre-assembly screening was performed using RepeatMasker [38] and NCBI BLAST [14] .
TextSentencer_T262 35066-35198 Sentence denotes The screening process was performed in three steps where at each step any read which did not fulfill a pass criterion was discarded.
TextSentencer_T262 35066-35198 Sentence denotes The screening process was performed in three steps where at each step any read which did not fulfill a pass criterion was discarded.
TextSentencer_T263 35199-35318 Sentence denotes The first step involved running the reads through RepeatMasker to produce masked FASTA files used for further analyzes.
TextSentencer_T263 35199-35318 Sentence denotes The first step involved running the reads through RepeatMasker to produce masked FASTA files used for further analyzes.
TextSentencer_T264 35319-35492 Sentence denotes Reads were discarded if more than 70% of the nucleotides were masked, or if highest scoring stretch (given a 1/-5 model for non-N and N respectively) was shorter than 50 bp.
TextSentencer_T264 35319-35492 Sentence denotes Reads were discarded if more than 70% of the nucleotides were masked, or if highest scoring stretch (given a 1/-5 model for non-N and N respectively) was shorter than 50 bp.
TextSentencer_T265 35493-35605 Sentence denotes The 50 bp threshold was chosen as a cut-off as shorter sequences were both rare and provided little information.
TextSentencer_T265 35493-35605 Sentence denotes The 50 bp threshold was chosen as a cut-off as shorter sequences were both rare and provided little information.
TextSentencer_T266 35606-35748 Sentence denotes The thresholds for the repetitive classification model were determined empirically with the goal of purging only heavily repetitive sequences.
TextSentencer_T266 35606-35748 Sentence denotes The thresholds for the repetitive classification model were determined empirically with the goal of purging only heavily repetitive sequences.
TextSentencer_T267 35749-35935 Sentence denotes The next two steps of the screening process were performed using NCBI BLAST; searching against first the NCBI databases Human Genome Transcripts and then the Human Genome database [13] .
TextSentencer_T267 35749-35935 Sentence denotes The next two steps of the screening process were performed using NCBI BLAST; searching against first the NCBI databases Human Genome Transcripts and then the Human Genome database [13] .
TextSentencer_T268 35936-36082 Sentence denotes At each consecutive step further reads were discarded if a homolog was found at or above 90% identity covering at least 80% of the query sequence.
TextSentencer_T268 35936-36082 Sentence denotes At each consecutive step further reads were discarded if a homolog was found at or above 90% identity covering at least 80% of the query sequence.
TextSentencer_T269 36083-36286 Sentence denotes These thresholds were set after manual inspection of the sequence identity and coverage range, with the goal of removing reads that would after a full pipeline run still be classified as of human origin.
TextSentencer_T269 36083-36286 Sentence denotes These thresholds were set after manual inspection of the sequence identity and coverage range, with the goal of removing reads that would after a full pipeline run still be classified as of human origin.
TextSentencer_T270 36287-36311 Sentence denotes De novo genome assembly.
TextSentencer_T270 36287-36311 Sentence denotes De novo genome assembly.
TextSentencer_T271 36312-36435 Sentence denotes Sequence assembly was performed using the MIRA 3.0.5 software [12] with the parameters '-job = denovo,genome,accurate,454'.
TextSentencer_T271 36312-36435 Sentence denotes Sequence assembly was performed using the MIRA 3.0.5 software [12] with the parameters '-job = denovo,genome,accurate,454'.
TextSentencer_T272 36436-36533 Sentence denotes The resulting ACE-file was further analyzed and contig statistics were extracted for each contig.
TextSentencer_T272 36436-36533 Sentence denotes The resulting ACE-file was further analyzed and contig statistics were extracted for each contig.
TextSentencer_T273 36534-36671 Sentence denotes The following information was extracted: number of reads, contig coverage (min, max and mean) as well as sample origin (DNA or RNA pool).
TextSentencer_T273 36534-36671 Sentence denotes The following information was extracted: number of reads, contig coverage (min, max and mean) as well as sample origin (DNA or RNA pool).
TextSentencer_T274 36672-36793 Sentence denotes The final result of this process was regular FASTA output with all extracted information added to each FASTA header line.
TextSentencer_T274 36672-36793 Sentence denotes The final result of this process was regular FASTA output with all extracted information added to each FASTA header line.
TextSentencer_T275 36794-36811 Sentence denotes Homolog searches.
TextSentencer_T275 36794-36811 Sentence denotes Homolog searches.
TextSentencer_T276 36812-37008 Sentence denotes The homolog searches conceptually consisted of searching against both a nucleotide and protein database, translating the nucleotide sequences in all six possible frames, see 'Step 5' of Figure 1 .
TextSentencer_T276 36812-37008 Sentence denotes The homolog searches conceptually consisted of searching against both a nucleotide and protein database, translating the nucleotide sequences in all six possible frames, see 'Step 5' of Figure 1 .
TextSentencer_T277 37009-37190 Sentence denotes The similarity search was partitioned into three levels of homology-search, where at each level; any sequence that could be reliably classified was removed from downstream analyzes.
TextSentencer_T277 37009-37190 Sentence denotes The similarity search was partitioned into three levels of homology-search, where at each level; any sequence that could be reliably classified was removed from downstream analyzes.
TextSentencer_T278 37191-37436 Sentence denotes For nucleotide homology search, the NCBI nt (minimally non-redundant nucleotide) database (Jan 2010) [13] was used and for the six-frame translated nucleotide homology search the NCBI nr (non-redundant protein) database (Jan 2010) [13] was used.
TextSentencer_T278 37191-37436 Sentence denotes For nucleotide homology search, the NCBI nt (minimally non-redundant nucleotide) database (Jan 2010) [13] was used and for the six-frame translated nucleotide homology search the NCBI nr (non-redundant protein) database (Jan 2010) [13] was used.
TextSentencer_T279 37437-37677 Sentence denotes At the first level, the search pipeline consisted of a nucleotide BLAST search using the MegaBLAST algorithm [39] with parameter settings optimized for finding homology around 90% identity (2/-3 reward/penalty and 5/2 gap open/extend cost).
TextSentencer_T279 37437-37677 Sentence denotes At the first level, the search pipeline consisted of a nucleotide BLAST search using the MegaBLAST algorithm [39] with parameter settings optimized for finding homology around 90% identity (2/-3 reward/penalty and 5/2 gap open/extend cost).
TextSentencer_T280 37678-37840 Sentence denotes Sequences for which a homolog were found with at least 90% identity covering 70% or more of the query sequence, were dismissed from further search, at each level.
TextSentencer_T280 37678-37840 Sentence denotes Sequences for which a homolog were found with at least 90% identity covering 70% or more of the query sequence, were dismissed from further search, at each level.
TextSentencer_T281 37841-38045 Sentence denotes The second level of homology searches was performed using the BLASTn algorithm with parameter settings optimized for finding homologs of lower identity (4/-5 reward/penalty and 12/8 gap open/extend cost).
TextSentencer_T281 37841-38045 Sentence denotes The second level of homology searches was performed using the BLASTn algorithm with parameter settings optimized for finding homologs of lower identity (4/-5 reward/penalty and 12/8 gap open/extend cost).
TextSentencer_T282 38046-38134 Sentence denotes The final search step was performed using the BLASTx algorithm using default parameters.
TextSentencer_T282 38046-38134 Sentence denotes The final search step was performed using the BLASTx algorithm using default parameters.
TextSentencer_T283 38135-38279 Sentence denotes All BLAST searches were performed using the default e-value cut-off of 10, while more stringent thresholds were employed in downstream analysis.
TextSentencer_T283 38135-38279 Sentence denotes All BLAST searches were performed using the default e-value cut-off of 10, while more stringent thresholds were employed in downstream analysis.
TextSentencer_T284 38280-38404 Sentence denotes For each supplied query sequence, the BLAST program returns a pre-defined number (n) of likely homologs in descending order.
TextSentencer_T284 38280-38404 Sentence denotes For each supplied query sequence, the BLAST program returns a pre-defined number (n) of likely homologs in descending order.
TextSentencer_T285 38405-38526 Sentence denotes Thus, potentially interesting homologs that were not within these n most likely homologs were not included in the result.
TextSentencer_T285 38405-38526 Sentence denotes Thus, potentially interesting homologs that were not within these n most likely homologs were not included in the result.
TextSentencer_T286 38527-38699 Sentence denotes For example, a bacterial RNA-polymerase homolog might not be within these n most likely homologs if there were n better (often very similar) viral homologs in the database.
TextSentencer_T286 38527-38699 Sentence denotes For example, a bacterial RNA-polymerase homolog might not be within these n most likely homologs if there were n better (often very similar) viral homologs in the database.
TextSentencer_T287 38700-38936 Sentence denotes This complicates query analysis and to address this problem, each search step was partitioned into searches against four distinct subsets consisting of mammalian sequences, bacterial sequences, viral sequences and 'all other' sequences.
TextSentencer_T287 38700-38936 Sentence denotes This complicates query analysis and to address this problem, each search step was partitioned into searches against four distinct subsets consisting of mammalian sequences, bacterial sequences, viral sequences and 'all other' sequences.
TextSentencer_T288 38937-38990 Sentence denotes These four subsets will here be denoted 'categories'.
TextSentencer_T288 38937-38990 Sentence denotes These four subsets will here be denoted 'categories'.
TextSentencer_T289 38991-39158 Sentence denotes By dividing the database into these categories the highest scoring homologs of each group were identified, regardless of the score of homologs within other categories.
TextSentencer_T289 38991-39158 Sentence denotes By dividing the database into these categories the highest scoring homologs of each group were identified, regardless of the score of homologs within other categories.
TextSentencer_T290 39159-39297 Sentence denotes For each query, at each level of database searches, the top-three hits of each category (mammals, bacteria, viruses and others) were kept.
TextSentencer_T290 39159-39297 Sentence denotes For each query, at each level of database searches, the top-three hits of each category (mammals, bacteria, viruses and others) were kept.
TextSentencer_T291 39298-39429 Sentence denotes The hits were then ranked using bit-score instead of e-value, in order to avoid the database size bias introduced into the e-value.
TextSentencer_T291 39298-39429 Sentence denotes The hits were then ranked using bit-score instead of e-value, in order to avoid the database size bias introduced into the e-value.
TextSentencer_T292 39430-39463 Sentence denotes Comparative analysis of homologs.
TextSentencer_T292 39430-39463 Sentence denotes Comparative analysis of homologs.
TextSentencer_T293 39464-39575 Sentence denotes Chimeric reads may result from unspecific PCR reactions and such reads may in turn cause mis-assembled contigs.
TextSentencer_T293 39464-39575 Sentence denotes Chimeric reads may result from unspecific PCR reactions and such reads may in turn cause mis-assembled contigs.
TextSentencer_T294 39576-39722 Sentence denotes Therefore, upon completion of the second level of databases search (thorough search against NT using BLASTn) the hits of each query were analyzed.
TextSentencer_T294 39576-39722 Sentence denotes Therefore, upon completion of the second level of databases search (thorough search against NT using BLASTn) the hits of each query were analyzed.
TextSentencer_T295 39723-39910 Sentence denotes A list of equal length to the query was created for each category (mammals, bacteria, viruses and others) where the highest bit-score covering each position was noted, called a score map.
TextSentencer_T295 39723-39910 Sentence denotes A list of equal length to the query was created for each category (mammals, bacteria, viruses and others) where the highest bit-score covering each position was noted, called a score map.
TextSentencer_T296 39911-40081 Sentence denotes These score maps for each category were analyzed to allow splitting a sequence if two or more parts of the sequence had their closest homolog within different categories.
TextSentencer_T296 39911-40081 Sentence denotes These score maps for each category were analyzed to allow splitting a sequence if two or more parts of the sequence had their closest homolog within different categories.
TextSentencer_T297 40082-40126 Sentence denotes The algorithm consisted of four major steps:
TextSentencer_T297 40082-40126 Sentence denotes The algorithm consisted of four major steps:
TextSentencer_T298 40127-40129 Sentence denotes 1.
TextSentencer_T298 40127-40129 Sentence denotes 1.
TextSentencer_T299 40130-40352 Sentence denotes First, parts of the query sequence were identified as a continuous sub-sequence of the query, longer than 50 bp, for which the highest scoring (bit-score) homolog in each position were part of the same particular category.
TextSentencer_T299 40130-40352 Sentence denotes First, parts of the query sequence were identified as a continuous sub-sequence of the query, longer than 50 bp, for which the highest scoring (bit-score) homolog in each position were part of the same particular category.
TextSentencer_T300 40353-40355 Sentence denotes 2.
TextSentencer_T300 40353-40355 Sentence denotes 2.
TextSentencer_T301 40356-40536 Sentence denotes These parts were then analyzed and if any two parts were found matching different categories; a potential split (in between the relevant two 'parts') of the sequence was evaluated.
TextSentencer_T301 40356-40536 Sentence denotes These parts were then analyzed and if any two parts were found matching different categories; a potential split (in between the relevant two 'parts') of the sequence was evaluated.
TextSentencer_T302 40537-40539 Sentence denotes 3.
TextSentencer_T302 40537-40539 Sentence denotes 3.
TextSentencer_T303 40540-40832 Sentence denotes The sequence was split into sub-sequences if, for both subsequences, the bit-score of the relevant category was on average at least 2 times as high as the bit-score of any other category, covering the same sub-sequence, i.e. if the mean bit-score ratios for both subsequences were at least 2.
TextSentencer_T303 40540-40832 Sentence denotes The sequence was split into sub-sequences if, for both subsequences, the bit-score of the relevant category was on average at least 2 times as high as the bit-score of any other category, covering the same sub-sequence, i.e. if the mean bit-score ratios for both subsequences were at least 2.
TextSentencer_T304 40833-40835 Sentence denotes 4.
TextSentencer_T304 40833-40835 Sentence denotes 4.
TextSentencer_T305 40836-40980 Sentence denotes Finally, the resulting sub-sequences were re-queried against the database and the sub-sequences replaced the original sequence in the query-set.
TextSentencer_T305 40836-40980 Sentence denotes Finally, the resulting sub-sequences were re-queried against the database and the sub-sequences replaced the original sequence in the query-set.
TextSentencer_T306 40981-41239 Sentence denotes In order to provide a more stable range for these bit-score ratios, calculated in step 3 above, as well as avoiding low scoring alignments from splitting sequences; a maximum ratio of 10.0 as well as a minimum bit-score of 20.0 was allowed for each position.
TextSentencer_T306 40981-41239 Sentence denotes In order to provide a more stable range for these bit-score ratios, calculated in step 3 above, as well as avoiding low scoring alignments from splitting sequences; a maximum ratio of 10.0 as well as a minimum bit-score of 20.0 was allowed for each position.
TextSentencer_T307 41240-41365 Sentence denotes Thus, the top scoring alignment bit-score for a category n at position p (s n,p ) is normalized to S n,p = max(s n,p , 20.0).
TextSentencer_T307 41240-41365 Sentence denotes Thus, the top scoring alignment bit-score for a category n at position p (s n,p ) is normalized to S n,p = max(s n,p , 20.0).
TextSentencer_T308 41366-41504 Sentence denotes Furthermore, the bit-score ratio for category n at position p, r n,p = S n,p /max k?n (S k,p ) is normalized to R n,p = min(r n,p , 10.0).
TextSentencer_T308 41366-41504 Sentence denotes Furthermore, the bit-score ratio for category n at position p, r n,p = S n,p /max k?n (S k,p ) is normalized to R n,p = min(r n,p , 10.0).
TextSentencer_T309 41505-41683 Sentence denotes These thresholds were set after manual classification of 15 chimeric sequences so that the pipeline would split all these (using these thresholds identified 90 contigs in total).
TextSentencer_T309 41505-41683 Sentence denotes These thresholds were set after manual classification of 15 chimeric sequences so that the pipeline would split all these (using these thresholds identified 90 contigs in total).
TextSentencer_T310 41684-41794 Sentence denotes After completion of all three levels of homology searches the result-set was compiled into a final result-set.
TextSentencer_T310 41684-41794 Sentence denotes After completion of all three levels of homology searches the result-set was compiled into a final result-set.
TextSentencer_T311 41795-41903 Sentence denotes Each sequence was assigned to the category of the highest scoring (bit-score) hit from homology searches if:
TextSentencer_T311 41795-41903 Sentence denotes Each sequence was assigned to the category of the highest scoring (bit-score) hit from homology searches if:
TextSentencer_T312 41904-41906 Sentence denotes 1.
TextSentencer_T312 41904-41906 Sentence denotes 1.
TextSentencer_T313 41907-41960 Sentence denotes The e-value of the hit was below or equal to 10 23 2.
TextSentencer_T313 41907-41960 Sentence denotes The e-value of the hit was below or equal to 10 23 2.
TextSentencer_T314 41961-42096 Sentence denotes The mean position-wise bit-score ratio between the suggested category and other categories, over the entire sequence was at least 1.15.
TextSentencer_T314 41961-42096 Sentence denotes The mean position-wise bit-score ratio between the suggested category and other categories, over the entire sequence was at least 1.15.
TextSentencer_T315 42097-42099 Sentence denotes 3.
TextSentencer_T315 42097-42099 Sentence denotes 3.
TextSentencer_T316 42100-42333 Sentence denotes If the bit-score of the top-scoring hit was at least 33% higher than the second highest scoring hit, the mean position-wise bitscore ratio was ignored thus introducing a top-scoring hit 'veto' which would cancel the 2 nd requirement.
TextSentencer_T316 42100-42333 Sentence denotes If the bit-score of the top-scoring hit was at least 33% higher than the second highest scoring hit, the mean position-wise bitscore ratio was ignored thus introducing a top-scoring hit 'veto' which would cancel the 2 nd requirement.
TextSentencer_T317 42334-42594 Sentence denotes If these conditions were not met, the particular sequence was instead assigned to the 'undefined' category, thus containing sequences for which no close homolog was found or where homologs of seemingly indistinguishable importance suggest different categories.
TextSentencer_T317 42334-42594 Sentence denotes If these conditions were not met, the particular sequence was instead assigned to the 'undefined' category, thus containing sequences for which no close homolog was found or where homologs of seemingly indistinguishable importance suggest different categories.
TextSentencer_T318 42595-42736 Sentence denotes The e-value threshold of 10 23 was used as in previous studies [8] while the thresholds for conflicting homology were determined empirically.
TextSentencer_T318 42595-42736 Sentence denotes The e-value threshold of 10 23 was used as in previous studies [8] while the thresholds for conflicting homology were determined empirically.
TextSentencer_T319 42737-42898 Sentence denotes Finally, through the use of the NCBI taxonomy database [13] each sequence could be further partitioned at any taxonomic level by considering the closest homolog.
TextSentencer_T319 42737-42898 Sentence denotes Finally, through the use of the NCBI taxonomy database [13] each sequence could be further partitioned at any taxonomic level by considering the closest homolog.
TextSentencer_T320 42899-43036 Sentence denotes All of the categories were first automatically split according to NCBI taxonomy division [13] , for example as in Table S2 and Table S3 .
TextSentencer_T320 42899-43036 Sentence denotes All of the categories were first automatically split according to NCBI taxonomy division [13] , for example as in Table S2 and Table S3 .
TextSentencer_T321 43037-43169 Sentence denotes To gain a more granular grouping; specific parts of the results were further partitioned, for example as in Figure 4 and Figure S1 .
TextSentencer_T321 43037-43169 Sentence denotes To gain a more granular grouping; specific parts of the results were further partitioned, for example as in Figure 4 and Figure S1 .
TextSentencer_T322 43170-43223 Sentence denotes Phylogenetic analysis of novel HRV-C genome sequence.
TextSentencer_T322 43170-43223 Sentence denotes Phylogenetic analysis of novel HRV-C genome sequence.
TextSentencer_T323 43224-43321 Sentence denotes Multiple sequence alignments and the phylogenetic tree were prepared using ClustalX 2.0.12 [40] .
TextSentencer_T323 43224-43321 Sentence denotes Multiple sequence alignments and the phylogenetic tree were prepared using ClustalX 2.0.12 [40] .
TextSentencer_T324 43322-43410 Sentence denotes Genetic distances were calculated using the Kimura-2 parameter and a Ts/Tv ratio of 2.0.
TextSentencer_T324 43322-43410 Sentence denotes Genetic distances were calculated using the Kimura-2 parameter and a Ts/Tv ratio of 2.0.
TextSentencer_T325 43411-43532 Sentence denotes The phylogenetic tree was constructed using the neighborjoining method and evaluated by 1,000 bootstrap pseudoreplicates.
TextSentencer_T325 43411-43532 Sentence denotes The phylogenetic tree was constructed using the neighborjoining method and evaluated by 1,000 bootstrap pseudoreplicates.
TextSentencer_T326 43533-43576 Sentence denotes The tree was ploted using NJplot 2.3 [41] .
TextSentencer_T326 43533-43576 Sentence denotes The tree was ploted using NJplot 2.3 [41] .
TextSentencer_T327 43577-43626 Sentence denotes Figure S1 Species of the Orthomyxoviridae family.
TextSentencer_T327 43577-43626 Sentence denotes Figure S1 Species of the Orthomyxoviridae family.
TextSentencer_T328 43627-43751 Sentence denotes The Orthomyxoviridae homolog sequences split by species (manually curated, only alignments with an e-value,1e-5 considered).
TextSentencer_T328 43627-43751 Sentence denotes The Orthomyxoviridae homolog sequences split by species (manually curated, only alignments with an e-value,1e-5 considered).
TextSentencer_T329 43752-43803 Sentence denotes The numbers are the derived number of reads. (TIFF)
TextSentencer_T329 43752-43803 Sentence denotes The numbers are the derived number of reads. (TIFF)
TextSentencer_T330 43804-43847 Sentence denotes Table S1 Sequencing and screen information.
TextSentencer_T330 43804-43847 Sentence denotes Table S1 Sequencing and screen information.
TextSentencer_T331 43848-44033 Sentence denotes Sequencing runs performed using both the GS20 and the GS FLX 454 sequencing instrument as well as screening efforts removing repetitive sequences and/or sequences of human origin. (DOC)
TextSentencer_T331 43848-44033 Sentence denotes Sequencing runs performed using both the GS20 and the GS FLX 454 sequencing instrument as well as screening efforts removing repetitive sequences and/or sequences of human origin. (DOC)
TextSentencer_T332 44034-44064 Sentence denotes Table S2 Assembly information.
TextSentencer_T332 44034-44064 Sentence denotes Table S2 Assembly information.
TextSentencer_T333 44065-44180 Sentence denotes The number or reads, sequences, longest sequences and total bases for assembled sequences (contigs) and singletons.
TextSentencer_T333 44065-44180 Sentence denotes The number or reads, sequences, longest sequences and total bases for assembled sequences (contigs) and singletons.
TextSentencer_T334 44181-44387 Sentence denotes Note that 10,951 of the sequenced reads did not form contigs and were too short to be included as singletons for further analysis (the exclusion process is described further in Materials and Methods). (DOC)
TextSentencer_T334 44181-44387 Sentence denotes Note that 10,951 of the sequenced reads did not form contigs and were too short to be included as singletons for further analysis (the exclusion process is described further in Materials and Methods). (DOC)