> top > docs > PMC:7033720 > spans > 5249-6735

PMC:7033720 / 5249-6735 JSON TXT

Annnotations TAB JSON ListView MergeView

LitCovid-PubTator

TextAE

TextAE

Pathogen discovery and characterization To identify potential pathogens from the mNGS sequencing results, a pathogen discovery pipeline was carried out on sequenced data. Briefly, reads containing adaptor sequences and low-complex regions were removed from the dataset. Human reads were also removed by mapping against the reference human genome. All non-human and non-repeat sequence reads were then compared to a reference virus database (downloaded from https://ftp.ncbi.nih.gov/blast/db/ref_viruses_rep_genomes.tar.gz) and the non-redundant protein database (nr) using blastn and diamond blastx programs [4], respectively. Taxonomy lineage information was obtained for each blast hits by matching the accession number with the taxonomy database, which was subsequently used to identify reads of virus origin. Bacterial pathogen identification was carried out by using the Metaphlan2 program [5]. Reads were also assembled de novo using Megahit [6], with the virus genome identified based on the blast procedure described above. To validate the assembled genome sequences, reads were subsequently mapped to the genomes and a majority consensus sequences were determined for each sample. Minor variation calling was performed after mapping using Genious software package, with a minimum coverage set to 20 and minimum variant frequency set to 0.05. In addition to mapping, the virus genomes were also confirmed with Sanger sequencing using primers designed based on the NGS sequences.

LitCovid-PD-FMA-UBERON

TextAE

TextAE

Pathogen discovery and characterization To identify potential pathogens from the mNGS sequencing results, a pathogen discovery pipeline was carried out on sequenced data. Briefly, reads containing adaptor sequences and low-complex regions were removed from the dataset. Human reads were also removed by mapping against the reference human genome. All non-human and non-repeat sequence reads were then compared to a reference virus database (downloaded from https://ftp.ncbi.nih.gov/blast/db/ref_viruses_rep_genomes.tar.gz) and the non-redundant protein database (nr) using blastn and diamond blastx programs [4], respectively. Taxonomy lineage information was obtained for each blast hits by matching the accession number with the taxonomy database, which was subsequently used to identify reads of virus origin. Bacterial pathogen identification was carried out by using the Metaphlan2 program [5]. Reads were also assembled de novo using Megahit [6], with the virus genome identified based on the blast procedure described above. To validate the assembled genome sequences, reads were subsequently mapped to the genomes and a majority consensus sequences were determined for each sample. Minor variation calling was performed after mapping using Genious software package, with a minimum coverage set to 20 and minimum variant frequency set to 0.05. In addition to mapping, the virus genomes were also confirmed with Sanger sequencing using primers designed based on the NGS sequences.

LitCovid-PD-CLO

TextAE

TextAE

Pathogen discovery and characterization To identify potential pathogens from the mNGS sequencing results, a pathogen discovery pipeline was carried out on sequenced data. Briefly, reads containing adaptor sequences and low-complex regions were removed from the dataset. Human reads were also removed by mapping against the reference human genome. All non-human and non-repeat sequence reads were then compared to a reference virus database (downloaded from https://ftp.ncbi.nih.gov/blast/db/ref_viruses_rep_genomes.tar.gz) and the non-redundant protein database (nr) using blastn and diamond blastx programs [4], respectively. Taxonomy lineage information was obtained for each blast hits by matching the accession number with the taxonomy database, which was subsequently used to identify reads of virus origin. Bacterial pathogen identification was carried out by using the Metaphlan2 program [5]. Reads were also assembled de novo using Megahit [6], with the virus genome identified based on the blast procedure described above. To validate the assembled genome sequences, reads were subsequently mapped to the genomes and a majority consensus sequences were determined for each sample. Minor variation calling was performed after mapping using Genious software package, with a minimum coverage set to 20 and minimum variant frequency set to 0.05. In addition to mapping, the virus genomes were also confirmed with Sanger sequencing using primers designed based on the NGS sequences.

LitCovid-PD-CHEBI

TextAE

TextAE

Pathogen discovery and characterization To identify potential pathogens from the mNGS sequencing results, a pathogen discovery pipeline was carried out on sequenced data. Briefly, reads containing adaptor sequences and low-complex regions were removed from the dataset. Human reads were also removed by mapping against the reference human genome. All non-human and non-repeat sequence reads were then compared to a reference virus database (downloaded from https://ftp.ncbi.nih.gov/blast/db/ref_viruses_rep_genomes.tar.gz) and the non-redundant protein database (nr) using blastn and diamond blastx programs [4], respectively. Taxonomy lineage information was obtained for each blast hits by matching the accession number with the taxonomy database, which was subsequently used to identify reads of virus origin. Bacterial pathogen identification was carried out by using the Metaphlan2 program [5]. Reads were also assembled de novo using Megahit [6], with the virus genome identified based on the blast procedure described above. To validate the assembled genome sequences, reads were subsequently mapped to the genomes and a majority consensus sequences were determined for each sample. Minor variation calling was performed after mapping using Genious software package, with a minimum coverage set to 20 and minimum variant frequency set to 0.05. In addition to mapping, the virus genomes were also confirmed with Sanger sequencing using primers designed based on the NGS sequences.

LitCovid-sentences

TextAE

TextAE

Pathogen discovery and characterization To identify potential pathogens from the mNGS sequencing results, a pathogen discovery pipeline was carried out on sequenced data. Briefly, reads containing adaptor sequences and low-complex regions were removed from the dataset. Human reads were also removed by mapping against the reference human genome. All non-human and non-repeat sequence reads were then compared to a reference virus database (downloaded from https://ftp.ncbi.nih.gov/blast/db/ref_viruses_rep_genomes.tar.gz) and the non-redundant protein database (nr) using blastn and diamond blastx programs [4], respectively. Taxonomy lineage information was obtained for each blast hits by matching the accession number with the taxonomy database, which was subsequently used to identify reads of virus origin. Bacterial pathogen identification was carried out by using the Metaphlan2 program [5]. Reads were also assembled de novo using Megahit [6], with the virus genome identified based on the blast procedure described above. To validate the assembled genome sequences, reads were subsequently mapped to the genomes and a majority consensus sequences were determined for each sample. Minor variation calling was performed after mapping using Genious software package, with a minimum coverage set to 20 and minimum variant frequency set to 0.05. In addition to mapping, the virus genomes were also confirmed with Sanger sequencing using primers designed based on the NGS sequences.

MyTest

TextAE

TextAE

Pathogen discovery and characterization To identify potential pathogens from the mNGS sequencing results, a pathogen discovery pipeline was carried out on sequenced data. Briefly, reads containing adaptor sequences and low-complex regions were removed from the dataset. Human reads were also removed by mapping against the reference human genome. All non-human and non-repeat sequence reads were then compared to a reference virus database (downloaded from https://ftp.ncbi.nih.gov/blast/db/ref_viruses_rep_genomes.tar.gz) and the non-redundant protein database (nr) using blastn and diamond blastx programs [4], respectively. Taxonomy lineage information was obtained for each blast hits by matching the accession number with the taxonomy database, which was subsequently used to identify reads of virus origin. Bacterial pathogen identification was carried out by using the Metaphlan2 program [5]. Reads were also assembled de novo using Megahit [6], with the virus genome identified based on the blast procedure described above. To validate the assembled genome sequences, reads were subsequently mapped to the genomes and a majority consensus sequences were determined for each sample. Minor variation calling was performed after mapping using Genious software package, with a minimum coverage set to 20 and minimum variant frequency set to 0.05. In addition to mapping, the virus genomes were also confirmed with Sanger sequencing using primers designed based on the NGS sequences.

2_test

TextAE

TextAE

Pathogen discovery and characterization To identify potential pathogens from the mNGS sequencing results, a pathogen discovery pipeline was carried out on sequenced data. Briefly, reads containing adaptor sequences and low-complex regions were removed from the dataset. Human reads were also removed by mapping against the reference human genome. All non-human and non-repeat sequence reads were then compared to a reference virus database (downloaded from https://ftp.ncbi.nih.gov/blast/db/ref_viruses_rep_genomes.tar.gz) and the non-redundant protein database (nr) using blastn and diamond blastx programs [4], respectively. Taxonomy lineage information was obtained for each blast hits by matching the accession number with the taxonomy database, which was subsequently used to identify reads of virus origin. Bacterial pathogen identification was carried out by using the Metaphlan2 program [5]. Reads were also assembled de novo using Megahit [6], with the virus genome identified based on the blast procedure described above. To validate the assembled genome sequences, reads were subsequently mapped to the genomes and a majority consensus sequences were determined for each sample. Minor variation calling was performed after mapping using Genious software package, with a minimum coverage set to 20 and minimum variant frequency set to 0.05. In addition to mapping, the virus genomes were also confirmed with Sanger sequencing using primers designed based on the NGS sequences.