PMC:7033720 / 5249-6735 JSONTXT 10 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T50 0-39 Sentence denotes Pathogen discovery and characterization
T51 40-170 Sentence denotes To identify potential pathogens from the mNGS sequencing results, a pathogen discovery pipeline was carried out on sequenced data.
T52 171-269 Sentence denotes Briefly, reads containing adaptor sequences and low-complex regions were removed from the dataset.
T53 270-346 Sentence denotes Human reads were also removed by mapping against the reference human genome.
T54 347-626 Sentence denotes All non-human and non-repeat sequence reads were then compared to a reference virus database (downloaded from https://ftp.ncbi.nih.gov/blast/db/ref_viruses_rep_genomes.tar.gz) and the non-redundant protein database (nr) using blastn and diamond blastx programs [4], respectively.
T55 627-812 Sentence denotes Taxonomy lineage information was obtained for each blast hits by matching the accession number with the taxonomy database, which was subsequently used to identify reads of virus origin.
T56 813-899 Sentence denotes Bacterial pathogen identification was carried out by using the Metaphlan2 program [5].
T57 900-1031 Sentence denotes Reads were also assembled de novo using Megahit [6], with the virus genome identified based on the blast procedure described above.
T58 1032-1189 Sentence denotes To validate the assembled genome sequences, reads were subsequently mapped to the genomes and a majority consensus sequences were determined for each sample.
T59 1190-1350 Sentence denotes Minor variation calling was performed after mapping using Genious software package, with a minimum coverage set to 20 and minimum variant frequency set to 0.05.
T60 1351-1486 Sentence denotes In addition to mapping, the virus genomes were also confirmed with Sanger sequencing using primers designed based on the NGS sequences.