PMC:7565482 / 7438-11920 JSONTXT

Annnotations TAB JSON ListView MergeView


    {"project":"LitCovid-PD-FMA-UBERON","denotations":[{"id":"T31","span":{"begin":346,"end":356},"obj":"Body_part"},{"id":"T32","span":{"begin":405,"end":416},"obj":"Body_part"},{"id":"T33","span":{"begin":480,"end":490},"obj":"Body_part"},{"id":"T34","span":{"begin":676,"end":686},"obj":"Body_part"},{"id":"T35","span":{"begin":819,"end":829},"obj":"Body_part"},{"id":"T36","span":{"begin":864,"end":873},"obj":"Body_part"},{"id":"T37","span":{"begin":1254,"end":1257},"obj":"Body_part"},{"id":"T38","span":{"begin":1477,"end":1488},"obj":"Body_part"},{"id":"T39","span":{"begin":1587,"end":1590},"obj":"Body_part"},{"id":"T40","span":{"begin":1918,"end":1929},"obj":"Body_part"},{"id":"T41","span":{"begin":2112,"end":2121},"obj":"Body_part"},{"id":"T42","span":{"begin":2187,"end":2196},"obj":"Body_part"},{"id":"T43","span":{"begin":2295,"end":2306},"obj":"Body_part"},{"id":"T44","span":{"begin":2587,"end":2594},"obj":"Body_part"},{"id":"T45","span":{"begin":3035,"end":3043},"obj":"Body_part"},{"id":"T46","span":{"begin":3149,"end":3157},"obj":"Body_part"},{"id":"T47","span":{"begin":3174,"end":3180},"obj":"Body_part"},{"id":"T48","span":{"begin":3405,"end":3412},"obj":"Body_part"},{"id":"T49","span":{"begin":3502,"end":3512},"obj":"Body_part"},{"id":"T50","span":{"begin":4179,"end":4182},"obj":"Body_part"},{"id":"T51","span":{"begin":4291,"end":4295},"obj":"Body_part"},{"id":"T52","span":{"begin":4340,"end":4347},"obj":"Body_part"},{"id":"T53","span":{"begin":4442,"end":4445},"obj":"Body_part"}],"attributes":[{"id":"A31","pred":"fma_id","subj":"T31","obj":""},{"id":"A32","pred":"fma_id","subj":"T32","obj":""},{"id":"A33","pred":"fma_id","subj":"T33","obj":""},{"id":"A34","pred":"fma_id","subj":"T34","obj":""},{"id":"A35","pred":"fma_id","subj":"T35","obj":""},{"id":"A36","pred":"fma_id","subj":"T36","obj":""},{"id":"A37","pred":"fma_id","subj":"T37","obj":""},{"id":"A38","pred":"fma_id","subj":"T38","obj":""},{"id":"A39","pred":"fma_id","subj":"T39","obj":""},{"id":"A40","pred":"fma_id","subj":"T40","obj":""},{"id":"A41","pred":"fma_id","subj":"T41","obj":""},{"id":"A42","pred":"fma_id","subj":"T42","obj":""},{"id":"A43","pred":"fma_id","subj":"T43","obj":""},{"id":"A44","pred":"fma_id","subj":"T44","obj":""},{"id":"A45","pred":"fma_id","subj":"T45","obj":""},{"id":"A46","pred":"fma_id","subj":"T46","obj":""},{"id":"A47","pred":"fma_id","subj":"T47","obj":""},{"id":"A48","pred":"fma_id","subj":"T48","obj":""},{"id":"A49","pred":"fma_id","subj":"T49","obj":""},{"id":"A50","pred":"fma_id","subj":"T50","obj":""},{"id":"A51","pred":"fma_id","subj":"T51","obj":""},{"id":"A52","pred":"fma_id","subj":"T52","obj":""},{"id":"A53","pred":"fma_id","subj":"T53","obj":""}],"text":"2. Methods\n\n2.1. Consensus Sequence ORF Generation and Entropy Calculation\nA total of 1731 full-length SARS-CoV-2 sequences were downloaded from NCBI (30 April 2020, txid2697049, minimum length = 29,000 bp) and aligned using MAFFT [44]. The alignment was visually inspected and curated using Genbank NC_045512.2 as a coordinate reference [45]. A nucleotide consensus sequence was generated by keeping all nucleotides present in at least 25% of the sequences in the alignment. The amino acid consensus sequence was then created by using NC_045512.2 annotated Open Reading Frames (ORFs) plus additional ORFs described in Finkel et al. [46] using the Biostrings R package. Mixed nucleotide positions were either resolved if they were synonymous or flagged for downstream analysis. Positional entropy was calculated at the amino acid level both as the standard and 22-aminoacid-normalized Shannon entropy for every ORF using Bio3d R package on the alignment [47], and afterward, the mean OLP normalized entropy was calculated.\n\n2.2. Overlapping Peptide Set Design and Variability Plots\nFor the automated design of overlapping peptides with variable length, we used the previously described Peptgen algorithm available at the Los Alamos National Laboratories HIV Immunology database [48]. This OLP generator allows predefining peptide length and level of the desired overlap between adjacent OLP. Peptgen is also set up to exclude from the C-terminal end of OLP certain “forbidden” amino acids (G, P, E, D, Q, N, T, S and C) that are rarely seen to serve as the C-terminal anchor position of HLA class I presented epitopes [49]. Using this optional modification can lead to length variation in the OLP set, which can be controlled by limiting the maximal length of an OLP in regions with numerous serial “forbidden” residues. The settings used for the present SARS-CoV-2 consensus OLP design were a) OLP length of 15 or 18 amino acids, with maximal extension or truncation of up to ±3 residues to avoid forbidden C-terminal residues. In addition, the overlap between adjacent OLP was set at 10 or 11 residues. The no-glutamine at N-terminal setting was applied to prevent OLP starting with a glutamine residue as this can lead to complications with peptide synthesis. For positions where two or more amino acids were present above 25% of the sequences in the alignment, two or more sequence variants for those OLPs were generated. Sequence logos were generated for these cases with the ggseqlogo R package [50].\n\n2.3. Detection of Conserved Peptides Among Coronavirus\nIn an attempt to detect protein fragments that are conserved across a wide range of members of the coronavirus family, full-length consensus ORF from SARS-CoV-2 were aligned with other coronavirus sequences. Three alignments were performed based on different sequence selection criteria: (i) 50 reference sequences (RefSeq) with the lowest E-values resulting from a pBLAST search [51] using the ORF-specific consensus sequences (pan-coronavirus alignment) (ii) homologous proteins from 17 viruses representing the Betacoronavirus taxon (beta-coronavirus alignment) or, (iii) homologous proteins from the 7 full-genome sequenced human coronaviruses (including SARS-CoV, MERS-CoV, and common cold species OC43, NL63, 229E, HKU1, human-coronavirus alignment). Selected sequences were aligned using the MUSCLE algorithm in MEGA X [52]. Conserved protein fragments were identified using BioEdit with the following criteria: minimum length of 8 amino acid, maximum average entropy of 0.25, maximum entropy per position of 1 and limiting the search to 1 gap per segment. Sequence logos were generated for the aligned peptides on Weblogo [53].\n\n2.4. Identification of Previously Described Epitopes in CoV-2 Conserved Regions\nTo identify previously reported epitopes in the conserved regions of coronaviruses (pan-coronavirus, betacoronaviruses, and human coronaviruses), and match them with the SARS-CoV-2 consensus sequence, searches for experimentally described epitopes were carried out in the Immune Epitope Database [54]. The search criteria were as follows: “linear peptide; blast option: 90%; Host: Homo sapiens; Any MHC restriction; Positive assays only; All assays; Any disease”. The search yielded 141 epitopes, of which 14 B-cell epitopes and 2 epitopes from a hypothetical protein were removed. The remaining identified epitopes were subsequently used to generate an epitope map of the respective conserved regions."}


    {"project":"LitCovid-PD-UBERON","denotations":[{"id":"T1","span":{"begin":1944,"end":1953},"obj":"Body_part"}],"attributes":[{"id":"A1","pred":"uberon_id","subj":"T1","obj":""}],"text":"2. Methods\n\n2.1. Consensus Sequence ORF Generation and Entropy Calculation\nA total of 1731 full-length SARS-CoV-2 sequences were downloaded from NCBI (30 April 2020, txid2697049, minimum length = 29,000 bp) and aligned using MAFFT [44]. The alignment was visually inspected and curated using Genbank NC_045512.2 as a coordinate reference [45]. A nucleotide consensus sequence was generated by keeping all nucleotides present in at least 25% of the sequences in the alignment. The amino acid consensus sequence was then created by using NC_045512.2 annotated Open Reading Frames (ORFs) plus additional ORFs described in Finkel et al. [46] using the Biostrings R package. Mixed nucleotide positions were either resolved if they were synonymous or flagged for downstream analysis. Positional entropy was calculated at the amino acid level both as the standard and 22-aminoacid-normalized Shannon entropy for every ORF using Bio3d R package on the alignment [47], and afterward, the mean OLP normalized entropy was calculated.\n\n2.2. Overlapping Peptide Set Design and Variability Plots\nFor the automated design of overlapping peptides with variable length, we used the previously described Peptgen algorithm available at the Los Alamos National Laboratories HIV Immunology database [48]. This OLP generator allows predefining peptide length and level of the desired overlap between adjacent OLP. Peptgen is also set up to exclude from the C-terminal end of OLP certain “forbidden” amino acids (G, P, E, D, Q, N, T, S and C) that are rarely seen to serve as the C-terminal anchor position of HLA class I presented epitopes [49]. Using this optional modification can lead to length variation in the OLP set, which can be controlled by limiting the maximal length of an OLP in regions with numerous serial “forbidden” residues. The settings used for the present SARS-CoV-2 consensus OLP design were a) OLP length of 15 or 18 amino acids, with maximal extension or truncation of up to ±3 residues to avoid forbidden C-terminal residues. In addition, the overlap between adjacent OLP was set at 10 or 11 residues. The no-glutamine at N-terminal setting was applied to prevent OLP starting with a glutamine residue as this can lead to complications with peptide synthesis. For positions where two or more amino acids were present above 25% of the sequences in the alignment, two or more sequence variants for those OLPs were generated. Sequence logos were generated for these cases with the ggseqlogo R package [50].\n\n2.3. Detection of Conserved Peptides Among Coronavirus\nIn an attempt to detect protein fragments that are conserved across a wide range of members of the coronavirus family, full-length consensus ORF from SARS-CoV-2 were aligned with other coronavirus sequences. Three alignments were performed based on different sequence selection criteria: (i) 50 reference sequences (RefSeq) with the lowest E-values resulting from a pBLAST search [51] using the ORF-specific consensus sequences (pan-coronavirus alignment) (ii) homologous proteins from 17 viruses representing the Betacoronavirus taxon (beta-coronavirus alignment) or, (iii) homologous proteins from the 7 full-genome sequenced human coronaviruses (including SARS-CoV, MERS-CoV, and common cold species OC43, NL63, 229E, HKU1, human-coronavirus alignment). Selected sequences were aligned using the MUSCLE algorithm in MEGA X [52]. Conserved protein fragments were identified using BioEdit with the following criteria: minimum length of 8 amino acid, maximum average entropy of 0.25, maximum entropy per position of 1 and limiting the search to 1 gap per segment. Sequence logos were generated for the aligned peptides on Weblogo [53].\n\n2.4. Identification of Previously Described Epitopes in CoV-2 Conserved Regions\nTo identify previously reported epitopes in the conserved regions of coronaviruses (pan-coronavirus, betacoronaviruses, and human coronaviruses), and match them with the SARS-CoV-2 consensus sequence, searches for experimentally described epitopes were carried out in the Immune Epitope Database [54]. The search criteria were as follows: “linear peptide; blast option: 90%; Host: Homo sapiens; Any MHC restriction; Positive assays only; All assays; Any disease”. The search yielded 141 epitopes, of which 14 B-cell epitopes and 2 epitopes from a hypothetical protein were removed. The remaining identified epitopes were subsequently used to generate an epitope map of the respective conserved regions."}


    {"project":"LitCovid-PD-MONDO","denotations":[{"id":"T34","span":{"begin":103,"end":111},"obj":"Disease"},{"id":"T35","span":{"begin":1855,"end":1863},"obj":"Disease"},{"id":"T36","span":{"begin":2713,"end":2721},"obj":"Disease"},{"id":"T37","span":{"begin":3222,"end":3230},"obj":"Disease"},{"id":"T38","span":{"begin":3246,"end":3257},"obj":"Disease"},{"id":"T39","span":{"begin":3950,"end":3958},"obj":"Disease"}],"attributes":[{"id":"A34","pred":"mondo_id","subj":"T34","obj":""},{"id":"A35","pred":"mondo_id","subj":"T35","obj":""},{"id":"A36","pred":"mondo_id","subj":"T36","obj":""},{"id":"A37","pred":"mondo_id","subj":"T37","obj":""},{"id":"A38","pred":"mondo_id","subj":"T38","obj":""},{"id":"A39","pred":"mondo_id","subj":"T39","obj":""}],"text":"2. Methods\n\n2.1. Consensus Sequence ORF Generation and Entropy Calculation\nA total of 1731 full-length SARS-CoV-2 sequences were downloaded from NCBI (30 April 2020, txid2697049, minimum length = 29,000 bp) and aligned using MAFFT [44]. The alignment was visually inspected and curated using Genbank NC_045512.2 as a coordinate reference [45]. A nucleotide consensus sequence was generated by keeping all nucleotides present in at least 25% of the sequences in the alignment. The amino acid consensus sequence was then created by using NC_045512.2 annotated Open Reading Frames (ORFs) plus additional ORFs described in Finkel et al. [46] using the Biostrings R package. Mixed nucleotide positions were either resolved if they were synonymous or flagged for downstream analysis. Positional entropy was calculated at the amino acid level both as the standard and 22-aminoacid-normalized Shannon entropy for every ORF using Bio3d R package on the alignment [47], and afterward, the mean OLP normalized entropy was calculated.\n\n2.2. Overlapping Peptide Set Design and Variability Plots\nFor the automated design of overlapping peptides with variable length, we used the previously described Peptgen algorithm available at the Los Alamos National Laboratories HIV Immunology database [48]. This OLP generator allows predefining peptide length and level of the desired overlap between adjacent OLP. Peptgen is also set up to exclude from the C-terminal end of OLP certain “forbidden” amino acids (G, P, E, D, Q, N, T, S and C) that are rarely seen to serve as the C-terminal anchor position of HLA class I presented epitopes [49]. Using this optional modification can lead to length variation in the OLP set, which can be controlled by limiting the maximal length of an OLP in regions with numerous serial “forbidden” residues. The settings used for the present SARS-CoV-2 consensus OLP design were a) OLP length of 15 or 18 amino acids, with maximal extension or truncation of up to ±3 residues to avoid forbidden C-terminal residues. In addition, the overlap between adjacent OLP was set at 10 or 11 residues. The no-glutamine at N-terminal setting was applied to prevent OLP starting with a glutamine residue as this can lead to complications with peptide synthesis. For positions where two or more amino acids were present above 25% of the sequences in the alignment, two or more sequence variants for those OLPs were generated. Sequence logos were generated for these cases with the ggseqlogo R package [50].\n\n2.3. Detection of Conserved Peptides Among Coronavirus\nIn an attempt to detect protein fragments that are conserved across a wide range of members of the coronavirus family, full-length consensus ORF from SARS-CoV-2 were aligned with other coronavirus sequences. Three alignments were performed based on different sequence selection criteria: (i) 50 reference sequences (RefSeq) with the lowest E-values resulting from a pBLAST search [51] using the ORF-specific consensus sequences (pan-coronavirus alignment) (ii) homologous proteins from 17 viruses representing the Betacoronavirus taxon (beta-coronavirus alignment) or, (iii) homologous proteins from the 7 full-genome sequenced human coronaviruses (including SARS-CoV, MERS-CoV, and common cold species OC43, NL63, 229E, HKU1, human-coronavirus alignment). Selected sequences were aligned using the MUSCLE algorithm in MEGA X [52]. Conserved protein fragments were identified using BioEdit with the following criteria: minimum length of 8 amino acid, maximum average entropy of 0.25, maximum entropy per position of 1 and limiting the search to 1 gap per segment. Sequence logos were generated for the aligned peptides on Weblogo [53].\n\n2.4. Identification of Previously Described Epitopes in CoV-2 Conserved Regions\nTo identify previously reported epitopes in the conserved regions of coronaviruses (pan-coronavirus, betacoronaviruses, and human coronaviruses), and match them with the SARS-CoV-2 consensus sequence, searches for experimentally described epitopes were carried out in the Immune Epitope Database [54]. The search criteria were as follows: “linear peptide; blast option: 90%; Host: Homo sapiens; Any MHC restriction; Positive assays only; All assays; Any disease”. The search yielded 141 epitopes, of which 14 B-cell epitopes and 2 epitopes from a hypothetical protein were removed. The remaining identified epitopes were subsequently used to generate an epitope map of the respective conserved regions."}


    {"project":"LitCovid-PD-CLO","denotations":[{"id":"T61","span":{"begin":75,"end":76},"obj":""},{"id":"T62","span":{"begin":315,"end":316},"obj":""},{"id":"T63","span":{"begin":339,"end":341},"obj":""},{"id":"T64","span":{"begin":344,"end":345},"obj":""},{"id":"T65","span":{"begin":861,"end":863},"obj":""},{"id":"T66","span":{"begin":1041,"end":1048},"obj":""},{"id":"T67","span":{"begin":1122,"end":1130},"obj":""},{"id":"T68","span":{"begin":1279,"end":1281},"obj":""},{"id":"T69","span":{"begin":1322,"end":1329},"obj":""},{"id":"T70","span":{"begin":1892,"end":1893},"obj":""},{"id":"T71","span":{"begin":1915,"end":1917},"obj":""},{"id":"T72","span":{"begin":2092,"end":2094},"obj":""},{"id":"T73","span":{"begin":2185,"end":2186},"obj":""},{"id":"T74","span":{"begin":2244,"end":2251},"obj":""},{"id":"T75","span":{"begin":2536,"end":2544},"obj":""},{"id":"T76","span":{"begin":2631,"end":2632},"obj":""},{"id":"T77","span":{"begin":2927,"end":2928},"obj":""},{"id":"T78","span":{"begin":2992,"end":2995},"obj":""},{"id":"T79","span":{"begin":3052,"end":3059},"obj":""},{"id":"T80","span":{"begin":3191,"end":3196},"obj":""},{"id":"T81","span":{"begin":3290,"end":3295},"obj":""},{"id":"T82","span":{"begin":3362,"end":3368},"obj":""},{"id":"T83","span":{"begin":3362,"end":3368},"obj":""},{"id":"T84","span":{"begin":3362,"end":3368},"obj":""},{"id":"T85","span":{"begin":3362,"end":3368},"obj":""},{"id":"T86","span":{"begin":3390,"end":3392},"obj":""},{"id":"T87","span":{"begin":3610,"end":3617},"obj":""},{"id":"T88","span":{"begin":3673,"end":3681},"obj":""},{"id":"T89","span":{"begin":3864,"end":3867},"obj":""},{"id":"T90","span":{"begin":3904,"end":3909},"obj":""},{"id":"T91","span":{"begin":4127,"end":4134},"obj":""},{"id":"T92","span":{"begin":4161,"end":4173},"obj":""},{"id":"T93","span":{"begin":4289,"end":4295},"obj":""},{"id":"T94","span":{"begin":4325,"end":4326},"obj":""}],"text":"2. Methods\n\n2.1. Consensus Sequence ORF Generation and Entropy Calculation\nA total of 1731 full-length SARS-CoV-2 sequences were downloaded from NCBI (30 April 2020, txid2697049, minimum length = 29,000 bp) and aligned using MAFFT [44]. The alignment was visually inspected and curated using Genbank NC_045512.2 as a coordinate reference [45]. A nucleotide consensus sequence was generated by keeping all nucleotides present in at least 25% of the sequences in the alignment. The amino acid consensus sequence was then created by using NC_045512.2 annotated Open Reading Frames (ORFs) plus additional ORFs described in Finkel et al. [46] using the Biostrings R package. Mixed nucleotide positions were either resolved if they were synonymous or flagged for downstream analysis. Positional entropy was calculated at the amino acid level both as the standard and 22-aminoacid-normalized Shannon entropy for every ORF using Bio3d R package on the alignment [47], and afterward, the mean OLP normalized entropy was calculated.\n\n2.2. Overlapping Peptide Set Design and Variability Plots\nFor the automated design of overlapping peptides with variable length, we used the previously described Peptgen algorithm available at the Los Alamos National Laboratories HIV Immunology database [48]. This OLP generator allows predefining peptide length and level of the desired overlap between adjacent OLP. Peptgen is also set up to exclude from the C-terminal end of OLP certain “forbidden” amino acids (G, P, E, D, Q, N, T, S and C) that are rarely seen to serve as the C-terminal anchor position of HLA class I presented epitopes [49]. Using this optional modification can lead to length variation in the OLP set, which can be controlled by limiting the maximal length of an OLP in regions with numerous serial “forbidden” residues. The settings used for the present SARS-CoV-2 consensus OLP design were a) OLP length of 15 or 18 amino acids, with maximal extension or truncation of up to ±3 residues to avoid forbidden C-terminal residues. In addition, the overlap between adjacent OLP was set at 10 or 11 residues. The no-glutamine at N-terminal setting was applied to prevent OLP starting with a glutamine residue as this can lead to complications with peptide synthesis. For positions where two or more amino acids were present above 25% of the sequences in the alignment, two or more sequence variants for those OLPs were generated. Sequence logos were generated for these cases with the ggseqlogo R package [50].\n\n2.3. Detection of Conserved Peptides Among Coronavirus\nIn an attempt to detect protein fragments that are conserved across a wide range of members of the coronavirus family, full-length consensus ORF from SARS-CoV-2 were aligned with other coronavirus sequences. Three alignments were performed based on different sequence selection criteria: (i) 50 reference sequences (RefSeq) with the lowest E-values resulting from a pBLAST search [51] using the ORF-specific consensus sequences (pan-coronavirus alignment) (ii) homologous proteins from 17 viruses representing the Betacoronavirus taxon (beta-coronavirus alignment) or, (iii) homologous proteins from the 7 full-genome sequenced human coronaviruses (including SARS-CoV, MERS-CoV, and common cold species OC43, NL63, 229E, HKU1, human-coronavirus alignment). Selected sequences were aligned using the MUSCLE algorithm in MEGA X [52]. Conserved protein fragments were identified using BioEdit with the following criteria: minimum length of 8 amino acid, maximum average entropy of 0.25, maximum entropy per position of 1 and limiting the search to 1 gap per segment. Sequence logos were generated for the aligned peptides on Weblogo [53].\n\n2.4. Identification of Previously Described Epitopes in CoV-2 Conserved Regions\nTo identify previously reported epitopes in the conserved regions of coronaviruses (pan-coronavirus, betacoronaviruses, and human coronaviruses), and match them with the SARS-CoV-2 consensus sequence, searches for experimentally described epitopes were carried out in the Immune Epitope Database [54]. The search criteria were as follows: “linear peptide; blast option: 90%; Host: Homo sapiens; Any MHC restriction; Positive assays only; All assays; Any disease”. The search yielded 141 epitopes, of which 14 B-cell epitopes and 2 epitopes from a hypothetical protein were removed. The remaining identified epitopes were subsequently used to generate an epitope map of the respective conserved regions."}


    {"project":"LitCovid-PubTator","denotations":[{"id":"97","span":{"begin":103,"end":113},"obj":"Species"},{"id":"101","span":{"begin":1855,"end":1865},"obj":"Species"},{"id":"102","span":{"begin":2112,"end":2121},"obj":"Chemical"},{"id":"103","span":{"begin":2187,"end":2196},"obj":"Chemical"},{"id":"118","span":{"begin":2662,"end":2673},"obj":"Species"},{"id":"119","span":{"begin":2713,"end":2723},"obj":"Species"},{"id":"120","span":{"begin":2748,"end":2759},"obj":"Species"},{"id":"121","span":{"begin":2996,"end":3007},"obj":"Species"},{"id":"122","span":{"begin":3077,"end":3092},"obj":"Species"},{"id":"123","span":{"begin":3100,"end":3116},"obj":"Species"},{"id":"124","span":{"begin":3191,"end":3196},"obj":"Species"},{"id":"125","span":{"begin":3197,"end":3210},"obj":"Species"},{"id":"126","span":{"begin":3222,"end":3230},"obj":"Species"},{"id":"127","span":{"begin":3232,"end":3240},"obj":"Species"},{"id":"128","span":{"begin":3290,"end":3307},"obj":"Species"},{"id":"129","span":{"begin":3278,"end":3282},"obj":"Species"},{"id":"130","span":{"begin":3253,"end":3257},"obj":"Disease"},{"id":"131","span":{"begin":3284,"end":3288},"obj":"CellLine"},{"id":"133","span":{"begin":3756,"end":3761},"obj":"Species"},{"id":"141","span":{"begin":3849,"end":3862},"obj":"Species"},{"id":"142","span":{"begin":3868,"end":3879},"obj":"Species"},{"id":"143","span":{"begin":3881,"end":3898},"obj":"Species"},{"id":"144","span":{"begin":3904,"end":3909},"obj":"Species"},{"id":"145","span":{"begin":3910,"end":3923},"obj":"Species"},{"id":"146","span":{"begin":3950,"end":3960},"obj":"Species"},{"id":"147","span":{"begin":4161,"end":4173},"obj":"Species"}],"attributes":[{"id":"A97","pred":"tao:has_database_id","subj":"97","obj":"Tax:2697049"},{"id":"A101","pred":"tao:has_database_id","subj":"101","obj":"Tax:2697049"},{"id":"A102","pred":"tao:has_database_id","subj":"102","obj":"MESH:D005973"},{"id":"A103","pred":"tao:has_database_id","subj":"103","obj":"MESH:D005973"},{"id":"A118","pred":"tao:has_database_id","subj":"118","obj":"Tax:11118"},{"id":"A119","pred":"tao:has_database_id","subj":"119","obj":"Tax:2697049"},{"id":"A120","pred":"tao:has_database_id","subj":"120","obj":"Tax:11118"},{"id":"A121","pred":"tao:has_database_id","subj":"121","obj":"Tax:11118"},{"id":"A122","pred":"tao:has_database_id","subj":"122","obj":"Tax:694002"},{"id":"A123","pred":"tao:has_database_id","subj":"123","obj":"Tax:694002"},{"id":"A124","pred":"tao:has_database_id","subj":"124","obj":"Tax:9606"},{"id":"A125","pred":"tao:has_database_id","subj":"125","obj":"Tax:11118"},{"id":"A126","pred":"tao:has_database_id","subj":"126","obj":"Tax:694009"},{"id":"A127","pred":"tao:has_database_id","subj":"127","obj":"Tax:1335626"},{"id":"A128","pred":"tao:has_database_id","subj":"128","obj":"Tax:694448"},{"id":"A129","pred":"tao:has_database_id","subj":"129","obj":"Tax:11137"},{"id":"A130","pred":"tao:has_database_id","subj":"130","obj":"MESH:D000067390"},{"id":"A131","pred":"tao:has_database_id","subj":"131","obj":"CVCL:B526"},{"id":"A133","pred":"tao:has_database_id","subj":"133","obj":"Tax:2697049"},{"id":"A141","pred":"tao:has_database_id","subj":"141","obj":"Tax:11118"},{"id":"A142","pred":"tao:has_database_id","subj":"142","obj":"Tax:11118"},{"id":"A143","pred":"tao:has_database_id","subj":"143","obj":"Tax:694002"},{"id":"A144","pred":"tao:has_database_id","subj":"144","obj":"Tax:9606"},{"id":"A145","pred":"tao:has_database_id","subj":"145","obj":"Tax:11118"},{"id":"A146","pred":"tao:has_database_id","subj":"146","obj":"Tax:2697049"},{"id":"A147","pred":"tao:has_database_id","subj":"147","obj":"Tax:9606"}],"namespaces":[{"prefix":"Tax","uri":""},{"prefix":"MESH","uri":""},{"prefix":"Gene","uri":""},{"prefix":"CVCL","uri":""}],"text":"2. Methods\n\n2.1. Consensus Sequence ORF Generation and Entropy Calculation\nA total of 1731 full-length SARS-CoV-2 sequences were downloaded from NCBI (30 April 2020, txid2697049, minimum length = 29,000 bp) and aligned using MAFFT [44]. The alignment was visually inspected and curated using Genbank NC_045512.2 as a coordinate reference [45]. A nucleotide consensus sequence was generated by keeping all nucleotides present in at least 25% of the sequences in the alignment. The amino acid consensus sequence was then created by using NC_045512.2 annotated Open Reading Frames (ORFs) plus additional ORFs described in Finkel et al. [46] using the Biostrings R package. Mixed nucleotide positions were either resolved if they were synonymous or flagged for downstream analysis. Positional entropy was calculated at the amino acid level both as the standard and 22-aminoacid-normalized Shannon entropy for every ORF using Bio3d R package on the alignment [47], and afterward, the mean OLP normalized entropy was calculated.\n\n2.2. Overlapping Peptide Set Design and Variability Plots\nFor the automated design of overlapping peptides with variable length, we used the previously described Peptgen algorithm available at the Los Alamos National Laboratories HIV Immunology database [48]. This OLP generator allows predefining peptide length and level of the desired overlap between adjacent OLP. Peptgen is also set up to exclude from the C-terminal end of OLP certain “forbidden” amino acids (G, P, E, D, Q, N, T, S and C) that are rarely seen to serve as the C-terminal anchor position of HLA class I presented epitopes [49]. Using this optional modification can lead to length variation in the OLP set, which can be controlled by limiting the maximal length of an OLP in regions with numerous serial “forbidden” residues. The settings used for the present SARS-CoV-2 consensus OLP design were a) OLP length of 15 or 18 amino acids, with maximal extension or truncation of up to ±3 residues to avoid forbidden C-terminal residues. In addition, the overlap between adjacent OLP was set at 10 or 11 residues. The no-glutamine at N-terminal setting was applied to prevent OLP starting with a glutamine residue as this can lead to complications with peptide synthesis. For positions where two or more amino acids were present above 25% of the sequences in the alignment, two or more sequence variants for those OLPs were generated. Sequence logos were generated for these cases with the ggseqlogo R package [50].\n\n2.3. Detection of Conserved Peptides Among Coronavirus\nIn an attempt to detect protein fragments that are conserved across a wide range of members of the coronavirus family, full-length consensus ORF from SARS-CoV-2 were aligned with other coronavirus sequences. Three alignments were performed based on different sequence selection criteria: (i) 50 reference sequences (RefSeq) with the lowest E-values resulting from a pBLAST search [51] using the ORF-specific consensus sequences (pan-coronavirus alignment) (ii) homologous proteins from 17 viruses representing the Betacoronavirus taxon (beta-coronavirus alignment) or, (iii) homologous proteins from the 7 full-genome sequenced human coronaviruses (including SARS-CoV, MERS-CoV, and common cold species OC43, NL63, 229E, HKU1, human-coronavirus alignment). Selected sequences were aligned using the MUSCLE algorithm in MEGA X [52]. Conserved protein fragments were identified using BioEdit with the following criteria: minimum length of 8 amino acid, maximum average entropy of 0.25, maximum entropy per position of 1 and limiting the search to 1 gap per segment. Sequence logos were generated for the aligned peptides on Weblogo [53].\n\n2.4. Identification of Previously Described Epitopes in CoV-2 Conserved Regions\nTo identify previously reported epitopes in the conserved regions of coronaviruses (pan-coronavirus, betacoronaviruses, and human coronaviruses), and match them with the SARS-CoV-2 consensus sequence, searches for experimentally described epitopes were carried out in the Immune Epitope Database [54]. The search criteria were as follows: “linear peptide; blast option: 90%; Host: Homo sapiens; Any MHC restriction; Positive assays only; All assays; Any disease”. The search yielded 141 epitopes, of which 14 B-cell epitopes and 2 epitopes from a hypothetical protein were removed. The remaining identified epitopes were subsequently used to generate an epitope map of the respective conserved regions."}


    {"project":"LitCovid-PD-GO-BP","denotations":[{"id":"T11","span":{"begin":2244,"end":2261},"obj":""},{"id":"T12","span":{"begin":2252,"end":2261},"obj":""}],"text":"2. Methods\n\n2.1. Consensus Sequence ORF Generation and Entropy Calculation\nA total of 1731 full-length SARS-CoV-2 sequences were downloaded from NCBI (30 April 2020, txid2697049, minimum length = 29,000 bp) and aligned using MAFFT [44]. The alignment was visually inspected and curated using Genbank NC_045512.2 as a coordinate reference [45]. A nucleotide consensus sequence was generated by keeping all nucleotides present in at least 25% of the sequences in the alignment. The amino acid consensus sequence was then created by using NC_045512.2 annotated Open Reading Frames (ORFs) plus additional ORFs described in Finkel et al. [46] using the Biostrings R package. Mixed nucleotide positions were either resolved if they were synonymous or flagged for downstream analysis. Positional entropy was calculated at the amino acid level both as the standard and 22-aminoacid-normalized Shannon entropy for every ORF using Bio3d R package on the alignment [47], and afterward, the mean OLP normalized entropy was calculated.\n\n2.2. Overlapping Peptide Set Design and Variability Plots\nFor the automated design of overlapping peptides with variable length, we used the previously described Peptgen algorithm available at the Los Alamos National Laboratories HIV Immunology database [48]. This OLP generator allows predefining peptide length and level of the desired overlap between adjacent OLP. Peptgen is also set up to exclude from the C-terminal end of OLP certain “forbidden” amino acids (G, P, E, D, Q, N, T, S and C) that are rarely seen to serve as the C-terminal anchor position of HLA class I presented epitopes [49]. Using this optional modification can lead to length variation in the OLP set, which can be controlled by limiting the maximal length of an OLP in regions with numerous serial “forbidden” residues. The settings used for the present SARS-CoV-2 consensus OLP design were a) OLP length of 15 or 18 amino acids, with maximal extension or truncation of up to ±3 residues to avoid forbidden C-terminal residues. In addition, the overlap between adjacent OLP was set at 10 or 11 residues. The no-glutamine at N-terminal setting was applied to prevent OLP starting with a glutamine residue as this can lead to complications with peptide synthesis. For positions where two or more amino acids were present above 25% of the sequences in the alignment, two or more sequence variants for those OLPs were generated. Sequence logos were generated for these cases with the ggseqlogo R package [50].\n\n2.3. Detection of Conserved Peptides Among Coronavirus\nIn an attempt to detect protein fragments that are conserved across a wide range of members of the coronavirus family, full-length consensus ORF from SARS-CoV-2 were aligned with other coronavirus sequences. Three alignments were performed based on different sequence selection criteria: (i) 50 reference sequences (RefSeq) with the lowest E-values resulting from a pBLAST search [51] using the ORF-specific consensus sequences (pan-coronavirus alignment) (ii) homologous proteins from 17 viruses representing the Betacoronavirus taxon (beta-coronavirus alignment) or, (iii) homologous proteins from the 7 full-genome sequenced human coronaviruses (including SARS-CoV, MERS-CoV, and common cold species OC43, NL63, 229E, HKU1, human-coronavirus alignment). Selected sequences were aligned using the MUSCLE algorithm in MEGA X [52]. Conserved protein fragments were identified using BioEdit with the following criteria: minimum length of 8 amino acid, maximum average entropy of 0.25, maximum entropy per position of 1 and limiting the search to 1 gap per segment. Sequence logos were generated for the aligned peptides on Weblogo [53].\n\n2.4. Identification of Previously Described Epitopes in CoV-2 Conserved Regions\nTo identify previously reported epitopes in the conserved regions of coronaviruses (pan-coronavirus, betacoronaviruses, and human coronaviruses), and match them with the SARS-CoV-2 consensus sequence, searches for experimentally described epitopes were carried out in the Immune Epitope Database [54]. The search criteria were as follows: “linear peptide; blast option: 90%; Host: Homo sapiens; Any MHC restriction; Positive assays only; All assays; Any disease”. The search yielded 141 epitopes, of which 14 B-cell epitopes and 2 epitopes from a hypothetical protein were removed. The remaining identified epitopes were subsequently used to generate an epitope map of the respective conserved regions."}


    {"project":"LitCovid-sentences","denotations":[{"id":"T39","span":{"begin":0,"end":2},"obj":"Sentence"},{"id":"T40","span":{"begin":3,"end":10},"obj":"Sentence"},{"id":"T41","span":{"begin":12,"end":16},"obj":"Sentence"},{"id":"T42","span":{"begin":17,"end":74},"obj":"Sentence"},{"id":"T43","span":{"begin":75,"end":236},"obj":"Sentence"},{"id":"T44","span":{"begin":237,"end":343},"obj":"Sentence"},{"id":"T45","span":{"begin":344,"end":475},"obj":"Sentence"},{"id":"T46","span":{"begin":476,"end":669},"obj":"Sentence"},{"id":"T47","span":{"begin":670,"end":777},"obj":"Sentence"},{"id":"T48","span":{"begin":778,"end":1022},"obj":"Sentence"},{"id":"T49","span":{"begin":1024,"end":1028},"obj":"Sentence"},{"id":"T50","span":{"begin":1029,"end":1081},"obj":"Sentence"},{"id":"T51","span":{"begin":1082,"end":1283},"obj":"Sentence"},{"id":"T52","span":{"begin":1284,"end":1391},"obj":"Sentence"},{"id":"T53","span":{"begin":1392,"end":1623},"obj":"Sentence"},{"id":"T54","span":{"begin":1624,"end":1820},"obj":"Sentence"},{"id":"T55","span":{"begin":1821,"end":2028},"obj":"Sentence"},{"id":"T56","span":{"begin":2029,"end":2104},"obj":"Sentence"},{"id":"T57","span":{"begin":2105,"end":2262},"obj":"Sentence"},{"id":"T58","span":{"begin":2263,"end":2425},"obj":"Sentence"},{"id":"T59","span":{"begin":2426,"end":2506},"obj":"Sentence"},{"id":"T60","span":{"begin":2508,"end":2512},"obj":"Sentence"},{"id":"T61","span":{"begin":2513,"end":2562},"obj":"Sentence"},{"id":"T62","span":{"begin":2563,"end":2770},"obj":"Sentence"},{"id":"T63","span":{"begin":2771,"end":3319},"obj":"Sentence"},{"id":"T64","span":{"begin":3320,"end":3394},"obj":"Sentence"},{"id":"T65","span":{"begin":3395,"end":3626},"obj":"Sentence"},{"id":"T66","span":{"begin":3627,"end":3698},"obj":"Sentence"},{"id":"T67","span":{"begin":3700,"end":3704},"obj":"Sentence"},{"id":"T68","span":{"begin":3705,"end":3779},"obj":"Sentence"},{"id":"T69","span":{"begin":3780,"end":4081},"obj":"Sentence"},{"id":"T70","span":{"begin":4082,"end":4149},"obj":"Sentence"},{"id":"T71","span":{"begin":4150,"end":4160},"obj":"Sentence"},{"id":"T72","span":{"begin":4161,"end":4243},"obj":"Sentence"},{"id":"T73","span":{"begin":4244,"end":4361},"obj":"Sentence"},{"id":"T74","span":{"begin":4362,"end":4482},"obj":"Sentence"}],"namespaces":[{"prefix":"_base","uri":""}],"text":"2. Methods\n\n2.1. Consensus Sequence ORF Generation and Entropy Calculation\nA total of 1731 full-length SARS-CoV-2 sequences were downloaded from NCBI (30 April 2020, txid2697049, minimum length = 29,000 bp) and aligned using MAFFT [44]. The alignment was visually inspected and curated using Genbank NC_045512.2 as a coordinate reference [45]. A nucleotide consensus sequence was generated by keeping all nucleotides present in at least 25% of the sequences in the alignment. The amino acid consensus sequence was then created by using NC_045512.2 annotated Open Reading Frames (ORFs) plus additional ORFs described in Finkel et al. [46] using the Biostrings R package. Mixed nucleotide positions were either resolved if they were synonymous or flagged for downstream analysis. Positional entropy was calculated at the amino acid level both as the standard and 22-aminoacid-normalized Shannon entropy for every ORF using Bio3d R package on the alignment [47], and afterward, the mean OLP normalized entropy was calculated.\n\n2.2. Overlapping Peptide Set Design and Variability Plots\nFor the automated design of overlapping peptides with variable length, we used the previously described Peptgen algorithm available at the Los Alamos National Laboratories HIV Immunology database [48]. This OLP generator allows predefining peptide length and level of the desired overlap between adjacent OLP. Peptgen is also set up to exclude from the C-terminal end of OLP certain “forbidden” amino acids (G, P, E, D, Q, N, T, S and C) that are rarely seen to serve as the C-terminal anchor position of HLA class I presented epitopes [49]. Using this optional modification can lead to length variation in the OLP set, which can be controlled by limiting the maximal length of an OLP in regions with numerous serial “forbidden” residues. The settings used for the present SARS-CoV-2 consensus OLP design were a) OLP length of 15 or 18 amino acids, with maximal extension or truncation of up to ±3 residues to avoid forbidden C-terminal residues. In addition, the overlap between adjacent OLP was set at 10 or 11 residues. The no-glutamine at N-terminal setting was applied to prevent OLP starting with a glutamine residue as this can lead to complications with peptide synthesis. For positions where two or more amino acids were present above 25% of the sequences in the alignment, two or more sequence variants for those OLPs were generated. Sequence logos were generated for these cases with the ggseqlogo R package [50].\n\n2.3. Detection of Conserved Peptides Among Coronavirus\nIn an attempt to detect protein fragments that are conserved across a wide range of members of the coronavirus family, full-length consensus ORF from SARS-CoV-2 were aligned with other coronavirus sequences. Three alignments were performed based on different sequence selection criteria: (i) 50 reference sequences (RefSeq) with the lowest E-values resulting from a pBLAST search [51] using the ORF-specific consensus sequences (pan-coronavirus alignment) (ii) homologous proteins from 17 viruses representing the Betacoronavirus taxon (beta-coronavirus alignment) or, (iii) homologous proteins from the 7 full-genome sequenced human coronaviruses (including SARS-CoV, MERS-CoV, and common cold species OC43, NL63, 229E, HKU1, human-coronavirus alignment). Selected sequences were aligned using the MUSCLE algorithm in MEGA X [52]. Conserved protein fragments were identified using BioEdit with the following criteria: minimum length of 8 amino acid, maximum average entropy of 0.25, maximum entropy per position of 1 and limiting the search to 1 gap per segment. Sequence logos were generated for the aligned peptides on Weblogo [53].\n\n2.4. Identification of Previously Described Epitopes in CoV-2 Conserved Regions\nTo identify previously reported epitopes in the conserved regions of coronaviruses (pan-coronavirus, betacoronaviruses, and human coronaviruses), and match them with the SARS-CoV-2 consensus sequence, searches for experimentally described epitopes were carried out in the Immune Epitope Database [54]. The search criteria were as follows: “linear peptide; blast option: 90%; Host: Homo sapiens; Any MHC restriction; Positive assays only; All assays; Any disease”. The search yielded 141 epitopes, of which 14 B-cell epitopes and 2 epitopes from a hypothetical protein were removed. The remaining identified epitopes were subsequently used to generate an epitope map of the respective conserved regions."}