> top > docs > PMC:7519301 > spans > 34579-37176 > annotations

PMC:7519301 / 34579-37176 JSONTXT

Annnotations TAB JSON ListView MergeView

LitCovid-PD-FMA-UBERON

Id Subject Object Predicate Lexical cue fma_id
T113 2156-2164 Body_part denotes Appendix http://purl.org/sig/ont/fma/fma14542

LitCovid-PD-UBERON

Id Subject Object Predicate Lexical cue uberon_id
T3 1560-1563 Body_part denotes tip http://purl.obolibrary.org/obo/UBERON_2001840

LitCovid-PD-MONDO

Id Subject Object Predicate Lexical cue mondo_id
T180 39-47 Disease denotes SARS-CoV http://purl.obolibrary.org/obo/MONDO_0005091
T181 39-43 Disease denotes SARS http://purl.obolibrary.org/obo/MONDO_0005091

LitCovid-PD-CLO

Id Subject Object Predicate Lexical cue
T277 90-92 http://purl.obolibrary.org/obo/CLO_0050510 denotes 18
T278 1169-1171 http://purl.obolibrary.org/obo/CLO_0050509 denotes 27
T279 1362-1363 http://purl.obolibrary.org/obo/CLO_0001020 denotes a
T280 1617-1620 http://purl.obolibrary.org/obo/NCBITaxon_314295 denotes ape
T281 1635-1637 http://purl.obolibrary.org/obo/CLO_0001407 denotes 52
T282 2171-2173 http://purl.obolibrary.org/obo/CLO_0050050 denotes S1
T283 2176-2177 http://purl.obolibrary.org/obo/CLO_0001020 denotes A

LitCovid-PD-CHEBI

Id Subject Object Predicate Lexical cue chebi_id
T97 1474-1479 Chemical denotes gamma http://purl.obolibrary.org/obo/CHEBI_30212
T98 2153-2155 Chemical denotes SI http://purl.obolibrary.org/obo/CHEBI_90326

LitCovid-PubTator

Id Subject Object Predicate Lexical cue tao:has_database_id
493 653-656 Gene denotes Hu1 Gene:3215
494 39-49 Species denotes SARS-CoV-2 Tax:2697049
495 1104-1108 Species denotes CoV2 Tax:2697049

LitCovid-sentences

Id Subject Object Predicate Lexical cue
T210 0-34 Sentence denotes Sequence Processing and Filtering.
T211 35-255 Sentence denotes All SARS-CoV-2 sequences available on GISAID as of May 18, 2020 (n = 27,989) were downloaded and deduplicated where possible, and those missing accurate dates (that is, only recording the month and/or year) were removed.
T212 256-337 Sentence denotes Sequences were processed using the Biostrings package (version 2.48.0) in R (49).
T213 338-533 Sentence denotes Sequences known to be linked through direct transmission were removed, and only the sample with the earliest date (chosen at random when multiple samples were taken on the same day) was retained.
T214 534-696 Sentence denotes Sequences were then aligned with Mafft v7.467 using the -addfragments option to align to the reference sequence (Wuhan-Hu1, GISAID accession EPI_ISL_402125) (50).
T215 697-872 Sentence denotes Insertions relative to Wuhan-Hu-1 were removed, and the 5′ and 3′ ends of sequences (where coverage was low) were excised, resulting in an alignment consisting of the 10 ORFs.
T216 873-1177 Sentence denotes Any sequences with less than 95% coverage of the ORFs (i.e., >5% gaps) were removed, and 30 homoplasic sites likely due to sequencing artifacts identified by de Maio et al. were masked (https://github.com/W-L/ProblematicSites_SARS-CoV2/blob/master/archived_vcf/problematic_sites_sarsCov2.2020-05-27.vcf).
T217 1178-1499 Sentence denotes To identify individual sequences that were much more divergent than expected, given their sampling date, which likely reflected sequencing artifacts rather than evolution, we obtained a tree using FastTree v2.10.1 compiled with double precision under the general time reversible (GTR) model with gamma heterogeneity (51).
T218 1500-1643 Sentence denotes This tree was rooted at the reference sequence, and root-to-tip regression was performed following TempEst using the ape package in R (52, 53).
T219 1644-1743 Sentence denotes Outliers were defined as sequences that had studentized residuals greater than 3, and were removed.
T220 1744-1873 Sentence denotes Sequences from the United Kingdom corresponded to nearly half of the sequences (n = 12,157/25,671, 47%) of this filtered dataset.
T221 1874-2175 Sentence denotes To avoid overrepresentation of the UK sequences and bias in subsequent analyses, we investigated the effect of downsampling sequences on the mean Hamming distance and identified the minimum number of sequences required to recover the mean corresponding to the full distribution (SI Appendix, Fig. S1).
T222 2176-2375 Sentence denotes A subsample of 5,000 sequences satisfied these criteria, and also ensured that there were fewer sequences from the United Kingdom than from the United States (n = 5,398), reflecting the epidemiology.
T223 2376-2498 Sentence denotes These 5,000 sequences were sampled randomly, with weight proportional to the number of UK sequences collected on that day.
T224 2499-2597 Sentence denotes After these filtering steps, the alignment used for subsequent analyses included 18,514 sequences.

2_test

Id Subject Object Predicate Lexical cue
32868447-23329690-132542412 692-694 23329690 denotes 50
32868447-20224823-132542413 1495-1497 20224823 denotes 51
32868447-27774300-132542414 1635-1637 27774300 denotes 52
32868447-30016406-132542415 1639-1641 30016406 denotes 53