> top > docs > PMC:7519301 > spans > 34579-37176 > annotations

PMC:7519301 / 34579-37176 JSON TXT

Annnotations TAB JSON ListView MergeView

LitCovid-PD-FMA-UBERON

Id	Subject	Object	Predicate	Lexical cue	fma_id
T113	2156-2164	Body_part	denotes	Appendix	http://purl.org/sig/ont/fma/fma14542

LitCovid-PD-UBERON

Id	Subject	Object	Predicate	Lexical cue	uberon_id
T3	1560-1563	Body_part	denotes	tip	http://purl.obolibrary.org/obo/UBERON_2001840

LitCovid-PD-MONDO

Id	Subject	Object	Predicate	Lexical cue	mondo_id
T180	39-47	Disease	denotes	SARS-CoV	http://purl.obolibrary.org/obo/MONDO_0005091
T181	39-43	Disease	denotes	SARS	http://purl.obolibrary.org/obo/MONDO_0005091

LitCovid-PD-CLO

Id	Subject	Object	Predicate	Lexical cue
T277	90-92	http://purl.obolibrary.org/obo/CLO_0050510	denotes	18
T278	1169-1171	http://purl.obolibrary.org/obo/CLO_0050509	denotes	27
T279	1362-1363	http://purl.obolibrary.org/obo/CLO_0001020	denotes	a
T280	1617-1620	http://purl.obolibrary.org/obo/NCBITaxon_314295	denotes	ape
T281	1635-1637	http://purl.obolibrary.org/obo/CLO_0001407	denotes	52
T282	2171-2173	http://purl.obolibrary.org/obo/CLO_0050050	denotes	S1
T283	2176-2177	http://purl.obolibrary.org/obo/CLO_0001020	denotes	A

LitCovid-PD-CHEBI

Id	Subject	Object	Predicate	Lexical cue	chebi_id
T97	1474-1479	Chemical	denotes	gamma	http://purl.obolibrary.org/obo/CHEBI_30212
T98	2153-2155	Chemical	denotes	SI	http://purl.obolibrary.org/obo/CHEBI_90326

LitCovid-PubTator

Id	Subject	Object	Predicate	Lexical cue	tao:has_database_id
493	653-656	Gene	denotes	Hu1	Gene:3215
494	39-49	Species	denotes	SARS-CoV-2	Tax:2697049
495	1104-1108	Species	denotes	CoV2	Tax:2697049

LitCovid-sentences

Id	Subject	Object	Predicate	Lexical cue
T210	0-34	Sentence	denotes	Sequence Processing and Filtering.
T211	35-255	Sentence	denotes	All SARS-CoV-2 sequences available on GISAID as of May 18, 2020 (n = 27,989) were downloaded and deduplicated where possible, and those missing accurate dates (that is, only recording the month and/or year) were removed.
T212	256-337	Sentence	denotes	Sequences were processed using the Biostrings package (version 2.48.0) in R (49).
T213	338-533	Sentence	denotes	Sequences known to be linked through direct transmission were removed, and only the sample with the earliest date (chosen at random when multiple samples were taken on the same day) was retained.
T214	534-696	Sentence	denotes	Sequences were then aligned with Mafft v7.467 using the -addfragments option to align to the reference sequence (Wuhan-Hu1, GISAID accession EPI_ISL_402125) (50).
T215	697-872	Sentence	denotes	Insertions relative to Wuhan-Hu-1 were removed, and the 5′ and 3′ ends of sequences (where coverage was low) were excised, resulting in an alignment consisting of the 10 ORFs.
T216	873-1177	Sentence	denotes	Any sequences with less than 95% coverage of the ORFs (i.e., >5% gaps) were removed, and 30 homoplasic sites likely due to sequencing artifacts identified by de Maio et al. were masked (https://github.com/W-L/ProblematicSites_SARS-CoV2/blob/master/archived_vcf/problematic_sites_sarsCov2.2020-05-27.vcf).
T217	1178-1499	Sentence	denotes	To identify individual sequences that were much more divergent than expected, given their sampling date, which likely reflected sequencing artifacts rather than evolution, we obtained a tree using FastTree v2.10.1 compiled with double precision under the general time reversible (GTR) model with gamma heterogeneity (51).
T218	1500-1643	Sentence	denotes	This tree was rooted at the reference sequence, and root-to-tip regression was performed following TempEst using the ape package in R (52, 53).
T219	1644-1743	Sentence	denotes	Outliers were defined as sequences that had studentized residuals greater than 3, and were removed.
T220	1744-1873	Sentence	denotes	Sequences from the United Kingdom corresponded to nearly half of the sequences (n = 12,157/25,671, 47%) of this filtered dataset.
T221	1874-2175	Sentence	denotes	To avoid overrepresentation of the UK sequences and bias in subsequent analyses, we investigated the effect of downsampling sequences on the mean Hamming distance and identified the minimum number of sequences required to recover the mean corresponding to the full distribution (SI Appendix, Fig. S1).
T222	2176-2375	Sentence	denotes	A subsample of 5,000 sequences satisfied these criteria, and also ensured that there were fewer sequences from the United Kingdom than from the United States (n = 5,398), reflecting the epidemiology.
T223	2376-2498	Sentence	denotes	These 5,000 sequences were sampled randomly, with weight proportional to the number of UK sequences collected on that day.
T224	2499-2597	Sentence	denotes	After these filtering steps, the alignment used for subsequent analyses included 18,514 sequences.

2_test

Id	Subject	Object	Predicate	Lexical cue
32868447-23329690-132542412	692-694	23329690	denotes	50
32868447-20224823-132542413	1495-1497	20224823	denotes	51
32868447-27774300-132542414	1635-1637	27774300	denotes	52
32868447-30016406-132542415	1639-1641	30016406	denotes	53