PMC:7890668 / 1499-3810 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/7890668","sourcedb":"PMC","sourceid":"7890668","source_url":"https://www.ncbi.nlm.nih.gov/pmc/7890668","text":"2 Materials and methods\nWe used COVID-19 Open Research Dataset (CORD-19, https://allenai.org/data/cord-19) containing over 60 000 full-text research papers with ontological tagging provided by the SciBiteAI group (https://github.com/SciBiteLabs/CORD19). We parsed CORD-19 data into a format compatible with the ROBOKOP’s knowledge graph by extracting, sentence by sentence, the counts of ontological terms and tag co-occurrences. This resulted in 800 000 new edges in the COVID-KOP knowledge graph. In addition, we used the SciGraph tool (https://github.com/SciGraph/SciGraph), which also allows biomedical ontological term tagging and tag co-occurrence counts at the paper rather than sentence level, leading to 4.5 million new edges.\nGene Ontology Annotation data for all viral proteins, including those of SARS-CoV-2, were downloaded from the EBI FTP site (see https://github.com/TranslatorIIPrototypes/ViralProteome for details). The knowledge graph integration tool KGX (https://github.com/NCATS-Tangerine/kgx) was used to merge the GOA data and create a ROBOKOP-formatted graph. In total, the COVID-KOP database and knowledge graph comprise nodes for 40 000 proteins, 4000 NCBITaxon (Federhen, 2012) terms, 1300 GO annotations (Ashburner et al., 2000) and 232 000 new edges (labeled as ‘related_to’) on top of those in ROBOKOP. We use bidirectional edges for these linkages because the connection directionality is not provided in primary sources.\nA set of 26 SARS-CoV-2 symptoms was identified from various resources (https://www.cebm.net/covid-19/covid-19-signs-and-symptoms-tracker/; https://covid.cd2h.org/N3C; https://www.hematology.org/covid-19/covid-19-and-coagulopathy) and a recent commentary (Schett et al., 2020). This information was manually entered into the COVID-KOP database as edges between the COVID-19 and its phenotypes.\nDue to multiple identifier systems used by different databases for the same entities, we utilized the Data Translator Node Normalization API (https://github.com/TranslatorIIPrototypes/NodeNormalization) for data integration. COVID-KOP is powered by the knowledge graph database Neo4J (https://neo4j.com/), which uses Cypher to enable complex graph database queries. The fully integrated COVID-KOP KG can be mined in the same way as ROBOKOP KG (Bizon et al., 2019).","divisions":[{"label":"title","span":{"begin":0,"end":23}},{"label":"p","span":{"begin":24,"end":735}},{"label":"p","span":{"begin":736,"end":1453}},{"label":"p","span":{"begin":1454,"end":1846}}],"tracks":[]}