Table 1  Resources for text mining researchers and practitioners, including embeddings, reusable annotated datasets, knowledge graphs and pretrained COVID-19-domain language models. These resources can be incorporated into downstream systems. Under ‘Affiliation’, we use  for industry,  for nonprofit and no symbol for academic affiliations; if no affiliation is provided, the work is conducted by independent researchers
Resource type   Resource name   Data/model used   Affiliation   Link   Description
Embeddings  SPECTER CORD-19 embeddings  CORD-19  Allen Institute for AI  https://www.semanticscholar.org/cord19  SPECTER embeddings [19] for CORD-19 papers
COVID-19 Concept embeddings  CORD-19, SNOMED-CT  The Ohio State University  https://slate.cse.ohio-state.edu/JET/COVID-19/  JET embeddings [60] for clinical entities (SNOMED-CT) in CORD-19 corpus
CORD-19 SeVeN embeddings  CORD-19  Cardiff University  https://github.com/luisespinosaanke/cord-19-seven  SeVeN [28] word embeddings trained on CORD-19
Co-occurrence network embeddings [63]  CORD-19-on-FHIR  Mayo Clinic  https://github.com/shenfc/COVID-19-network-embeddings  Network co-occurrence embeddings trained on semantically annotated version of CORD-19 (CORD-19-on-FHIR)
Annotations  CODA-19 [35]  CORD-19  Penn State University, UCSF, Carnegie Mellon University  https://github.com/windx0303/CODA-19  Crowdsourced dataset of research aspect annotations for papers in CORD-19
CORD-19-on-FHIR  CORD-19, FHIR  Mayo Clinic  https://github.com/fhircat/CORD-19-on-FHIR  FHIR RDF version of CORD-19 with annotations of condition, medication, and procedure clinical entities
COVID-19 DistillerSR  CORD-19, ClinicalTrials.gov  Evidence Partners  https://www.evidencepartners.com/resources/covid-19-resources/  Links between clinical trial identifiers and documents in CORD-19
SciBite COVID-19 annotations  CORD-19  SciBite  https://github.com/SciBiteLabs/CORD19  Sentence and entity co-occurrence annotations; annotation of entities from MeSH, GO, HPO, HGNC, ChEMBL, and more
Knowledge graph  CovidGraph  CORD-19, Lens, Ensembl, NCBI Gene, Gene Ontology, experimental data, Johns Hopkins 2019-nCoV dataset  Many academic and industry organizations  https://covidgraph.org/  A knowledgegraph of COVID-19 papers, case statistics, genes and functions, and molecular data
KGTK COVID-19 KnowledgeGraph [36]  CORD-19, WikiData, CTD, Blender Lab COVID-KG  USC, Pontificia Universidade Cat’olica Rio de Janeiro  –  A knowledge graph that integrates the CORD-19 corpus with gene, chemical, disease and taxonomic information from Wikidata and CTD databases and the Blender Lab COVID-KG (http://blender.cs.illinois.edu/covid19/)
Blender Lab COVID-KG [99]  CORD-19  UIUC  http://blender.cs.illinois.edu/covid19/  Knowledge graph with entity types genes, diseases, chemicals and organisms and subtypes derived from the text and figure/caption relations in literature
COVID-19 KnowledgeGraph [105]  CORD-19, Comprehend Medical [8]  Amazon Web Services (AWS)  https://aws.amazon.com/cn/covid-19-data-lake/  COVID-19 specific knowledge graph; graph embeddings are used to power AWS CORD-19 search
COVID-KOP [44]  ROBOKOP, GO annotations, SciBite CORD-19 annotations  UNC Chapel Hill  https://covidkop.renci.org/  Combines ROBOKOP biomedical knowledge graph with information extracted from SciBite CORD-19 annotations
Language model  CovidBERT  CORD-19, BioBERT, ClinicalBERT  –  https://github.com/manueltonneau/covid-berts  BioBERT [49] and ClinicalBERT [2] fine-tuned on CORD-19
GreenCovidSQuADBERT [68]  CORD-19, Word2vec, SQuADBERT  LMU Munich, Siemens  –  A cheap and performant way to achieve domain adaptation for BERT models; achieves by training Word2vec and aligning Word2vec embeddings to BERT wordpieces
SeVeN indicates semantic vector networks; JET, jointly embedding entities and text; CTD, Comparative Toxicogenomics database.