Table 1 Resources for text mining researchers and practitioners, including embeddings, reusable annotated datasets, knowledge graphs and pretrained COVID-19-domain language models. These resources can be incorporated into downstream systems. Under ‘Affiliation’, we use for industry, for nonprofit and no symbol for academic affiliations; if no affiliation is provided, the work is conducted by independent researchers Resource type Resource name Data/model used Affiliation Link Description Embeddings SPECTER CORD-19 embeddings CORD-19 Allen Institute for AI https://www.semanticscholar.org/cord19 SPECTER embeddings [19] for CORD-19 papers COVID-19 Concept embeddings CORD-19, SNOMED-CT The Ohio State University https://slate.cse.ohio-state.edu/JET/COVID-19/ JET embeddings [60] for clinical entities (SNOMED-CT) in CORD-19 corpus CORD-19 SeVeN embeddings CORD-19 Cardiff University https://github.com/luisespinosaanke/cord-19-seven SeVeN [28] word embeddings trained on CORD-19 Co-occurrence network embeddings [63] CORD-19-on-FHIR Mayo Clinic https://github.com/shenfc/COVID-19-network-embeddings Network co-occurrence embeddings trained on semantically annotated version of CORD-19 (CORD-19-on-FHIR) Annotations CODA-19 [35] CORD-19 Penn State University, UCSF, Carnegie Mellon University https://github.com/windx0303/CODA-19 Crowdsourced dataset of research aspect annotations for papers in CORD-19 CORD-19-on-FHIR CORD-19, FHIR Mayo Clinic https://github.com/fhircat/CORD-19-on-FHIR FHIR RDF version of CORD-19 with annotations of condition, medication, and procedure clinical entities COVID-19 DistillerSR CORD-19, ClinicalTrials.gov Evidence Partners https://www.evidencepartners.com/resources/covid-19-resources/ Links between clinical trial identifiers and documents in CORD-19 SciBite COVID-19 annotations CORD-19 SciBite https://github.com/SciBiteLabs/CORD19 Sentence and entity co-occurrence annotations; annotation of entities from MeSH, GO, HPO, HGNC, ChEMBL, and more Knowledge graph CovidGraph CORD-19, Lens, Ensembl, NCBI Gene, Gene Ontology, experimental data, Johns Hopkins 2019-nCoV dataset Many academic and industry organizations https://covidgraph.org/ A knowledgegraph of COVID-19 papers, case statistics, genes and functions, and molecular data KGTK COVID-19 KnowledgeGraph [36] CORD-19, WikiData, CTD, Blender Lab COVID-KG USC, Pontificia Universidade Cat’olica Rio de Janeiro – A knowledge graph that integrates the CORD-19 corpus with gene, chemical, disease and taxonomic information from Wikidata and CTD databases and the Blender Lab COVID-KG (http://blender.cs.illinois.edu/covid19/) Blender Lab COVID-KG [99] CORD-19 UIUC http://blender.cs.illinois.edu/covid19/ Knowledge graph with entity types genes, diseases, chemicals and organisms and subtypes derived from the text and figure/caption relations in literature COVID-19 KnowledgeGraph [105] CORD-19, Comprehend Medical [8] Amazon Web Services (AWS) https://aws.amazon.com/cn/covid-19-data-lake/ COVID-19 specific knowledge graph; graph embeddings are used to power AWS CORD-19 search COVID-KOP [44] ROBOKOP, GO annotations, SciBite CORD-19 annotations UNC Chapel Hill https://covidkop.renci.org/ Combines ROBOKOP biomedical knowledge graph with information extracted from SciBite CORD-19 annotations Language model CovidBERT CORD-19, BioBERT, ClinicalBERT – https://github.com/manueltonneau/covid-berts BioBERT [49] and ClinicalBERT [2] fine-tuned on CORD-19 GreenCovidSQuADBERT [68] CORD-19, Word2vec, SQuADBERT LMU Munich, Siemens – A cheap and performant way to achieve domain adaptation for BERT models; achieves by training Word2vec and aligning Word2vec embeddings to BERT wordpieces SeVeN indicates semantic vector networks; JET, jointly embedding entities and text; CTD, Comparative Toxicogenomics database.