FSU-PRGE | | A new broad-coverage corpus composed of 3,306 MEDLINE abstracts dealing with gene and protein mentions.
The annotation process was semi-automatic.
Publication: http://aclweb.org/anthology/W/W10/W10-1838.pdf | 59.5 K | CALBC Project | Yue Wang | 2017-03-08 | Released | |
GlyCosmos600-docs | | A random collection of 600 PubMed abstracts from 6 glycobiology-related journals: Glycobiology, Glycoconjugate journal, The Journal of biological chemistry, Journal of proteome research, Journal of proteomics, and Carbohydrate research. The whole PMIDs were collected on June 11, 2019. From each journal, 100 PMIDs were randomly sampled. | 0 | | Jin-Dong Kim | 2019-06-11 | Released | |
DisGeNET5_variant_disease | | The file contains variant-disease associations obtained by text mining MEDLINE abstracts using the BeFree system, including the variant and disease off sets. | 144 K | IBI Group | Yue Wang | 2020-02-01 | Released | |
CORD-19_All_docs | | All the documents in the whole CORD-19 dataset.
The documents in this project will be updated as the CORD-19 dataset grows.
See the COVID DATASET LICENSE AGREEMENT. | 0 | | Jin-Dong Kim | 2020-03-23 | Released | |
CORD-19_bioRxiv_medRxiv_subset | | The bioRxiv/medRxiv subset of the CORD-19 dataset: pre-prints that are not peer reviewed.
The documents in this project will be updated as the CORD-19 dataset grows.
See the COVID DATASET LICENSE AGREEMENT.
| 0 | | Jin-Dong Kim | 2020-03-23 | Released | |
CORD-19_Commercial_use_subset | | The Commercial use subset of the CORD-19 dataset.
The documents in this project will be updated as the CORD-19 dataset grows.
See the COVID DATASET LICENSE AGREEMENT. | 0 | | Jin-Dong Kim | 2020-03-23 | Released | |
CORD-19_Non-commercial_use_subset | | The Non commercial use subset of the CORD-19 dataset.
The documents in this project will be updated as the CORD-19 dataset grows.
See the COVID DATASET LICENSE AGREEMENT. | 0 | | Jin-Dong Kim | 2020-03-23 | Released | |
LitCovid-PubTatorCentral | | Named-entities for the documents in the LitCovid dataset. Annotations were automatically predicted by the PubTatorCentral tool (https://www.ncbi.nlm.nih.gov/research/pubtator/) | 4.64 K | | zebet | 2020-04-01 | Released | |
CORD-19_Custom_license_subset | | The Custom license subset of the CORD-19 dataset.
The documents in this project will be updated as the CORD-19 dataset grows.
See the COVID DATASET LICENSE AGREEMENT. | 5.08 M | | Jin-Dong Kim | 2020-04-10 | Released | |
CORD-19-PD-UBERON | | PubDictionaries annotation for UBERON terms - updated at 2020-04-30
It is disease term annotation based on Uberon.
The terms in Uberon are uploaded in PubDictionaries
(Uberon), with which the annotations in this project are produced.
The parameter configuration used for this project is
here.
Note that it is an automatically generated dictionary-based annotation. It will be updated periodically, as the documents are increased, and the dictionary is improved. | 1.42 M | | Jin-Dong Kim | 2020-04-30 | Released | |
CORD-19-PD-HP | | PubDictionaries annotation for HP terms - updated at 2020-04-30
Disease term annotation based on HP.
Version 2020-04-20.
The terms in HP are loaded in PubDictionaries, with which the annotations in this project are produced. The parameter configuration used for this project is here.
Note that it is an automatically generated dictionary-based annotation. It will be updated periodically, as the documents are increased, and the dictionary is improved. | 1.15 M | | Jin-Dong Kim | 2020-05-12 | Released | |
bionlp-st-ge-2016-reference-tees | | NER and event extraction produced by TEES (with the default GE11 model) for the 20 full papers used in the BioNLP 2016 GE task reference corpus. | 14.6 K | Nico Colic | Nico Colic | 2020-09-13 | Released | |
bionlp-st-ge-2016-spacy-parsed | | Dependency parses produced by spaCy parser, and part-of-speech tags produced by Stanford tagger (with the wsj-0-18-left3words-nodistsim model). The exact procedure is described here. Data set contains the 34 full paper articles used in the BioNLP 2016 GE task.
| 225 K | Nico Colic | Nico Colic | 2020-10-02 | Released | |
bionlp-st-ge-2016-test-tees | | NER and event extraction produced by TEES (with the default GE11 model) for the 14 full papers used in the BioNLP 2016 GE task test corpus. | 9.17 K | Nico Colic | Nico Colic | 2020-10-02 | Released | |
craft-sa-dev | | Development data for CRAFT SA shared task. This project contains the development (training) annotations for the Structural Annotation task of the CRAFT Shared Task 2019. This particular set contains token and sentence annotations with tokens linked via dependency relations. These dependency relations were automatically generated using the manually curated CRAFT constituency treebank files as input. | 490 K | University of Colorado Anschutz Medical Campus | craft-st | 2020-10-02 | Released | |
LitCovid-PD-FMA-UBERON-v1 | | PubDictionaries annotation for anatomy terms - updated at 2020-04-20
Disease term annotation based on FMA and Uberon. Version 2020-04-20.
The terms in FMA and Uberon are loaded in PubDictionaries
(FMA and
Uberon), with which the annotations in this project are produced.
The parameter configuration used for this project is
here for FMA and
there for Uberon.
Note that it is an automatically generated dictionary-based annotation. It will be updated periodically, as the documents are increased, and the dictionary is improved. | 4.3 K | | Jin-Dong Kim | 2020-11-20 | Released | |
LitCovid-PD-HP-v1 | | PubDictionaries annotation for human phenotype terms - updated at 2020-04-20
Disease term annotation based on HP.
Version 2020-04-20.
The terms in HP are loaded in PubDictionaries, with which the annotations in this project are produced. The parameter configuration used for this project is here.
Note that it is an automatically generated dictionary-based annotation. It will be updated periodically, as the documents are increased, and the dictionary is improved. | 3.03 K | | Jin-Dong Kim | 2020-11-20 | Released | |
LitCovid-PD-MONDO-v1 | | PubDictionaries annotation for disease terms - updated at 2020-04-20
It is based on MONDO Version 2020-04-20.
The terms in MONDO are loaded in PubDictionaries, with which the annotations in this project are produced. The parameter configuration used for this project is here.
Note that it is an automatically generated dictionary-based annotation. It will be updated periodically, as the documents are increased, and the dictionary is improved. | 13.4 K | | Jin-Dong Kim | 2020-11-20 | Released | |
LitCovid-sentences-v1 | | Sentence segmentation of all the texts in the LitCovid literature. The segmentation is automatically obtained using the TextSentencer annotation service developed and maintained by DBCLS. | 16.5 K | | Jin-Dong Kim | 2021-01-17 | Released | |
PubMed_ArguminSci | | Predictions for PubMed automatically extracted with the ArguminSci tool (https://github.com/anlausch/ArguminSci). | 777 K | | zebet | 2021-03-10 | Released | |