PubMed-2017 | | abstracts published in 2017. | 0 | | Jin-Dong Kim | 2023-11-24 | Developing | |
Trait_curation150825 | | Trait curation | 0 | Sachiko_Shirasawa | Sachiko Shirasawa | 2023-11-29 | Testing | |
Computational_Biology | | | 0 | | Sophie Nam | 2023-11-29 | | |
Training_Data_French_fr_en | | | 0 | | wmtbio | 2023-11-27 | Developing | |
test-integbio | | | 0 | | yucca | 2016-08-03 | | |
ichiharatest_150825_2 | | test | 0 | ichihara_hisako | Hisako Ichihara | 2015-09-11 | Testing | |
SPECIES800_autotagged | | This project comprises the SPECIES800 corpus documents automatically annotated by the Jensenlab tagger.
Annotated entity types are:
Genes/proteins from the mentioned organisms (and any human ones)
PubChem Compound identifiers
NCBI Taxonomy entries
Gene Ontology cellular component terms
BRENDA Tissue Ontology terms
Disease Ontology terms
Environment Ontology terms
The SPECIES 800 (S800) comprises 800 PubMed abstracts. In its original form species mentions were manually identified and mapped to the corresponding NCBI Taxonomy identifiers.
Described in:
The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text.
Pafilis E, Frankild SP, Fanini L, Faulwetter S, Pavloudi C, et al. (2013). PLoS ONE, 2013, 8(6): e65390. doi:10.1371/journal.pone.0065390.
The manually annotated corpus is also available as a PubAnnotation project (see here).
| 0 | Evangelos Pafilis, Sampo Pyysalo, Lars Juhl Jensen | evangelos | 2015-11-20 | Testing | |
Training_Data_English_fr_en | | | 0 | | wmtbio | 2023-11-29 | Developing | |
Training_Data_English_zh_en | | | 0 | | wmtbio | 2023-11-29 | Developing | |
BLAH2015_Annotations_Adderall | | | 0 | nestoralvaro | nestoralvaro | 2023-11-29 | Testing | |
Training_Data_English_es_en | | | 0 | | wmtbio | 2023-11-29 | Developing | |
Disease_Markers | | | 0 | | Sophie Nam | 2023-11-29 | | |
Genomics_fulltext | | | 0 | | Sophie Nam | 2023-11-29 | | |
Training_Data_English_ja_en | | | 0 | | wmtbio | 2023-11-26 | Developing | |
test5 | | | 0 | | glennq | 2016-02-06 | | |
CORD-19_bioRxiv_medRxiv_subset | | The bioRxiv/medRxiv subset of the CORD-19 dataset: pre-prints that are not peer reviewed.
The documents in this project will be updated as the CORD-19 dataset grows.
See the COVID DATASET LICENSE AGREEMENT.
| 0 | | Jin-Dong Kim | 2023-11-29 | Released | |
Frame annotation ver1 | | | 0 | Younggyun Hahm | kaist_nlp | 2023-11-29 | Testing | |
Microarrays | | | 0 | | Sophie Nam | 2023-11-29 | | |
CORD-19_Non-commercial_use_subset | | The Non commercial use subset of the CORD-19 dataset.
The documents in this project will be updated as the CORD-19 dataset grows.
See the COVID DATASET LICENSE AGREEMENT. | 0 | | Jin-Dong Kim | 2023-11-29 | Released | |
Training_Data_Spanish_es_en | | | 0 | | wmtbio | 2023-11-28 | Developing | |