pmc-enju-pas | | Predicate-argument structure annotation produced by Enju.
This data set is initially produced as a supporting resource for BioNLP-ST 2016 GE task.
As so, it currently includes the 34 full paper articles that are in the benchmark data sets of GE 2016 task, reference data set (bionlp-st-ge-2016-reference) and test data set (bionlp-st-ge-2016-test), but will be extended to include more papers from the PubMed Central Open Access subset (PMCOA).
| 205 K | DBCLS | Jin-Dong Kim | 2023-11-28 | Developing | |
bionlp-st-ge-2016-test | | It is the benchmark test data set of the BioNLP-ST 2016 GE task. It includes Genia-style event annotations to 14 full paper articles which are about NFκB proteins. For testing purpose, however, annotations are all blinded, which means users cannot see the annotations in this project. Instead, annotations in any other project can be compared to the hidden annotations in this project, then the annotations in the project will be automatically evaluated based on the comparison.
A participant of GE task can get the evaluation of his/her result of automatic annotation, through following process:
Create a new project.
Import documents from the project, bionlp-st-2016-test-proteins to your project.
Import annotations from the project, bionlp-st-2016-test-proteins to your project.
At this point, you may want to compare you project to this project, the benchmark data set. It will show that protein annotations in your project is 100% correct, but other annotations, e.g., events, are 0%.
Produce event annotations, using your system, upon the protein annotations.
Upload your event annotations to your project.
Compare your project to this project, to get evaluation.
GE 2016 benchmark data set is provided as multi-layer annotations which include:
bionlp-st-ge-2016-reference: benchmark reference data set
bionlp-st-ge-2016-test: benchmark test data set (this project)
bionlp-st-ge-2016-test-proteins: protein annotation to the benchmark test data set
Following is supporting resources:
bionlp-st-ge-2016-coref: coreference annotation
bionlp-st-ge-2016-uniprot: Protein annotation with UniProt IDs.
pmc-enju-pas: dependency parsing result produced by Enju
UBERON-AE: annotation for anatomical entities as defined in UBERON
ICD10: annotation for disease names as defined in ICD10
GO-BP: annotation for biological process names as defined in GO
GO-CC: annotation for cellular component names as defined in GO
A SPARQL-driven search interface is provided at http://bionlp.dbcls.jp/sparql. | 7.99 K | DBCLS | Jin-Dong Kim | 2023-11-29 | Released | |
bionlp-st-ge-2016-test-proteins | | Protein annotations to the benchmark test data set of the BioNLP-ST 2016 GE task.
A participant of the GE task may import the documents and annotations of this project to his/her own project, to begin with producing event annotations.
For more details, please refer to the benchmark test data set (bionlp-st-ge-2016-test).
| 4.34 K | DBCLS | Jin-Dong Kim | 2023-11-27 | Released | |
ICD10 | | Annotation for disease names as defined in ICD10 | 1.6 K | DBCLS | Jin-Dong Kim | 2023-11-29 | Developing | |
sentences | | Sentence segmentation annotation.
Automatic annotation by TextSentencer. | 6.96 M | DBCLS | Jin-Dong Kim | 2023-11-24 | Developing | |
GO-CC | | Annotation for cellular components as defined in the "Cellular Component" subtree of Gene Ontology | 17.6 K | DBCLS | Jin-Dong Kim | 2023-11-30 | Developing | |
GO-BP | | Annotation for biological processes as defined in the "Biological Process" subset of Gene Ontology | 35.4 K | DBCLS | Jin-Dong Kim | 2023-11-29 | Developing | |
Colil | | Colil (Comments on Literature in Literature) is a search service for citation contexts utilized in the biomedical domain. Colil searches for a cited paper in the Colil database and then returns a list of the citation contexts for it and its relevant papers based on co-citations. | 3.34 K | DBCLS | Toyofumi Fujiwara | 2023-11-28 | Testing | |
bionlp-st-ge-2016-coref | | Coreference annotation to the benchmark data set (reference and test) of BioNLP-ST 2016 GE task.
For detailed information, please refer to the benchmark reference data set (bionlp-st-ge-2016-reference) and benchmark test data set (bionlp-st-ge-2016-test). | 853 | DBCLS | Jin-Dong Kim | 2024-06-17 | Released | |
GO-MF | | Annotation for molecular functions as defined in the "Molecular Function" subtree of Gene Ontology | 19.7 K | DBCLS | Jin-Dong Kim | 2023-12-04 | Testing | |
bionlp-st-ge-2016-uniprot | | UniProt protein annotation to the benchmark data set of BioNLP-ST 2016 GE task: reference data set (bionlp-st-ge-2016-reference) and test data set (bionlp-st-ge-2016-test).
The annotations are produced based on a dictionary which is semi-automatically compiled for the 34 full paper articles included in the benchmark data set (20 in the reference data set + 14 in the test data set).
For detailed information about BioNLP-ST GE 2016 task data sets, please refer to the benchmark reference data set (bionlp-st-ge-2016-reference) and benchmark test data set (bionlp-st-ge-2016-test).
| 16.2 K | DBCLS | Jin-Dong Kim | 2023-11-29 | Beta | |
PGR-NEG | | Identification of Negative Relations
| 23 | Diana Sousa | dpavot | 2023-11-28 | Developing | |
PGR-UNK | | Identification of Unknown Relations
| 91 | Diana Sousa | dpavot | 2023-11-29 | Developing | |
PGR-FAL | | Identification of False Relations | 128 | Diana Sousa | dpavot | 2023-11-29 | Developing | |
ENG_RE | | Entities and relations annotations from the following ontologies: Disease Ontology ('DO'), Gene Ontology ('GO'), Human Phenotype Ontology ('HPO'), and ChEBI ontology ('CHEBI'). | 224 | Diana Sousa | dpavot | 2023-11-29 | Developing | |
pubmed-enju-pas | | Annotating PubMed abstracts for predicate-argument structure (PAS). Enju 2.4.2 is used to automatically compute PAS. | 19.1 M | Enju | Jin-Dong Kim | 2023-11-24 | Developing | |
Erin_test | | @ Yonsei University | 0 | Erin | ErinHJ_Kim | 2023-11-29 | Testing | |
SPECIES800_autotagged | | This project comprises the SPECIES800 corpus documents automatically annotated by the Jensenlab tagger.
Annotated entity types are:
Genes/proteins from the mentioned organisms (and any human ones)
PubChem Compound identifiers
NCBI Taxonomy entries
Gene Ontology cellular component terms
BRENDA Tissue Ontology terms
Disease Ontology terms
Environment Ontology terms
The SPECIES 800 (S800) comprises 800 PubMed abstracts. In its original form species mentions were manually identified and mapped to the corresponding NCBI Taxonomy identifiers.
Described in:
The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text.
Pafilis E, Frankild SP, Fanini L, Faulwetter S, Pavloudi C, et al. (2013). PLoS ONE, 2013, 8(6): e65390. doi:10.1371/journal.pone.0065390.
The manually annotated corpus is also available as a PubAnnotation project (see here).
| 0 | Evangelos Pafilis, Sampo Pyysalo, Lars Juhl Jensen | evangelos | 2015-11-20 | Testing | |
SPECIES800 | | SPECIES 800 (S800): an abstract-based manually annotated corpus. S800 comprises 800 PubMed abstracts in which organism mentions were identified and mapped to the corresponding NCBI Taxonomy identifiers.
Described in:
The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text.
Pafilis E, Frankild SP, Fanini L, Faulwetter S, Pavloudi C, et al. (2013). PLoS ONE, 2013, 8(6): e65390. doi:10.1371/journal.pone.0065390 | 3.71 K | Evangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, Lars Juhl Jensen | evangelos | 2023-11-28 | Released | |
2015-BEL-Sample | | An attempt to upload 295 BEL statements, i.e. the sample set used for the 2015 BioCreative challenge.
| 58 | Fabio Rinaldi | Fabio Rinaldi | 2023-11-29 | Testing | |