sentences | | Sentence segmentation annotation.
Automatic annotation by TextSentencer. | 6.96 M | DBCLS | Jin-Dong Kim | 2023-11-24 | Developing | |
Test-Documents | | | 1 | | Jin-Dong Kim | 2023-11-24 | | |
PA-LLM | | 🖐️ LLMs for biomedical text summarisation | 0 | | Nico Colic | 2024-01-19 | Developing | |
PA-LLM-test | | | 0 | | Nico Colic | 2024-01-19 | Testing | |
MyTest | | | 9.81 M | | Jin-Dong Kim | 2023-11-24 | Testing | |
silkworm_test | | test project | 457 | | yaguchi | 2023-11-29 | Testing | |
CORD-19_Custom_license_subset | | The Custom license subset of the CORD-19 dataset.
The documents in this project will be updated as the CORD-19 dataset grows.
See the COVID DATASET LICENSE AGREEMENT. | 5.08 M | | Jin-Dong Kim | 2023-11-24 | Released | |
PubCasesHPO | | HPO annotation in PubCases | 3.18 M | | Toyofumi Fujiwara | 2023-11-24 | Beta | |
Epistemic_Statements | | The goal of this work is to identify epistemic statements in the scientific literature. An epistemic statement is a statement of unknowns, hypotheses, speculations, uncertainties, including statements of claims, hypotheses, questions, explanations, future opportunities, surprises, issues, or concerns within a sentence. The unit of an epistemic statement is a sentence automatically parsed. The classification is binary - epistemic statement or not. We will label epistemic statements only and one can assume that if a statement is not labeled, then it is not an epistemic statement.
The classifier is a CRF, trained on gold standard annotations of epistemic statements that are currently ongoing. We report an F-measure of 0.91 after 5-fold cross validation on a test set with 914 statements and an F-measure of 0.9 on a held out document with 130 statements. This project is still under development and is submitted to be used for the CovidLit project and associated Hackathon.
Please contact Mayla if you have any questions. | 1.42 M | | mboguslav | 2023-11-24 | Developing | |
LitCovid-PD-MONDO-v1 | | PubDictionaries annotation for disease terms - updated at 2020-04-20
It is based on MONDO Version 2020-04-20.
The terms in MONDO are loaded in PubDictionaries, with which the annotations in this project are produced. The parameter configuration used for this project is here.
Note that it is an automatically generated dictionary-based annotation. It will be updated periodically, as the documents are increased, and the dictionary is improved. | 13.4 K | | Jin-Dong Kim | 2023-11-29 | Released | |
LitCoin-PubTator-for-Tuning | | A set of randomly selected PubMed articles with PubTator annotation.
The labels of PubTator annotations are converted to corresponding labels for LitCoin as follows:
'Gene' -> 'GeneOrGeneProduct',
'Disease' -> 'DiseaseOrPhenotypicFeature',
'Chemical' -> 'ChemicalEntity'
'Species' -> 'OrganismTaxon'
'Mutation' -> 'SequenceVariant'
'CellLine' -> 'CellLine' | 14.2 K | | Jin-Dong Kim | 2023-11-29 | | |
example-dialog | | | 0 | | Jin-Dong Kim | 2023-11-27 | Testing | |
speech-test | | | 6 | | Jin-Dong Kim | 2023-11-26 | Testing | |
ENG_NER_NEL | | Annotations in COVID-19 related PubMed abstracts from the following ontologies: Disease Ontology ("do"), Gene Ontology ("go"), Human Phenotype Ontology ("hpo"), ChEBI ontology ("chebi"), MeSH
| 493 | LASIGE-DeST | pruas_18 | 2023-11-26 | Developing | |
LitCovid-PD-HP | | | 922 K | | Jin-Dong Kim | 2023-11-28 | Beta | |
LitCovid-PD-FMA-UBERON-v1 | | PubDictionaries annotation for anatomy terms - updated at 2020-04-20
Disease term annotation based on FMA and Uberon. Version 2020-04-20.
The terms in FMA and Uberon are loaded in PubDictionaries
(FMA and
Uberon), with which the annotations in this project are produced.
The parameter configuration used for this project is
here for FMA and
there for Uberon.
Note that it is an automatically generated dictionary-based annotation. It will be updated periodically, as the documents are increased, and the dictionary is improved. | 4.3 K | | Jin-Dong Kim | 2023-11-27 | Released | |
LitCovid-PubTatorCentral | | Named-entities for the documents in the LitCovid dataset. Annotations were automatically predicted by the PubTatorCentral tool (https://www.ncbi.nlm.nih.gov/research/pubtator/) | 4.64 K | | zebet | 2023-11-27 | Released | |
LappsTest | | Project to test posting annotations directly from the Language Applications Grid | 2.67 K | Keith Suderman | ksuderman | 2023-11-27 | Developing | |
pmc-enju-pas | | Predicate-argument structure annotation produced by Enju.
This data set is initially produced as a supporting resource for BioNLP-ST 2016 GE task.
As so, it currently includes the 34 full paper articles that are in the benchmark data sets of GE 2016 task, reference data set (bionlp-st-ge-2016-reference) and test data set (bionlp-st-ge-2016-test), but will be extended to include more papers from the PubMed Central Open Access subset (PMCOA).
| 205 K | DBCLS | Jin-Dong Kim | 2023-11-28 | Developing | |
NCBITAXON | | annotation for NCBI taxonomy.
Automatic annotation by PD-NCBITaxon. | 1.1 M | | Jin-Dong Kim | 2024-09-18 | Developing | |