PubCasesHPO | | HPO annotation in PubCases | 3.18 M | | Toyofumi Fujiwara | 2023-11-24 | Beta | |
PubCasesORDO | | ORDO annotation in PubCases | 865 K | | Toyofumi Fujiwara | 2023-11-24 | Beta | |
PubMed-2000 | | abstracts published in 2000. | 0 | | Jin-Dong Kim | 2023-11-29 | Developing | |
pubmed-2016 | | abstracts published in 2016 | 0 | | Jin-Dong Kim | 2023-11-28 | | |
PubMed-2017 | | abstracts published in 2017. | 0 | | Jin-Dong Kim | 2023-11-24 | Developing | |
PubMed_ArguminSci | | Predictions for PubMed automatically extracted with the ArguminSci tool (https://github.com/anlausch/ArguminSci). | 777 K | | zebet | 2023-11-24 | Released | |
pubmed-enju-pas | | Annotating PubMed abstracts for predicate-argument structure (PAS). Enju 2.4.2 is used to automatically compute PAS. | 19.1 M | Enju | Jin-Dong Kim | 2023-11-24 | Developing | |
PubMed-French-test | | A collection of PubMed abstract written in French | 0 | | Jin-Dong Kim | 2023-11-29 | Developing | |
PubMed-German-test | | A collection of PubMed abstracts which are written in German | 0 | | Jin-Dong Kim | 2023-11-24 | Developing | |
PubmedHPO | | Human phenotype annotation to PubMed abstracts, based on the HPO ontology | 12.4 M | Tudor Groza | tudor | 2023-11-24 | Beta | |
pubmed-sentences-benchmark | | A benchmark data for text segmentation into sentences.
The source of annotation is the GENIA treebank v1.0.
Following is the process taken.
began with the GENIA treebank v1.0.
sentence annotations were extracted and converted to PubAnnotation JSON.
uploaded. 12 abstracts met alignment failure.
among the 12 failure cases, 4 had a dot('.') character where there should be colon (':'). They were manually fixed then successfully uploaded: 7903907, 8053950, 8508358, 9415639.
among the 12 failed abstracts, 8 were "250 word truncation" cases. They were manually fixed and successfully uploaded. During the fixing, manual annotations were added for the missing pieces of text.
30 abstracts had extra text in the end, indicating copyright statement, e.g., "Copyright 1998 Academic Press." They were annotated as a sentence in GTB. However, the text did not exist anymore in PubMed. Therefore, the extra texts were removed, together with the sentence annotation to them.
| 18.4 K | GENIA project | Jin-Dong Kim | 2023-11-28 | Released | |
PubMed_Structured_Abstracts | | Sections (zones) as retrieved from PubMed. | 131 K | | zebet | 2023-11-28 | Released | |
pubmed_test | | | 0 | | | 2023-11-29 | | |
PubTator4TogoVar | | | 198 K | PubTator | Yasunori Yamamoto | 2024-01-10 | Developing | |
pubtator-sample | | Sample annotation of PubTator produced by Zhiyong Lu et al. | 28 | Zhiyong Lu | Jin-Dong Kim | 2023-11-27 | Testing | |
QFMC_MEDLINE | | Quaero French Medical Corpus:
Annotation of MEDLINE titles | 5.9 K | Aurélie Névéol | Pierre Zweigenbaum | 2023-11-29 | Beta | |
RDoCTask1SampleData | | Each annotation file contains an annotated abstract with an RDoC category. Each title span in these sample data is annotated with the corresponding related RDoC construct, although the RDoC category would apply for the entire abstract. The annotation data are formatted as json files. Please refer to the following page for a more detailed description of the json format http://www.pubannotation.org/docs/annotation-format/. | 20 | | mmanani1s | 2023-11-29 | Released | |
RDoCTask2SampleData | | Each annotation file contains an annotated abstract with the most relevant sentence. The relevant sentence is annotated with the RDoC category name. The annotation data are formatted as json files. Please refer to the following page for a more detailed description of the json format http://www.pubannotation.org/docs/annotation-format/.
| 10 | | mmanani1s | 2023-11-29 | Released | |
RELASIGEBLAH7hhaider5 | | | 277 | | hhaider5 | 2023-11-29 | Developing | |
RELISH-DB | | Abstracts contained in the data of the RELISH-DB (https://relishdb.ict.griffith.edu.au) made available for download here.
Data was downloaded from here: https://figshare.com/projects/RELISH-DB/60095
Related publication: https://academic.oup.com/database/article/doi/10.1093/database/baz085/5608006#200722023 | 0 | | | 2023-11-29 | Released | |