glycogenes | | | 2.75 K | 2023-11-30 | Testing | |
LitCovid-sample-docs | | A comprehensive literature resource on the subject of Covid-19 is collected by NCBI:
https://www.ncbi.nlm.nih.gov/research/coronavirus/
The LitCovid project@PubAnnotation is a collection of the titles and abstracts of the LitCovid dataset, for the people who want to perform text mining analysis. Please note that if you produce some annotation to the documents in this project, and contribute the annotation back to PubAnnotation, it will become publicly available together with contribution from other people.
If you want to contribute your annotation to PubAnnotation, please refer to the documentation page:
http://www.pubannotation.org/docs/submit-annotation/
The list of the PMID is sourced from here
Below is a notice from the original LitCovid dataset:
PUBLIC DOMAIN NOTICE
National Center for Biotechnology Information
This software/database is a "United States Government Work" under the
terms of the United States Copyright Act. It was written as part of
the author's official duties as a United States Government employee and
thus cannot be copyrighted. This software/database is freely available
to the public for use. The National Library of Medicine and the U.S.
Government have not placed any restriction on its use or reproduction.
Although all reasonable efforts have been taken to ensure the accuracy
and reliability of the software and data, the NLM and the U.S.
Government do not and cannot warrant the performance or results that
may be obtained by using this software or data. The NLM and the U.S.
Government disclaim all warranties, express or implied, including
warranties of performance, merchantability or fitness for any particular
purpose.
Please cite the authors in any work or product based on this material :
Chen Q, Allot A, & Lu Z. (2020) Keep up with the latest coronavirus research, Nature 579:193
| 0 | 2023-11-29 | Uploading | |
KoreanFN-example | | Korean FrameNet example | 6 | 2023-11-29 | Developing | |
genia-medco-coref | | Coreference annotation made to the Genia corpus, following the MUC annotation scheme. It is a product of the collaboration between the Genia and the MedCo projects. | 45.9 K | 2023-11-24 | Developing | |
tutorial1 | | | 5 | 2023-11-29 | Testing | |
bionlp-st-ge-2016-coref | | Coreference annotation to the benchmark data set (reference and test) of BioNLP-ST 2016 GE task.
For detailed information, please refer to the benchmark reference data set (bionlp-st-ge-2016-reference) and benchmark test data set (bionlp-st-ge-2016-test). | 853 | 2024-06-17 | Released | |
pubmed-sentences-benchmark | | A benchmark data for text segmentation into sentences.
The source of annotation is the GENIA treebank v1.0.
Following is the process taken.
began with the GENIA treebank v1.0.
sentence annotations were extracted and converted to PubAnnotation JSON.
uploaded. 12 abstracts met alignment failure.
among the 12 failure cases, 4 had a dot('.') character where there should be colon (':'). They were manually fixed then successfully uploaded: 7903907, 8053950, 8508358, 9415639.
among the 12 failed abstracts, 8 were "250 word truncation" cases. They were manually fixed and successfully uploaded. During the fixing, manual annotations were added for the missing pieces of text.
30 abstracts had extra text in the end, indicating copyright statement, e.g., "Copyright 1998 Academic Press." They were annotated as a sentence in GTB. However, the text did not exist anymore in PubMed. Therefore, the extra texts were removed, together with the sentence annotation to them.
| 18.4 K | 2023-11-28 | Released | |
epi-statement-test | | | 2 | 2023-11-30 | Testing | |
test_lasige | | | 494 | 2023-12-02 | Testing | |
LitCovid-PD-UBERON | | | 540 K | 2023-11-29 | | |