| Age_blah | | | 1.9 K | | slee7268 | 2023-11-29 | Beta | |
| Ab3P-abbreviations | | This corpus was developed during the creation of the Ab3P abbreviation definition identification tool. It includes 1250 manually annotated MEDLINE records. This gold standard includes 1221 abbreviation-definition pairs.
Abbreviation definition identification based on automatic precision estimates
Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur
BMC Bioinformatics20089:402
DOI: 10.1186/1471-2105-9-402 | 2.33 K | Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur | comeau | 2023-11-29 | Beta | |
| GoldHamster | | | 285 K | | zebet | 2023-11-29 | Beta | |
| Mabel_2 | | | 0 | | Grati | 2025-07-16 | Beta | |
| IU_X-Ray_Raw | | IU X-Ray data selected for the Hidden-RAD shared task.
It consists of 1,085 data considering the criteria below:
1. Text field condition: at least one of the four fields (MeSH, Problems, Findings, Impression) must contain one term from 36 selected key terms defined by Hidden-RAD.
2. Key Term Constraints: Each field must contain no more than 3 key terms; at least one key term must be present across all fields; one key term must present in either 'Findings' or 'Impression'
3. Exclusion Criteria: Reports containing the following clinically ambiguous or non-diagnostic terms must be excluded - blunt, calcifications, fractures, hazy, infiltrate, status post operation, thickening, tortuous | 0 | | kenkim2 | 2025-09-15 | Beta | |
| IMDB-NLP | | Annotations for chunking and semantic role labeling based on in-memory databases. | 0 | | | 2016-05-06 | Uploading | |
| OryzaGP1 | | A dataset for Named Entity Recognition for rice gene | 0 | Huy Do. Pierre Larmande | | 2019-01-31 | Uploading | |
| OryzaGP | | A dataset for Named Entity Recognition for rice gene | 29.1 K | Huy Do and Pierre Larmande | Yue Wang | 2023-11-24 | Uploading | |
| LitCovid-sample-Pubtator | | | 3.86 K | | Jin-Dong Kim | 2023-11-28 | Uploading | |
| Bioinformatics_fulltext | | | 0 | | Sophie Nam | 2023-11-28 | Uploading | |
| FA_Top100Plus-GeneProtein | | Top100+本来Top100に入るべきだった7レビューの計、107レビュー中101レビュー。
5414, 6076, 6930, 8403, 9643, 18544は、0denotationでドキュメント自体登録していない。
attributesの詳細はconfig参照。
ドキュメントのソースDBが@AikoHIRAKIとなっているものはTypo修正がPubAnnotationの公式FirstAuthorsドキュメントに反映された段階で置き換えます。
| 10.4 K | | yucca | 2023-11-29 | Uploading | |
| AnEM_full-texts | | 250 documents selected randomly from full-text papers
Entity types: organism subdivision, anatomical system, organ, multi-tissue structure, tissue, cell, developing anatomical structure, cellular component, organism substance, immaterial anatomical entity and pathological formation
Together with AnEM_abstracts, it is probably the largest manually annotated corpus on anatomical entities. | 687 | NaCTeM | Yue Wang | 2023-11-29 | Uploading | |
| LitCovid-sample-docs | | A comprehensive literature resource on the subject of Covid-19 is collected by NCBI:
https://www.ncbi.nlm.nih.gov/research/coronavirus/
The LitCovid project@PubAnnotation is a collection of the titles and abstracts of the LitCovid dataset, for the people who want to perform text mining analysis. Please note that if you produce some annotation to the documents in this project, and contribute the annotation back to PubAnnotation, it will become publicly available together with contribution from other people.
If you want to contribute your annotation to PubAnnotation, please refer to the documentation page:
http://www.pubannotation.org/docs/submit-annotation/
The list of the PMID is sourced from here
Below is a notice from the original LitCovid dataset:
PUBLIC DOMAIN NOTICE
National Center for Biotechnology Information
This software/database is a "United States Government Work" under the
terms of the United States Copyright Act. It was written as part of
the author's official duties as a United States Government employee and
thus cannot be copyrighted. This software/database is freely available
to the public for use. The National Library of Medicine and the U.S.
Government have not placed any restriction on its use or reproduction.
Although all reasonable efforts have been taken to ensure the accuracy
and reliability of the software and data, the NLM and the U.S.
Government do not and cannot warrant the performance or results that
may be obtained by using this software or data. The NLM and the U.S.
Government disclaim all warranties, express or implied, including
warranties of performance, merchantability or fitness for any particular
purpose.
Please cite the authors in any work or product based on this material :
Chen Q, Allot A, & Lu Z. (2020) Keep up with the latest coronavirus research, Nature 579:193
| 0 | | Jin-Dong Kim | 2023-11-29 | Uploading | |
| CoGe_Citation_Annotations | | Annotated PMC abstracts+full articles, that cite the "CoGe" papers (PMID: 18952863, 18269575).
Total Num Citations: 165
Total Num Unique Citations: 141
Total Num Abstracts: 165
Total Num Whole Articles: 165 | 0 | Heather Lent | hclent | 2023-11-29 | Uploading | |
| ENG_NER_NEL_Diana | | | 461 | | dpavot | 2023-11-29 | Uploading | |
| chemicals | | | 0 | | pruas_18 | 2023-11-29 | Uploading | |
| PT_NER_NEL_pruas | | | 334 | Pedro Ruas | pruas_18 | 2023-11-30 | Uploading | |
| tagtog | | OpenAccess annotations coming from tagtog.net | 0 | tagtog | tagtog | 2015-02-23 | Developing | |
| guideline annotations | | 5 guideline annotations with custom vocab | 0 | | Tiffany Leung | 2015-11-07 | Developing | |
| Annotation-Euglena-Enzymes | | | 0 | | Shuichi Kawashima | 2016-06-13 | Developing | |