Projects

Name	Description	# Ann.	Author	Maintainer	Updated_at	Status

1 2 3 4 5 ... 33 34 » 1-20 / 663 show all
GlyCosmos15-docs	Analytical_Chemistry Biochim_Biophys_Acta Carbohydrate_Research Cell Glycobiology Glycoconjugate_Journal J_Am_Chem_Soc Journal_of_Biological_Chemistry Journal_of_Proteome_Research Journal_of_Proteomics Molecular_and_Cellular_Proteomics Nature_Biotechnology Nature_Communications Nature_Methods Scientific_Reports	0		Jin-Dong Kim	2025-04-20	Released
NCBI-Disease-Corpus-All	All data (train, develop, and test) combined together	6.89 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Kenkim	2025-02-21	Released
NCBI-Disease-Train	The NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community.	5.15 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Kenkim	2025-01-17	Released
NCBI-Disease-Test	The NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community.	960	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Kenkim	2025-01-17	Released
NCBI-Disease-Develop	The NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community.	787	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Kenkim	2025-01-17	Released
bionlp-st-ge-2016-coref	Coreference annotation to the benchmark data set (reference and test) of BioNLP-ST 2016 GE task. For detailed information, please refer to the benchmark reference data set (bionlp-st-ge-2016-reference) and benchmark test data set (bionlp-st-ge-2016-test).	853	DBCLS	Jin-Dong Kim	2024-06-17	Released
NCBIDiseaseCorpus	The NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community.	6.85 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Chih-Hsuan Wei	2023-11-29	Released
LitCovid-OGER	Using OGER (http://www.ontogene.org/resources/oger) to detect entities from 10 different vocabularies	9.31 K	Fabio Rinaldi	Nico Colic	2023-11-29	Released
RELISH-DB	Abstracts contained in the data of the RELISH-DB (https://relishdb.ict.griffith.edu.au) made available for download here. Data was downloaded from here: https://figshare.com/projects/RELISH-DB/60095 Related publication: https://academic.oup.com/database/article/doi/10.1093/database/baz085/5608006#200722023	0			2023-11-29	Released
craft-ca-core-dev	Development data for CRAFT CA shared task, core concepts only. This project contains the development (training) annotations for the Concept Annotation task of the CRAFT Shared Task 2019. This particular set of concept annotations is the "core" set. See the task description for details, but this set contains only annotations to concepts that appear in the original 10 Open Biomedical Ontologies used for annotation. (That is to say, it does not contain any annotations to extension classes).	59.8 K	University of Colorado Anschutz Medical Campus	craft-st	2023-11-29	Released
bionlp-st-ge-2016-test-tees	NER and event extraction produced by TEES (with the default GE11 model) for the 14 full papers used in the BioNLP 2016 GE task test corpus.	9.17 K	Nico Colic	Nico Colic	2023-11-29	Released
bionlp-st-ge-2016-test	It is the benchmark test data set of the BioNLP-ST 2016 GE task. It includes Genia-style event annotations to 14 full paper articles which are about NFκB proteins. For testing purpose, however, annotations are all blinded, which means users cannot see the annotations in this project. Instead, annotations in any other project can be compared to the hidden annotations in this project, then the annotations in the project will be automatically evaluated based on the comparison. A participant of GE task can get the evaluation of his/her result of automatic annotation, through following process: Create a new project. Import documents from the project, bionlp-st-2016-test-proteins to your project. Import annotations from the project, bionlp-st-2016-test-proteins to your project. At this point, you may want to compare you project to this project, the benchmark data set. It will show that protein annotations in your project is 100% correct, but other annotations, e.g., events, are 0%. Produce event annotations, using your system, upon the protein annotations. Upload your event annotations to your project. Compare your project to this project, to get evaluation. GE 2016 benchmark data set is provided as multi-layer annotations which include: bionlp-st-ge-2016-reference: benchmark reference data set bionlp-st-ge-2016-test: benchmark test data set (this project) bionlp-st-ge-2016-test-proteins: protein annotation to the benchmark test data set Following is supporting resources: bionlp-st-ge-2016-coref: coreference annotation bionlp-st-ge-2016-uniprot: Protein annotation with UniProt IDs. pmc-enju-pas: dependency parsing result produced by Enju UBERON-AE: annotation for anatomical entities as defined in UBERON ICD10: annotation for disease names as defined in ICD10 GO-BP: annotation for biological process names as defined in GO GO-CC: annotation for cellular component names as defined in GO A SPARQL-driven search interface is provided at http://bionlp.dbcls.jp/sparql.	7.99 K	DBCLS	Jin-Dong Kim	2023-11-29	Released
bionlp-st-ge-2016-reference-tees	NER and event extraction produced by TEES (with the default GE11 model) for the 20 full papers used in the BioNLP 2016 GE task reference corpus.	14.6 K	Nico Colic	Nico Colic	2023-11-29	Released
CORD-19_bioRxiv_medRxiv_subset	The bioRxiv/medRxiv subset of the CORD-19 dataset: pre-prints that are not peer reviewed. The documents in this project will be updated as the CORD-19 dataset grows. See the COVID DATASET LICENSE AGREEMENT.	0		Jin-Dong Kim	2023-11-29	Released
CORD-19_Non-commercial_use_subset	The Non commercial use subset of the CORD-19 dataset. The documents in this project will be updated as the CORD-19 dataset grows. See the COVID DATASET LICENSE AGREEMENT.	0		Jin-Dong Kim	2023-11-29	Released
CORD-19_Commercial_use_subset	The Commercial use subset of the CORD-19 dataset. The documents in this project will be updated as the CORD-19 dataset grows. See the COVID DATASET LICENSE AGREEMENT.	0		Jin-Dong Kim	2023-11-29	Released
CORD-19_All_docs	All the documents in the whole CORD-19 dataset. The documents in this project will be updated as the CORD-19 dataset grows. See the COVID DATASET LICENSE AGREEMENT.	0		Jin-Dong Kim	2023-11-29	Released
BioLarkPubmedHPO	228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. For more info, please see Groza et al. "Automatic concept recognition using the human phenotype ontology reference and test suite corpora", 2015.	7.16 K	Tudor Groza	simon	2023-11-29	Released
AnEM_abstracts	250 documents selected randomly from citation abstracts Entity types: organism subdivision, anatomical system, organ, multi-tissue structure, tissue, cell, developing anatomical structure, cellular component, organism substance, immaterial anatomical entity and pathological formation Together with AnEM_full-texts, it is probably the largest manually annotated corpus on anatomical entities.	1.91 K	NaCTeM	Yue Wang	2023-11-29	Released
GENIAcorpus	multi_cell (1,782) mono_cell (222) virus (2,136) protein_family_or_group (8,002) protein_complex (2,394) protein_molecule (21,290) protein_subunit (942) protein_substructure (129) protein_domain_or_region (1,044) protein_other (97) peptide (521) amino_acid_monomer (784) DNA_family_or_group (332) DNA_molecule (664) DNA_substructure (2) DNA_domain_or_region (39) DNA_other (16) RNA_family_or_group (1,545) RNA_molecule (554) RNA_substructure (106) RNA_domain_or_region (8,237) RNA_other (48) polynucleotide (259) nucleotide (243) lipid (2,375) carbohydrate (99) other_organic_compound (4,113) body_part (461) tissue (706) cell_type (7,473) cell_component (679) cell_line (4,129) other_artificial_source (211) inorganic (258) atom (342) other (21,056)	78.9 K	GENIA Project	Yue Wang	2023-11-29	Released

Name	# Ann.	Author	Maintainer	Updated_at	Status

1 2 3 4 5 ... 33 34 » 1-20 / 663 show all
GlyCosmos15-docs	0		Jin-Dong Kim	2025-04-20	Released
NCBI-Disease-Corpus-All	6.89 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Kenkim	2025-02-21	Released
NCBI-Disease-Train	5.15 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Kenkim	2025-01-17	Released
NCBI-Disease-Test	960	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Kenkim	2025-01-17	Released
NCBI-Disease-Develop	787	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Kenkim	2025-01-17	Released
bionlp-st-ge-2016-coref	853	DBCLS	Jin-Dong Kim	2024-06-17	Released
NCBIDiseaseCorpus	6.85 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Chih-Hsuan Wei	2023-11-29	Released
LitCovid-OGER	9.31 K	Fabio Rinaldi	Nico Colic	2023-11-29	Released
RELISH-DB	0			2023-11-29	Released
craft-ca-core-dev	59.8 K	University of Colorado Anschutz Medical Campus	craft-st	2023-11-29	Released
bionlp-st-ge-2016-test-tees	9.17 K	Nico Colic	Nico Colic	2023-11-29	Released
bionlp-st-ge-2016-test	7.99 K	DBCLS	Jin-Dong Kim	2023-11-29	Released
bionlp-st-ge-2016-reference-tees	14.6 K	Nico Colic	Nico Colic	2023-11-29	Released
CORD-19_bioRxiv_medRxiv_subset	0		Jin-Dong Kim	2023-11-29	Released
CORD-19_Non-commercial_use_subset	0		Jin-Dong Kim	2023-11-29	Released
CORD-19_Commercial_use_subset	0		Jin-Dong Kim	2023-11-29	Released
CORD-19_All_docs	0		Jin-Dong Kim	2023-11-29	Released
BioLarkPubmedHPO	7.16 K	Tudor Groza	simon	2023-11-29	Released
AnEM_abstracts	1.91 K	NaCTeM	Yue Wang	2023-11-29	Released
GENIAcorpus	78.9 K	GENIA Project	Yue Wang	2023-11-29	Released