Projects

Name	Description	# Ann.	Author	Maintainer	Updated_at	Status

« 1 2 3 4 5 6 ... 29 30 » 21-40 / 590 show all
Ab3P-abbreviations	This corpus was developed during the creation of the Ab3P abbreviation definition identification tool. It includes 1250 manually annotated MEDLINE records. This gold standard includes 1221 abbreviation-definition pairs. Abbreviation definition identification based on automatic precision estimates Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur BMC Bioinformatics20089:402 DOI: 10.1186/1471-2105-9-402	2.33 K	Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur	comeau	2023-11-29	Beta
NEUROSES	This corpus is composed of PubMed articles containing cognitive enhancers and anti-depressants drug mentions. The selected sentences are automatically annotated using the NCBO Annotator with the Chemical Entities of Biological Interest (CHEBI) and Phenotypic Quality Ontology (PATO) ontologies, we also produced annotations using PhenoMiner ontology via a dictionary-based tagger.	2.14 M		nestoralvaro	2023-11-24	Beta
CHEMDNER-training-test	The training subset of the CHEMDNER corpus	29.4 K	Martin Krallinger et al.	Jin-Dong Kim	2023-11-27	Testing
jnlpba-st-training	The training data used in the task came from the GENIA version 3.02 corpus, This was formed from a controlled search on MEDLINE using the MeSH terms "human", "blood cells" and "transcription factors". From this search, 1,999 abstracts were selected and hand annotated according to a small taxonomy of 48 classes based on a chemical classification. Among the classes, 36 terminal classes were used to annotate the GENIA corpus. For the shared task only the classes protein, DNA, RNA, cell line and cell type were used. The first three incorporate several subclasses from the original taxonomy while the last two are interesting in order to make the task realistic for post-processing by a potential template filling application. The publication year of the training set ranges over 1990~1999.	51.1 K	GENIA	Yue Wang	2023-11-26	Released
bionlp-st-gro-2013-training	The training data set of the BioNLP-ST 2013 GRO task, including 150 MEDLINE abstracts that are annotated with concepts and relations of the Gene Regulation Ontology (GRO; http://www.ebi.ac.uk/Rebholz-srv/GRO/GRO.html)	8.02 K	Jung-jae Kim	Jung-jae Kim	2023-11-29	Testing
bionlp-st-pc-2013-training	The training dataset from the pathway curation (PC) task in the BioNLP Shared Task 2013. The entity types defined in the PC task are simple chemical, gene or gene product, complex and cellular component.	7.86 K	NaCTeM and KISTI	Yue Wang	2023-11-27	Released
bionlp-st-id-2011-training	The training dataset from the infectious diseases (ID) task in the BioNLP Shared Task 2011. Entity types: - Genes and gene products: gene, RNA, and protein name mentions. - Two-component systems: mentions of the names of two-component regulatory systems, frequently embedding the names of the two Proteins forming the system.- Chemicals: mentions of chemical compounds such as "NaCL".- Organisms: mentions of organism names or organism specification through specific properties (e.g. "graRS mutant").- Regulons/Operons: mentions of names of specific regulons and operons.	5.61 K	University of Tokyo Tsujii Laboratory, NaCTeM and Biocomplexity Institute of Virginia Tech	Yue Wang	2023-11-28	Released
bionlp-st-epi-2011-training	The training dataset from the Epigenetics and Post-translational Modifications (EPI) task in the BioNLP Shared Task 2011. The core entities of the task are genes and gene products (RNA and proteins), identified in the data simply as "Protein" annotations.	7.59 K	GENIA	Yue Wang	2023-11-29	Released
bionlp-st-cg-2013-training	The training dataset from the cancer genetics task in the BioNLP Shared Task 2013. Composed of anatomical and molecular entities.	10.9 K	NaCTeM	Yue Wang	2023-11-28	Released
GlycoConjugate-collection	The PubMed entries (titles and abstracts) from the journal of GlycoConjugate	0		Jin-Dong Kim	2023-11-28	Developing
PIR-corpus2	The protein tag was used to tag proteins, or protein-associated or -related objects, such as domains, pathways, expression of gene. Annotation guideline: http://pir.georgetown.edu/pirwww/about/doc/manietal.pdf	5.52 K	University of Delaware and Georgetown University Medical Center	Yue Wang	2023-11-29	Released
PIR-corpus1	The Protein Information Resource (PIR) is not biased towards any particular biomedical domain, and is expected to provide more diverse protein names in a given sample size. Annotation category: protein, compound-protein, acronym.	4.44 K	University of Delaware and Georgetown University Medical Center	Yue Wang	2023-11-27	Released
PennBioIE	The PennBioIE corpus (0.9) covers two domains of biomedical knowledge. One is the inhibition of the cytochrome P450 family of enzymes (CYP450 or CYP for short) , and the other domain is the molecular genetics of dance (oncology or onco for short).	23.8 K	UPenn Biomedical Information Extraction Project	Yue Wang	2023-11-26	Released
NCBIDiseaseCorpus	The NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community.	6.85 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Chih-Hsuan Wei	2023-11-29	Released
LocText	The manually annotated corpus consists of 100 PubMed abstracts annotated for proteins, subcellular localizations, organisms and relations between them. The focus of the corpus is on annotation of proteins and their subcellular localizations.	2.29 K	Goldberg et al	Shrikant Vinchurkar	2023-11-29	Released
Epistemic_Statements	The goal of this work is to identify epistemic statements in the scientific literature. An epistemic statement is a statement of unknowns, hypotheses, speculations, uncertainties, including statements of claims, hypotheses, questions, explanations, future opportunities, surprises, issues, or concerns within a sentence. The unit of an epistemic statement is a sentence automatically parsed. The classification is binary - epistemic statement or not. We will label epistemic statements only and one can assume that if a statement is not labeled, then it is not an epistemic statement. The classifier is a CRF, trained on gold standard annotations of epistemic statements that are currently ongoing. We report an F-measure of 0.91 after 5-fold cross validation on a test set with 914 statements and an F-measure of 0.9 on a held out document with 130 statements. This project is still under development and is submitted to be used for the CovidLit project and associated Hackathon. Please contact Mayla if you have any questions.	1.42 M		mboguslav	2023-11-24	Developing
DisGeNET5_variant_disease	The file contains variant-disease associations obtained by text mining MEDLINE abstracts using the BeFree system, including the variant and disease off sets.	144 K	IBI Group	Yue Wang	2023-11-24	Released
DisGeNET5_gene_disease	The file contains gene-disease associations obtained by text mining MEDLINE abstracts using the BeFree system including the gene and disease off sets.	2.04 M	IBI Group	Yue Wang	2023-11-24	Released
bionlp-st-gro-2013-development	The development data set of the BioNLP-ST 2013 GRO task, including 50 MEDLINE abstracts that are annotated with concepts and relations of the Gene Regulation Ontology (GRO; http://www.ebi.ac.uk/Rebholz-srv/GRO/GRO.html)	2.66 K	Jung-jae Kim	Jung-jae Kim	2023-11-29	Testing
AIMed	The AIMed corpus is one of the most widely used corpora for protein-protein interaction extraction. The protein annotations are either parts of the protein interaction annotations, or are uninvolved in any protein interaction annotation. Publication: http://www.cs.utexas.edu/~ml/papers/bionlp-aimed-04.pdf	4.04 K	The University of Texas at Austin	Yue Wang	2023-11-27	Testing

Name	# Ann.	Author	Maintainer	Updated_at	Status

« 1 2 3 4 5 6 ... 29 30 » 21-40 / 590 show all
Ab3P-abbreviations	2.33 K	Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur	comeau	2023-11-29	Beta
NEUROSES	2.14 M		nestoralvaro	2023-11-24	Beta
CHEMDNER-training-test	29.4 K	Martin Krallinger et al.	Jin-Dong Kim	2023-11-27	Testing
jnlpba-st-training	51.1 K	GENIA	Yue Wang	2023-11-26	Released
bionlp-st-gro-2013-training	8.02 K	Jung-jae Kim	Jung-jae Kim	2023-11-29	Testing
bionlp-st-pc-2013-training	7.86 K	NaCTeM and KISTI	Yue Wang	2023-11-27	Released
bionlp-st-id-2011-training	5.61 K	University of Tokyo Tsujii Laboratory, NaCTeM and Biocomplexity Institute of Virginia Tech	Yue Wang	2023-11-28	Released
bionlp-st-epi-2011-training	7.59 K	GENIA	Yue Wang	2023-11-29	Released
bionlp-st-cg-2013-training	10.9 K	NaCTeM	Yue Wang	2023-11-28	Released
GlycoConjugate-collection	0		Jin-Dong Kim	2023-11-28	Developing
PIR-corpus2	5.52 K	University of Delaware and Georgetown University Medical Center	Yue Wang	2023-11-29	Released
PIR-corpus1	4.44 K	University of Delaware and Georgetown University Medical Center	Yue Wang	2023-11-27	Released
PennBioIE	23.8 K	UPenn Biomedical Information Extraction Project	Yue Wang	2023-11-26	Released
NCBIDiseaseCorpus	6.85 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Chih-Hsuan Wei	2023-11-29	Released
LocText	2.29 K	Goldberg et al	Shrikant Vinchurkar	2023-11-29	Released
Epistemic_Statements	1.42 M		mboguslav	2023-11-24	Developing
DisGeNET5_variant_disease	144 K	IBI Group	Yue Wang	2023-11-24	Released
DisGeNET5_gene_disease	2.04 M	IBI Group	Yue Wang	2023-11-24	Released
bionlp-st-gro-2013-development	2.66 K	Jung-jae Kim	Jung-jae Kim	2023-11-29	Testing
AIMed	4.04 K	The University of Texas at Austin	Yue Wang	2023-11-27	Testing