> top > projects

Projects

NameTDescription# Ann.AuthorMaintainerUpdated_atStatus

21-40 / 160 show all
SPECIES800SPECIES 800 (S800): an abstract-based manually annotated corpus. S800 comprises 800 PubMed abstracts in which organism mentions were identified and mapped to the corresponding NCBI Taxonomy identifiers. Described in: The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. Pafilis E, Frankild SP, Fanini L, Faulwetter S, Pavloudi C, et al. (2013). PLoS ONE, 2013, 8(6): e65390. doi:10.1371/journal.pone.00653903.71 KEvangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, Lars Juhl Jensenevangelos2015-11-20Released
PennBioIEThe PennBioIE corpus (0.9) covers two domains of biomedical knowledge. One is the inhibition of the cytochrome P450 family of enzymes (CYP450 or CYP for short) , and the other domain is the molecular genetics of dance (oncology or onco for short).23.9 KUPenn Biomedical Information Extraction ProjectYue Wang2016-12-06Released
NCBIDiseaseCorpusThe NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community.6.88 KRezarta Islamaj Doğan,Robert Leaman,Zhiyong LuChih-Hsuan Wei2015-08-06Released
bionlp-st-bb3-2016-trainingEntity (bacteria, habitats and geographical places) annotation to the training dataset of the BioNLP-ST 2016 BB task. For more information, please refer to bionlp-st-bb3-2016-development and bionlp-st-bb3-2016-test. Bacteria Bacteria entities are annotated as contiguous spans of text that contains a full unambiguous prokaryote taxon name, the type label is Bacteria. The Bacteria type is a taxon, at any taxonomic level from phylum (Eubacteria) to strain. The category that the text entities have to be assigned to is the most specific and unique category of the NCBI taxonomy resource. In case a given strain, or a group of strains is not referenced by NCBI, it is assigned with the closest taxid in the taxonomy. Habitat Habitat entities are annotated as spans of text that contains a complete mention of a potential habitat for bacteria, the type label is Habitat. Habitat entities are assigned one or several concepts from the habitat subpart of the OntoBiotope ontology. The assigned concepts are as specific as possible. OntoBiotope defines most relevant microorganism habitats from all areas considered by microbial ecology (hosts, natural environment, anthropized environments, food, medical, etc.). Habitat entities are rarely referential entities, they are usually noun phrases including properties and modifiers. There are rare cases of habitats referred with adjectives or verbs. The spans are generally contiguous but some of them are discontinuous in order to cope with conjunctions. Geographical Geographical entities are geographical and organization places denoted by official names.1.29 KINRAYue Wang2017-05-22Released
CellFinderCellFinder corpus4.75 KMariana Neves, Alexander Damaschun, Andreas Kurtz, Ulf LeserMariana Neves2015-11-25Released
bionlp-st-ge-2016-spacy-parsedDependency parses produced by spaCy parser, and part-of-speech tags produced by Stanford tagger (with the wsj-0-18-left3words-nodistsim model). The exact procedure is described here. Data set contains the 34 full paper articles used in the BioNLP 2016 GE task. 226 KNico ColicNico Colic2016-05-25Released
BioLarkPubmedHPO228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. For more info, please see Groza et al. "Automatic concept recognition using the human phenotype ontology reference and test suite corpora", 2015.7.24 KTudor Grozasimon2017-03-28Released
PIR-corpus1The Protein Information Resource (PIR) is not biased towards any particular biomedical domain, and is expected to provide more diverse protein names in a given sample size. Annotation category: protein, compound-protein, acronym.4.44 KUniversity of Delaware and Georgetown University Medical CenterYue Wang2016-11-14Released
FSU-PRGEA new broad-coverage corpus composed of 3,306 MEDLINE abstracts dealing with gene and protein mentions. The annotation process was semi-automatic. Publication: http://aclweb.org/anthology/W/W10/W10-1838.pdf59.5 KCALBC ProjectYue Wang2017-03-08Released
DisGeNETDisease-Gene association annotation.3.12 MNuria Queralt Jin-Dong Kim2016-01-28Beta
QFMC_MEDLINEQuaero French Medical Corpus: Annotation of MEDLINE titles5.97 KAurélie NévéolPierre Zweigenbaum2016-02-03Beta
NEUROSESThis corpus is composed of PubMed articles containing cognitive enhancers and anti-depressants drug mentions. The selected sentences are automatically annotated using the NCBO Annotator with the Chemical Entities of Biological Interest (CHEBI) and Phenotypic Quality Ontology (PATO) ontologies, we also produced annotations using PhenoMiner ontology via a dictionary-based tagger.2.15 Mnestoralvaro2016-02-24Beta
bionlp-st-ge-2016-uniprotUniProt protein annotation to the benchmark data set of BioNLP-ST 2016 GE task: reference data set (bionlp-st-ge-2016-reference) and test data set (bionlp-st-ge-2016-test). The annotations are produced based on a dictionary which is semi-automatically compiled for the 34 full paper articles included in the benchmark data set (20 in the reference data set + 14 in the test data set). For detailed information about BioNLP-ST GE 2016 task data sets, please refer to the benchmark reference data set (bionlp-st-ge-2016-reference) and benchmark test data set (bionlp-st-ge-2016-test). 16.2 KDBCLSJin-Dong Kim2016-05-22Beta
Ab3P-abbreviationsThis corpus was developed during the creation of the Ab3P abbreviation definition identification tool. It includes 1250 manually annotated MEDLINE records. This gold standard includes 1221 abbreviation-definition pairs. Abbreviation definition identification based on automatic precision estimates Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur BMC Bioinformatics20089:402 DOI: 10.1186/1471-2105-9-4022.34 KSunghwan Sohn, Donald C Comeau, Won Kim and W John Wilburcomeau2016-07-29Beta
CRAFT-treebankPenn Treebank markup for each sentence of the Colorado Richly Annotated Full Text Corpus (CRAFT).844 KUColoradoJin-Dong Kim2015-11-19Beta
PubmedHPOHuman phenotype annotation to PubMed abstracts, based on the HPO ontology12.4 MTudor Grozatudor2016-12-06Beta
craftTest bed for PubAnnotation query development.53 KKevin Bretonnel CohenKevinBretonnelCohen2015-10-13Beta
IMDB-NLPAnnotations for chunking and semantic role labeling based on in-memory databases.02016-05-06Uploading
CoGe_Citation_AnnotationsAnnotated PMC abstracts+full articles, that cite the "CoGe" papers (PMID: 18952863, 18269575). Total Num Citations: 165 Total Num Unique Citations: 141 Total Num Abstracts: 165 Total Num Whole Articles: 165 0Heather Lenthclent2016-10-11Uploading
AnEM_full-texts250 documents selected randomly from full-text papers Entity types: organism subdivision, anatomical system, organ, multi-tissue structure, tissue, cell, developing anatomical structure, cellular component, organism substance, immaterial anatomical entity and pathological formation Together with AnEM_abstracts, it is probably the largest manually annotated corpus on anatomical entities.689NaCTeMYue Wang2016-07-27Uploading
NameT# Ann.AuthorMaintainerUpdated_atStatus

21-40 / 160 show all
SPECIES8003.71 KEvangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, Lars Juhl Jensenevangelos2015-11-20Released
PennBioIE23.9 KUPenn Biomedical Information Extraction ProjectYue Wang2016-12-06Released
NCBIDiseaseCorpus6.88 KRezarta Islamaj Doğan,Robert Leaman,Zhiyong LuChih-Hsuan Wei2015-08-06Released
bionlp-st-bb3-2016-training1.29 KINRAYue Wang2017-05-22Released
CellFinder4.75 KMariana Neves, Alexander Damaschun, Andreas Kurtz, Ulf LeserMariana Neves2015-11-25Released
bionlp-st-ge-2016-spacy-parsed226 KNico ColicNico Colic2016-05-25Released
BioLarkPubmedHPO7.24 KTudor Grozasimon2017-03-28Released
PIR-corpus14.44 KUniversity of Delaware and Georgetown University Medical CenterYue Wang2016-11-14Released
FSU-PRGE59.5 KCALBC ProjectYue Wang2017-03-08Released
DisGeNET3.12 MNuria Queralt Jin-Dong Kim2016-01-28Beta
QFMC_MEDLINE5.97 KAurélie NévéolPierre Zweigenbaum2016-02-03Beta
NEUROSES2.15 Mnestoralvaro2016-02-24Beta
bionlp-st-ge-2016-uniprot16.2 KDBCLSJin-Dong Kim2016-05-22Beta
Ab3P-abbreviations2.34 KSunghwan Sohn, Donald C Comeau, Won Kim and W John Wilburcomeau2016-07-29Beta
CRAFT-treebank844 KUColoradoJin-Dong Kim2015-11-19Beta
PubmedHPO12.4 MTudor Grozatudor2016-12-06Beta
craft53 KKevin Bretonnel CohenKevinBretonnelCohen2015-10-13Beta
IMDB-NLP02016-05-06Uploading
CoGe_Citation_Annotations0Heather Lenthclent2016-10-11Uploading
AnEM_full-texts689NaCTeMYue Wang2016-07-27Uploading