> top > projects

Projects

NameTDescription# Ann.AuthorMaintainerUpdated_atStatus

1-20 / 541 show all
LitCovid-OGER Using OGER (http://www.ontogene.org/resources/oger) to detect entities from 10 different vocabularies9.31 KFabio RinaldiNico Colic2022-09-01Released
CORD-19-PD-MONDO PubDictionaries annotation for MONDO terms - updated at 2020-04-30 It is disease term annotation based on MONDO. Version 2020-04-20. The terms in MONDO are loaded in PubDictionaries, with which the annotations in this project are produced. The parameter configuration used for this project is here. Note that it is an automatically generated dictionary-based annotation. It will be updated periodically, as the documents are increased, and the dictionary is improved.6.32 MJin-Dong Kim2022-06-14Released
Inflammaging Inflammation axis23.4 Malo332022-06-08Released
geneset_names 0alo332022-04-26Released
TEST-DiseaseOrPhenotypicFeature Annotated by Mesh_All_FN795Eisuke Dohi2021-12-23Released
Zoonoses_partialAnnotation This is a part of Zoonoses project used by PanZoora. But Zoonoses project provides whole manual annotated data but this is partial ones.266AikoHIRAKI2021-12-01Released
LitCovid-OGER-BB Using OGER (www.ontogene.com) and Biobert to obtain annotations for 10 different vocabularies.308 KFabio RinaldiNico Colic2021-10-18Released
LitCovid-docs-s 0Jin-Dong Kim2021-10-18Released
bionlp-st-ge-2016-reference It is the benchmark reference data set of the BioNLP-ST 2016 GE task. It includes Genia-style event annotations to 20 full paper articles which are about NFκB proteins. The task is to develop an automatic annotation system which can produce annotation similar to the annotation in this data set as much as possible. For evaluation of the performance of a participating system, the system needs to produce annotations to the documents in the benchmark test data set (bionlp-st-ge-2016-test). GE 2016 benchmark data set is provided as multi-layer annotations which include: bionlp-st-ge-2016-reference: benchmark reference data set (this project) bionlp-st-ge-2016-test: benchmark test data set (annotations are blined) bionlp-st-ge-2016-test-proteins: protein annotation to the benchmark test data set Following is supporting resources: bionlp-st-ge-2016-coref: coreference annotation bionlp-st-ge-2016-uniprot: Protein annotation with UniProt IDs. pmc-enju-pas: dependency parsing result produced by Enju UBERON-AE: annotation for anatomical entities as defined in UBERON ICD10: annotation for disease names as defined in ICD10 GO-BP: annotation for biological process names as defined in GO GO-CC: annotation for cellular component names as defined in GO A SPARQL-driven search interface is provided at http://bionlp.dbcls.jp/sparql.14.4 KDBCLSJin-Dong Kim2021-07-28Released
2015-BEL-Sample-2 The 295 BEL statements for sample set used for the 2015 BioCreative challenge.11.4 KFabio RinaldiNico Colic2021-03-11Released
DisGeNET5_gene_disease The file contains gene-disease associations obtained by text mining MEDLINE abstracts using the BeFree system including the gene and disease off sets.2.04 MIBI GroupYue Wang2021-03-11Released
spacy-test Random set of articles used for testing in the development of the RESTful spaCy parsing web service. Since development is now finished, they are released for the community to use.131 KNico ColicNico Colic2021-03-10Released
PubMed_ArguminSci Predictions for PubMed automatically extracted with the ArguminSci tool (https://github.com/anlausch/ArguminSci).777 Kzebet2021-03-10Released
NCBIDiseaseCorpus The NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community.6.85 KRezarta Islamaj Doğan,Robert Leaman,Zhiyong LuChih-Hsuan Wei2021-03-10Released
GENIAcorpus multi_cell (1,782) mono_cell (222) virus (2,136) protein_family_or_group (8,002) protein_complex (2,394) protein_molecule (21,290) protein_subunit (942) protein_substructure (129) protein_domain_or_region (1,044) protein_other (97) peptide (521) amino_acid_monomer (784) DNA_family_or_group (332) DNA_molecule (664) DNA_substructure (2) DNA_domain_or_region (39) DNA_other (16) RNA_family_or_group (1,545) RNA_molecule (554) RNA_substructure (106) RNA_domain_or_region (8,237) RNA_other (48) polynucleotide (259) nucleotide (243) lipid (2,375) carbohydrate (99) other_organic_compound (4,113) body_part (461) tissue (706) cell_type (7,473) cell_component (679) cell_line (4,129) other_artificial_source (211) inorganic (258) atom (342) other (21,056) 78.9 KGENIA ProjectYue Wang2021-03-10Released
pubmed-sentences-benchmark A benchmark data for text segmentation into sentences. The source of annotation is the GENIA treebank v1.0. Following is the process taken. began with the GENIA treebank v1.0. sentence annotations were extracted and converted to PubAnnotation JSON. uploaded. 12 abstracts met alignment failure. among the 12 failure cases, 4 had a dot('.') character where there should be colon (':'). They were manually fixed then successfully uploaded: 7903907, 8053950, 8508358, 9415639. among the 12 failed abstracts, 8 were "250 word truncation" cases. They were manually fixed and successfully uploaded. During the fixing, manual annotations were added for the missing pieces of text. 30 abstracts had extra text in the end, indicating copyright statement, e.g., "Copyright 1998 Academic Press." They were annotated as a sentence in GTB. However, the text did not exist anymore in PubMed. Therefore, the extra texts were removed, together with the sentence annotation to them. 18.4 KGENIA projectJin-Dong Kim2021-03-10Released
jnlpba-st-training The training data used in the task came from the GENIA version 3.02 corpus, This was formed from a controlled search on MEDLINE using the MeSH terms "human", "blood cells" and "transcription factors". From this search, 1,999 abstracts were selected and hand annotated according to a small taxonomy of 48 classes based on a chemical classification. Among the classes, 36 terminal classes were used to annotate the GENIA corpus. For the shared task only the classes protein, DNA, RNA, cell line and cell type were used. The first three incorporate several subclasses from the original taxonomy while the last two are interesting in order to make the task realistic for post-processing by a potential template filling application. The publication year of the training set ranges over 1990~1999.51.1 KGENIAYue Wang2021-03-10Released
PubMed_Structured_Abstracts Sections (zones) as retrieved from PubMed.131 Kzebet2021-03-10Released
PennBioIE The PennBioIE corpus (0.9) covers two domains of biomedical knowledge. One is the inhibition of the cytochrome P450 family of enzymes (CYP450 or CYP for short) , and the other domain is the molecular genetics of dance (oncology or onco for short).23.8 KUPenn Biomedical Information Extraction ProjectYue Wang2021-03-10Released
AnEM_abstracts 250 documents selected randomly from citation abstracts Entity types: organism subdivision, anatomical system, organ, multi-tissue structure, tissue, cell, developing anatomical structure, cellular component, organism substance, immaterial anatomical entity and pathological formation Together with AnEM_full-texts, it is probably the largest manually annotated corpus on anatomical entities.1.91 KNaCTeMYue Wang2021-03-10Released
NameT# Ann.AuthorMaintainerUpdated_atStatus

1-20 / 541 show all
LitCovid-OGER 9.31 KFabio RinaldiNico Colic2022-09-01Released
CORD-19-PD-MONDO 6.32 MJin-Dong Kim2022-06-14Released
Inflammaging 23.4 Malo332022-06-08Released
geneset_names 0alo332022-04-26Released
TEST-DiseaseOrPhenotypicFeature 795Eisuke Dohi2021-12-23Released
Zoonoses_partialAnnotation 266AikoHIRAKI2021-12-01Released
LitCovid-OGER-BB 308 KFabio RinaldiNico Colic2021-10-18Released
LitCovid-docs-s 0Jin-Dong Kim2021-10-18Released
bionlp-st-ge-2016-reference 14.4 KDBCLSJin-Dong Kim2021-07-28Released
2015-BEL-Sample-2 11.4 KFabio RinaldiNico Colic2021-03-11Released
DisGeNET5_gene_disease 2.04 MIBI GroupYue Wang2021-03-11Released
spacy-test 131 KNico ColicNico Colic2021-03-10Released
PubMed_ArguminSci 777 Kzebet2021-03-10Released
NCBIDiseaseCorpus 6.85 KRezarta Islamaj Doğan,Robert Leaman,Zhiyong LuChih-Hsuan Wei2021-03-10Released
GENIAcorpus 78.9 KGENIA ProjectYue Wang2021-03-10Released
pubmed-sentences-benchmark 18.4 KGENIA projectJin-Dong Kim2021-03-10Released
jnlpba-st-training 51.1 KGENIAYue Wang2021-03-10Released
PubMed_Structured_Abstracts 131 Kzebet2021-03-10Released
PennBioIE 23.8 KUPenn Biomedical Information Extraction ProjectYue Wang2021-03-10Released
AnEM_abstracts 1.91 KNaCTeMYue Wang2021-03-10Released