> top > projects

Projects

NameTDescription# Ann. AuthorMaintainerUpdated_atStatus

501-520 / 592 show all
GlyCosmos15-Glycan 21.4 KJin-Dong Kim2024-09-19Developing
uniprot-human Uniprot proteins for human21.8 KJin-Dong KimJin-Dong Kim2023-11-29Testing
PennBioIE The PennBioIE corpus (0.9) covers two domains of biomedical knowledge. One is the inhibition of the cytochrome P450 family of enzymes (CYP450 or CYP for short) , and the other domain is the molecular genetics of dance (oncology or onco for short).23.8 KUPenn Biomedical Information Extraction ProjectYue Wang2023-11-26Released
Goldhamster2_Cellosaurus 27.5 Kzebet2023-11-29Developing
Glycosmos15-GlycoEpitope 27.8 KJin-Dong Kim2024-09-18Developing
ASCO_abstracts asco abstracts sample dataset28 Kalo332023-11-29Testing
OryzaGP A dataset for Named Entity Recognition for rice gene29.1 KHuy Do and Pierre LarmandeYue Wang2023-11-24Uploading
CHEMDNER-training-test The training subset of the CHEMDNER corpus29.4 KMartin Krallinger et al.Jin-Dong Kim2023-11-27Testing
GlycoBiology-NCBITAXON NCBITAXON-based annotation to GlycoBiology abstracts32.7 Kshuo502023-11-29Testing
Genomics_Informatics Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.35.3 KHyun-Seok Parkewha-bio2023-11-29Beta
GO-BP Annotation for biological processes as defined in the "Biological Process" subset of Gene Ontology35.4 KDBCLSJin-Dong Kim2023-11-29Developing
biosemtest test submitting Peregrine annotations35.6 KMark Thompsonmarkthompson2023-11-29Testing
FirstAuthor_s 新着論文レビューで疾患名のあるレビューにおけるUniProtIDと薬剤等化合物日化辞ID (Japanese)39.3 KAikoHIRAKI2023-11-29Developing
OryzaGP_2022 41.3 Klarmande2023-11-24
genia-medco-coref Coreference annotation made to the Genia corpus, following the MUC annotation scheme. It is a product of the collaboration between the Genia and the MedCo projects.45.9 KMedCo project & Genia projectJin-Dong Kim2023-11-24Developing
jnlpba-st-training The training data used in the task came from the GENIA version 3.02 corpus, This was formed from a controlled search on MEDLINE using the MeSH terms "human", "blood cells" and "transcription factors". From this search, 1,999 abstracts were selected and hand annotated according to a small taxonomy of 48 classes based on a chemical classification. Among the classes, 36 terminal classes were used to annotate the GENIA corpus. For the shared task only the classes protein, DNA, RNA, cell line and cell type were used. The first three incorporate several subclasses from the original taxonomy while the last two are interesting in order to make the task realistic for post-processing by a potential template filling application. The publication year of the training set ranges over 1990~1999.51.1 KGENIAYue Wang2023-11-26Released
Preeclampsia A collection of titles and abstracts of "Preeclampsia"-related papers. They were extracted from PubMed using the MeSH term "Preeclampsia" and specifying the language to be "English, on 11th September, 2017. The texts were then annotated by PubDictionaries using the dictionary "Preeclampsia".58.7 Kcallahan_tiff2023-11-29Developing
PMA_MER PMAs annotated using MERpy.58.9 KStefano Rensitherightstef2023-11-29Developing
FSU-PRGE A new broad-coverage corpus composed of 3,306 MEDLINE abstracts dealing with gene and protein mentions. The annotation process was semi-automatic. Publication: http://aclweb.org/anthology/W/W10/W10-1838.pdf59.5 KCALBC ProjectYue Wang2023-11-26Released
craft-ca-core-dev Development data for CRAFT CA shared task, core concepts only. This project contains the development (training) annotations for the Concept Annotation task of the CRAFT Shared Task 2019. This particular set of concept annotations is the "core" set. See the task description for details, but this set contains only annotations to concepts that appear in the original 10 Open Biomedical Ontologies used for annotation. (That is to say, it does not contain any annotations to extension classes).59.8 KUniversity of Colorado Anschutz Medical Campuscraft-st2023-11-29Released
NameT# Ann. AuthorMaintainerUpdated_atStatus

501-520 / 592 show all
GlyCosmos15-Glycan 21.4 KJin-Dong Kim2024-09-19Developing
uniprot-human 21.8 KJin-Dong KimJin-Dong Kim2023-11-29Testing
PennBioIE 23.8 KUPenn Biomedical Information Extraction ProjectYue Wang2023-11-26Released
Goldhamster2_Cellosaurus 27.5 Kzebet2023-11-29Developing
Glycosmos15-GlycoEpitope 27.8 KJin-Dong Kim2024-09-18Developing
ASCO_abstracts 28 Kalo332023-11-29Testing
OryzaGP 29.1 KHuy Do and Pierre LarmandeYue Wang2023-11-24Uploading
CHEMDNER-training-test 29.4 KMartin Krallinger et al.Jin-Dong Kim2023-11-27Testing
GlycoBiology-NCBITAXON 32.7 Kshuo502023-11-29Testing
Genomics_Informatics 35.3 KHyun-Seok Parkewha-bio2023-11-29Beta
GO-BP 35.4 KDBCLSJin-Dong Kim2023-11-29Developing
biosemtest 35.6 KMark Thompsonmarkthompson2023-11-29Testing
FirstAuthor_s 39.3 KAikoHIRAKI2023-11-29Developing
OryzaGP_2022 41.3 Klarmande2023-11-24
genia-medco-coref 45.9 KMedCo project & Genia projectJin-Dong Kim2023-11-24Developing
jnlpba-st-training 51.1 KGENIAYue Wang2023-11-26Released
Preeclampsia 58.7 Kcallahan_tiff2023-11-29Developing
PMA_MER 58.9 KStefano Rensitherightstef2023-11-29Developing
FSU-PRGE 59.5 KCALBC ProjectYue Wang2023-11-26Released
craft-ca-core-dev 59.8 KUniversity of Colorado Anschutz Medical Campuscraft-st2023-11-29Released