> ホーム > プロジェクト

プロジェクト

NameTDescription# Ann.AuthorMaintainerUpdated_atStatus

61-80 / 381 すべて表示
bionlp-st-id-2011-training The training dataset from the infectious diseases (ID) task in the BioNLP Shared Task 2011. Entity types: - Genes and gene products: gene, RNA, and protein name mentions. - Two-component systems: mentions of the names of two-component regulatory systems, frequently embedding the names of the two Proteins forming the system.- Chemicals: mentions of chemical compounds such as "NaCL".- Organisms: mentions of organism names or organism specification through specific properties (e.g. "graRS mutant").- Regulons/Operons: mentions of names of specific regulons and operons.5.61 KUniversity of Tokyo Tsujii Laboratory, NaCTeM and Biocomplexity Institute of Virginia TechYue Wang2020-09-17公開中
bionlp-st-ge-2016-coref Coreference annotation to the benchmark data set (reference and test) of BioNLP-ST 2016 GE task. For detailed information, please refer to the benchmark reference data set (bionlp-st-ge-2016-reference) and benchmark test data set (bionlp-st-ge-2016-test).853DBCLSJin-Dong Kim2020-09-18公開中
bionlp-st-ge-2016-test It is the benchmark test data set of the BioNLP-ST 2016 GE task. It includes Genia-style event annotations to 14 full paper articles which are about NFκB proteins. For testing purpose, however, annotations are all blinded, which means users cannot see the annotations in this project. Instead, annotations in any other project can be compared to the hidden annotations in this project, then the annotations in the project will be automatically evaluated based on the comparison. A participant of GE task can get the evaluation of his/her result of automatic annotation, through following process: Create a new project. Import documents from the project, bionlp-st-2016-test-proteins to your project. Import annotations from the project, bionlp-st-2016-test-proteins to your project. At this point, you may want to compare you project to this project, the benchmark data set. It will show that protein annotations in your project is 100% correct, but other annotations, e.g., events, are 0%. Produce event annotations, using your system, upon the protein annotations. Upload your event annotations to your project. Compare your project to this project, to get evaluation. GE 2016 benchmark data set is provided as multi-layer annotations which include: bionlp-st-ge-2016-reference: benchmark reference data set bionlp-st-ge-2016-test: benchmark test data set (this project) bionlp-st-ge-2016-test-proteins: protein annotation to the benchmark test data set Following is supporting resources: bionlp-st-ge-2016-coref: coreference annotation bionlp-st-ge-2016-uniprot: Protein annotation with UniProt IDs. pmc-enju-pas: dependency parsing result produced by Enju UBERON-AE: annotation for anatomical entities as defined in UBERON ICD10: annotation for disease names as defined in ICD10 GO-BP: annotation for biological process names as defined in GO GO-CC: annotation for cellular component names as defined in GO A SPARQL-driven search interface is provided at http://bionlp.dbcls.jp/sparql.7.99 KDBCLSJin-Dong Kim2020-09-18公開中
bionlp-st-ge-2016-test-tees NER and event extraction produced by TEES (with the default GE11 model) for the 14 full papers used in the BioNLP 2016 GE task test corpus.9.17 KNico ColicNico Colic2020-09-18公開中
bionlp-st-ge-2016-spacy-parsed Dependency parses produced by spaCy parser, and part-of-speech tags produced by Stanford tagger (with the wsj-0-18-left3words-nodistsim model). The exact procedure is described here. Data set contains the 34 full paper articles used in the BioNLP 2016 GE task. 225 KNico ColicNico Colic2020-09-18公開中
bionlp-st-ge-2016-test-proteins Protein annotations to the benchmark test data set of the BioNLP-ST 2016 GE task. A participant of the GE task may import the documents and annotations of this project to his/her own project, to begin with producing event annotations. For more details, please refer to the benchmark test data set (bionlp-st-ge-2016-test). 4.34 KDBCLSJin-Dong Kim2020-09-18公開中
craft-ca-core-dev Development data for CRAFT CA shared task, core concepts only. This project contains the development (training) annotations for the Concept Annotation task of the CRAFT Shared Task 2019. This particular set of concept annotations is the "core" set. See the task description for details, but this set contains only annotations to concepts that appear in the original 10 Open Biomedical Ontologies used for annotation. (That is to say, it does not contain any annotations to extension classes).59.8 KUniversity of Colorado Anschutz Medical Campuscraft-st2020-09-21公開中
craft-ca-core-ex-dev Development data for CRAFT CA shared task, core concepts + EXTENSIONS. This project contains the development (training) annotations for the Concept Annotation task of the CRAFT Shared Task 2019. This particular set of concept annotations is the "core+extensions" set. See the task description for details, but this set contains annotations to concepts that appear in the original 10 Open Biomedical Ontologies used for annotation PLUS annotations to extension classes created using the core concepts.90.2 KUniversity of Colorado Anschutz Medical Campuscraft-st2020-09-21公開中
craft-sa-dev Development data for CRAFT SA shared task. This project contains the development (training) annotations for the Structural Annotation task of the CRAFT Shared Task 2019. This particular set contains token and sentence annotations with tokens linked via dependency relations. These dependency relations were automatically generated using the manually curated CRAFT constituency treebank files as input.490 KUniversity of Colorado Anschutz Medical Campuscraft-st2020-09-21公開中
DisGeNET Disease-Gene association annotation.3.12 MNuria Queralt Jin-Dong Kim2016-01-28ベータ
Ab3P-abbreviations This corpus was developed during the creation of the Ab3P abbreviation definition identification tool. It includes 1250 manually annotated MEDLINE records. This gold standard includes 1221 abbreviation-definition pairs. Abbreviation definition identification based on automatic precision estimates Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur BMC Bioinformatics20089:402 DOI: 10.1186/1471-2105-9-4022.34 KSunghwan Sohn, Donald C Comeau, Won Kim and W John Wilburcomeau2016-07-29ベータ
PubCasesHPO HPO annotation in PubCases3.2 MToyofumi Fujiwara2017-09-06ベータ
PubCasesORDO ORDO annotation in PubCases869 KToyofumi Fujiwara2017-09-14ベータ
QFMC_MEDLINE Quaero French Medical Corpus: Annotation of MEDLINE titles5.97 KAurélie NévéolPierre Zweigenbaum2018-01-24ベータ
Genomics_Informatics Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.35.3 KHyun-Seok Parkewha-bio2018-11-27ベータ
LitCovid-PAS-Enju Predicate-argument structure annotation produced by the Enju parser.125 KJin-Dong Kim2020-03-25ベータ
PubmedHPO Human phenotype annotation to PubMed abstracts, based on the HPO ontology12.4 MTudor Grozatudor2020-04-01ベータ
blah6_medical_device BLAH6 hackathon project to annotate medical device indications in premarket approval statement summaries. The documents in this project serve as a corpus of premarket approval (PMA) statements that have undergone quality control. In particular, we have (1) removed non-ascii characters, (2) fixed some text segmentation errors, and (3) fixed some capitalization errors.0Stefano Rensitherightstef2020-08-04ベータ
FA_Top107-forWeb ※※※ !要データ加工! webリンク用には、この結果を加工して使っています。その他で使われる場合に、末尾記載の問題を別途解決する必要があります。 !要データ加工! ※※※ Top100+本来Top100に入るべきだった7レビューの計、107レビュー中99レビュー。 5414, 6076, 6930, 8403, 9643, 12112, 18544, 18829は、0denotationでドキュメント自体登録していません。 @AikoHIRAKIはtypoを修正したレビューフォルダ。 attributesの詳細はconfig参照。 ※※※ !注意! webリンク側のしばりで、選択文字列は複数のUniProtIDに対応していません。(例)Protein1~Protein7とある場合、 Protein1, 2, 3, 4, 5, 6, 7をさし、かつ全てにUniProtIDがあったとしても、1と7のみUniProtIDをとってきています。 "~"は、Protein2, 3, 4, 5, 6を意味していますが、positionではなく文字列で検索をかけているのと、見せ方の仕様上、これらのIDは全て未取得となっています。⇔GeneProteinでは"~"に2-6のIDsをもたせていました。 該当レビュー;14898(~=MAPK2, MAPK3, MAPK4, MAPK5, MAPK6), 10471(~=Ago2, Ago3) --------------------------------------- (例)ProteinAB...ProteinCD...ProteinB...ProteinDとある場合、 ProteinABは、ProteinAとBというLexical_Cueになっています。ProteinCDも同様に、ProteinCとD。BとDだけでは、このレビュー内ではProteinBやProteinDをさすことが分かるのですが、それ以外で使用する場合に、BとDにそれぞれ該当UniProtIDをあてるのは不適切です。 該当レビュー;11957(β4=itgb4, β1=itgb1, β5=itgb5, β3=itgb3) 他の例が出てきたら順次、ここに記載していきます。当座、これらは削除する必要があります。 attributeで削除フラグをつけるか、Jakeの機能がTextAEに実装されれば解決するか、検討して、何かしら分かるようにしておきます。 (例)ProteinA/B とある場合、 webリンクでは、"ProteinA"にUniProtID-Aを、"/B"にUniProtID-Bをつけています(リンク側のしばり)。webリンク以外で使われる場合には、別プロジェクトのFA_Top100Plus-GeneProteinで行っていたようなRelationを使って、"/B"ではなく、"ProteinB"として、UniProtID-Bと対応させる必要があります。現状のとり方ですと、要Relation箇所は救済出来ません。 Lexical cueには"/B"とありますが、Objectには"ProteinB"と残してあるので、Objectを参照して下さい。 但し、言語処理のようなpositionがご入用な場合には上では対応出来ていません。 該当レビュー;11935(/4=BMP4), 14898(/2=LATS2), 7412(/2=TSC2), 4629(/2=CtBP2) (webリンクでは、レビュー毎に完結しているので、"/B"がそのレビューで他の意味をなしていなければ対応出来るのと、文字列合致でリンクを貼っているためです。) !注意! ※※※ RelationのmergedはTextAEの既存機能で既に出来ます。10.3 KAikoHIRAKI2020-09-01ベータ
Age_blah 1.45 Kslee72682020-09-17ベータ
NameT# Ann.AuthorMaintainerUpdated_atStatus

61-80 / 381 すべて表示
bionlp-st-id-2011-training 5.61 KUniversity of Tokyo Tsujii Laboratory, NaCTeM and Biocomplexity Institute of Virginia TechYue Wang2020-09-17公開中
bionlp-st-ge-2016-coref 853DBCLSJin-Dong Kim2020-09-18公開中
bionlp-st-ge-2016-test 7.99 KDBCLSJin-Dong Kim2020-09-18公開中
bionlp-st-ge-2016-test-tees 9.17 KNico ColicNico Colic2020-09-18公開中
bionlp-st-ge-2016-spacy-parsed 225 KNico ColicNico Colic2020-09-18公開中
bionlp-st-ge-2016-test-proteins 4.34 KDBCLSJin-Dong Kim2020-09-18公開中
craft-ca-core-dev 59.8 KUniversity of Colorado Anschutz Medical Campuscraft-st2020-09-21公開中
craft-ca-core-ex-dev 90.2 KUniversity of Colorado Anschutz Medical Campuscraft-st2020-09-21公開中
craft-sa-dev 490 KUniversity of Colorado Anschutz Medical Campuscraft-st2020-09-21公開中
DisGeNET 3.12 MNuria Queralt Jin-Dong Kim2016-01-28ベータ
Ab3P-abbreviations 2.34 KSunghwan Sohn, Donald C Comeau, Won Kim and W John Wilburcomeau2016-07-29ベータ
PubCasesHPO 3.2 MToyofumi Fujiwara2017-09-06ベータ
PubCasesORDO 869 KToyofumi Fujiwara2017-09-14ベータ
QFMC_MEDLINE 5.97 KAurélie NévéolPierre Zweigenbaum2018-01-24ベータ
Genomics_Informatics 35.3 KHyun-Seok Parkewha-bio2018-11-27ベータ
LitCovid-PAS-Enju 125 KJin-Dong Kim2020-03-25ベータ
PubmedHPO 12.4 MTudor Grozatudor2020-04-01ベータ
blah6_medical_device 0Stefano Rensitherightstef2020-08-04ベータ
FA_Top107-forWeb 10.3 KAikoHIRAKI2020-09-01ベータ
Age_blah 1.45 Kslee72682020-09-17ベータ