Projects

Name	Description	# Ann.	Author	Maintainer	Updated_at	Status

« 1 2 3 4 5 6 7 ... 15 16 » 41-60 / 317 show all
NCBIDiseaseCorpus	The NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community.	6.85 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Chih-Hsuan Wei	2023-11-29	Released
bionlp-st-ge-2016-coref	Coreference annotation to the benchmark data set (reference and test) of BioNLP-ST 2016 GE task. For detailed information, please refer to the benchmark reference data set (bionlp-st-ge-2016-reference) and benchmark test data set (bionlp-st-ge-2016-test).	853	DBCLS	Jin-Dong Kim	2024-06-17	Released
consensus_PMA_Age_Indications		1.7 K		laurenc	2023-11-28	Beta
blah6_medical_device	BLAH6 hackathon project to annotate medical device indications in premarket approval statement summaries. The documents in this project serve as a corpus of premarket approval (PMA) statements that have undergone quality control. In particular, we have (1) removed non-ascii characters, (2) fixed some text segmentation errors, and (3) fixed some capitalization errors.	0	Stefano Rensi	therightstef	2023-11-29	Beta
QFMC_MEDLINE	Quaero French Medical Corpus: Annotation of MEDLINE titles	5.9 K	Aurélie Névéol	Pierre Zweigenbaum	2023-11-29	Beta
LitCovid_Glycan-Motif-Structure	PubDictionaries annotation for glycan-Motif terms.	6.51 K		ISSAKU YAMADA	2023-11-29	Beta
FA_Top107-forWeb	※※※　！要データ加工！　webリンク用には、この結果を加工して使っています。その他で使われる場合に、末尾記載の問題を別途解決する必要があります。　！要データ加工！　※※※ Top100＋本来Top100に入るべきだった７レビューの計、107レビュー中99レビュー。 5414, 6076, 6930, 8403, 9643, 12112, 18544, 18829は、0denotationでドキュメント自体登録していません。 @AikoHIRAKIはtypoを修正したレビューフォルダ。 attributesの詳細はconfig参照。 ※※※　！注意！ webリンク側のしばりで、選択文字列は複数のUniProtIDに対応していません。（例）Protein1～Protein7とある場合、 Protein1, 2, 3, 4, 5, 6, 7をさし、かつ全てにUniProtIDがあったとしても、1と7のみUniProtIDをとってきています。 "～"は、Protein2, 3, 4, 5, 6を意味していますが、positionではなく文字列で検索をかけているのと、見せ方の仕様上、これらのIDは全て未取得となっています。⇔GeneProteinでは"～"に2-6のIDsをもたせていました。該当レビュー；14898(～=MAPK2, MAPK3, MAPK4, MAPK5, MAPK6), 10471(～=Ago2, Ago3) --------------------------------------- （例）ProteinAB...ProteinCD...ProteinB...ProteinDとある場合、 ProteinABは、ProteinAとBというLexical_Cueになっています。ProteinCDも同様に、ProteinCとD。BとDだけでは、このレビュー内ではProteinBやProteinDをさすことが分かるのですが、それ以外で使用する場合に、BとDにそれぞれ該当UniProtIDをあてるのは不適切です。該当レビュー；11957(β4=itgb4, β1=itgb1, β5=itgb5, β3=itgb3) 他の例が出てきたら順次、ここに記載していきます。当座、これらは削除する必要があります。 attributeで削除フラグをつけるか、Jakeの機能がTextAEに実装されれば解決するか、検討して、何かしら分かるようにしておきます。（例）ProteinA/B　とある場合、 webリンクでは、"ProteinA"にUniProtID-Aを、"/B"にUniProtID-Bをつけています（リンク側のしばり）。webリンク以外で使われる場合には、別プロジェクトのFA_Top100Plus-GeneProteinで行っていたようなRelationを使って、"/B"ではなく、"ProteinB"として、UniProtID-Bと対応させる必要があります。現状のとり方ですと、要Relation箇所は救済出来ません。 Lexical cueには"/B"とありますが、Objectには"ProteinB"と残してあるので、Objectを参照して下さい。但し、言語処理のようなpositionがご入用な場合には上では対応出来ていません。該当レビュー；11935(/4=BMP4), 14898(/2=LATS2), 7412(/2=TSC2), 4629(/2=CtBP2) （webリンクでは、レビュー毎に完結しているので、"/B"がそのレビューで他の意味をなしていなければ対応出来るのと、文字列合致でリンクを貼っているためです。）！注意！　※※※ RelationのmergedはTextAEの既存機能で既に出来ます。	10.3 K		AikoHIRAKI	2023-11-29	Beta
Genomics_Informatics	Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.	35.3 K	Hyun-Seok Park	ewha-bio	2023-11-29	Beta
Age_blah		1.9 K		slee7268	2023-11-29	Beta
Ab3P-abbreviations	This corpus was developed during the creation of the Ab3P abbreviation definition identification tool. It includes 1250 manually annotated MEDLINE records. This gold standard includes 1221 abbreviation-definition pairs. Abbreviation definition identification based on automatic precision estimates Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur BMC Bioinformatics20089:402 DOI: 10.1186/1471-2105-9-402	2.33 K	Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur	comeau	2023-11-29	Beta
GoldHamster		285 K		zebet	2023-11-29	Beta
OryzaGP1	A dataset for Named Entity Recognition for rice gene	0	Huy Do. Pierre Larmande		2019-01-31	Uploading
Bioinformatics_fulltext		0		Sophie Nam	2023-11-28	Uploading
FA_Top100Plus-GeneProtein	Top100＋本来Top100に入るべきだった７レビューの計、107レビュー中101レビュー。 5414, 6076, 6930, 8403, 9643, 18544は、0denotationでドキュメント自体登録していない。 attributesの詳細はconfig参照。ドキュメントのソースDBが@AikoHIRAKIとなっているものはTypo修正がPubAnnotationの公式FirstAuthorsドキュメントに反映された段階で置き換えます。	10.4 K		yucca	2023-11-29	Uploading
AnEM_full-texts	250 documents selected randomly from full-text papers Entity types: organism subdivision, anatomical system, organ, multi-tissue structure, tissue, cell, developing anatomical structure, cellular component, organism substance, immaterial anatomical entity and pathological formation Together with AnEM_abstracts, it is probably the largest manually annotated corpus on anatomical entities.	687	NaCTeM	Yue Wang	2023-11-29	Uploading
LitCovid-sample-docs	A comprehensive literature resource on the subject of Covid-19 is collected by NCBI: https://www.ncbi.nlm.nih.gov/research/coronavirus/ The LitCovid project@PubAnnotation is a collection of the titles and abstracts of the LitCovid dataset, for the people who want to perform text mining analysis. Please note that if you produce some annotation to the documents in this project, and contribute the annotation back to PubAnnotation, it will become publicly available together with contribution from other people. If you want to contribute your annotation to PubAnnotation, please refer to the documentation page: http://www.pubannotation.org/docs/submit-annotation/ The list of the PMID is sourced from here Below is a notice from the original LitCovid dataset: PUBLIC DOMAIN NOTICE National Center for Biotechnology Information This software/database is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the author's official duties as a United States Government employee and thus cannot be copyrighted. This software/database is freely available to the public for use. The National Library of Medicine and the U.S. Government have not placed any restriction on its use or reproduction. Although all reasonable efforts have been taken to ensure the accuracy and reliability of the software and data, the NLM and the U.S. Government do not and cannot warrant the performance or results that may be obtained by using this software or data. The NLM and the U.S. Government disclaim all warranties, express or implied, including warranties of performance, merchantability or fitness for any particular purpose. Please cite the authors in any work or product based on this material : Chen Q, Allot A, & Lu Z. (2020) Keep up with the latest coronavirus research, Nature 579:193	0		Jin-Dong Kim	2023-11-29	Uploading
CoGe_Citation_Annotations	Annotated PMC abstracts+full articles, that cite the "CoGe" papers (PMID: 18952863, 18269575). Total Num Citations: 165 Total Num Unique Citations: 141 Total Num Abstracts: 165 Total Num Whole Articles: 165	0	Heather Lent	hclent	2023-11-29	Uploading
ENG_NER_NEL_Diana		461		dpavot	2023-11-29	Uploading
PT_NER_NEL_pruas		334	Pedro Ruas	pruas_18	2023-11-30	Uploading
tagtog	OpenAccess annotations coming from tagtog.net	0	tagtog	tagtog	2015-02-23	Developing

Name	# Ann.	Author	Maintainer	Updated_at	Status

« 1 2 3 4 5 6 7 ... 15 16 » 41-60 / 317 show all
NCBIDiseaseCorpus	6.85 K	Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu	Chih-Hsuan Wei	2023-11-29	Released
bionlp-st-ge-2016-coref	853	DBCLS	Jin-Dong Kim	2024-06-17	Released
consensus_PMA_Age_Indications	1.7 K		laurenc	2023-11-28	Beta
blah6_medical_device	0	Stefano Rensi	therightstef	2023-11-29	Beta
QFMC_MEDLINE	5.9 K	Aurélie Névéol	Pierre Zweigenbaum	2023-11-29	Beta
LitCovid_Glycan-Motif-Structure	6.51 K		ISSAKU YAMADA	2023-11-29	Beta
FA_Top107-forWeb	10.3 K		AikoHIRAKI	2023-11-29	Beta
Genomics_Informatics	35.3 K	Hyun-Seok Park	ewha-bio	2023-11-29	Beta
Age_blah	1.9 K		slee7268	2023-11-29	Beta
Ab3P-abbreviations	2.33 K	Sunghwan Sohn, Donald C Comeau, Won Kim and W John Wilbur	comeau	2023-11-29	Beta
GoldHamster	285 K		zebet	2023-11-29	Beta
OryzaGP1	0	Huy Do. Pierre Larmande		2019-01-31	Uploading
Bioinformatics_fulltext	0		Sophie Nam	2023-11-28	Uploading
FA_Top100Plus-GeneProtein	10.4 K		yucca	2023-11-29	Uploading
AnEM_full-texts	687	NaCTeM	Yue Wang	2023-11-29	Uploading
LitCovid-sample-docs	0		Jin-Dong Kim	2023-11-29	Uploading
CoGe_Citation_Annotations	0	Heather Lent	hclent	2023-11-29	Uploading
ENG_NER_NEL_Diana	461		dpavot	2023-11-29	Uploading
PT_NER_NEL_pruas	334	Pedro Ruas	pruas_18	2023-11-30	Uploading
tagtog	0	tagtog	tagtog	2015-02-23	Developing