Annotation of Human Phenotype-Gene Relations - Identification of Negative, False, and Unknown Relations | | Accessible negative results are relevant for researchers and clinicians not only to limit their search space but also to prevent the costly re-exploration of the hypothesis. However, most biomedical relation extraction data sets do not seek to distinguish between a false and a negative relation. A false relation should express a context where the entities are not related. In contrast, a negative relation should express a context where there is an affirmation of no association between the two entities. Furthermore, when we are dealing with data sets created using distant supervision techniques, we also have some false negative relations that constitute undocumented/unknown relations. Unknown relations are good examples to further exploration by researchers and clinicians. We propose to improve the distinction between these two concepts, by revising the false relations of the PGR corpus with regular expressions. | 2020-02-21 | |
LASIGE: Annotating a multilingual COVID-19-related corpus for BLAH7 | | The global motivation is the creation of parallel multilingual datasets for text mining systems in COVID-19-related literature. Tracking the most recent advances in the COVID-19-related research is essential given the novelty of the disease and its impact on society. Still, the pace of publication requires automatic approaches to access and organize the knowledge that keeps being produced every day. It is necessary to develop text mining pipelines to assist in that task, which is only possible with evaluation datasets. However, there is a lack of COVID-19-related datasets, even more, if considering other languages besides English. The expected contribution of the project will be the annotation of a multilingual parallel dataset (EN-PT), providing this resource to the community to improve the text mining research on COVID-19-related literature. | 2021-02-17 | |