PMC:7371427 / 56324-60550 JSONTXT 6 Projects

Annnotations TAB TSV DIC JSON TextAE

Id Subject Object Predicate Lexical cue
T309 0-74 Sentence denotes Unstructured biomedical knowledge synthesis and triangulation capabilities
T310 75-258 Sentence denotes In order to capture biomedical literature-based associations, the nferX platform defines two scores: a ‘local score’ and a ‘global score’, as described previously (Park et al., 2020).
T311 259-595 Sentence denotes Briefly, the local score is obtained from applying a traditional natural language processing technique which captures the strength of association between two concepts in a selected corpus of biomedical literature based on the frequency of their co-occurrence normalized by the frequency of each individual concept throughout the corpus.
T312 596-786 Sentence denotes A higher local score between Concept X and Concept Y indicates that these concepts are frequently mentioned in close proximity to each other more frequently than would be expected by chance.
T313 787-934 Sentence denotes The global score, on the other hand, is based on the neural network renaissance that has recently taken place in Natural Language Processing (NLP).
T314 935-1065 Sentence denotes To compute global scores, all tokens (e.g. words and phrases) are projected in a high-dimensional vector space of word embeddings.
T315 1066-1165 Sentence denotes These vectors serve to represent the ‘neighborhood’ of concepts which occur around a given concept.
T316 1166-1391 Sentence denotes The cosine similarity between any two vectors measures the similarity of these neighborhoods and is the basis for our global score metric, where concepts which are more similar in this vector space have a higher global score.
T317 1392-1632 Sentence denotes While the global scores in this work are computed in the embedding space of word2vec model, it can also be computed in the embedding space of any deep learning model including recent transformer-based models like BERT (Devlin et al., 2019).
T318 1633-1794 Sentence denotes These may have complementary benefits to word2vec embeddings since the embeddings are context sensitive having different vectors for different sentence contexts.
T319 1795-2033 Sentence denotes However, despite the context sensitive nature of BERT embeddings a global score computation for a phrase may still be of value given the score is computed across sentence embeddings capturing the context sensitive nature of those phrases.
T320 2034-2260 Sentence denotes From a visualization perspective, the local score and global score (‘Signals’) are represented in the platform using bubbles where bubble size corresponds to the local score and color intensity corresponds to the global score.
T321 2261-2386 Sentence denotes This allows users to rapidly determine the strength of association between any two concepts throughout biomedical literature.
T322 2387-2545 Sentence denotes We consider concepts which show both high local and global scores to be ‘concordant’ and have found that these typically recapitulate well-known associations.
T323 2546-2699 Sentence denotes One key aspect of the nferX platform is that it allows the user to query associated concepts for a virtually unbounded number of possible query concepts.
T324 2700-2742 Sentence denotes This is achieved by means of two features:
T325 2743-2995 Sentence denotes Firstly, the nferX platform allows the user to compose queries using the logical AND, OR and NOT operators to logically combine any number of biomedical concepts in a query, each combination amounting to a gross or nuanced composite biomedical concept.
T326 2996-3498 Sentence denotes Secondly, since logical combinations yield a virtually unbounded number of biomedical concepts that can be queries, the nferX platform implements a completely dynamic method of computing local scores on the fly by using novel high performance parallel and distributed algorithms that, in real time, scan hundreds of millions of documents to quickly locate user query related text fragments and count co-occurring biomedical concepts for computing strength of association scores and their significances.
T327 3499-3847 Sentence denotes The platform further leverages statistical inference to calculate ‘enrichments’ based on structured data, thus enabling real-time triangulation of signals from the unstructured biomedical knowledge graph various other structured databases (e.g. curated ontologies, RNA-sequencing datasets, human genetic associations, protein-protein interactions).
T328 3848-4019 Sentence denotes This facilitates unbiased hypothesis-free learning and faster pattern recognition, and it allows users to more holistically determine the veracity of concept associations.
T329 4020-4226 Sentence denotes Finally, the platform allows the user to identify and further examine the documents and textual fragments from which the knowledge synthesis signals are derived using the Documents and Signals applications.