PubAnnotation

Id	Subject	Object	Predicate	Lexical cue
T309	0-74	Sentence	denotes	Unstructured biomedical knowledge synthesis and triangulation capabilities
T310	75-258	Sentence	denotes	In order to capture biomedical literature-based associations, the nferX platform defines two scores: a ‘local score’ and a ‘global score’, as described previously (Park et al., 2020).
T311	259-595	Sentence	denotes	Briefly, the local score is obtained from applying a traditional natural language processing technique which captures the strength of association between two concepts in a selected corpus of biomedical literature based on the frequency of their co-occurrence normalized by the frequency of each individual concept throughout the corpus.
T312	596-786	Sentence	denotes	A higher local score between Concept X and Concept Y indicates that these concepts are frequently mentioned in close proximity to each other more frequently than would be expected by chance.
T313	787-934	Sentence	denotes	The global score, on the other hand, is based on the neural network renaissance that has recently taken place in Natural Language Processing (NLP).
T314	935-1065	Sentence	denotes	To compute global scores, all tokens (e.g. words and phrases) are projected in a high-dimensional vector space of word embeddings.
T315	1066-1165	Sentence	denotes	These vectors serve to represent the ‘neighborhood’ of concepts which occur around a given concept.
T316	1166-1391	Sentence	denotes	The cosine similarity between any two vectors measures the similarity of these neighborhoods and is the basis for our global score metric, where concepts which are more similar in this vector space have a higher global score.
T317	1392-1632	Sentence	denotes	While the global scores in this work are computed in the embedding space of word2vec model, it can also be computed in the embedding space of any deep learning model including recent transformer-based models like BERT (Devlin et al., 2019).
T318	1633-1794	Sentence	denotes	These may have complementary benefits to word2vec embeddings since the embeddings are context sensitive having different vectors for different sentence contexts.
T319	1795-2033	Sentence	denotes	However, despite the context sensitive nature of BERT embeddings a global score computation for a phrase may still be of value given the score is computed across sentence embeddings capturing the context sensitive nature of those phrases.
T320	2034-2260	Sentence	denotes	From a visualization perspective, the local score and global score (‘Signals’) are represented in the platform using bubbles where bubble size corresponds to the local score and color intensity corresponds to the global score.
T321	2261-2386	Sentence	denotes	This allows users to rapidly determine the strength of association between any two concepts throughout biomedical literature.
T322	2387-2545	Sentence	denotes	We consider concepts which show both high local and global scores to be ‘concordant’ and have found that these typically recapitulate well-known associations.
T323	2546-2699	Sentence	denotes	One key aspect of the nferX platform is that it allows the user to query associated concepts for a virtually unbounded number of possible query concepts.
T324	2700-2742	Sentence	denotes	This is achieved by means of two features:
T325	2743-2995	Sentence	denotes	Firstly, the nferX platform allows the user to compose queries using the logical AND, OR and NOT operators to logically combine any number of biomedical concepts in a query, each combination amounting to a gross or nuanced composite biomedical concept.
T326	2996-3498	Sentence	denotes	Secondly, since logical combinations yield a virtually unbounded number of biomedical concepts that can be queries, the nferX platform implements a completely dynamic method of computing local scores on the fly by using novel high performance parallel and distributed algorithms that, in real time, scan hundreds of millions of documents to quickly locate user query related text fragments and count co-occurring biomedical concepts for computing strength of association scores and their significances.
T327	3499-3847	Sentence	denotes	The platform further leverages statistical inference to calculate ‘enrichments’ based on structured data, thus enabling real-time triangulation of signals from the unstructured biomedical knowledge graph various other structured databases (e.g. curated ontologies, RNA-sequencing datasets, human genetic associations, protein-protein interactions).
T328	3848-4019	Sentence	denotes	This facilitates unbiased hypothesis-free learning and faster pattern recognition, and it allows users to more holistically determine the veracity of concept associations.
T329	4020-4226	Sentence	denotes	Finally, the platform allows the user to identify and further examine the documents and textual fragments from which the knowledge synthesis signals are derived using the Documents and Signals applications.

T309

0-74

Sentence

denotes

Unstructured biomedical knowledge synthesis and triangulation capabilities

T310

75-258

Sentence

denotes

In order to capture biomedical literature-based associations, the nferX platform defines two scores: a ‘local score’ and a ‘global score’, as described previously (Park et al., 2020).

T311

259-595

Sentence

denotes

Briefly, the local score is obtained from applying a traditional natural language processing technique which captures the strength of association between two concepts in a selected corpus of biomedical literature based on the frequency of their co-occurrence normalized by the frequency of each individual concept throughout the corpus.

T312

596-786

Sentence

denotes

A higher local score between Concept X and Concept Y indicates that these concepts are frequently mentioned in close proximity to each other more frequently than would be expected by chance.

T313

787-934

Sentence

denotes

The global score, on the other hand, is based on the neural network renaissance that has recently taken place in Natural Language Processing (NLP).

T314

935-1065

Sentence

denotes

To compute global scores, all tokens (e.g. words and phrases) are projected in a high-dimensional vector space of word embeddings.

T315

1066-1165

Sentence

denotes

These vectors serve to represent the ‘neighborhood’ of concepts which occur around a given concept.

T316

1166-1391

Sentence

denotes

The cosine similarity between any two vectors measures the similarity of these neighborhoods and is the basis for our global score metric, where concepts which are more similar in this vector space have a higher global score.

T317

1392-1632

Sentence

denotes

While the global scores in this work are computed in the embedding space of word2vec model, it can also be computed in the embedding space of any deep learning model including recent transformer-based models like BERT (Devlin et al., 2019).

T318

1633-1794

Sentence

denotes

These may have complementary benefits to word2vec embeddings since the embeddings are context sensitive having different vectors for different sentence contexts.

T319

1795-2033

Sentence

denotes

However, despite the context sensitive nature of BERT embeddings a global score computation for a phrase may still be of value given the score is computed across sentence embeddings capturing the context sensitive nature of those phrases.

T320

2034-2260

Sentence

denotes

From a visualization perspective, the local score and global score (‘Signals’) are represented in the platform using bubbles where bubble size corresponds to the local score and color intensity corresponds to the global score.

T321

2261-2386

Sentence

denotes

This allows users to rapidly determine the strength of association between any two concepts throughout biomedical literature.

T322

2387-2545

Sentence

denotes

We consider concepts which show both high local and global scores to be ‘concordant’ and have found that these typically recapitulate well-known associations.

T323

2546-2699

Sentence

denotes

One key aspect of the nferX platform is that it allows the user to query associated concepts for a virtually unbounded number of possible query concepts.

T324

2700-2742

Sentence

denotes

This is achieved by means of two features:

T325

2743-2995

Sentence

denotes

Firstly, the nferX platform allows the user to compose queries using the logical AND, OR and NOT operators to logically combine any number of biomedical concepts in a query, each combination amounting to a gross or nuanced composite biomedical concept.

T326

2996-3498

Sentence

denotes

Secondly, since logical combinations yield a virtually unbounded number of biomedical concepts that can be queries, the nferX platform implements a completely dynamic method of computing local scores on the fly by using novel high performance parallel and distributed algorithms that, in real time, scan hundreds of millions of documents to quickly locate user query related text fragments and count co-occurring biomedical concepts for computing strength of association scores and their significances.

T327

3499-3847

Sentence

denotes

The platform further leverages statistical inference to calculate ‘enrichments’ based on structured data, thus enabling real-time triangulation of signals from the unstructured biomedical knowledge graph various other structured databases (e.g. curated ontologies, RNA-sequencing datasets, human genetic associations, protein-protein interactions).

T328

3848-4019

Sentence

denotes

This facilitates unbiased hypothesis-free learning and faster pattern recognition, and it allows users to more holistically determine the veracity of concept associations.

T329

4020-4226

Sentence

denotes

Finally, the platform allows the user to identify and further examine the documents and textual fragments from which the knowledge synthesis signals are derived using the Documents and Signals applications.

PMC:7371427 / 56324-60550 JSON TXT 6 Projects

Annnotations TAB TSV DIC JSON TextAE

PMC:7371427 / 56324-60550 JSONTXT 6 Projects

Annnotations TAB TSV DIC JSON TextAE

PMC:7371427 / 56324-60550 JSON TXT 6 Projects