Figure 1—figure supplement 1. Validation of metrics used to assess literature-derived associations. A)-(B) 1-d logistic model predicting true vs random concept pair associations from (A) cosine score and (B) Exponential Local Score. Extent of separation between the green true associations and red random associations indicates extent to which score captures known concept associations. (C) Normalized histograms of co-occurrence counts (on logarithmic scale) for high-cosine vs low-cosine token pairs. (D) Distribution of cosines between gene-disease token vector pairs vs null distributions of cosines between pairs of random 300-d vectors. (E) Null cosine distribution between two random vectors, as the dimension of the vectors varies.