PubMed-MEDLINE 2014 Corpus The CR process was performed on the 2014 release of the PubMed-MEDLINE corpus. The corpus contains 22,376,811 articles, of which 13,262,617 have a valid title and abstract (most of the missing entries represent articles in languages other than English and only their titles are listed). MEDLINE abstracts are associated with a series of medical subject headings (MeSHs); the main headings (descriptors) provide a schematic description of the topic of the article. The descriptors are divided into 16 categories, including category C, “diseases.” Category C contains 4,620 unique entries, and we refer to it here as “MeSH diseases.” We note that although MeSH category C is described as comprising diseases, many of the terms in the complete tree C (4,620 entries) do not refer to specific diseases. For instance, many of the terms describe general categories, such as “brain diseases” (MeSH: D001927), veterinary diseases (e.g., “brucellosis, bovine” [MeSH: D002007]), and various other entities, such as “cadaver” (MeSH: D002102). Others represent phenotypic features of diseases rather than actual disease entities; one example is “Cheyne-Stokes respiration” (MeSH: D002639), which is an abnormal breathing pattern that can be observed in diseases such as central sleep apnea syndrome. We excluded such MeSH entries by careful manual curation, leaving a total of 3,145 MeSH category C descriptors that we judged to actually represent specific disease entries. Only these entries were used for the analysis described in this manuscript. We filtered the 13,262,617 abstracts on the basis of the MeSH terms to retain only those abstracts that included at least one of the 3,145 disease entries from the MeSH disease list and then processed them with the Bio-LarK Concept Recognizer. In some cases, a single abstract was annotated with multiple MeSH disease terms, some of which were also featured as major topics for the article under scrutiny. For the purpose of this analysis, we included all abstracts independently of the number of associated MeSH terms or their major topic feature.