The algorithms we developed to derive disease models from the annotation patterns of PubMed abstracts combined a number of components, including (1) semantic CR (Bio-LarK40); (2) an adaptation of the TFIDF method, whereby diseases take the place of documents, and the “document frequency” of individual HPO terms is calculated from the number of abstracts containing the term; (3) an evaluation of the IC of individual HPO terms for calculating the semantic similarity84,85 between terms; and (4) a heuristic graph clustering method that attempts to extend seed terms with particularly high TFIDF values to create a dense phenotypic network. This allowed us to develop annotations for over 3,000 common, complex diseases, and we demonstrated the potential utility of the resource by an analysis of phenotypic overlap between common and rare disease, as well as between complex diseases that share one or more genetic associations. The platform we have made available, together with the data, is in itself a valuable resource for the community. In addition to providing a way to download the data in a tab-separated form, or to access it programmatically via application programming interfaces, the website also enables a phenotype- and disorder-centric browsing of MEDLINE abstracts and browsing within the CDN (Figure S7). This resource could be useful for physicians who are caring for persons with a given disease and who present with a particular manifestation or complication of that disease (denoted by an HPO term). The browser will present all PubMed abstracts that were identified in our study and that describe both the disease and the phenotypic manifestation, which might provide information that could be helpful in clinical management.