Statistical Significance of the CDN In order to test the statistical significance of the distribution of phenotypic similarity among diseases within the same disease category or between different categories, we introduced the concept of the gray-edge fraction (GEF). That is, we visualized edges between nodes (diseases) that do not belong to one of the same 13 general disease categories as gray edges. The GEF was defined as the proportion of gray edges among all edges in the CDN. The lower the GEF, the better the phenotypic clustering of diseases agrees with the classification of the diseases into the 13 categories. The original CDN (CDN-o) comprised 3,547 edges, 998 of which were gray edges, corresponding to a GEF of 0.246 (red arrow in Figure S4A). We tested two randomization procedures, edge randomization (er) and annotation randomization (ar). The edge-permutation procedure retains the number of edges and the degree distribution of the network.43 Two edges, A-B and X-Y, are chosen at random and reshuffled to create the edges A-Y and X-B. Reshuffling is skipped if the edges A-Y and X-B already exist. Reshuffling is performed 10,000 times, resulting is an edge-randomized version of CDN-o, which we call CDN-er and for which we can again compute the GEF. We constructed 1,000 versions of CDN-er and plotted the distribution of the resulting GEF values in Figure S4A. As one can see, the p value of the CDN is less than 0.001 because none of the edge-randomized CDNs achieved the same or a smaller GEF than the original CDN. We additionally performed a test in which we randomized the HPO terms associated with each disease (ar). For this, we randomly selected 50% of the terms associated with each disease and replaced them with randomly selected HPO terms. We computed the randomized CDN (called CDN-ar) by using the above procedures used to construct the CDN-o. We repeated this procedure 100 times and computed the GEF for each CDN-ar. Note that each CDN-ar might not have the same amount of nodes and edges as the CDN-o. When using the same simcut (2.0) used for constructing the CDN-o, we obtained much smaller networks (fewer than 100 nodes). The distribution of GEF values of CDN-ar with simcut 2.0 is shown in Figure S4B. No CDN-ar achieved a GEF less than or equal to the CDN-o GEF, which corresponds to a p value of less than 0.01. We modified the simcut to 1.4 because it leads to CDN-ar versions with approximately the same amount of nodes as CDN-o. The distribution of the resulting GEF values is shown in Figure S4C. Again, not a single CDN-ar constructed with a simcut of 1.4 achieved a GEF less than or equal to the CDN-o GEF, which corresponds to a p value of less than 0.01.