Investigation on the interaction data We have investigated the collected data set in this work. We measured the similarity of the selected targets with Smith-Waterman score (Figure 3A), and found that the similarities of vast majority of targets are low (<0.2), indicating that the homology of the selected targets in the data set is weak. X-ray and other biology studies suggest that a number of proteins contain more than one ligand-binding sites. For example, some enzymes possess two or more binding sites, one for substrate and another for activator/inhibitor. Therefore, we constructed a sites-ligand interaction network using a bipartite graph to check the degree distributions of both binding sites and ligands (Figure 3B and 3C). From Figure 3B we can see that each of the most binding sites bind with only one ligand, which is consistent with the fact that the binding of target and ligand is specific. Figure 3C shows that more than 95% ligands interact with only one site. In all, we can infer that the targets in the data set are low in homology, the connections of site-ligand bipartite graph are sparse and the average degree of binding sites is larger than that of ligands. Figure 3 Investigation of the data set. A) The distribution of target sequence similarities B) The degree distributions of targets C) The degree distribution of the ligands