The first synthetic data set consists of 140,000 simulated intergenic regions representing the orthologous promoter regions of 10,000 genes from the fourteen species, where each sequence is of length 500 bp, with two planted Crp sites, generated from the Crp motif model (Figure 1A). The second data set is the same but with "1/2-strength Crp" sites, where the average number of bits of information across the positions of a Crp motif is cut in half. The third data set contains "1/3-strength Crp" sites. The fourth data set is a negative control and contains no planted transcription factor binding sites. See the Methods and Figure 1 for more information.