In our testing on simulated data, we use a selection of five transcription factor datasets with motifs of varying levels of conservation, as measured by their IC (Table 3). We generate background sequences with uniform nucleotide distributions, as well as those with increasingly biased probability distributions. A background sequence for a particular binding site is generated of length equal to that of its upstream region (up to 600 bps). In particular, for each position, a base is selected at random according to a probability distribution in which base G is chosen with some probability pr(G) and the other bases with probability (1 - pr(G))/3 each.