To perform the real-data tests, three databases representing the reference species clade were generated for scanning: (1) a database containing the 2379 E. coli intergenic regions of interest, (2) a database containing only E. coli data ("E. coli reduced"), where 1662 E. coli intergenic regions have been reduced in sequence space by alignment with S. typhi orthologous data plus an additional 836 E. coli sequences for which there was no orthologous S. typhi data, and (3) a database containing 1662 E. coli-S. typhi aligned orthologous intergenic regions plus an additional 836 E. coli sequences for which there was no orthologous S. typhi data.