Discussion A novel kernel density method to measure oligomeric frequency in a iterative sequence maps of biological sequences (Chaos Game Representation or its generalization to alphabets longer than 4 units, Universal sequence Maps) was described and summarily illustrated. However, the illustration would not be complete without mapping the promoter regions of Bacillus subtilis and the recognition of the TATA box in the same sequences used in the preceding report [15], which motivated the development described here. This discussion will therefore focus on the representation and decomposition of sequence conservation, which can be detected by unlikely repetition of the conserved segment of because the conserved segment has an unlikely composition in the context of the remaining sequence. Accordingly, the illustration in Figure 4 uses the same 20, 100 unit long upstream promoter regions of B.subtilis obtained from [23,24], all having a known promoter sequence constituted by the sub-string TTGACA-(space)-TATAAT with at most one substitution (known as the TATA-box). The entropic properties of those sequences were discussed in the preceding work [15], were they were designated by the Es symbol. For the sake of reference, the Es concatenation is embedded in the software library provided with this report, and is retrieved when using the illustrative function paper_fig(4), which reproduces Figure 4 (this function can be used to reproduce the other three figures too, see Methods). The volume under the density distribution is, by definition, unitary (the normalized height is obtained by dividing H, equation 3, by N, the total number of sequence units). Therefore, the average value of the matrix underneath the 3D bar plot in Figure 4 is also unitary and sets the scale for the representation (scaled height axis is represented in the 3D view of the density distribution represented in Figure 4). Two important issues for pattern recognition in sequences are raised by this illustration and warrant discussion even if they fall outside the strict reporting of a kernel density distribution method. Firstly, it is clear that for any fixed resolution, L, all conserved segments of longer length will have its L-long sub-segments represented as peaks scattered throughout the distribution. As a consequence, the choice of value for the smooth parameter, S, should be set as to maximize the recognition of an objective quantity, such as information content. When scanning different scales, by using various values for L, the optimal value of S would also be different, as it would be dependent on the information content encoded at that scale. Secondly, the shorter sub-segments of a conserved segment of length L, will set the base height for the quadrants where the conserved L-long segment is inserted. Therefore, the availability of a density distribution kernel for the projection of sequences in a continuous space also creates the opportunity to devise de-embedding schemes that will pinpoint the location of conservation for arbitrary target resolutions.