Methods

Algorithms and libraries
All algorithms and figures were implemented with original code developed using the programming environment MATLAB 7, Mathworks Inc. The resulting toolbox is and is made freely available at the GeneChaos resource [22] with no restrictions to use or modification. To assist in understanding the proposed algorithms, a function was included that produces the figures presented in this manuscript, e.g. paper_fig(1) will produce Figure 1, etc. This function therefore also serves as a tutorial to the usage and interplay of the remaining functions.

Terminology
This report, and the iterative mapping field in general, mixes terminology from two distinct approaches to sequence analysis which are noteworthy elaboration for the sake of clarity. "Scale" and "resolution" are used as generic terms for a concept that is sometimes precised as "sequence length" or Markovian "order". "Length" is the term used in word-statistics and corresponds to the length of the L-tuple. "Order" describes the same concept but is more commonly used in the context of Markov models. To add to the confusion, L-tuple/word "length" is one unit smaller than "order". For example a simple 4 × 4 transition matrix between nucleotides resolves Markovian succession with order 1 and the conditional probabilities in each of the 16 squares correspond to the frequency of all possible dinucleotides (length 2). Another example, the "scale of L-tuple distribution" designates the length of the tuple for which all frequencies where determined. A variation on this theme is the use of "alphabet size" to access scale: it designates the number of unique symbols available for use in by the sequence. Along the same line of thought, "vocabulary" (not used in this report) would designate the number of possible L-tuples of a given length.
At the origin of this terminology confusion is the fact that both terms, "order" and "length", are originally defined in the context of integer sequence resolution. However the CGR/USM techniques are not restricted to integer resolutions also allow for fractal order/length. Therefore the more generic use of "scale" and "resolution" to overcome the integer presumption. The generalization of scale achieved by iterative maps of discrete sequences was object of some discussion in the early 90's, for example contrast [1] with [2], a topic revised and discussed in [3,9].

List of symbols
uf : coordinate in the forward iterative map. The dimension and sequence unit represented are indicated by sub and supra-indexes, ujf(i) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOzayMaeiikaGIaemyAaKMaeiykaKcaaaaa@340B@ represents the position in the jth dimension of the iterative map for the ith unit of the sequence.
ub : same as ub but for iterative coordinates in the backward map, that is, obtained by iterating from the end to the beginning of the sequence. For either map, the coordinates fall within the [0,1] interval.
Uj(i) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGvbqvdaqhaaWcbaGaemOAaOgabaGaeiikaGIaemyAaKMaeiykaKcaaaaa@3276@: value of the jth binary digit assigned to the ith unit of the sequence. This positions each unit of the alphabet at an edge of a unitary hypercube [14].
D : number of dimensions of each unidirectional map.
N : length of the sequence being represented.
K : density kernel, K(u) indicates the height of the density distribution in map coordinate u.
L : memory length resolution, that is, the length of the segments being resolved. It is equivalent to Markovian order added one unit.
S : kernel smoothing parameter, see equation 3 for definition. The value of S varie between 0, for uniform density, and +∞, where the density distribution is exactly equivalent to a Markov transition table.