Seeding the iterative USM function The iterative USM procedure described graphically in the previous section and in Figure 1 is formally defined by Equation 1 for an arbitrary sequence of N units built from an alphabet with M possible symbols. { u j f ( 0 ) = u j b ( 2 ) u j f ( i ) = u j f ( i − 1 ) + 1 2 ( U j ( i ) − u j f ( i − 1 ) ) = 1 2 u j f ( i − 1 ) + 1 2 U j ( i ) u j b ( N + 1 ) = u j f ( N − 1 ) u j b ( i ) = 1 2 u j b ( i + 1 ) + 1 2 U j ( i ) U j ( i ) ∈ { 0 , 1 } i = { 1 , 2 , ... , N } j = { 1 , 2 , ... , D }       E q u a t i o n   1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGabaabaeqabaGaemyDau3aa0baaSqaaiabdQgaQbqaaiabdAgaMjabcIcaOiabicdaWiabcMcaPaaakiabg2da9iabdwha1naaDaaaleaacqWGQbGAaeaacqWGIbGycqGGOaakcqaIYaGmcqGGPaqkaaaakeaacqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOzayMaeiikaGIaemyAaKMaeiykaKcaaOGaeyypa0JaemyDau3aa0baaSqaaiabdQgaQbqaaiabdAgaMjabcIcaOiabdMgaPjabgkHiTiabigdaXiabcMcaPaaakiabgUcaRmaalaaabaGaeGymaedabaGaeGOmaidaamaabmaabaGaemyvau1aa0baaSqaaiabdQgaQbqaaiabcIcaOiabdMgaPjabcMcaPaaakiabgkHiTiabdwha1naaDaaaleaacqWGQbGAaeaacqWGMbGzcqGGOaakcqWGPbqAcqGHsislcqaIXaqmcqGGPaqkaaaakiaawIcacaGLPaaacqGH9aqpdaWcaaqaaiabigdaXaqaaiabikdaYaaacqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOzayMaeiikaGIaemyAaKMaeyOeI0IaeGymaeJaeiykaKcaaOGaey4kaSYaaSaaaeaacqaIXaqmaeaacqaIYaGmaaGaemyvau1aa0baaSqaaiabdQgaQbqaaiabcIcaOiabdMgaPjabcMcaPaaaaOqaaiabdwha1naaDaaaleaacqWGQbGAaeaacqWGIbGycqGGOaakcqWGobGtcqGHRaWkcqaIXaqmcqGGPaqkaaGccqGH9aqpcqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOzayMaeiikaGIaemOta4KaeyOeI0IaeGymaeJaeiykaKcaaaGcbaGaemyDau3aa0baaSqaaiabdQgaQbqaaiabdkgaIjabcIcaOiabdMgaPjabcMcaPaaakiabg2da9maalaaabaGaeGymaedabaGaeGOmaidaaiabdwha1naaDaaaleaacqWGQbGAaeaacqWGIbGycqGGOaakcqWGPbqAcqGHRaWkcqaIXaqmcqGGPaqkaaGccqGHRaWkdaWcaaqaaiabigdaXaqaaiabikdaYaaacqWGvbqvdaqhaaWcbaGaemOAaOgabaGaeiikaGIaemyAaKMaeiykaKcaaaGcbaGaemyvau1aa0baaSqaaiabdQgaQbqaaiabcIcaOiabdMgaPjabcMcaPaaakiabgIGiopaacmaabaGaeGimaaJaeiilaWIaeGymaedacaGL7bGaayzFaaaabaGaemyAaKMaeyypa0ZaaiWaaeaacqaIXaqmcqGGSaalcqaIYaGmcqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGobGtaiaawUhacaGL9baaaeaacqWGQbGAcqGH9aqpdaGadaqaaiabigdaXiabcYcaSiabikdaYiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdseaebGaay5Eaiaaw2haaaaacaGL7baacaWLjaGaaCzcaGqabiab=veafjab=fhaXjab=vha1jab=fgaHjab=rha0jab=LgaPjab=9gaVjab=5gaUjabbccaGiab=fdaXaaa@DAC1@ Each of the unique M units of the alphabet are represented by unique binary vector which, graphically, positions them as unique edges of a unitary hypercube with D = log2(M) dimensions [14]. The reason why the CGR/USM procedure is revisited here is to highlight the novel seeding procedure, by ujb(2) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOyaiMaeiikaGIaeGOmaiJaeiykaKcaaaaa@339A@ for the forward iteration and by ujf(N−1) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOzayMaeiikaGIaemOta4KaeyOeI0IaeGymaeJaeiykaKcaaaaa@35B2@ for the backward coordinate iteration procedure. Why not seeding at 1/2 In the original CGR proposition [1] the mid coordinate, 1/2, is invariably used as the initial position. Because this position cannot be mapped back to a real sequence this at first appeared as a reasonable proposition even if not fundamentally superior to any of the other boundary positions such as 0 and 1. However, seeding all iterations equally causes an artifactual conservation of the beginning of the sequence which will bias sequence entropy calculations based on map coordinates [15], particularly for small sequences: the first iteration can only produce two coordinates, 1/4 or 3/4, the second iteration will produce one of 4 possibilities: 1/8, 3/8, 5/8 or 7/8, etc. This will cause some extent of artfactual high density at those positions. Other approaches to seeding iterative maps A possible solution to seed within the domain of possible sequences would be to start with a position randomly collected from a uniform distribution, as indeed used in the original USM paper [14]. However, that too will cause a bias, this time towards missing conservation of initial units in a sequence if that is the case. A negligible few false negatives may be an acceptable outcome for pattern recognition and would have no effect elsewhere in the sequence. However, it falls short of what is required for a kernel generating truly scale independent density distribution of patterns. The solution proposed here The solution proposed by Equation 1 is to seed the iterative mapping with the reverse coordinates: to seed the first forward coordinate with the next to last backward coordinate for the same dimension and vice versa. Note the first forward coordinate, ujf(1,...) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOzayMaeiikaGIaeGymaeJaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiykaKcaaaaa@372C@, and the last backward coordinate, ujb(...,1) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOyaiMaeiikaGIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaeGymaeJaeiykaKcaaaaa@3724@, to be iterated are both the first unit of the sequence, e.g. i = 1. Similarly, the last forward coordinate and the first backward coordinate are assigned to the last unit of the sequence, i = N. Therefore, the new seeding solution can be interpreted as considering that each sequence is preceded and succeeded by its mirror images for the effect of studying local properties. If the sequence is long enough that the numerical resolution of uf(N) is insensitive to the seed value, then the seed value can be determined in practice by simply iterating the last few tens of units of the reverse sequence starting with an arbitrary value. For very short sequences however, Equation 1 has to go through more than one circular iteration, starting from an arbitrary seed value, until the coordinates values converge. This solution causes each unique sequence to have a unique scale independent distribution of patterns where its statistical characteristics can be studied with no need to rebuild the original sequence. This also implies that the coordinates of iterative maps of sequences, as defined by Equation 1, are, fundamentally, steady state solutions. A simple, dramatic, example where this is of consequence is in the positioning of the sequence "A", or "AA" in Figure 1. In the conventional CGR procedure they'd be positioned with coordinates (1/4, 1/4) and (1/8, 1/8) which would place them next to very different, much more heterogeneous, sequences. On the contrary, the solution by seeding as described in Equation 1 will correctly produce the coordinate (0,0). Similarly, a sequence with regular alternation of two units, say "ABABABABAB" should produce well defined density peaks at only two positions, 1/3 and 2/3, which is in fact the steady state solution produced by Equation 1. On the contrary, both CGR and the random seeded USM would produce two trails of values converging to those solutions but not quite reaching them. The fully self-referenced nature of the modified USM construction is also reflected in the observation that the steady state solutions invariably produce ujf(1)=ujb(1) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOzayMaeiikaGIaeGymaeJaeiykaKcaaOGaeyypa0JaemyDau3aa0baaSqaaiabdQgaQbqaaiabdkgaIjabcIcaOiabigdaXiabcMcaPaaaaaa@3B9C@ and ujf(N)=ujb(N) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG1bqDdaqhaaWcbaGaemOAaOgabaGaemOzayMaeiikaGIaemOta4KaeiykaKcaaOGaeyypa0JaemyDau3aa0baaSqaaiabdQgaQbqaaiabdkgaIjabcIcaOiabd6eaojabcMcaPaaaaaa@3C06@. However, exploring the bidirectional density distributions is beyond the scope of this report.