PMC:1459173 / 22182-23692 JSONTXT

Annnotations TAB JSON ListView MergeView

{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/1459173","sourcedb":"PMC","sourceid":"1459173","source_url":"https://www.ncbi.nlm.nih.gov/pmc/1459173","text":"Our steepest descent paradigm performs a number of phases consisting each in the selection of the pattern to be used for compression followed by the actual substitution and encoding. The process stops when no further compression is achieved. The sequence representation at the outset is finally pipelined into some of the popular encoders and the best one among the overall scores thus achieved is retained. Clearly, at any stage it is impossible to choose the motif on the basis of the actual compression eventually conveyed by that motif. The decision must be based on an estimate, that takes in to account the mechanics of encoding. In practice, we estimate at log(i) the number of bits needed to encode the integer i (we refer to, e.g., [4] for reasons that legitimate this choice). In one scheme [10], one eliminates all occurrences of m, and record in succession m, its length, and the total number of its occurrences followed by the actual list of such occurrences. Letting |m| to denote the length of m, Dm denotes the number of extensible characters in m, fm the number of occurrences of m in the textstring, sm the number of characters occupied by the motif m in all its occurrences on s, |Σ| the cardinality of the alphabet and n the size of the input string, the compression brought about by m is estimated by subtracting from the sm log |Σ| bits originally encumbered by this motif on s, the expression |m| log |Σ| + log |m| + fmDm log D + fm log n + log fm charged by encoding, thereby obtaining:","tracks":[]}