2 Methods
In this part, we first introduce in section 2.1 the FMCI and see the limits of naive approaches to their corresponding numerical computations. The main results are given in section 2.3 where we propose two effective algorithms able to to compute general FMCI p-values (algorithm 1) or complementary p-value (algorithm 2). The theoretical background for these algorithms is given in the section 2.2.

2.1 Finite Markov Chain Imbedding
Let us consider X = X1,...,Xn a sequence of Bernoulli or Markov observations and En an event depending on the sequence X. We suppose that it is possible to build from X an order one Markov chain Z = Z1,...,Zn on the finite state space  of size L. This space contains (in the order): k starting states denoted s1,...,sk, some intermediate states, and one final absorbing state f. The Markov chain is designed such as
ℙ(En|Z1 = si) = ℙ(Zn = f|Z1 = si) = ∏n-1(si, f)     (1)
where
is the transition matrix of Z.
If μ is the starting distribution of Z1, we hence get
Using this approach (and a binary decomposition of n - 1), it is possible to compute the p-value with O(log2(n) × L2) memory complexity and O(log2(n) × L3) time complexity. As L usually grows very fast when we consider more complex events En, these complexities are a huge drawback of the method. Moreover, numerical precision considerations prevent this approach to give accurate results when using the relation ℙ() = 1 - ℙ(En) to compute the p-value of the complementary event (as the absolute error is then equal to the relative precision of the computations).

2.2 Effective computations
Proposition 1. For all n ≥ 1 we have
Proof. This trivial to establish by recurrence using matrix block multiplications.     □
We hence get the
Corollary 2 (direct p-value). For all n ≥ 1 we have
for all 1 ≤ i ≤ k     ℙ(En|X1 = si) =      and          (5)
with yn-2 computable through the following recurrence relations:
x0 = y0 = v     and, for all j ≥ 0     xj+1 = Rxj     and     yj+1 = yj+xj     (6)
Proof. Simply use proposition 1 to rewrite equations (1) and (3). Recurrence relations are then obvious to establish.     □
And we also get the
Corollary 3 (complementary p-value). For all n ≥ 1 we have
for all 1 ≤ i ≤ k     ℙ(|X1 = si) =      and          (7)
with x0 is a size L - 1 column vector filled with ones and with xn-1 = Rn-1x0 which is computable through the following recurrence relation:
for all j ≥ 0     xj+1 = Rxj     (8)
Proof. ∏ being a stochastic matrix, ∏n-1 is also stochastic, it is therefore clear that the sum of Rn-1 over the columns gives 1 - yn-2 and the corollary is proved.     □
Using these two corollaries, it is therefore possible to accurately compute the p-value of the event or of its complementary with a complexity O(L + ζ) in memory and O(n × ζ) in time where ζ is the number of non zero terms in the matrix R. In the worst case, ζ = (L - 1)2 but the technique of FMCI usually leads to a very sparse structure for R. One should note that these dramatic improvements from the naive approach could even get better by considering the structure of R itself, but this have to be done specifically for each considered problem. We will give detailed examples of this in both our application parts but, for the moment, we focus on the general case for which we give algorithms.

2.3 Algorithms
Using with the corollary 2 we get a simple algorithm to compute p = ℙ(En)
algorithm 1: direct p-value
x is a real column vector of size L - 1 and y a real column vector of size k
initialization x = (v1,...,vL-1)' and y = (v1,...,vk)'
main loop for i = 1...n - 2 do
    • x = R × x (sparse product)
    • y = y + (x1,...,xk)'
end return
and using the corollary 3 we get an even simpler algorithm to compute the q = 1 - p = ℙ()
algorithm 2: complementary p-value
x is a real column vector of size L - 1
initialization x = (1,...,1)'
main loop for i = 1...n - 1 do
    • x = R × x (sparse product)
end return
The more critical stage of both these algorithms is the sparse product of the matrix R by a column vector which can be efficiently done with ζ operations.
It is interesting to point out the fact that these algorithms do not require the stationarity of the underlying Markov chain. More surprisingly, it is also possible to relax the random sequence homogeneity assumption. Indeed, if our transition matrix ∏ depends on the position i in the sequence, we simply have to replace R in the algorithms with the corresponding Ri (which may use a significant amount of additional memory depending on its expression as a function of i).
For complementary p-value, we require to compute R1R2...Rn-1Rnx which is easily done recursively starting from the right. In the direct p-value case however, it seems more difficult since we need to compute x + R1x + R1R2x + ... + R1R2...Rn-1Rnx. Fortunately this sum can be rewritten as x + R1(x + R2{... [x + Rn-1(x + Rnx)]...}) which is again easy to compute recursively starting from the right.
The resulting complexities in the heterogeneous case are hence the same than in the homogeneous one (assuming that the number of non zero terms in Ri remains approximately constant). This remarkable property of the FMCI should be remembered especially in the biological field where most sequences are known to have complex heterogeneous structures which are often difficult to take into account.