Given any two sequencing traces in S, they are related either through sharing the same immediate template (traces 5 and 6 in figure 1, for example), or indirectly, through the ancestor, α, with each having a distinct immediate template. These two alternatives correspond to two different likelihoods. In the first, the differences between the two traces are attributable to sequencing error only, and must be consistent with the reported position-specific quality scores. In the latter, the divergence of the underlying genes from their common ancestor provides an additional source of differences. The difference in description lengths under these two models provides a criterion for choosing between the models. In fact, the criterion we will use for determining the final set of assemblies is the minimization of the total description length of S under the model in figure 1. Briefly, the description length is the information required to encode the data and the values of the model parameters. (See [30] for a complete treatment of description length techniques for inference.) For any sub-model for pairs of traces that involves mutations from the common ancestor, the description length must account for the information required to encode the mutation rate. If y is the overlap in nucleotides between the two sequencing traces, the cost of encoding the mutation rate is log y.