Measures for finding repeats
Three measures can be used to give partitions of repeated regions.

Quasiperiodicty
Wan and Song proposed a measure in which all the repeated copies (except the last one) have the same length [6]. For this measure, a linear time and space algorithm was given [6].

Approximate periods
Sim et al. [7] introduced a notion of approximation periods (approximate period) using edit distance or relative edit distance. The problem in general is defined as follows: given a string x, find a repeated pattern p such that x can be partitioned as x = p1p2...pk and  is minimized. Here d(p, pl) is the relative edit distance which is  the edit distance, where L = (|p| + |pl|)/2 is the average length of the two strings p and pl. Note that, the normalization of the edit distance is important for finding repeated patterns since otherwise, one can give a partition in which each pattern has one letter and the edit distance is at most 1 (small). The problem in general is NP-hard [7]. When the repeated pattern p is assumed to be a substring of x, The problem can be solved in O(|x|4) time. Note that the second measure is more general than the first since it allows insertions and deletions. Both measures in [7] and [6] use the bottleneck function that finds the repeated pattern p and assumes that each copy pi in the long string is close to the repeated pattern p, i.e., d(pi, p) ≤ δ and δ is minimized. However, in biological sequences, copies of the repeated patterns may change gradually so that some repeats in the region may have very little in common. For example, it is well-known that the N-terminal non-globular region of Thermus thermophilus seryl-tRNA synthetase (PDB:1SRY) [1,8] has weak 7-residue repeats. See Table 1. The similarity score between two consecutive patterns is calculated using Blosum62 matrix and the gap penalty is set to be -4. The repeated patterns gradually changes from the 4-th unit LDLEALLA to the 13-th unit KEARLE. The average similarity score for the nine pairs of consecutive patterns is 4.56. But the similarity score between the 4-th unit and the 8-th unit is -11. In this case, the algorithms based on the bottleneck function may fail to find the multiple repeats.
Table 1  Pseudo periodic repeats of 1SRY (matrix:blosum62, gap penalty: -4)
Unit  Pseudo-periodic unit  Length  Similarity with previous unit
1  MVDLKRLR  8
2  QEPEVFHR  8  -5
3  AIREKGVA  8  -10
4  LDLEALLA  8  -1
5  LDREVQEL  8  7
6  KKRLQEVQ  8  6
7  TERNQVA  7  6
8  KRVPKAP  7  -4
9  PEEKEAL  7  -2
10  IARGKAL  7  3
11  GEEAKRL  7  3
12  EEALRE  6  10
13  KEARLE  6  12
14  ALLLQV  6  -6
15  PLPP  4  -8

Pseudo-periodic repeats
Li et al. [1] gave the first measure that allows gradual changes of patterns and changes of pattern lengths in the region. The repeats they defined are called the pseudo-periodic repeats. Given a repeated region (a string) x and a partition X = s1s2...sk, the pseudo periodic score is
where d(·) is the edit distance, |si| is the length of si, and c is a factor that control the penalty of the two ends of the partition. Li et al. [1] gave a O(|x|2) algorithm to compute an optimal partition of a given repeated region x. It was shown that the pseudo-periodic score can accurately give partitions for tandem repeated regions, where the repeated patterns are weakly similar.
Example: The example is from [1]. The sequence of the LbH domain of members of the LpxA family consists of the imperfect tandem repetition of hexapeptide units [9-11]. These imperfect tandem repeats (partitions) have been accurately detected by the algorithm using the pseudo periodic score [1]. (See Table 2).
Table 2  Pseudo periodic repeats of LPXA_ECOLI (matrix:blosum62, gap penalty: -4)
Unit  Pseudo-periodic unit  Length  Similarity with previous unit
1  MIDKSAFVHPTAIVEEGA  18
2  SIGANAHIGPFCIVGPHV  18  14
3  EIGEGTVLKSHVVVNGHT  18  16
4  KIGRDNEIYQFASI  14  -7
5  GEVNQDLKYAGEPTR  15  -3
6  VEIGDRNRIRESVTI  15  2
7  HRGTVQGGGL  10  -14
8  TKVGSDNLLMINAHIAHD  18  -24
9  CTVGNRCILANNATLAGH  18  20
10  VSVDDFAIIGGMTAVHQF  18  4
11  CIIGAHVMVGGCSGV  15  4 In sequence analysis, we may have a long sequence s and only a substring t (or a few substrings) of s contains the consecutive repeats. The problem here is to find out the substring t and give an optimal pseudo-periodic partition. We call this problem the local pseudo-periodic problem. In this paper, we define the maximization version of the pseudo-periodic partition and develop an algorithm that solves the local pseudo-periodic problem in O(n2) time, where n is the length of the input sequence s.