### Counting Distinct Squares

A square is any repetition of order at 2 of a word. In the English language the words hotshots, tartar, or murmur, represent squares. In , Fraenkel and Simpson asked the following question:

#### Given a word of length n, what is the maximum number of distinct squares that can occur as factors of the word?

Conjecture: For every word of length n, the maximum number of distinct squares that occur as factors of it is bounded by n.

### State of the art

In  the authors also give a proof of the fact that this number is bounded in fact by 2n, by making use of the so called Three squares Lemma, from , which says that if three squares with primitive roots start all at the same position, then there exists a later occurrence of the shortest one. Thus, using this fact the authors consider only last occurrences of each of the squares, and get the bound. Later on, in , Ilie simplifies the existing proof for the 2n result, and then, in , he improves the bound using a simple combinatorial argument, to 2n-log(n). Next we present another proof, that was communicated to Mike Müller by Dan Gusfield, as being done by Dean Hickerson of Davis, California.

Proof: Let the lengths of the 3 squares be 2a, 2b, and 2c with a < b < c. Let s(x) denote the symbol in position x in the string, with position 0 being the leftmost. Thus:
s(x) = s(x+a) for 0 ≤ x < a   (0)
s(x) = s(x+b) for 0 ≤ x < b   (1)
s(x) = s(x+c) for 0 ≤ x < c   (2)
If c ≥ 2a, then (2) implies that s(x) = s(x+c) for 0 ≤ x < 2a, so the square of length 2a also occurs in positions c to c + 2a - 1, and the proof is complete in this case.
For the rest of the proof, assume that c < 2a. This implies that
0 < c - b < 2a - b < a < b < c < 2a < a + b
I claim that s(x) = s(x+b-a) for 0 ≤ x < 2a   (3).
Assuming this, the square of length 2a also occurs in positions b-a to a+b-1, completing the proof.
To prove (3), we consider 3 cases:
A) If 0 ≤ x < 2a - b. Then s(x) = s(x+b) since 0 ≤ x < b equals s(x+b-a) since 0 ≤ x + b - a < a, and hence s(x) = s(x+b-a).
B) If a ≤ x < 2a. Then s(x) = s(x-a) since 0 ≤ x-a < a, thus s(x)= s(x+b-a) since 0 ≤ x-a < b, and hence s(x) = s(x+b-a).
C) If 2a - b ≤ x < a, we proceed by induction on x. The base case relies on A).
Assume that s(y) = s(y+b-a) for all y such that 0 ≤ y < x   (4)
Then, s(x) = s(x+b) since 0 ≤ x < b, thus s(x) = s(x+b-c) since 0 ≤ x + b - c < c, thus s(x) = s(x+2b-a-c) by (4) with 0 ≤ y = x + b - c < x, thus s(x) = s(x+2b-a) since 0 ≤ x + 2b - a - c < c, thus s(x) = s(x+b-a) since 0 ≤ x + b - a < b, so s(x) = s(x+b-a), and the induction is complete.
The three cases cover the range 0 ≤ x < 2a, so (3) is proved, and therefore we can conclude.   ◼

The problem was further investigated in [4-7] with all papers presenting different possible strategies or techniques. In , the authors consider different types of squares by looking at the number of squares starting at the same position. They call two squares starting at the same position an FS-double-square, and bound the number of these to 5n/6. As a result, they manage to obtain the best known bound so far (11n/6) for the number of distinct squares that are factors of a word. Another important breakthrough came in 2015 when Manea and Seki show in  that the conjectured fact that the maximum ratio between the length of the word and the number of distinct squares that it contains as factors is actually obtained on a binary alphabet.

As a final remark, note that the problem has been further studied also in the context of partial words. Thus in  the authors show that there is an exponential dependency between the number of squares and the number of holes a word has, while in , the authors prove that when you consider binary words with only one hole, the bound is 7n/2 (this result was also proved independently in ).

### Bibliography:

1. A.S. Fraenkel, J. Simpson: How many squares can a string contain? Journal of Combinatorial Theory, Series A 82 (1998), pp 112–120
2. M. Crochemore, W. Rytter: Squares, cubes, and time–space efficient string searching, Algorithmica 13 (1995), pp 405–425
3. L. Ilie: A simple proof that a word of length n has at most 2n distinct squares. Journal of Combinatorial Theory, Series A 112(1) (2005), pp 163–164
4. L. Ilie: A note on the number of squares in a word. Theoretical Computer Science 380 (2007), pp 373–376
5. A. Deza, F. Franek, A. Thierry: How many double squares can a string contain? Discrete Applied Mathematics 180 (2015), pp 52–69
6. A. Deza, F. Franek, M. Jiang: A d-step approach for distinct squares in strings. CPM 2011, LNCS 6661, pp 77–89
7. N. Jonoska, F. Manea, and S. Seki: Stronger Square Conjecture on Binary Words. SOFSEM 2014, LNCS 8327, pp 339-350
8. F. Manea, and S. Seki: Square-Density Increasing Mappings . WORDS 2015, LNCS 9304, pp 160-169
9. F. Blanchet-Sadri, R. Mercaş, and G. Scott: Counting distinct squares in partial words. Acta Cybernetica 19(2), pp 465-477
10. F. Blanchet-Sadri, and R. Mercaş: A note on the number of squares in a partial word with one hole. RAIRO - Theoretical Informatics and Applications 43(4), pp 767-774
11. Vesa Halava, Tero Harju, Tomi Kärki: On the number of squares in partial words. RAIRO - Theoretical Informatics and Applications 44(1), pp 125-138