Counting Distinct Squares

A square is any repetition of order at 2 of a word. In the English language the words hotshots, tartar, or murmur, represent squares. In [1], Fraenkel and Simpson asked the following question:

Given a word of length n, what is the maximum number of distinct squares that can occur as factors of the word?

Conjecture: For every word of length n, the maximum number of distinct squares that occur as factors of it is bounded by n.


State of the art

In [1] the authors also give a proof of the fact that this number is bounded in fact by 2n, by making use of the so called Three squares Lemma, from [2], which says that if three squares with primitive roots start all at the same position, then there exists a later occurrence of the shortest one. Thus, using this fact the authors consider only last occurrences of each of the squares, and get the bound. Later on, in [3], Ilie simplifies the existing proof for the 2n result, and then, in [4], he improves the bound using a simple combinatorial argument, to 2n-log(n). Next we present another proof, that was communicated to Mike Müller by Dan Gusfield, as being done by Dean Hickerson of Davis, California.

Proof: Let the lengths of the 3 squares be 2a, 2b, and 2c with a < b < c. Let s(x) denote the symbol in position x in the string, with position 0 being the leftmost. Thus:
s(x) = s(x+a) for 0 ≤ x < a   (0)
s(x) = s(x+b) for 0 ≤ x < b   (1)
s(x) = s(x+c) for 0 ≤ x < c   (2)
If c ≥ 2a, then (2) implies that s(x) = s(x+c) for 0 ≤ x < 2a, so the square of length 2a also occurs in positions c to c + 2a - 1, and the proof is complete in this case.
For the rest of the proof, assume that c < 2a. This implies that
0 < c - b < 2a - b < a < b < c < 2a < a + b
I claim that s(x) = s(x+b-a) for 0 ≤ x < 2a   (3).
Assuming this, the square of length 2a also occurs in positions b-a to a+b-1, completing the proof.
To prove (3), we consider 3 cases:
A) If 0 ≤ x < 2a - b. Then s(x) = s(x+b) since 0 ≤ x < b equals s(x+b-a) since 0 ≤ x + b - a < a, and hence s(x) = s(x+b-a).
B) If a ≤ x < 2a. Then s(x) = s(x-a) since 0 ≤ x-a < a, thus s(x)= s(x+b-a) since 0 ≤ x-a < b, and hence s(x) = s(x+b-a).
C) If 2a - b ≤ x < a, we proceed by induction on x. The base case relies on A).
Assume that s(y) = s(y+b-a) for all y such that 0 ≤ y < x   (4)
Then, s(x) = s(x+b) since 0 ≤ x < b, thus s(x) = s(x+b-c) since 0 ≤ x + b - c < c, thus s(x) = s(x+2b-a-c) by (4) with 0 ≤ y = x + b - c < x, thus s(x) = s(x+2b-a) since 0 ≤ x + 2b - a - c < c, thus s(x) = s(x+b-a) since 0 ≤ x + b - a < b, so s(x) = s(x+b-a), and the induction is complete.
The three cases cover the range 0 ≤ x < 2a, so (3) is proved, and therefore we can conclude.   ◼

The problem was further investigated in [4-7] with all papers presenting different possible strategies or techniques. In [5], the authors consider different types of squares by looking at the number of squares starting at the same position. They call two squares starting at the same position an FS-double-square, and bound the number of these to 5n/6. As a result, they manage to obtain the best known bound so far (11n/6) for the number of distinct squares that are factors of a word. Another important breakthrough came in 2015 when Manea and Seki show in [8] that the conjectured fact that the maximum ratio between the length of the word and the number of distinct squares that it contains as factors is actually obtained on a binary alphabet.

As a final remark, note that the problem has been further studied also in the context of partial words. Thus in [9] the authors show that there is an exponential dependency between the number of squares and the number of holes a word has, while in [10], the authors prove that when you consider binary words with only one hole, the bound is 7n/2 (this result was also proved independently in [11]).


Bibliography:

  1. A.S. Fraenkel, J. Simpson: How many squares can a string contain? Journal of Combinatorial Theory, Series A 82 (1998), pp 112–120
  2. M. Crochemore, W. Rytter: Squares, cubes, and time–space efficient string searching, Algorithmica 13 (1995), pp 405–425
  3. L. Ilie: A simple proof that a word of length n has at most 2n distinct squares. Journal of Combinatorial Theory, Series A 112(1) (2005), pp 163–164
  4. L. Ilie: A note on the number of squares in a word. Theoretical Computer Science 380 (2007), pp 373–376
  5. A. Deza, F. Franek, A. Thierry: How many double squares can a string contain? Discrete Applied Mathematics 180 (2015), pp 52–69
  6. A. Deza, F. Franek, M. Jiang: A d-step approach for distinct squares in strings. CPM 2011, LNCS 6661, pp 77–89
  7. N. Jonoska, F. Manea, and S. Seki: Stronger Square Conjecture on Binary Words. SOFSEM 2014, LNCS 8327, pp 339-350
  8. F. Manea, and S. Seki: Square-Density Increasing Mappings . WORDS 2015, LNCS 9304, pp 160-169
  9. F. Blanchet-Sadri, R. Mercaş, and G. Scott: Counting distinct squares in partial words. Acta Cybernetica 19(2), pp 465-477
  10. F. Blanchet-Sadri, and R. Mercaş: A note on the number of squares in a partial word with one hole. RAIRO - Theoretical Informatics and Applications 43(4), pp 767-774
  11. Vesa Halava, Tero Harju, Tomi Kärki: On the number of squares in partial words. RAIRO - Theoretical Informatics and Applications 44(1), pp 125-138


If you have any further information regarding this problem, or would like me to fix any typos/errors here, please feel free to contact me.