Mathematisches Forschungsinstitut Oberwolfach › ~berthe › › Articles › OWR_2010_37.pdf · Mathematisches Forschungsinstitut Oberwolfach Report No. 37/2010 DOI: 10.4171

Mathematisches Forschungsinstitut Oberwolfach

Report No. 37/2010

DOI: 10.4171/OWR/2010/37

Mini-Workshop: Combinatorics on Words

Organised byValerie Berthe, MontpellierJuhani Karhumaki, TurkuDirk Nowotka, Stuttgart

Jeffrey Shallit, Waterloo

August 22nd – August 28th, 2010

Abstract. The area of combinatorics on words is concerned with propertiesof sequences of symbols. It is characteristic to the field that questions arisefrom various mathematical problems, and hence, many fundamental resultson words have been established in different areas. Over the last two decadesthe theory has developed into a quickly growing topic of its own. This work-shop was dedicated to reflect on the current status of the field, discuss theimpact of recent results, and provide new research challenges. This is a reporton the meeting and presentation of extended abstracts of the lectures.

Mathematics Subject Classification (2000): 68R15 (main), 03D03, 03D40, 11N60, 20M05.

Introduction by the Organisers

Combinatorics on words studies properties of sequences of symbols, either finiteor infinite, taken from a finite alphabet. The focus on words might be algebraic,combinatorial, or algorithmic. It is characteristic to the field that the motiva-tion to study properties of sequences has arisen from very different mathematicalproblems. As a result many fundamental properties of words, e.g., on avoidablepatterns, unavoidable regularities, and word equations have been established invarious mathematical areas. Over the last two decades the theory has developedinto a quickly growing topic of its own.

In this workshop the current status of the field was identified, the impact ofrecent breakthrough results like on unavoidable repetitions in infinite words, wordequations, the structure of finite words of small index, and the transcendence of

2196 Oberwolfach Report 37/2010

certain morphic words was discussed, as well as, perspectives for research on thisnew and challenging field were created.

The talks of this workshop addressed a large portion of topics in combinatoricson words. They ranged from complexity questions (inconstancy, palindromic andfactor complexity, rich words) over pattern avoidance, word equations, specialfactorizations and periods to topics touching on number theory and graph theory.One session was dedicated to the perspectives of combinatorics on words and openproblems.

It has to be noted that one could experience a very productive atmosphere dur-ing the whole workshop. All talks were accompanied with interesting comments,proposed conjectures and (sometimes) solutions, connections with other fields werediscovered, and lively discussions were held far beyond the time of the lectures.Just to name two examples of discussed topics: (1) The Halbeisen-Hungerbuehler-Pirillo-Varicchio problem asks for a word over some finite alphabet 0, 1, ..., ksuch that two consecutive factors of the same length never have the same sum.New approaches to this problem were raised and investigated during the workshop.(2) A new idea on the connection between solutions of word equations and semi-linear sets was discussed and yields a fresh approach to the investigation of thecumulative defect effect that may eventually lead to new progress on the Culik-Karhumaki conjecture which has withstood a solution for about three decades.

The work of the participants at the workshop, as documented by the abstractsin this report, shows that combinatorics on words is an active field with manyfacets and surprising connections. We are grateful to all participants for theircontributions to this successful workshop as well as to the staff of the MFO fortheir perfect service.

Mini-Workshop: Combinatorics on Words 2197

Mini-Workshop: Combinatorics on Words

Table of Contents

Jean-Paul Allouche (joint with Laurence Maillard-Teyssier)Inconstancy and complexity of finite and infinite sequences . . . . . . . . . . . . 2199

Valerie BertheWord combinatorics, S-adic sequences and multidimensional continuedfractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2202

Julien CassaigneWords of very low factor complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2204

James D. CurriePower-free Sequences: Topology, Reachability and Curling Numbers . . . . 2205

Aldo de LucaOn a palindromization map on free monoids . . . . . . . . . . . . . . . . . . . . . . . . 2207

Amy GlenOn (almost) rich words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2209

Tero Harju (joint with Volker Diekert, Dirk Nowotka)Weinbaum Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2212

Stepan Holub (joint with Dirk Nowotka)Periods and Unbordered Factors: The Ehrenfeucht-Silberger Problem . . . 2213

Juhani KarhumakiIndependent Systems of Word Equations and Related Topics . . . . . . . . . . 2215

Dirk Nowotka (joint with Bastian Bischoff)Word periods under involution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2219

Pascal OchemRepetition-free colorings of trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2222

Elena V. PribavkinaOn the Minimal Uncompletable Word Problem . . . . . . . . . . . . . . . . . . . . . . 2224

Eric Rowland (joint with Bobbe Cooper, Doron Zeilberger)Ambiguity in a certain context-free grammar . . . . . . . . . . . . . . . . . . . . . . . . 2226

Kalle SaariThe enumeration of squares and runs in the Fibonacci words revisited . . 2229

Jeffrey ShallitOn Patterns and Pattern Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2230


Thomas StollOn the sum of digits of n and nh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2237

Luca Q. ZamboniA Note on Coloring Factors of Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2240


Abstracts

Inconstancy and complexity of finite and infinite sequences

Jean-Paul Allouche

(joint work with Laurence Maillard-Teyssier)

This text is an extended abstract of a paper of the same authors [4].

1. Introduction

What is a “complicated” or “complex” (finite or infinite) sequence? The readerwill probably agree that the sequence 0000... is very “simple”, that the sequence010101... is also simple though may be less simple, that an eventually periodicsequence with a very long preperiod might be very complicated, and that a “ran-dom” sequence must be complicated. One can imagine of several ways of definingcomplexity in mathematical terms. Classical approaches include

• algorithmic complexities: in particular Kolmogorov-Solomonoff-Chaitincomplexity and its relation to compressibility: see, e.g., [16];

• combinatorial complexities: in particular block – or factor – complexity,see, e.g., [1, 11], repetition complexity, see, e.g., [15], palindrome complex-ity, see, e.g., [3], arithmetical complexity, see, e.g., [6], maximal patterncomplexity, see, e.g., [12, 13]. Also see the paper [14], and the talk of J.Cassaigne at this mini-workshop. We also mention quasi-periodicity, re-currence and uniform recurrence, some definitions of pseudo-randomness,e.g., [17], the study of certain subsequences of the given sequence – in par-ticular the measure of automaticity introduced by Shallit et al., see, e.g.,Chapter 15 of [5];

• number-theoretical complexity: in relation with periodicity, but also withalgebraicity properties of real numbers, formal power series or continuedfractions associated with a given sequence, see, e.g., [2].

Now what is a fluctuating (finite or infinite) sequence? How is it possible todetect and define sequences that admit large variations or fluctuations? A classicalcriterion is the residual variance of a sequence: this is a measure of the “distance”between the piecewise affine curve associated with the sequence and its regressionline. Residual variance does not discriminate between a sequence that oscillateswildly and a sequence that grows very rapidly. We thus propose to bring to light anold result of Cauchy and Crofton, in order to define what we call the inconstancyof a sequence. This definition is based upon the idea that a complicated curve iscut by a “random” straight line in many more points than a “quasi-affine” curve.

2. Cauchy-Crofton’s theorem. Inconstancy of a sequence

Let Γ be a plane curve. Let ℓ(Γ) denote its length and let δ(Γ) denote theperimeter of its convex hull. Any straight line in the plane can be defined as theset of (x, y) such that x cos θ + y sin θ − ρ = 0, where θ belongs to [0, π) and ρ


is a real number, and hence is completely determined by (ρ, θ). Letting µ denotethe Lebesgue measure on the set (ρ, θ), ρ ≥ 0, θ ∈ [0, π), the average numberof intersection points between the curve Γ and straight lines is defined to be thequantity ∫

D∈Ω(Γ)

♯(Γ ∩D)dρ dθ

µ(Ω(Γ))

where Ω(Γ) is the set of straight lines which intersect Γ.

The following result can be found in [10, p. 184–185], see also the papers ofCauchy [8, 9].

Theorem 1 (Cauchy-Crofton). The average number of intersection points betweenthe curve Γ and the straight lines in Ω(Γ) satisfies the equality

∫

D∈Ω(Γ)

♯(Γ ∩D)dρ dθ

µ(Ω(Γ))=

2ℓ(Γ)

δ(Γ)·

Remark 2. The reader will have noted the relation between this theorem and theBuffon needle problem (see [7, p. 100–104]).

The theorem of Cauchy-Crofton leads us to the following definition.

Definition 3. Let Γ be a plane curve. Let ℓ(Γ) be its length and δ(Γ) the perimeterof its convex hull. The inconstancy of the curve Γ, denoted I(Γ), is defined by

I(Γ) := 2ℓ(Γ)

δ(Γ)·

Remark 4. The minimal value of I(Γ) is 1. It is obtained in particular when Γ isa segment.

3. First results

In [4] we compare inconstancy with residual variance for very simple sequences.Then we compute the inconstancy of classical infinite sequences. We give some ofour results below.

Theorem 5. Let (un)n≥0 be an infinite sequence taking two values 0 and h >0, with u0 = 0. We make the assumption that the frequencies of occurrencesof the blocks 00, hh, 0h, h0 in the sequence exist and are respectively equal toF00,Fhh,F0h,Fh0. Then

I((un)n≥0) = F00+Fhh+(√1 + h2)(F0h+Fh0) = 1+(

√1 + h2− 1)(F0h+Fh0).

Remark 6. A similar result holds with sequences taking any finite number of values.

Corollary 7. We have in particular the following results for binary sequences(taking only values 0 and 1). Let ((mn)n≥0) be the Thue-Morse sequence; let


((rn)n≥0) be the Shapiro-Rudin sequence; let ((zn)n≥0) be the regular paperfoldingsequence.

I((01)∞) =√2 = 1.414...

I((021)∞) = 1+2√2

3 = 1.276...

I((mn)n≥0) = 1+2√2

3 = 1.276...

I((031)∞) = 1+√2

2 = 1.207...

I((rn)n≥0) = 1+√2

2 = 1.207...

I((zn)n≥0) = 1+√2

2 = 1.207...

I((un)n≥0) = 1+√2

2 = 1.207... (for almost all sequences u)I(0∞)) = 1.

4. Possible applications

We began checking whether inconstancy is a pertinent measure of fluctuation,or even a prediction tool in different domains: variations of BMI (body mass index)and metabolic syndrome (in relation with cardio-vascular diseases, see, e.g., [18]),smoothness of musical themes, and fluctuations of the stockmarket.

References

[1] J.-P. Allouche, Sur la complexite des suites infinies, Bull. Belg. Math. Soc. 1 (1994) 133–143.[2] J.-P. Allouche, Automates et algebricites, J. Theor. Nombres Bordeaux 17 (2005) 1–11.[3] J.-P. Allouche, M. Baake, J. Cassaigne, D. Damanik, Palindrome complexity, Theoret. Com-

put. Sci. 292 (2003) 9–31.[4] J.-P. Allouche, L. Maillard-Teyssier, Inconstancy of finite and infinite sequences, Preprint

2009, http://arxiv.org/abs/0910.1173[5] J.-P. Allouche, J. Shallit, Automatic sequences. Theory, applications, generalizations, Cam-

bridge University Press, Cambridge, 2003.[6] S. V. Avgustinovich, D. G. Fon-Der-Flaass, A. E. Frid, Arithmetical complexity of infinite

words, in M. Ito and T. Imaoka, editors, Words, Languages & Combinatorics III, ICWLC2000, Kyoto, Japan, March 14-18, 2000, World Scientific Publishing, Singapore, 2003, pp.51–62.

[7] G. L. Leclerc Buffon, Histoire naturelle generale et particuliere, Supplement, Tome qua-trieme, Imprimerie Royale, 1777.

[8] A. Cauchy, Notes sur divers theoremes relatifs a la rectification des courbes, et a la quadra-ture des surfaces, C. R. Acad. Sci. Paris 13 (1841) 1060–1063. (Also in Œuvres completes6, Gauthier-Villars, Paris, pp. 369–375, 1888.)

[9] A. Cauchy, Memoire sur la rectification des courbes et la quadrature des surfaces courbes,Mem. Acad. Sci. Paris 22 (1850) 3–15. (Also in Œuvres completes 2, Gauthier-Villars,Paris, pp. 167–177, 1908.)

[10] M. W. Crofton, On the theory of local probability, applied to straight lines drawn at random

in a plane; the methods used being also extended to the proof of certain new theorems inthe Integral Calculus, Philos. Trans. R. Soc. Lond. 158 (1868) 181–199.

[11] S. Ferenczi, Z. Kasa, Complexity for finite factors of infinite sequences, Theoret. Comput.Sci. 218 (1999) 177–195.

[12] T. Kamae, L. Q. Zamboni, Sequence entropy and the maximal pattern complexity of infinitewords, Ergodic Theory Dynam. Systems 22 (2002) 1191–1199.

[13] T. Kamae, L. Q. Zamboni, Maximal pattern complexity of discrete dynamical systems,Ergodic Theory Dynam. Systems 22 (2002) 1201–1214.


[14] L. Ilie, Combinatorial Complexity Measures for Strings, in Recent advances in formal lan-guages and applications, Studies in Computational Intelligence, 2006, Volume 25/2006, pp.149–170.

[15] L. Ilie, S. Yu, K. Zhang, Word complexity and repetitions in words, Computing and Com-binatorics Conference—COCOON’02, Internat. J. Found. Comput. Sci. 15 (2004) 41–55.

[16] M. Li, P. Vitanyi, An introduction to Kolmogorov complexity and its applications, Thirdedition, Springer Verlag, 2008.

[17] C. Mauduit, A. Sarkozy, On finite pseudorandom binary sequences. I. Measure of pseudo-randomness, the Legendre symbol, Acta Arith. 82 (1997) 365–377.

[18] A.-C. Vergnaud, S. Bertrais, J.-M. Oppert, L. Maillard-Teyssier, P. Galan, S. Hercberg, S.Czernichow, Weight fluctuations and risk for metabolic syndrome in an adult cohort, Int.J. Obesity 32 (2008) 315–321.

Word combinatorics, S-adic sequences and multidimensionalcontinued fractions

Valerie Berthe

This survey lecture is about the numerous occurrences of Euclid’s algorithm andcontinued fraction algorithms (in their usual form, or in generalized versions) inword combinatorics. One motivation is the well-known and particularly fruitfulinteraction between Sturmian sequences, rotations of T1, and regular continuedfractions. For generalizations, see also the survey [4].

As another motivation let us recall Fine and Wilf’s theorem. This theorem givesa condition on the length of the periods a finite word can have. More precisely, ifw is a word having periods p and q with length greater than or equal to p + q −gcd(p, q), then w has period gcd(p, q). Asume now p and q coprime. The familyof words with length p+ q− 2 that are p and q periodic is particularly interesting.Such extremal words (with respect to Fine and Wilf’s theorem) are known to beparticular factors of Sturmian words, and their study involves once again Euclid’salgorithm. For more details, see [9] and the references therein.

There exist two natural types of generalizations of Fine and Wilf’s theorem,either by extending the size of the alphabet [7], or by considering multidimensionalwords [12]. Extremal words for these generalizations can also be described in termsof multidimensional continued fraction algorithms. In particular, in the formercase, an algorithm in the flavour of the fully subtractive algorithm [11] allows theconstruction of extremal words [13]. See also the lecture by A. de Luca whichevokes duality properties for Christoffel words and generalizations, obtained byreversing the corresponding generalized Euclid’s algorithm.

The connection between word combinatorics and multidimensional continuedfractions is particularly striking within the so-called S-adic framework. Let usrecall that a substitution is a non-erasing morphism of the free monoid. A se-quence is said to be S-adic if it is generated by an infinite composition of a finitenumber of substitutions. S-adic sequences generalize in a natural way substitutivesequences. The S-adic expansion of a Sturmian word can be described thanks tothe continued fraction expansion of its slope [5]. More generally, infinite wordshaving an at most linear number of factors of a given length (they are said to be


of linear complexity) are known to be S-adic, if they are furthermore assumed tobe minimal [8]. For more details, see also the lecture by J. Cassaigne. This coversvarious families of infinite words with a rich dynamical behaviour. In order tounderstand the geometric and symbolic nature of the dynamical systems that aregenerated by such infinite words, we are mainly interested in the two followingproblems: first, finding geometric interpretations of various symbolic dynamicalsystems including those generated by substitutions or by S-adic generation, andsecondly, developing multidimensional continued fraction algorithms reflecting thedynamics of the systems.

Several combinatorial questions can be formuated in an efficient way in thisS-adic/continued fraction framework. Given an S-adic sequence, one can askwhether this sequence is substitutive, that is, whether it is a letter-to-letter projec-tion of a fixed point of a substitution. Substitutive Sturmian sequences correspondto quadratic angles (for more details, see [5]). This result can be considered asa version of Galois’ theorem for continued fraction expansions. See also [6] for aconnected result: if all the parameters of an interval exchange belong to the samequadratic extension, the sequence of induced interval exchanges (by performingalways the same induction process) is ultimately periodic.

More generally, convergence issues (and Diophantine approximation properties)for a multidimensional continued fractions algorithm underlying a family of infinitewords correspond to the question of convergence toward frequencies of factors,which can themselves be expressed in measure-theoretic terms (in particular if onehas unique ergodicity).

The study of S-adic words thus leads to numerous questions that are of acombinatorial, arithmetic or else dynamical nature. Among them, the so-calledS-adic conjecture aims at finding a characterization of infinite words having linearcomplexity in S-adic terms. This conjecture is still open. Note that during thelecture, the following interesting decision problem has been raised by J. Shallit:Given a finite set of substitutions φ1, φ2, · · · , φn, and a word w, decide if w appearsas a factor of some S-adic infinite word generated by the φi.

References

[1] P. Arnoux, S. Ito, Pisot substitutions and Rauzy fractals, Bull. Bel. Math. Soc. Simon Stevin8 (2001), 181–207.

[2] P. Arnoux, S. Ito, Y. Sano, Higher dimensional extensions of substitutions and their dualmaps, J. Anal. Math. 83 (2001), 183–206.

[3] P. Arnoux, G. Rauzy, Representation geometrique de suites de complexite 2n+1, Bull. Soc.Math. France 119 (1991), 199–215.

[4] V. Berthe, S. Ferenczi, L.Q. Zamboni Interactions between dynamics, arithmetics, andcombinatorics: the good, the bad, and the ugly, Algebraic and Topological Dynamics, S.Kolyada, T. Manin, and T. Ward eds., Contemporary Mathematics (CONM), AMS, Amer-ican Mathematical Society, 385 (2005), 333–364.

[5] V. Berthe, C. Holton, L. Q. Zamboni, Initial powers of Sturmian sequences, Acta Arith.122 (2006), 315–347.

[6] M. D. Boshernitzan, C. R. Carroll, An extension of Lagrange’s theorem to interval exchangetransformations over quadratic fields, J. Anal. Math. 72 (1997), 21–44.


[7] M. G. Castelli, F. Mignosi, A. Restivo, Fine and Wilf ’s theorem for three periods and agenereralization of Sturmian words, Theoret. Comput. Sci. 218 (1999), 83–94.

[8] S. Ferenczi, Rank and symbolic complexity, Ergodic Theory Dynam. Systems 16 (1996)663–682.

[9] M. Lothaire, Algebraic combinatorics on words, Cambridge University Press (2002).[10] N. Pytheas Fogg, Substitutions in dynamics, arithmetics and combinatorics, Lecture Notes

in Mathematics, 1794, Edited by V. Berthe, S Ferenczi, C.Mauduit and A. Siegel, Springer-Verlag (2002).

[11] F. Schweiger, Multi-dimensional continued fractions, Oxford Science Publications, OxfordUniv. Press, Oxford (2000).

[12] R.J. Simpson, R.Tijdeman Multi-dimensional versions of a theorem of Fine and Wilf anda formula of Sylvester, Proc. Amer. Math. Soc. 131 (2003), 1661-1667.

[13] R. Tijdeman, L. Q. Zamboni, Fine and Wilf words for any periods II., Theor. Comput. Sci.410 (2009), 3027–3034.

Words of very low factor complexity

Julien Cassaigne

Let u ∈ AN be an infinite word. The factor complexity of u is the functionp : N → N defined by: p(n) is the number of words of length n occurring in u(factors of u). Morse and Hedlund [5] proved that p(n) ≥ n+1 for non-eventually-periodic words. Words for which p(n) = n+ 1 are called Sturmian words. Wordsfor which p(n) = n + c for some constant c can be deduced from them [3]. Weare interested in words “just above” this, roughly n + 1 ≤ p(n) ≤ 2n. Let αu =

lim inf p(n)n and βu = lim sup p(n)

n , and Ω = (αu, βu) : u ∈ AN ⊆ (R+ ∪ +∞)2.Then the general problem, essentially open, is: what is the structure of Ω?

Heinis proved [4] that β − α ≥ (2−α)(α−1)α . In particular, 1 < α = β < 2

is impossible.1 Aberkane [1] contructed a sequence of points of Ω converging to(1, 1); on the other hand (32 ,

53 ) seems to be an isolated point.

The main tool to study these words is the sequence of Rauzy graphs : Γn isthe directed graph with vertices Ln(u) (the factors of length n of u) and edgesLn+1(u), with an edge from x to y labelled with z if and only if z ∈ xA ∩ Ay.For Sturmian words, only two shapes of graphs are possible. For recurrent wordswith p(n) ≤ 4

3n+ 1, two new shapes appear. Such a word is then defined (up toshift, etc.) by a path in the “graph of graphs”, and some of its properties (α andβ, frequencies, etc.) may be deduced from this path. The path also provides ans-adic representation (infinite composition of substitutions), which can be viewedas a generalized continued fraction expansion.

References

[1] Ali Aberkane, Words whose complexity satisfies limp(n)n

= 1, Theoret. Comput. Sci 307

(2003), 31–46.[2] Julien Cassaigne and Francois Nicolas, Factor complexity, in Combinatorics, automata and

number theory, ed. V. Berthe et M. Rigo, Cambridge University Press, 2010, 163–247.

1this is generalized in [2]: if limp(n)n

exists, then it must be an integer.


[3] Ethan M. Coven, Sequences with minimal block growth II, Math. Systems Theory 8 (1975),376–382.

[4] Alex Heinis, The P (n)/n function for bi-infinite words, Theoret. Comput. Sci 273 (2002),35–46.

[5] Marston Morse and Gustav A. Hedlund, Symbolic Dynamics II. Sturmian trajectories, Amer-ican J. Math. 62 (1940), 1–42.

Power-free Sequences: Topology, Reachability and Curling Numbers

James D. Currie

A word is repetitive if it contains two identical blocks. A word containing k con-secutive identical blocks is said to contain a k power. Dejean [5] also introducedthe study of words containing fractional k powers.

Thue [11] showed that there are infinite words over the three letter alphabeta, b, c which are non-repetitive. Infinite nonrepetitive words (sequences) havebeen used to build counter-examples in algebra [8], ordered sets [12], symbolicdynamics [7] and other areas. A non-empty set L of infinite words is perfect iffor any u ∈ S and any n there is a word v ∈ L, v 6= u such that u and v have acommon prefix of length at least n.

Open problem 1. Let L be the set of infinite words over Σ which avoid patternp. Is L perfect?

Abusing notation, consider L to contain also the finite factors of its words. Weconsider the partial order where u < v iff u is a prefix of v. The diagram of Lunder this order is a tree with root ǫ, the empty word. We will identify L withthis tree. The meet of finite words u, v is their longest common prefix, denotedby u ∧ v.

Given words u ≤ v in L, the closed interval [u, v] is the set w ∈ L : u ≤ w ≤ v.For notational convenience, we also define [u,∞] = w ∈ L : u ≤ w. Supposethat u < v and

(1) [v,∞] is infinite for at most one upper cover v of v.(2) [u,∞]− [v,∞] is finite.

In this case any path in L from u to ∞ must traverse the vertices of [u, v]. Werefer to the set Bv(u, v) = B(u, v) = [u,∞]− [v,∞] as a bottleneck. The lengthof B = B(u, v) is |B| = |v| − |u|+ 1. The index of B is ι(B) = |u|. Suppose thatB(u, v) is a bottleneck and u < v are elements of [u, v]. It follows that at mostone cover of v has an infinite extension and we can form a bottleneck B(u, v).

We consider the case where L is the language of square-free words over a three-letter alphabet. Long bottlenecks in L must occur far out, i.e. for a bottleneck Bof L

(1) ι(B) ≥ f(|B|), whenever |B| > N0.

Here N0 is some constant, while f is eventually increasing and unbounded. Letg be a function such that for any non-negative integer M we have f(x) > Mwhenever x > g(M).


Theorem 8. Language L is infinite if and only if it contains a word of lengthmax(N0, g(0)).

This is a special case (u = ǫ) of the following:

Theorem 9. Word u ∈ L is a prefix of infinitely many words in L if and only ifL contains a word v > u, with |v| − |u| = max(N0, g(|u|)).

Inequality (1) can also be used to show that L is perfect.

Theorem 10. If L is infinite, then L is perfect.

Thus the infinite tree L is constantly branching. The proof is readily sharpenedto put a bound on how far L can go without branching.

Theorem 11. Suppose that u has an infinite extension in L. Then there is aword v > u, |v| ≤ |u| − 1+ g(|u|) such that v is the meet of two infinite extensionsof u in L.

Function f can be taken to have the form f(x) = ax3/2, some positive constanta. Thus g can be taken to be g(x) = max(N + 1, (x/a)2/3). This implies thefollowing:

Corollary 12. The set of nonrepetitive words over 1, 2, 3 of length n growsexponentially.

A striking aspect is that all of this structural information about L is demon-strated non-constructively! [2, 3, 4] We will argue that these non-constructivemethods should be used to attack the following three open problems:

Open problem 2. (Restivo/Salemi) A reachability problem [9]: Suppose that uis a prefix of infinitely many words of L, and v is a suffix of infinitely many wordsof L. Does L contain a word of the form uwv?

Open problem 3. (Sloane et al.) Curling numbers [1]: Given a finite word w,the curling number of w is the largest integer n such that w can be written uvn

for some words u and v. Starting with any word w, we can form the curlingnumber sequence of w: We let w0 = w, and wi+1 is formed by appending to wi

its curling number. The conjecture is that if one starts with any finite word andbegins to form the curling number sequence, one will eventually reach a 1.

Open problem 4. (Guay-Paquet and Shallit) Lexicographically least words [6]:It is conjectured that the lexicographically least infinite word over N avoiding 5/2powers uses only three letters.

References

[1] Benjamin Chaffin & N. J. A. Sloane, The Curling Number Conjecture, arXiv:0912.2382.[2] James. D. Currie, On the structure and extendibility of k-power free words, Eur. J. Comb.

(1995) 16, 111–124.[3] James. D. Currie and Robert O. Shelton, Cantor sets and Dejean’s conjecture, J. Automata,

Languages and Combin. 1 (1996), 113–128.


[4] James. D. Currie and Robert O. Shelton, On the structure and extendibility of k-power freewords II, submitted.

[5] Francoise Dejean, Sur un theoreme de Thue, J. Combin. Theory Ser. A 13 (1972), 90–99.[6] Mathieu Guay-Paquet & Jeffrey Shallit, Avoiding squares and overlaps over the natural

numbers, Discrete Mathematics 309 (2009), 6245-6254.[7] Marston Morse & Gustav A. Hedlund, Symbolic dynamics I, II, Amer. J. Math. 60 (1938),

815–866; 62 (1940) 142; MR 1, 123d.[8] P. S. Novikov & S. I. Adjan, Infinite periodic groups I, II, III, Izv. Akad. Nauk. SSSR Ser.

Mat. 32 (1968), 212–244;251–524;709–731;MR 39 #1532a–c.[9] Antonio Restivo & Sergio Salemi Words and Patterns Developments in Language Theory

2001 117–129.[10] Robert O. Shelton, On the structure and extendibility of square-free words, Combinatorics

on Words: Progress and Perspectives (1983) Academic Press, 101–188.

[11] Axel Thue, Uber unendliche Zeichenreihen, Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Chris-tiana (1906), 1–22.

[12] William T. Trotter & Peter Winkler, Arithmetic Progressions in Partially Ordered Sets,Order 4 (1987) 37–42.

On a palindromization map on free monoids

Aldo de Luca

The right-palindromic closure of a word u of a free monoid A∗ over the alphabetA, is the shortest palindrome u(+) of A∗ having u as a prefix. In a seminal paper[6] of 1997, the author introduced in the case of a binary alphabet, an injectivemap, called palindromization map, associating to each word v a palindrome byan iterated application, ‘directed’ by the word v, of the right palindromic closureoperator. More precisely, the palindromization map ψ : A∗ → PAL, where PALis the set of palindromes of A∗, is inductively defined as: ψ(ε) = ε (ε denotes theempty word) and for all u ∈ A∗ and x ∈ A,

ψ(ux) = (ψ(u)x)(+) .

For all words v if u is a prefix of v, then ψ(u) is a palindromic prefix (and suffix)of ψ(v) and, conversely, every palindromic prefix of ψ(v) is of the form ψ(u) forsome prefix u of v. For any w ∈ ψ(A∗) the unique word u such that ψ(u) = w iscalled the directive word of w. For instance, if A = a, b, c and v = aabc, one hasψ(a) = a, ψ(aa) = aa, ψ(aab) = (aab)(+) = aabaa, and ψ(aabc) = (aabaac)(+) =aabaacaabaa.

It was proved in [6] that if the palindromization map is extended to infinitebinary directive words such that each letter occurs infinitely many times in them,then one can construct all standard Sturmian words.

If one extends the action of palindromization map to infinite words over arbi-trary finite alphabets, one can generate a wider class of words, called standardepisturmian, introduced in 2001 by X. Droubay, J. Justin, and G. Pirillo in [12].This class includes standard Sturmian words and Arnoux-Rauzy words [1]. In thisextension of Sturmian words some properties are lost (for instance, the balanceproperty) and other are preserved (for instance, the richness in palindromes of


their factors). In any case episturmian words satisfy very interesting combinato-rial and structural properties and, in fact, many papers have been written on thesubject (see, for instance, the overview papers [2, 13]).

In this theory a key role is played by the class of palindromic prefixes of allstandard episturmian words over a given alphabet called epicentral words, andsimply central in the case of a binary alphabet [15]. These words are precisely theimages of a finitely generated free monoid by the palindromization map. Epicen-tral words satisfy interesting combinatorial properties since they can have severaldifferent representations. In fact, besides directive words which are related to thepalindromization map, epicentral words can be represented by periods (period vec-tor) and composition (Parikh vector). Further important representations can bedone by using matrices, trees, and graphs [10, 11].

The previous representations are also useful for the problem of counting theepicentral words and the palindromes of any length in all episturmian words overa given alphabet [5]. For any k, the map Pk which counts for any n the epicentralwords of length n on the k-letter alphabet, is a suitable extension to the casek > 2 of Euler’s totient function ϕ. Indeed, for k = 2, P2(n) = φ(n + 2)[9]; fork > 2 a general arithmetic interpretation for Pk(n) in terms of a multidimensionalgeneralization of the Euclidean algorithm is in [16] (see also [10]). The behavior ofthe map Pk is quite irregular and oscillating. Some conjectures based on a tableof numerical values of Pk (3 ≤ k ≤ 6 and 1 ≤ n ≤ 500) are formulated. In [5]a formula for the map gk counting for any n the palindromes of length n in allepisturmian words over a k letter alphabet is given. This formula for gk, extendinga result found in [7] in the case k = 2, depends on the map Pk.

The palindromization map has been recently extended to the case of free-groupF2 by C. Kassel and C. Reutenauer [14]. Moreover, in [8] a (right) θ-palindromicclosure operator, where ϑ is any involutory antimorphism of a free monoid, hasbeen introduced. The fixed points of ϑ are called ϑ-palindromes. For any word uthe ϑ-palindromic closure of u is the shortest ϑ-palindrome having u as a prefix.Similarly to the case of the reversal operator, a ϑ-palindromization map can bedefined; it associates to each word v a ϑ-palindrome by an iterated applicationof ϑ-closure operator ‘directed’ by the word v. By acting with this operator onany infinite directive word one obtains a class of words larger than the class ofstandard episturmian, that we called θ-standard words (or simply pseudostandardif one does not refer to a particular ϑ); when θ is the reversal operator one obtainsthe class of standard episturmian words.

In [4, 3] two more general families of words have been introduced. The firstis the family of the ϑ-standard words with seeds, that is the words obtained byiteration of the operator of θ-palindromic closure starting with a non-empty wordcalled seed. A second family of words called generalized pseudostandard words, isformed by the pseudostandard words directed by 2-words: the directive word anda word describing the antimorphism to use at each iteration.


References

[1] P. Arnoux, G. Rauzy. Representation geometrique de suites de complexite 2n+1, Bull. Soc.Math. France 119 (1991) 199–215

[2] J. Berstel, Sturmian and Episturmian words (A survey of some recent results), Lecture Notesin Computer Science, vol. 4728, Springer-Verlag Berlin 2007, pp. 23–47

[3] M. Bucci, A. de Luca, A. De Luca, L. Zamboni, On different generalizations of episturmianwords, Theoretical Computer Science 393 (2008) 23–36]

[4] M. Bucci, A. de Luca, A. De Luca, L.Q. Zamboni, On some problems related to palindromeclosure, Theoretical Informatics and Applications 42 (2008) 679–700

[5] M. Bucci, A. de Luca, A. De Luca, On the number of episturmian palindromes, TheoreticalComputer Science 411 (2010) 3668–3684

[6] A. de Luca, Sturmian words: Structure, Combinatorics, and their Arithmetics, TheoreticalComputer Science 183 (1997) 45–82

[7] A. de Luca, A. De Luca, Combinatorial properties of Sturmian palindromes, InternationalJournal of Foundations of Computer Science 17 (2006) 557–573

[8] A. de Luca, A. De Luca, Pseudopalindrome closure operators in free monoids, TheoreticalComputer Science 362 (2006) 282–300

[9] A. de Luca, F. Mignosi, Some combinatorial properties of Sturmian words, TheoreticalComputer Science 136 (1994) 361–385

[10] A. de Luca, L. Q. Zamboni, On graphs of central episturmian words, Theoretical ComputerScience 411 (2010) 70–90

[11] A. de Luca, L. Q. Zamboni, Involutions of epicentral words, European Journal of Combina-torics 31 (2010) 867–886

[12] X. Droubay, J. Justin, and G. Pirillo, Episturmian words and some constructions of de Lucaand Rauzy, Theoretical Computer Science 255 (2001) 539–553

[13] A. Glen, J. Justin, Episturmian words: A survey, Theoretical Informatics and Applications43 (2009) 403–442

[14] C. Kassel, C. Reutenauer, A palindromization map for the free group, Theoretical ComputerScience 409 (2008) 461–470

[15] M. Lothaire, Algebraic Combinatorics on Words, Encyclopedia of Mathematics and itsApplications, vol. 90, Cambridge University Press (Cambridge, 2002)

[16] F. Mignosi, L.Q. Zamboni, On the number of Arnoux-Rauzy words, Acta Arithmetica 101

(2002) 121–129

On (almost) rich words

Amy Glen

In recent years there has been growing interest in palindromes in the field of combi-natorics on words, especially since the work of A. de Luca [8] and also X. Droubayand G. Pirillo [10], who showed that the well-known Sturmian words are charac-terised by their palindromic complexity [1, 3, 5]. A strong motivation for the studyof palindromes, and in particular infinite words containing arbitrarily long palin-dromes, stems from applications to the modelling of quasicrystals in theoreticalphysics (see for instance [7, 17]) and to Diophantine approximation (e.g., see [14]).

In [9], X. Droubay, J. Justin, and G. Pirillo observed that any finite wordw of length |w| contains at most |w|+1 distinct palindromes (including the emptyword). Even further, they proved that a word w contains exactly |w| + 1 distinctpalindromes if and only if the longest palindromic suffix of any prefix p of woccurs exactly once in p (i.e., every prefix of w has Property Ju [9]). Such words


are ‘rich’ in palindromes in the sense that they contain the maximum number ofdifferent palindromic factors. Accordingly, we say that a finite word w is rich if itcontains exactly |w| + 1 distinct palindromes (or equivalently, if every prefix of whas Property Ju). Naturally, an infinite word is rich if all of its factors are rich. Inindependent work, P. Ambroz, C. Frougny, Z. Masakova, and E. Pelantova haveconsidered the same class of words which they call full words in [2], following theearlier work of S. Brlek, S. Hamel, M. Nivat, and C. Reutenauer in [5].

In [9], X. Droubay et al. also showed that the family of episturmianwords [9, 18], which includes the well-known Sturmian words, comprises a spe-cial class of rich infinite words. Specifically, they proved that if an infinite wordw is episturmian, then any factor u of w contains exactly |u| + 1 distinct palin-dromic factors. (See [4, 15, 19] for recent surveys on the theory of Sturmianand episturmian words.) Another special class of rich words consists of S. Fis-chler’s sequences with “abundant palindromic prefixes”, which were introducedand studied in [13] in relation to Diophantine approximation (see also [14]). Otherexamples of rich words have appeared in many different contexts; they includethe complementation-symmetric Rote sequences [1], certain words associated withβ-expansions where β is a simple Parry number [2, 6], and symbolic codings oftrajectories of symmetric interval exchange transformations [11, 12].

The first unified study of combinatorial and structural properties of richwords was carried out by myself and J. Justin, published in a joint paper withS. Widmer and L.Q. Zamboni who studied a wider class of words for which suc-cessive occurrences of any letter are separated by palindromes (called weakly richwords) – see [16].

In this talk, I will begin by giving a brief overview of some fundamentalproperties of rich words. In particular, I will show that rich words are charac-terised by the property that all complete returns to any palindromic factor arepalindromes. I will also give a more explicit description of periodic rich infinitewords. I will then discuss so-called almost rich words: they are infinite words forwhich only a finite number of prefixes do not satisfy Property Ju. Such words canalso be defined in terms of the defect of a finite word w, which is the differencebetween |w|+1 and the number of distinct palindromic factors of w (see the workof Brlek et al. in [5] where periodic infinite words with finite defect are charac-terised). With respect to this notion, rich words are those with defect 0 and almostrich words are infinite words with finite defect.

Lastly, I will consider the action of morphisms on (almost) rich words, withparticular interest in morphisms that preserve (almost) richness. We have shownthat such morphisms belong to the class of P -morphisms that was introduced byA. Hof, O. Knill, and B. Simon in [17] (see also the nice survey on palindromiccomplexity by J. Allouche et al. [1]), but it remains an open problem to char-acterise them. This is related to the following long-standing open question posedin [17]: are there (uniformly recurrent) infinite words containing arbitrarily longpalindromes that arise from primitive morphisms, none of which belongs to class


P? The answer is believed to be no. Up until now, it has only been shown to holdin the periodic case (see [1]) and also in the 2-letter case (see [20]).

References

[1] J.-P. Allouche, M. Baake, J. Cassaigne, D. Damanik, Palindrome complexity, Theoret. Com-put. Sci. 292 (2003) 9–31.

[2] P. Ambroz, C. Frougny, Z. Masakova, E. Pelantova, Palindromic complexity of infinite wordsassociated with simple Parry numbers, Ann. Inst. Fourier (Grenoble) 56 (2006) 2131–2160.

[3] P. Balazi, Z. Masakova, E. Pelantova, Factor versus palindromic complexity of uniformlyrecurrent infinite words, Theoret. Comput. Sci. 380 (2007) 266–275.

[4] J. Berstel, Sturmian and episturmian words (A survey of some recent results), in: Proceed-ings of CAI 2007, Lecture Notes in Computer Science, vol. 4728, 2007, pp. 23–47.

[5] S. Brlek, S. Hamel, M. Nivat, C. Reutenauer, On the palindromic complexity of infinitewords, Internat. J. Found. Comput. Sci. 15 (2004) 293–306.

[6] M. Bucci, A. De Luca, A. Glen, L.Q. Zamboni, A connection between palindromic and factorcomplexity using return words, Adv. in Appl. Math. 42 (2009) 60–74.

[7] D. Damanik, L.Q. Zamboni, Combinatorial properties of Arnoux-Rauzy subshifts and ap-plications to Schrodinger operators, Rev. Math. Phys. 15 (2003) 745–763.

[8] A. de Luca, Sturmian words: structure, combinatorics and their arithmetics, Theoret. Com-put. Sci. 183 (1997) 45–82.

[9] X. Droubay, J. Justin, G. Pirillo, Episturmian words and some constructions of de Luca andRauzy, Theoret. Comput. Sci. 255 (2001) 539–553.

[10] X. Droubay, G. Pirillo, Palindromes and Sturmian words, Theoret. Comput. Sci. 223 (1999)73–85.

[11] S. Ferenczi, L.Q. Zamboni, Language of k-interval exchange transformations, Bull. LondonMath. Soc. 40 (2008) 705–714.

[12] S. Ferenczi, L.Q. Zamboni, Structure of k-interval exchange transformations: induction,trajectories, and distance theorems, Preprint (2008), http://iml.univ-mrs.fr/~ferenczi/fz1.pdf

[13] S. Fischler, Palindromic prefixes and episturmian words, J. Combin. Theory Ser. A 113(2006) 1281–1304.

[14] S. Fischler, Palindromic prefixes and diophantine approximation, Monatsh. Math. 151 (2007)11–37.

[15] A. Glen, J. Justin, Episturmian words: a survey, Theoret. Inform. Appl. 43 (2009) 402–433.[16] A. Glen, J. Justin, S. Widmer, L.Q. Zamboni, Palindromic Richness, European J. Combin.

30 (2009) 510–531.[17] A. Hof, O. Knill, B. Simon, Singular continuous spectrum for palindromic Schrodinger

operators, Comm. Math. Phys. 174 (1995) 149–159.[18] J. Justin, G. Pirillo, Episturmian words and episturmian morphisms, Theoret. Comput. Sci.

276 (2002) 281–313.[19] M. Lothaire, Algebraic combinatorics on words, Encyclopedia of Mathematics and its Ap-

plications, vol. 90, Cambridge University Press, 2002.[20] B. Tan, Mirror substitutions and palindromic sequences, Theoret. Comput. Sci. 389 (2007)

118–124.


Weinbaum Factorizations

Tero Harju

(joint work with Volker Diekert, Dirk Nowotka)

C. M. Weinbaum [2] proved that for any primitive word w and a letter a occurringin w, there exists a conjugate of w that has a decomposition as w′ = uv such that(1) u ∈ aA∗ ∩A∗a but v /∈ aA∗ ∪A∗a, and (2) u and v have unique positions in was cyclic factors, i.e., there is exactly one conjugate of w having u as a prefix andonly one conjugate of w having v as a prefix.

Taken alone both conditions (1) and (2) are easily seen to be satisfied. Forinstance, given that w = baababbabaa, we find that the decompositions w′ =(aa)(babbabaab) and w′ = (aaba)(bbabaab) both satisfy (1) but not (2). On theother hand, the decomposition w′ = (abba)(baabaab) satisfies both conditions(1) and (2).

The following shows that the condition (2) is always satisfied.

Lemma 13. Let w = uv be a Lyndon word where v is the maximum suffix of w.Then u and v are uniquely positioned in w. Moreover, if v′ is a cyclic factor of wsuch that v ⊳ v′, then v ≤p v

′.

A simple proof of Weinbaum’s result is given in Diekert, Harju and Nowotka [1],where also the following generalization of the result are proven.

Let w be a primitive word, and f , g its factors. A factorization w′ = uv ofa conjugate of w is a Weinbaum factorization of w for f and g, if u and v areuniquely positioned in cyclic w and

u ∈ (fA∗ ∩ A∗f) \ (gA∗ ∪ A∗g),

v ∈ (gA∗ ∩A∗g) \ (fA∗ ∪ A∗f).

Note that if w′ = uv is a Weinbaum factorization of w in the original setting,then it is a Weinbaum factorization of w for a and v.

Let now w be a primitive word, f a proper factor of w, and define

G(f) =g | |fg| ≤ |w|, fgf is a cyclic factor of w2, fgf 6∈ A+fA+ ,R(f) =g ∈ G(f) | g does not occur in any other element of G(f), and

f and g do not intersect in w.A word f is called a Weinbaum factor of w, if R(f) 6= ∅.

Theorem 14 ([1]). Let w be a primitive word and let f be a Weinbaum factorof w with g ∈ R(f). Then w has a Weinbaum factorization for f and g.

Let then Ri denote the i-th iteration of the operation R. We can show thateither Ri(g) = Ri+2(g) or the set Ri+2(g) contains a word having length at leasttwice the length of a word in Ri(g). Hence, Ri(g) = Ri+2(g) for some i ≤ 2 log2(n).The bound can be improved so that we obtain the following result, where Φ =(1 +

√5)/2 is the golden ratio.


Theorem 15. Let w be a primitive word with a Weinbaum factor f , and letg ∈ R(f). If 2ℓ ≥ logΦ(n), then R2ℓ−1(g) 6= ∅, for every u ∈ R2ℓ−1(g), the setR(u) is a singleton; and for R(u) = v we obtain R(v) = u and a Weinbaumfactorization of w = uv for f and g.

References

[1] V. Diekert, T. Harju, and D. Nowotka, Weinbaum factorizations for primitive words, Izv.Vyssh. Uchebn. Zaved. Mat. (2010), no. 1, 21–33, In English: Russian Mathematics (Iz VUZ)54 (1), 16-25. MR 2664483

[2] C.M. Weinbaum, Unique subwords in nonperiodic words, Proc. Amer. Math. Soc. 109 (1990),no. 3, 615–619.

Periods and Unbordered Factors: The Ehrenfeucht-Silberger Problem

Stepan Holub

(joint work with Dirk Nowotka)

The period of a word w, denoted by π(w), is the length of the shortest word u suchthat w is a prefix of the infinite word uuu . . . . Obviously, such a shortest word uis primitive, that is, it is not a power of a shorter word. An extremal situation iswhen π(w) = |w|, where |w| denotes the length of w. In such a case, the word wis called unbordered, otherwise it is called bordered. The name is justified by thefact that a word w is bordered if and only if w = uvu for some nonempty word u,called a border of w. More generally, a border of w is any nonempty word u thatis both a prefix and a suffix of w, and u 6= w allowing a border to be longer thanthe half of the length of w. However, it is easy to conclude (see the picture below)that any bordered word has a border shorter than the half of its length. It is alsoobvious that the shortest border of a word is itself unbordered.

a b a b b a b a b b a b a b

z zz

y y

If w = uv, then w = vu = u−1wu is called a conjugate of w. Any primitiveword w has an unbordered conjugate w′; it is enough to consider a conjugate ofw that is a Lyndon word, defined as the minimal conjugate w.r.t. to a chosenlexicographic order ⊳. The proof of the fact that any Lyndon word is unborderedillustrates how elegant and efficient proofs can be when using the properties oflexicographic orders. Suppose that w = uvu is a Lyndon word w.r.t to ⊳. Bydefinition, the word w is primitive, whence uv 6= vu by the well known Periodicitylemma. If uv ⊳ vu, then also u(uv) ⊳ u(vu); if, on the other hand, vu ⊳ uv, then(vu)u ⊳ (uv)u; a contradiction in both cases.

The above property of Lyndon words in particular shows that any word uu,with u primitive, contains an unbordered factor of length |u|. It is also obviousthat any longer factor of uu is bordered. Therefore, we have τ(uu) = π(uu),


where τ(w) denotes the length of the longest unbordered factor of w. As alreadysuggested, the inequality τ(w) ≤ π(w) is obvious. Previous considerations alsoshow that τ(w) = π(w) as soon as |w| ≥ 2π(w). Moreover, the multiplicativeconstant 2 is optimal: for the word w = anban+1bban+1ban we have π(w) = 2n+6,τ(w) = 2n+ 5, and |w| = 2π(w)− 4.

It turns out that the question is much more difficult if we replace π(w) by τ(w)obtaining the following problem:

The Ehrenfeucht-Silberger Problem: What is the smallest number c suchthat |w| ≥ c τ(w) implies τ(w) = π(w)?

History. The problem was raised by Ehrenfeucht and Silberger in 1979 [3]. Theyconjectured that c = 2, which was falsified shortly thereafter by Assous andPouzet [1] by the following example:

w = anban+1banban+2banban+1ban

where n > 1 and τ(w) = 3n + 6 and π(w) = 4n + 7 and |w| = 7n + 10, thatis, τ(w) < π(w) and |w| = 7

3 τ(w) − 4 > 2τ(w). Assous and Pouzet in turnconjectured that c = 3. Duval [2] established in 1982 that c ≤ 4 and made aconjecture for a special case: if w possesses an unbordered prefix of length τ(w),then |w| ≥ 2τ(w) implies τ(w) = π(w). Duval’s conjecture was only solved in 2004[4] with a new proof given in [5]. The proof of Duval’s conjecture lowered the boundfor Ehrenfeucht and Silberger’s problem to c ≤ 3 reaching the value conjecturedby Assous and Pouzet. However, there remained a gap of 2

3 between that boundand the largest known example represented by the above Assous-Pouzet words. In2009, Holub and Nowotka [6] proved that the bound 7

3 is in fact optimal, solvingthe original Ehrenfeucht-Silberger problem.

The proof strategy. The proof has two main ideas. The first one is to factorize thestudied word w as

v′uzuv,

where |u| is the maximum length of an unbordered factor of w with at least twooccurrences in w, and |z| is the maximum distance between two such occurrences.In other words, uzu is the longest word with the longest possible shortest border.

The second idea of the proof is the concept of the α-critical suffix, which is bestexplained by the naive pattern matching algorithm. The algorithm scans a wordw looking for an occurrence of α. At each position, the scanning continues until amismatch with α is observed (or until α is found). When a mismatch occurs, thealgorithm backtracks and starts to scan the next possible position. The α-criticalsuffix of w is defined as the position of the last mismatch.

The critical suffixes combined with the above mentioned factorization are anefficient tool for finding long unbordered factors of w. The machinery is likely tobe useful even outside the Ehrenfeucht-Silberger problem. The factors found in theprocess induce seven constraints, which are shown to force either |w| ≤ 7

3 (τ(w)−1)or τ(w) = π(w).


Open problems and challenges. Although the multiplicative constant 73 is

optimal, there remains space for improvement of the additive constant, boundedby −4 due to the Assous-Pouzet words.

Open problem 1: Does |w| > 73τ(w) − 4 imply τ(w) = π(w)?

More generally:

Open problem 2: Find

infd | there is a word with |w| ≥ 73τ(w) − d and τ(w) < π(w).

The methods of the proof by Holub and Nowotka allow quite strong an insightinto the structure of words satisfying τ(w) < π(w) and |w| .= 7

3τ(w). The con-straints suggest that Assous-Pouzet words can be the only words achieving theextremal bound. Whence the following problem.

Open problem 3: Is it true that any word satisfying τ(w) < π(w) and |w| ≥73τ(w) − 4 is an Assout-Pouzet word?

Or, again more generally:

Open problem 4: For given δ, describe all words satisfying τ(w) < π(w) and|w| ≥ 7

3τ(w) − δ. What is the smallest δ such that there is more than one such aword for arbitrary fixed length?

References

[1] R. Assous and M. Pouzet, Une caracterisation des mots periodiques, Discrete Math. 25

(1979), no. 1, 1–5.

[2] J.-P. Duval, Relationship between the period of a finite word and the length of its unborderedsegments, Discrete Math. 40 (1982), no. 1, 31–44.

[3] A. Ehrenfeucht and D. M. Silberger, Periodicity and unbordered segments of words, DiscreteMath. 26 (1979), no. 2, 101–109.

[4] T. Harju and D. Nowotka, Periodicity and unbordered words: A proof of the extended Duvalconjecture, J. ACM 54 (2007), no. 4.

[5] S. Holub, A proof of the extended Duval’s conjecture, Theoret. Comput. Sci. 339 (2005),no. 1, 61–67.

[6] Stepan Holub and Dirk Nowotka, On the relation between periodicity and unbordered factorsof finite words, Int. J. Found. Comput. Sci. 21 (2010), no. 4, 633–645.

Independent Systems of Word Equations and Related Topics

Juhani Karhumaki

Theory of word equations is a fundamental topic in Combinatorics on Words. Inone hand it provides deep results – some jewels of discrete mathematics – and onthe other hand amazing simply formulated problems. As a challenging example Iurge the reader to conclude that the equation x2y3x2 = u2v3u2 has only periodicsolutions, i.e. in any solution all unknowns are powers of a common word.

In this abstract we consider one fundamental property of word equations and,in particular, open problems related to that. The property is so-called Ehrenfeuchtcompactness property, formulated as:


”Any system of word equations with a finite number of unknownsover a free monoid Σ∗ is equivalent to some of its finite subsys-tems.”

The property was formulated by A. Ehrenfeucht in 1970’s in slightly differentterms of formal languages. It was shown to hold by M. Albert and J. Lawrence [1]and simultaneously by G.S. Guba [6]. Actually, the result is a consequence of twowell known facts: the existence of embeddings of free monoids into multiplicativemonoids of integer matrices, and Hilbert’s Basis Theorem for polynomial ideals,see e.g. [7].

The compactness property immediately proposes a question

”What is the size of the above equivalent subsystem. Or moreconcretely can it be bounded by any function on the number ofunknowns?”

This is the fundamental problem we are discussing here.For clarity, we next recall the necessary terminology. Let Ξ be a finite set of

variables and Σ a finite alphabet. We denote by Σ∗ and Σ+ the free monoidand free semigroup generated by Σ, respectively. Now, an equation is a paire = (u, v) ∈ Ξ+ × Ξ+ usually written as u = v. A solution of equation in Σ∗ is amorphism h : Ξ∗ → Σ∗ satisfying h(u) = h(v). These notions extend in a naturalway to systems of equations.

Finally, we say that two systems of equations are equivalent if they possessexactly the same set of solutions, and a system is independent if it is not equivalentto any of its proper subsystems.

With these notions the compactness property can be reformulated:

”Each independent system of equations with a finite number ofunknowns over Σ∗ is finite.”

Accordingly, the second question, which is the main question in this presentation,asks:

”Is the cardinality of the maximal independent system of equationson n unknowns bounded by a function on n?”

Actually, this question can be asked separately for each value of n. Really amaz-ingly we do not know the answer even in the case n = 3! The case n = 2 istrivial.

Before we continue a few remarks are in order. Of course, the above questionscan be asked for any semigroup or monoid, and not only for free ones. And theanswer to the compactness question depends on the semigroup, see [8]. Particu-larly interesting is the case of commutative semigroups, which are in some sensecomplete opposites to free ones where no elements commute. For commutativemonoids the compactness property does hold, but even in the case of only oneunknown no bound for the size of the maximal independent system of equationsexists, that is there are arbitrarily large, but finite, such systems, see [10]. Thisalso implies indirectly that the known methods of using Hilbert’s Basis Theoremto prove the compactness property cannot give a solution to our main question.


We start the other remark with an example.

Example 16. Let Ξ = x, y, z and S the pair of equations S : xyz = zyx, xyyz =zyyx.

We leave it to the reader to figure out that this pair is independent. It alsohas a nonperiodic solution x = z = a and y = b. A question is can we add intoS a third equation with Ξ as the set of unknowns, such that it would still remainindependent and possess a nonperiodic solution

In this case this is not possible, but we have the following amazing open problemfrom [2]:

”Does there exist an independent system of three equations withthree unknowns such that it possesses a nonperiodic solution?”

This problem has been studied quite intensively, see [4], [9], [5] or [3]. However, itis only conjectured that the answer is ”no”. If this would be true it would followeasily that independent systems of three unknown equations are of cardinality atmost 3 – a solution to our main question in the case n = 3.

However, as an indication of our very poor knowledge on word equations, it isnot known whether the maximal size of independent systems of equations exist –even in case n = 3!

Nontrivial lower bounds for the size of maximal independent systems of equa-tions with n unknowns were given in [10]. One of the examples was as follows:

Example 17. Let Ξ = xi, yi, ui, wi, vi | i = 1, . . . , n and S the following set ofequations

S : xiujwkvjyi = yiujwkvjxi for i, j, k = 1, . . . , n.

Hence S contains 5n unknowns and is of cardinality n3. We claim that S isindependent over free semigroup Σ+. Let us fix i, j, k and denote the correspondingequation in S by S(i, j, k). Next we consider the morphism h defined by

h(xt) =

b2ab if t = i

a otherwiseh(up) =

ba if p = j

bab otherwiseh(wq) =

bab2 if q = k

b otherwise

h(yt) =

b if t = i

a otherwiseh(vp) =

ba if p = j

a otherwise

Straightforward calculations show that h is not a solution of the equation S(i, j, k),but is that of any other equation of S. This means that S is independent.

The conclusion of the above example is:

”The size of maximal independent system of equations with nunknowns is Ω(n3) in the free semigroup Σ+.”

Similarly, it is shown in [10] that in free monoids we can do a bit better:

”The size of maximal independent system of equations with nunknowns is Ω(n4) in the free monoid Σ∗.”


In [11] the hidden constants of the last result are slightly improved.The above asymptotic lower bound does not give anything for small values of n,

say n = 3. As we hinted in this case we know only that independent systems cancontain three equations, and that all such systems are finite! A simple example ofan independent three unknown system of equations is: x2 = y, y2 = z and z2 = x.

We conclude with a related problem. Let Sol(S) denote the set of all solutionsof a system S of equations. For a sequence s1, . . . , sm, we say that it is descendingchain of equations if

Solsi | i ≤ j * Solsi | i < j for all j = 1, . . . ,m.

This means that whenever a new equation is introduced the set of solutions de-creases. Therefore we are formalizing the question how many constrains for a setof n words we can introduce such that in each step the set of words satisfying theseconstrains becomes smaller.

An obvious connection between independent systems and descending chains ofequations is that any nonrepetitive sequence of equations from an independentsystem is a descending chain. Apart from this no interesting connections seem tobe known.

In fact, the results for descending chains are similar to independent systems.We state:

”Any descending chain of equations over Σ∗ or Σ+ is finite.”

The known asymptotic lower bounds for the maximal lengths of such chains arethose obtained for independent systems of equations. However, in the case of threeunknowns we can do better. As shown in [11] descending chain of length sevencan be constructed:

Example 18. The following sequence of seven equations provides a descendingchain. Here on the right a solution which is not anymore a solution of the nextone is shown.

xyz = zxy, x = a, y = b, z = abab

xyxzyz = zxzyxy, x = a, y = b, z = ab

xz = zx, x = a, y = b, z = 1

xy = yx, x = a, y = a, z = a

x = 1, x = 1, y = b, z = a

y = 1, x = 1, y = 1, z = a

z = 1, x = 1, y = 1, z = 1.

To conclude our knowledge on descending chains, we recall that all such chainsare finite, but we do not know whether they can be longer than seven.

References

[1] M. H. Albert and J. Lawrence, A proof of Ehrenfeucht’s conjecture, Theoret. Comput. Sci.41 (1985), no. 1, 121–123.


[2] Karel Culik, II and Juhani Karhumaki, Systems of equations over a free monoid and Ehren-feucht’s conjecture, Discrete Math. 43 (1983), no. 2-3, 139–153.

[3] Elena Czeizler, Stepan Holub, Juhani Karhumaki, and Markku Laine, Intricacies of simpleword equations: an example, Internat. J. Found. Comput. Sci. 18 (2007), no. 6, 1167–1175.

[4] Elena Czeizler and Juhani Karhumaki, On non-periodic solutions of independent systemsof word equations over three unknowns, Internat. J. Found. Comput. Sci. 18 (2007), no. 4,873–897.

[5] Elena Czeizler and Wojciech Plandowski, On systems of word equations over three unknownswith at most six occurrences of one of the unknowns, Theoret. Comput. Sci. 410 (2009),no. 30-32, 2889–2909.

[6] V. S. Guba, Equivalence of infinite systems of equations in free groups and semigroups tofinite subsystems, Mat. Zametki 40 (1986), no. 3, 321–324.

[7] Tero Harju and Juhani Karhumaki, Morphisms, Handbook of Formal Languages (GrzegorzRozenberg and Arto Salomaa, eds.), vol. 1, Springer-Verlag, 1997, pp. 439–510.

[8] Tero Harju, Juhani Karhumaki, and Wojciech Plandowski, Independent systems of equa-tions, Algebraic Combinatorics on Words (M. Lothaire, ed.), Cambridge University Press,2002, pp. 443–472.

[9] Tero Harju and Dirk Nowotka, On the independence of equations in three variables, Theoret.Comput. Sci. 307 (2003), no. 1, 139–172.

[10] Juhani Karhumaki and Wojciech Plandowski, On the size of independent systems of equa-tions in semigroups, Theoret. Comput. Sci. 168 (1996), no. 1, 105–119.

[11] Juhani Karhumaki and Aleksi Saarela, On maximal chains of systems of equations, Manu-script.

Word periods under involution

Dirk Nowotka

(joint work with Bastian Bischoff)

This talk addresses questions about unbordered words, local and global periodsgeneralized by considering involutions. Apart from general interest, this topicdraws its motivation from combinatorial questions in computational biology andDNA computing. The complementation of a single stranded DNA, understood asa word over the alphabet A,C,G, T , constitutes what we call an antimorphicinvolution where A is mapped to T and C to G and the order of letters is reversed;see also the work of Lila Kari et.al. [2, 5, 6]. The presented material results fromwork in progress.

Let A denote an alphabet and A∗ the free monoid of all finite words overA. Let w ∈ A∗ denote a word over A. Let |w| denote the length of w. Letθ denote an involution on A∗, that is, θ(θ(w)) = w. Then θ is called morphic,if θ(uv) = θ(u)θ(v), and antimorphic, if θ(uv) = θ(v)θ(u). A word w is calledprimitive, if w = ui implies i = 1 for any u ∈ A∗. The primitive root of w is theshortest word u such that w = ui for some i ≥ 1. A word w is called θ-primitive,if w ∈ u, θ(u)i implies i = 1 for any u ∈ A∗. The θ-primitive root of w is theshortest word u such that w ∈ u, θ(u)i for some i ≥ 1. If w is a prefix of uω

for some u then |u| is called (global) period of w. Let Π(w) denote the set of allperiods of w, and let π(w) denote the shortest period of w.


The notion of periodicity under involution can be defined in several ways. Ifw is a prefix of u, θ(u)ω for some u then |u| is called θ-period of w. Let Πθ(w)denote the set of all θ-periods of w, and let πθ(w) denote the shortest θ-period ofw. If w is a prefix of (u θ(u))ω for some u then |u| is called alternating θ-periodof w. Let Πalt

θ (w) denote the set of all alternating θ-periods of w, and let πaltθ (w)

denote the shortest alternating θ-period of w.Another θ-generalization of the notion of periodicity is the following. We say

that a natural p is called weak θ-period of w, if w[i] = w[i+p] or w[i] = θ(w[i+p])

for all 1 ≤ i ≤ |w|− p where w[j] denotes the jth letter of w. Let Πweakθ (w) denote

the set of all weak θ-periods of w. However, the following result shows that weakθ-periods do not seem to imply anything different than ordinary periods.

Theorem 19. Let θ be an involution on A∗ and w ∈ A∗. Let ψ be a suitablesubstitution for θ. Then Πweak

θ (w) = Π(ψ(w)).

A suitable substitution ψ for θ is a substitution with the following properties: LetA′ be an alphabet such that (1) a ∈ A′ or θ(a) ∈ A′ for all a ∈ A and (2) a ∈ A′

and a 6= θ(a) implies θ(a) 6∈ A′. Then ψ is a substitution where ψ(a) = a, ifa ∈ A′, and ψ(a) = θ(a), if a 6∈ A′. We consider only (alternating) θ-periods inthe following.

A very natural question regarding periods of words is the effects caused byoverlaps. The classical result by Fine and Wilf [4] considers periods without invo-lutions.

Theorem 20 ([4]). Let p, q ∈ Π(w) for some word w. If |w| ≥ p+ q − gcdp, qthen gcdp, q ∈ Π(w).

Czeizler et.al. consider in [2] the general antimorphic case and state:

Theorem 21 ([2]). Let θ be an anti-morphic involution on A∗ and w ∈ A∗ andp, q ∈ Πθ(w) where p > q. Let u and v be prefixes of w of length p and q,respectively. If |w| ≥ 2p + q − gcdp, q then u and v have the same θ-primitiveroot, that is they have a common period not longer than q.

We add the following for the morphic case.

Theorem 22. Let θ be a morphic involution on A∗ and w ∈ A∗.If |w| ≥ p+ q − gcdp, q with p, q ∈ Πθ(w) then gcdp, q ∈ Πθ(w).If |w| ≥ p+ q with p, q ∈ Πalt

θ (w) then gcdp, q ∈ Πaltθ (w).

All bounds given so far are tight. The only open case is the one for alternatingθ-periods where θ is antimorphic.

Theorem 23. Let θ be an antimorphic involution on A∗ and w ∈ A∗.If |w| ≥ p+ q with p, q ∈ Πalt

θ (w) then gcdp, q ∈ Πaltθ (w).

This bound however could not shown to be tight. We conjecture that the alter-nating antimorphic case has actually the bound |w| ≥ p + q − gcdp, q. Thisconjecture is still open.


The Critical Factorization theorem (CFT) is fundamental in the investigationof local periods. Let u ∼p v denote the fact that either u is a prefix of v or viceversa. Similarly, we write u ∼s v, if u is a suffix of v or vice versa. Consider afactorization w = uv, then x is called a repetition word for this factorization of w,if u ∼s x and v ∼p x. The length of x is called local period for the factorization uv.The smallest local period is denoted by π(u, v). It is straightforward to see thatπ(u, v) ≤ π(w). A factorization is called critical if π(u, v) = π(w). The criticalfactorization theorem was developed in several papers, see [7, 1, 3], and can bestated as follows.

Theorem 24 (CFT). Among any π(w) many consecutive factorizations uv ofa word w exists at least one that is critical, that is, where π(u, v) = π(w).

It is a natural generalization to consider local periods under an involution as wedid for the global periods. Given a factorization uv of w, we call x a θ-repetitionword, if u ∼s x and v ∼p θ(x). The length of x is called local θ-period for thefactorization uv. The smallest local θ-period is denoted by πθ(u, v). However, thisnotion does not yield a structural property like the CFT, neither when πθ(u, v)is related to the ordinary nor the θ-period of w, and neither for the morphic norantimorphic case, and neither for the alternating nor non-alternating case. Let thefollowing propositions exemplify this for the case of alternating θ-periods where θis morphic.

Proposition 25. Let |A| ≥ 3 and θ be a morphic involution (not the identity)on A∗. Then there exists for all p ≥ 1 a word w such that p = πalt

θ (w) andp = πθ(u, v) for all factorizations uv of w.

Proposition 26. Let θ be a morphic involution (not the identity) on A∗. Thenthere exists for all p 6= 3 a word w such that p = πalt

θ (w) and πθ(u, v) ∈ 1, 2 forall factorizations uv of w.

The inability to directly transfer a strong result on local periods to the caseof local θ-periods suggests a deeper investigation of the local θ-periodic structureof words. Unbordered factors are an obvious subject of interest here. A word isbordered if there exists a proper prefix that is also a suffix. Otherwise, a word iscalled unbordered. Let τ(w) denote the maximal length of unbordered factors in w.The following are straightforward observations: τ(w) ≤ π(w), the shortest borderof a bordered word is unbordered, a shortest local repetition word is unbordered.Moreover, a short argument using Lyndon words establishes that, if |w| ≥ 2π(w)−1then τ(w) = π(w). A more involved proof shows that, if |w| ≥ 7/3 τ(w) thenτ(w) = π(w) (see the abstract by Stepan Holub on page 2213). How does thatproperty of unbordered factors relate to the θ-bordered case? A word w is calledθ-bordered, if w has a proper prefix u such that θ(u) is a suffix of w. Otherwise, wis called θ-unbordered. Consider the morphic case and alternating θ-periods. Thefollowing sequence of words has a ration of word length to the maximal length ofunbordered factors that approaches 3 and in the limit yet τθ(wi) 6= πalt

θ (w) thereby


showing that the 7/3 τθ(w) bound does not hold. Let

wi = (ab)iabb(ab)iaab(ab)ia

for any i ≥ 2, and let θ(a) = b and θ(b) = a. Then |wi| = 6(i + 1) + 1 andτθ(wi) = 2i+ 4 and πalt

θ (wi) = 4i+ 5. We have |wi| ≥ 7/3 τθ(wi) for all i ≥ 2 andlimi→∞ |wi|/τθ(wi) = 3. We conjecture that this example is the best possible.

Conjecture 27. Let θ be a morphic involution on A∗ and w ∈ A∗.If |w| ≥ 3τθ(w) then τθ(w) = πalt

θ (w).

References

[1] Y. Cesari and M. Vincent, Une caracterisation des mots periodiques, C. R. Acad. Sci. ParisSer. A 286 (1978), 1175–1177.

[2] E. Czeizler, L. Kari, and S. Seki, On a special class of primitive words, Theoret. Comput.Sci. 411 (2010), no. 3, 617–630.

[3] J.-P. Duval, Periodes et repetitions des mots du monoıde libre, Theoret. Comput. Sci. 9

(1979), no. 1, 17–26.[4] N. J. Fine and H. S. Wilf, Uniqueness theorem for periodic functions, Proc. Amer. Math.

Soc. 16 (1965), 109–114.[5] L. Kari and K. Mahalingam, Involutively bordered words, Internat. J. Found. Comput. Sci.

18 (2007), no. 5, 1089–1106.[6] L. Kari and K. Mahalingam, Watson–Crick palindromes in DNA computing, Natural Com-

puting 9 (2010), 297–316.[7] M.-P. Schutzenberger, A property of finitely generated submonoids of free monoids, Algebraic

theory of semigroups (Proc. Sixth Algebraic Conf., Szeged, 1976) (Amsterdam), Colloq. Math.Soc. Janos Bolyai, vol. 20, North-Holland, 1979, pp. 545–576.

Repetition-free colorings of trees

Pascal Ochem

The notion of square-freeness of words naturally extends to colored graphs. A fac-tor of a colored graph is given by the sequence of colors on a non-intersectingpath. A coloring of a graph is non-repetitive if and only if none of these factorsis a square. The non-repetitive chromatic number of a graph class is the smallestinteger k such that every graph in the class has a non-repetitive coloring using atmost k colors.The famous result of Thue [7] that there exists an infinite square-free word over a3-letter alphabet is equivalent to the statement that the non-repetitive chromaticnumber of paths is 3. A major open question in this area is whether the non-repetitive chromatic number of planar graphs is bounded [4]. It is known tobe O(∆2) for graphs with maximum degree ∆ [2] and at most 4t for graphs withtreewidth k [5]. So planar graphs form an interesting class to study in this respect,since it is a small and natural class such that both the maximum degree and thetreewidth are unbounded. The non-repetitive chromatic number of planar graphsis at least 10, and this lower bound already holds for the weaker star coloring [1],such that the only forbidden squares are aa and abab for any letters a and b.


In this talk, I focus on trees, which form an intermediate class between pathsand planar graphs. First, a negative result about the list version of non-repetitivecoloring. Given an integer ℓ and a list assignment on the vertices such that eachlist contains exactly ℓ colors, we wonder whether we can obtain a non-repetitivecoloring of the graph such that the color of a vertex is chosen from its associatedlist.

Theorem 28 ([3]). For every integer ℓ, there exists a tree T and a list assignmentL with lists of size ℓ, such that every coloring of T obtained from L containsa square.

Theorem 28 implies that some classical methods will fail to prove that the non-repetitive chromatic number of planar graphs is bounded: those methods thatproduce the same upper bound for both the ”list” and the ”non-list” version ofthe studied coloring.

In the case of words or paths, we do not only know that the non-repetitivechromatic number is 3. Now that Dejean’s conjecture is proved, we know therepetition threshold of words over a k letter alphabet for every k ≥ 2. For aninteger k and graph class C, we similarly define the repetition threshold RT (k, C).The next result gives the repetition thresholds for the class of trees T .

Theorem 29 ([6]).

(1) RT (2, T ) = 72

(2) RT (3, T ) = 52

(3) RT (k, T ) = 32 , for k ≥ 4.

References

[1] M.O. Albertson, G.G. Chappell, H. A. Kierstead, A. Kndgen, and R. Ramamurthi. Coloringwith no 2-Colored P4’s, Electron. J. Comb. 11 (2004), #R26

[2] N. Alon, J. Grytczuk, M. Ha luszczak, and O. Riordan. Non-repetitive colourings of graphs,Random Struct. Alg. 21 (2002), 336-346.

[3] F. Fiorenzi, P. Ochem, P. Ossona de Mendez, and X. Zhu. Thue choosability of trees sub-mitted

[4] J. Grytczuk. Nonrepetitive colorings of graphs - a survey, Int. J. Math. Math. Sci. (2007),Art. ID 74639, 10 pp.

[5] A. Kundgen and M. J. Pelsmajer. Nonrepetitive colourings of graphs of bounded treewidth,Discrete Math. 308(19) (2008), Simonovits ’06, pp. 4473–4478.

[6] P. Ochem and E. Vaslet. Repetition thresholds for subdivided graphs and trees, Journeesmontoises 2010.

[7] A. Thue, Uber unendliche Zeichenreichen, Norske Vid. Selsk. Skr., I Mat. Nat. Kl., Chris-tiania, 7 (1906), 1-22.


On the Minimal Uncompletable Word Problem

Elena V. Pribavkina

A finite set S of (finite) words over an alphabet Σ is said to be complete if Fact(S∗),the set of factors of S∗, is equal to Σ∗, that is, if every word of Σ∗ is a factor of, orcan be completed by multiplication on the left and on the right as, a word of S∗.If S is not complete, Σ∗ \Fact(S∗) is not empty and a word in this set of minimallength is called a minimal uncompletable word (with respect to the non-completeset S).

Here we state some open questions related to the notion of a non-complete setand give a brief overview of partial results obtained so far towards solving thesequestions. The first natural question related to complete sets is the following:

Question 30. Is it decidable whether a given set S is complete?

To answer this question we can associate with the set S a non-deterministic

finite-state automaton F (S) recognizing S∗ in such a way that testing the propertyof completeness of the set S is equivalent to testing the synchronizability propertyof the associated automaton. For more details on this question see the paper [2].Once Question 30 is solved, another natural question is whether there is an efficientalgorithm of testing Fact(S∗) = Σ∗. In other terms,

Question 31. What is the computational complexity of testing Fact(S∗) = Σ∗?

Some results related to Question 31 were obtained by Rampersad, Shallit andXu in [3]. Given a language L they studied the computational complexity ofproblems Pref(L) = Σ∗, Suff(L) = Σ∗ and Fact(L) = Σ∗. They showed thatif L is given by a DFA M , then testing Pref(L(M)) = Σ∗ can be performed inlinear time, the decision problem Suff(L(M)) = Σ∗ is PSPACE-complete and theproblem Fact(L(M)) = Σ∗ is solvable in polynomial time. In contrast, if thelanguage L is given by an NFA, all three problems become PSPACE-complete. In[3] it is also proved, that in particular case L = S∗ one can test Pref(S∗) = Σ∗

and Suff(S∗) = Σ∗ in linear time, while the computational complexity of testingFact(S∗) = Σ∗ is still unknown.

Another rather natural question that might give some hint on solving the Ques-tion 31 is about the possible length of words that cannot be completed in S.Formally we state it as follows:

Question 32. Given a non-complete set S, what is the minimal length uwl(S) ofwords in Σ∗ \ Fact(S∗)?

From the connection between the set S and synchronizability property of the

associated automaton F (S) one can deduce an exponential upper bound on thevalue uwl(S):

uwl(S) ≤ 2‖S‖−m+1,

where m is the number of elements in S and ‖S‖ is the size of S, i.e. the sum oflengths of all elements in S. However this bound is not likely to be precise.


The Question 32 was introduced by Restivo in 1981. In his paper [4] he con-jectured that a non-complete set S always possesses an uncompletable word wof length at most 2k2, where k is the maximal length of words in S, and w isof the form w = uv1uv2 · · ·uvk−1u, where u /∈ S, |u| = k and |vi| ≤ k for alli = 1, 2, . . . , k − 1. This conjecture is appeared to be false by means of a coun-terexample found in [1]. Namely, let k > 6 and let

Sk = Σk \ ak−2bb ∪ Σbak−4Σ ∪ Σba ∪ b4 ∪ Jk

where Jk =⋃k−3

i=1 (baiΣ ∪ aib). In [1] the authors computed for 7 ≤ k ≤ 12 that

the word

w = (ak−2bb)ak−1(ak−2bb)bak−4((ak−2bb)ba(ak−2bb)bbak−5)k−6

(ak−2bb)ab(ak−2bb)bbak−3(ak−2bb)bak−3(ak−2bb)

is a minimal uncompletable word for Sk, and its length is |w| = 3k2 − 9k + 1.Nevertheless it is not proved that such a word is a minimal uncompletable (or evenuncompletable) for each k > 6. Thus the possible directions of the future worktowards solving the Question 32 are the following:

• show that the word w is a minimal uncompletable word for the set Sk foreach k > 6;

• find another infinite series of non-complete sets having minimal uncom-pletable words of length greater that 2k2;

• perform a series of computational experiments to find non-complete setsamong all such sets with a fixed parameter k having the maximal possiblelength of minimal uncompletable words;

• give some (polynomial, if possible) upper bound on the length of minimaluncompletable word in terms of k.

References

[1] G. Fici, E. Pribavkina, J. Sakarovitch. On the Minimal Uncompletable Word Problem,CoRR, http://arxiv.org/abs/1002.1928, 2010.

[2] E. V. Pribavkina. Slowly synchronizing automata with zero and incomplete sets, CoRR,http://arxiv.org/abs/0907.4576, 2009.

[3] N. Rampersad, J. Shallit, Z. Xu The computational complexity of universalityproblems for prefxes, suffixes, factors, and subwords of regular languages, CoRR,http://arxiv.org/abs/0907.0159, 2009.

[4] A. Restivo. Some remarks on complete subsets of a free monoid, Quaderni de ”La ricercascientifica”, CNR Roma 109 (1981) 19–25.


Ambiguity in a certain context-free grammar

Eric Rowland

(joint work with Bobbe Cooper, Doron Zeilberger)

Let G be the context-free grammar with start symbols 0, 1, 2 and formation rules0 → 12, 0 → 21, 1 → 02, 1 → 20, 2 → 01, 2 → 10. An n-leaf tree T parses alength-n word w on 0, 1, 2 if T is a valid derivation tree for w under the grammarG; that is, there is a labeling of the vertices of T compatible with the formationrules such that the leaves of T , from left to right, are labeled with the letters ofw. For example, the tree

parses the word 0110212:

1

0 2

1 0

2

1 0

1

0

2 1

2

The grammar G is ambiguous — there exist distinct trees that parse the sameword; for example, the trees

both parse 010. However, something much stronger can be said about this gram-mar.

Theorem 33. Let n ≥ 1, and let T1 and T2 be n-leaf binary trees. Then T1 andT2 parse a common word under G.

Kauffman [4] proved this theorem by showing that it is equivalent to the fourcolor theorem — the statement that every planar graph is four-colorable. The fourcolor theorem was proved by Appel, Haken, and Koch [1, 2] and employed a largecase analysis carried out by machine. The hope of the present authors [3] is thata direct proof will be shorter than the known proofs of the four color theorem,thereby providing a shorter proof of the four color theorem. In this direction, weenumerate the common parse words for some infinite families of tree pairs anddiscuss ways to reduce the problem of finding a parse word for a pair of trees tothat for a smaller pair.

Let ParseWords(T1, T2) be the set of equivalence classes, under permutationsof the alphabet 0, 1, 2, of words parsed by both trees T1 and T2. We will abusenotation slightly by writing a representative of each equivalence class.


A path tree is a binary tree with at most two vertices in each level. The 5-leafpath trees are as follows.

lll llr lrl lrr rll rlr rrl rrr

The illustrated bijection between n-leaf path trees and length-(n− 2) words onl, r can be used to define families of trees. For example, let LeftCombTree(n) bethe n-leaf path tree corresponding to the word ln−2, and let RightCombTree(n)be the n-leaf path tree corresponding to rn−2. There is only one equivalence classof words parsed by both a left comb tree and a right comb tree.

Theorem 34. ParseWords(LeftCombTree(n),RightCombTree(n)) =

01n−22

if n ≥ 2 is even01n−20

if n ≥ 3 is odd.

Similar results for other pairs of one-parameter families of trees can be estab-lished.

A simple two-parameter family is the (m + n)-leaf path tree corresponding tolmrn−2, which we call LeftTurnTree(m,n). Let RightTurnTree(m,n) be the left–right reflection of LeftTurnTree(m,n) — the tree corresponding to rmln−2. Thenext three theorems collectively determine the number of parse words of LeftTurn-Tree(m,n) and RightTurnTree(k,m+ n− k).

Theorem 35. For m ≥ 1, k ≥ 1, and max(2, k −m+ 2) ≤ n ≤ k,

|ParseWords(LeftTurnTree(m,n),RightTurnTree(k,m+ n− k))| = 1.

Let

a(m, k) = |ParseWords(LeftTurnTree(m, k + 1),RightTurnTree(k,m+ 1))|.By considering the left–right reflections of these two trees, we see that a(m, k) =a(k,m).

Theorem 36. For m ≥ 1 and k ≥ 1,

a(m+ 3, k)− 2a(m+ 2, k)− a(m+ 1, k) + 2a(m, k) = 0.

This recurrence can be written

(M − 2)(M − 1)(M + 1) a(m, k) = 0,

where M is the forward shift operator in m, so for fixed k the solution a(m, k)is a linear combination of 2m, 1, (−1)m. Unfortunately, we do not know a simplecombinatorial proof of the recurrence.

Theorem 37. For m ≥ 1, k ≥ 1, and n ≥ k + 2,

|ParseWords(LeftTurnTree(m,n),RightTurnTree(k,m+ n− k))| = 2a(m, k).


In addition to enumerating parse words, we are interested in pursuing existenceresults. There are several ways to reduce the problem of finding a parse word fora pair of trees to finding parse words for smaller pairs. Here we describe two —one easily provable and one conjectural.

If T1 and T2 are n-leaf trees that have a common branch system in the sameposition, then we can decompose the pair into two smaller pairs. For example, the8-leaf trees

T1 = T2 =

share the branch system

S =

in the second through fifth leaves, which we may remove to obtain the 5-leaf trees

.

Given a common parse word w1w2w3w4w5 of this pair of 5-leaf trees, we can find acommon parse word of the original pair of 8-leaf trees by taking any valid labelingof S and permuting the alphabet so that the root receives the label w2.

More generally, to decompose a pair of trees we only require a vertex in T1with dangling subtree S1 and a vertex in T2 with dangling subtree S2 such thatsame set of leaves appears in S1 and S2. A parse word for T1 and T2 can then beconstructed from a parse word for S1 and S2 and a parse word for the choppedtrees T ′

1 and T ′2.

A pair of n-leaf trees T1 and T2 is mutually crooked if there is no pair of consec-utive leaves that have an uncle–nephew relationship in both trees. For example,

are mutually crooked, while the following are not.

We conjecture that a pair of trees that is not mutually crooked has a commonparse word in which the uncle and nephew leaves receive the same label. There isreason to believe that finding this parse word is reducible to finding a parse wordfor the pair obtained by deleting the structure holding one of the two leaves.


References

[1] Kenneth Appel and Wolfgang Haken, Every planar map is four colorable I: discharging,Illinois Journal of Mathematics 21 (1977) 429–490.

[2] Kenneth Appel, Wolfgang Haken, and John Koch, Every planar map is four colorable II:reducibility, Illinois Journal of Mathematics 21 (1977) 491–567.

[3] Bobbe Cooper, Eric Rowland, and Doron Zeilberger, Toward a language theoretic proof ofthe four color theorem, available at http://arxiv.org/abs/1006.1324.

[4] Louis Kauffman, Map coloring and the vector cross product, Journal of Combinatorial The-ory, Series B 48 (1990) 145–154.

The enumeration of squares and runs in the Fibonacci words revisited

Kalle Saari

The problem of enumerating repetitions of different types in finite words is a re-current research topic in combinatorics on words. Here are three seminal results inthis area: Crochemore [1] showed that the number of primitively rooted squares,counting multiplicities, in a word of length n is in O(n log n); Fraenkel and Simp-son [2] showed that a word of length n contains no more than 2n distinct squares;Kolpakov and Kucherov [4] showed that the number of runs, counting multiplic-ities, in a word is linearly bounded. For more information about repetitions inwords and resent results, consult the surveys [5, Ch. 8] and [6, Ch. 8] and thewebsite [7].

The exact number of repetitions in interesting classes of words is naturally alsoof interest. Let us recall that Fibonacci words fn for n ≥ 0 are defined by

f0 = 0, f1 = 01, fn = fn−1fn−2 (n ≥ 2);

their connection to Fibonacci numbers F0 = 0, F1 = 1, . . . is that the length of fnequals Fn+2.

The number of squares and runs in Fibonacci words were worked out by Fraenkeland Simpson [3] and Kolpakov and Kucherov [4], respectively. The number ofdistinct squares in fn is given by the formula 2(Fn − 1) for all n ≥ 4; the numberof runs in fn, counting multiplicities, is precisely 2Fn − 3, and this holds for alln ≥ 3.

The purpose of this abstract is to outline a novel way for enumerating squaresand runs, both distinct and with multiplicities, in finite Fibonacci words. Ourtechnique uses properties of central and singular factors of the infinite Fibonacciword limn→∞ fn. Central factors, denoted by pn, are obtained from fn by erasingthe last two letters; singular factors, denoted by cn, are obtained from fn byremoving the last letter and appending its complement to the beginning. Thus,for all n ≥ 1, we have fn = pnab and cn = apna, where ab ∈ 01, 10. It turns outthat runs occurring within a central factor pn coincide with occurrences of factorsof the form ck+1ckck+1 and ckck+1ck. Also, the runs that occur as prefixes orsuffixes of pn are the central factors p4, p5, . . . , pn. Furthermore, each occurrenceof ck+1 in pn extends to an occurrence of ckck+1ck. Finally, it can be shown that


the number of occurrences of a singular factor ck in a central factor pn equalsFn−k − 1.

Using these observations, counting the number of squares and runs in a centralfactor pn is straightforward, and extending these formulas for fn is easy. In par-ticular, it can be verified that the number of squares with multiplicities in fn isgiven by

R(n) :=2

5(n− 5)(Fn+2 + Fn) +

4

5Fn + n+ 2 (n ≥ 3).

A formula for R(n) was first derived by Fraenkel and Simpson in [3], but theirformula has a misprint: it differs from ours by 3Fn.

References

[1] Maxime Crochemore, An optimal algorithm for computing the repetitions in a word, Inf.Process. Lett. 12 (1981), no. 5, 244–250.

[2] Aviezri S. Fraenkel and Jamie Simpson, How many squares can a string contain?, J. Comb.Theory, Ser. A 82 (1998), no. 1, 112–120.

[3] Aviezri S. Fraenkel and Jamie Simpson, The exact number of squares in Fibonacci words,Theor. Comput. Sci. 218 (1999), no. 1, 95–106.

[4] Roman M. Kolpakov and Gregory Kucherov, On maximal repetitions in words, FCT (GabrielCiobanu and Gheorghe Paun, eds.), Lecture Notes in Computer Science, vol. 1684, Springer,1999, pp. 374–385.

[5] M. Lothaire, Algebraic combinatorics on words, Encyclopedia of Mathematics and its Appli-

cations, vol. 90, Cambridge University Press, Cambridge, 2002.[6] M. Lothaire, Applied combinatorics on words, Encyclopedia of Mathematics and its Appli-

cations, vol. 105, Cambridge University Press, Cambridge, 2005.[7] L. Tinta M. Crochemore, L. Ilie, The “runs” conjecture, http://www.csd.uwo.ca/faculty/

ilie/runs.html, September 2010.

On Patterns and Pattern Avoidance

Jeffrey Shallit

This talk centered around four themes, all concerned with patterns and patternavoidance: (1) the complexity of matching words specified by abstract machines topatterns; (2) some aspects of the Thue-Morse word; (3) avoiding powers overN, thenatural numbers; and (4) the Pirillo-Varicchio-Halbeisen-Hungerbuhler problem.

1. Complexity of pattern matching

Let Σ be an alphabet, i.e., a nonempty, finite set of symbols. By Σ∗ we denotethe set of all finite words over Σ, and by ǫ, the empty word. If w = xyz, then y issaid to be a factor of w.

A pattern is a non-empty word p over a pattern alphabet ∆. The letters of ∆ arecalled variables. A morphism is a map h : Σ∗ → ∆∗ such that h(xy) = h(x)h(y)for all x, y ∈ Σ; a morphism is non-erasing if h(a) 6= ǫ for all a ∈ Σ. A wordw ∈ Σ∗ matches a pattern p if there exists a non-erasing morphism h : ∆∗ → Σ∗

such that h(p) = w. For example, the word w = oberwolfachmatches the patternxyxz.


Some patterns play special roles in combinatorics on words. For example, wordsmatching the pattern xx are called squares. An example in German is nennen,and in French is chercher. Words matching xxx are called cubes; an example isthe English sort-of-word shshsh.

An overlap is a word of the form axaxa where a is a single symbol and x is a(possibly empty) word. No single pattern captures the notion of overlap, but itis easy to see that a word w has a factor that is an overlap iff some factor of wmatches either of the two patterns xxx and xyxyx.

Much study has been devoted to fractional powers. We say a word x is an exactpq -power, for integers p > q ≥ 1, if x can be written in the form yey′ for some

e ≥ 1, and y′ is a prefix of y, and |x| = p, |y| = q. For example, the German wordschematische is a 12

8 -power. If x has a factor that is an exact pq −power, for some

pq ≥ α, we say that x has an occurrence of an α-power.

This suggests the following:

Problem 38. Let α > 1 be a real number that is not an integer. Is there any setof patterns P , finite or infinite, such that x has an occurrence of an α-power iffsome factor of x matches some pattern p ∈ P?

Remark. After my talk both James Currie and Julien Cassaigne suggested waysto show this is not possible, at least for some restricted ranges of α. But somefurther work is still needed to cover all α > 1.

The complexity of pattern matching is an old problem, going back to funda-mental results of Ehrenfeucht and Rozenberg [9] and (independently) Angluin [4].They showed that the problem of determining if an arbitrary word w matches anarbitrary pattern p is NP-complete.

Recently I (and co-authors) have explored the computational complexity ofother kinds of pattern matching, where we replace a word by a set of words specifiedby some abstract machine M (e.g., DFA, NFA, or PDA). For example, Andersonet al. [3] showed that the problem of deciding if some word in L(M) matches agiven pattern p is PSPACE-complete, if M is an NFA. Furthermore, the problemremains PSPACE-complete even the NFA is deterministic and the pattern p isrestricted to belong the class of patterns xx, xxx, xxxx, . . .. The idea behindthe proof is very simple. Given NFA’s M1,M2, . . .Mn, we construct an NFAaccepting L = L(M1)# · · ·#L(Mn)#. Then a word of L matches xn if and only ifx ∈ L(M1) ∩ · · · ∩ L(Mn). This shows PSPACE-hardness. It is slightly harderto verify that the problem is in PSPACE. Here the idea is to show that if a matchoccurs, it cannot be too larger. One way to show membership in PSPACE is toguess, for each variable xi in the pattern, a boolean matrix Bi that specifies thetransition table in the automaton induced by xi. Then we simply need to verifythat indeed each Bi has a corresponding word, and multiply the matrices together.

On the other hand, if M is a PDA (or CFG), then the problem becomes unde-cidable, even if p is the single fixed pattern xx. The idea here is to reduce fromthe Post correspondence problem.


It seems reasonable to guess that at least some of this problem’s complexitycomes from the fact that L(M) can be infinite. So we also explored finite analoguesof the problem. Here we are given a DFA or NFA M accepting a finite language,and we ask if some word of L(M) matches a given pattern p. In this case werecently showed [13] that the problem is NP-complete.

In the case thatM is a PDA (or CFG), the finite problem is PSPACE-complete,even in the case where p = ww. Again, the hard part is showing membership inPSPACE. The idea is to guess the configuration C after reading the first copy ofw. This configuration consists of a state and the contents of the stack, so we needto make sure that the stack contents cannot be arbitrarily large.

2. Around the Thue-Morse sequence

The Thue-Morse sequence t = t0t1t2 · · · = 0110100110010110 · · · is a familiarobject in combinatorics on words. For one thing, it avoids overlaps. RecentlyI showed (with co-authors) that t is fragile with respect to this property. Moreprecisely, t is overlap-free, but if the bits in any finite nonempty set of positionsare flipped (0 becoming 1 and vice versa), then the resulting sequence t′ is nolonger overlap-free [7].

This suggests considering the same problem for squares. For example,

Problem 39. Are there infinite words over 0, 1, 2 that are fragile with respectto the property of squarefreeness?

My guess is that you can get such a word by the usual trick of counting thenumber of 1’s between consecutive 0’s in the Thue-Morse word, and then droppingthe first symbol, but I haven’t proved it yet.

One can also consider words at the opposite spectrum. Call an infinite word wrobust with respect to property P if

(1) w has property P ; and(2) there are infinitely many disjoint finite sets S of positions, such that chang-

ing every symbol at the positions in S to some other symbol results in aword that still has property P .

This suggests

Problem 40. Are there infinite words over 0, 1, 2 that are robust with respectto the property of squarefreeness?

Remarks. After my talk Pascal Ochem showed how to find such words. Forexample, Brinkhuis [6] constructs a substitution (in the genuine sense of theoreticalcomputer science!) on three letters as follows:

h(0) = 0120210201021201020120210, 0120212010210120102120210h(1) = 1201021012102012101201021, 1201020121021201210201021h(2) = 2012102120210120212012102, 2012101202102012021012102


Then hω(0) is an infinite set of infinite squarefree words, and furthermore byreplacing any aligned block of size 25 with its corresponding block retains theproperty of squarefreeness.

James Currie also found a solution to Problem 40.

T. Cusick recently asked me a very interesting problem related to Thue-Morse.Let s2(n) denote the number of 1’s in the binary expansion of n (not reducedmodulo 2). For integers k, t ≥ 1 define

γk(t) =1

2k|n : 0 ≤ n < 2k and s2(n+ t) ≥ s2(n)|.

It is not hard to see that as k → ∞, the values of γk(t) stabilize to some fixedvalue γ∞(t). (In fact, it suffices to pick k ≥ 2⌈log2 t⌉.) So γ∞(t) measures thefraction of binary numbers n for which adding t gives more 1’s in n+ t than in n.It is now not so difficult to prove that γ∞(1) = 3/4, and indeed γ∞(2t) = γ∞(t).Cusick asked,

Problem 41. Is it true that γ∞(t) > 1/2 for all t ≥ 1? What is lim inft→∞ γ∞(t)?

We might also ask:

Problem 42. Find a simple way to compute γ∞(t).

Finally, I close this section with another problem related to Thue-Morse —more precisely, the overlap-free property. Let a = (ai)i≥0 be an infinite sequenceof integers. The Hankel determinant of order n beginning at position k is thedeterminant ∣∣∣∣∣∣∣∣∣

ak ak+1 · · · ak+n−1

ak+1 ak+2 · · · ak+n

......

. . ....

ak+n−1 ak+n · · · ak+2n−2

∣∣∣∣∣∣∣∣∣.

Evidently if a has an overlap cxcxc, then there is a 0 Hankel determinant —namely, the one corresponding to a matrix whose first and last rows are cxc. Sowe can ask,

Problem 43. Is there an infinite sequence a over a finite subset of Z with theproperty that the Hankel determinants of all orders, beginning at all positions, arenonzero?

I’ve done some computations on this problem. A simple backtracking algorithmshows that there is no such sequence over a subset of size 2, so we need at leastthree letters. I have constructed what appears to be the lexicographically leastsequence with hundreds of letters over 1, 2, 3, namely:

1121122121121231121122121122322112231121122121121231121122121122322112 · · ·

Interestingly, no backtracking was required to generate this!


I have also constructed a morphism that appears to generate such a sequence,namely:

1 → 12

2 → 23

3 → 14

4 → 32

I have tested this for hundreds of terms, and all Hankel determinants were nonzero.

Problem 44. Can you show that this morphism generates a sequence with allHankel determinants nonzero?

A natural approach to this kind of problem would be to consider the Hankeldeterminants modulo m for some integer m ≥ 2. However, this approach does notseem to work. In fact, I conjecture that the following is true:

Conjecture 45. For all integer moduli m ≥ 2, there is no infinite sequence overa finite subset of Z such that all Hankel determinants are nonzero (modulo m).

3. Avoiding patterns over the natural numbers

Suppose we try to avoid squares, but don’t allow backtracking. Having gener-ated w, at each new position we write down the least natural number i such thatwi is squarefree. It seems to be part of the folklore, and not too hard to prove,that the resulting word

w2 = (ci)i≥1 = 01020103010201014 · · ·is the so-called “ruler sequence”, defined by ci = ν2(i), the exponent of the largestpower of 2 dividing i. This is the lexicographically least sequence over N avoidingsquares (and also, by the way, the lexicographically least sequence avoiding abeliansquares). It can be generated by the morphism that maps i to 0, (i+ 1).

Recently Guay-Paquet and I [10] considered the same question for overlap-freesequences over N. Here the lexicographically least sequence

w2+ = 0010011001002 · · ·has a much more complicated structure. For example, we find that

0 occurs for the first time at position 11 occurs for the first time at position 32 occurs for the first time at position 133 occurs for the first time at position 794 occurs for the first time at position 633

and the sequence (di)i≥1 = 1, 3, 13, 79, 633, . . . satisfies the recurrence di+1 = 2idi+1. These numbers are also the coefficients of the exponential generating functionex/(1− 2x). Furthermore, they are given by the formula ⌊2n · n! · (√e− 1)⌋. Theword w2+ is a natural example of a word with transcendental letter frequencies.


The word w2+ can be written as ϕω(0) where

ϕ(0) = 001

ϕ(1) = 1001002

ϕ(2) = 200100110010020010011001003

...

where |ϕ(i)| = 2di+1 + 1. Here

ϕ(h) = (S(ϕh(00)) · (h+ 1)

where S(xc) is the shift operator defined by S(xc) = cx for x ∈ Σ∗, c ∈ Σ.We proved many other results about the wordw2+ and the morphism ϕ. Similar

results can be proved for words avoiding k’th powers and k+-powers, for any integerk ≥ 2.

However, similar results for fractional powers have so far eluded us. For exam-ple, we computed the first 1, 000, 000 terms of the lexicographically least sequenceavoiding 5

2 -powers and did not find any term greater than 3. Nor were we able tocharacterize this sequence.

This leads to the following problem:

Problem 46. Characterize the lexicographically least infinite α-power-free se-quence over N for some fractional α.

Remarks. After my talk Eric Rowland observed that the lexicographically least32 -power-free sequence was “pseudoperiodic” with period 10, and we were able toshow [14] that this sequence is in fact 6-regular in the sense of Allouche and Shallit[1, 2]. Rowland observed similar results for 4

3 but this has yet to be proved.

4. The Pirillo-Varricchio-Halbeisen-Hungerbuhler problem

Pirillo & Varricchio [12], and, independently, Halbeisen & Hungerbuhler [11]posed the following problem:

Problem 47. Is there an infinite seqence over a finite subset of Z with the propertythat it has no factors of the form xx′ with |x| = |x′| and ∑

x =∑x′?

We might call such a factor xx′ a PVHH-square. Recently there has been someattention paid to this problem [8], but it is still open.

Recently we found some results related to this problem [5]. For example, al-though with Van der Waerden’s theorem one can show that for any m it is impos-sible to avoid xx′ with |x| = |x′| and ∑

x =∑x′ (mod m), it becomes possible if

we allow the congruence to hold only if both sides are 0 (mod m).We also considered the analogous problem for cubes instead of squares. Thomas

Stoll and I performed some extensive calculations that suggest that a word avoidingPVHH cubes exists. We conjecture


Conjecture 48. The fixed point generated by the morphism

0 → 03

1 → 43

3 → 1

4 → 01

avoids xx′x′′ with |x| = |x′| = |x′′| and ∑x =

∑x′ =

∑x′′.

However, we have not yet been able to prove this.

Remarks. During the workshop Julien Cassaigne suggested another strategy whichmay suffice to prove the conjecture.

References

[1] J.-P. Allouche and J. Shallit. The ring of k-regular sequences. Theoret. Comput. Sci. 98

(1992), 163–197.[2] J.-P. Allouche and J. Shallit. The ring of k-regular sequences, II. Theoret. Comput. Sci. 307

(2003), 3–29.[3] T. Anderson, J. Loftus, N. Rampersad, N. Santean, and J. Shallit, Detecting palindromes,

patterns and borders in regular languages, Inform. and Comput. 207 (2009), 1096–1118.[4] D. Angluin, Finding patterns common to a set of strings, J. Comput. System Sci. 21 (1980)

46–62.[5] Y.-H. Au, A. Robertson, and J. Shallit, Van der Waerden’s theorem and avoidability in

words, preprint, http://arxiv.org/abs/0812.2466 .[6] J. Brinkhuis, Non-repetitive sequences on three symbols, Quart. J. Math. Oxford 34 (1983),

145–149.[7] S. Brown, N. Rampersad, J. Shallit, and T. Vasiga, Squares and overlaps in the Thue-Morse

sequence and some variants, RAIRO-Info. Theor. Appl. 40 (2006), 473–484.[8] J. Cassaigne, G. Richomme, K. Saari, and L. Q. Zamboni, Avoiding Abelian powers in binary

words with bounded Abelian complexity, preprint, http://arxiv.org/abs/1005.2514 .[9] A. Ehrenfeucht and G. Rozenberg, Finding a homomorphism between two words is NP-

complete, Inform. Process. Lett. 9 (1979) 86–88.[10] M. Guay-Paquet and J. Shallit, Avoiding squares and overlaps over the natural numbers,

Discrete Math. 309 (2009), 6245–6254.[11] L. Halbeisen and N. Hungerbuhler, An application of Van der Waerden’s theorem in ad-

ditive number theory, INTEGERS: Elect. Journ. Comb. Number Theory 0 (2000), #A7(electronic), http://www.integers-ejcnt.org/vol0.html .

[12] G. Pirillo and S. Varricchio, On uniformly repetitive semigroups, Semigroup Forum 49

(1994), 125–129.[13] N. Rampersad and J. Shallit, Detecting patterns in finite regular and context-free languages,

Inf. Process. Lett. 110 (2010), 108–112.

[14] E. Rowland J. Shallit, Avoiding 32

-powers over the natural numbers, preprint, September 22010.


On the sum of digits of n and nh

Thomas Stoll

1. Introduction

Let q ≥ 2 and denote by sq(n) the sum of digits in the q-ary representation of aninteger n. In recent years, much effort has been made to get a better understandingof the distribution properties of sq regarding certain subsequences of the positiveintegers. We mention the classical work by Gelfond [5], and the recents papers byC. Mauduit and J. Rivat on the distribution in arithmetic progressions of sq ofprimes [10] and of squares [11]. These questions are intimately connected to findthe frequency of letters −1 and +1 in the classical Thue-Morse sequence

(tn)n∈N = ((−1)s2(n))n∈N = +1,−1,−1,+1,−1,+1,+1,−1, . . .

with respect to special subsequences of indices (e.g., n ≡ a mod k, n = p prime,or n = m2). In the case of indices of the form P (n), where P (n) denotes a fixedinteger-valued polynomial of degree h ≥ 3, very little is known. For the currentstate of knowledge, we refer to the work of C. Dartyge and G. Tenenbaum [2], whoprovided some lower density estimates for the evaluation of sq(P (n)) in arithmeticprogressions.

2. Oberwolfach talk

A related problem, however, of a more elementary nature, is to study extremalproperties of sq(P (n)). In the binary case when q = 2, B. Lindstrom [9] showedthat

(1) lim supn→∞

s2(P (n))

log2 n= h.

The special case P (n) = n2 of (1) has been reproved by M. Drmota and J. Rivat [4]with constructions due to J. Cassaigne and G. Baron. Moreover, it is well-known [3,13] that the average order of magnitude of sq(n) and sq(n

h) is

(2)∑

n<N

sq(n) ∼1

h

∑

n<N

sq(nh) ∼ q − 1

2 log qN logN,

so that the average value of sq(nh) is h times larger than the average value of

sq(n). It is an interesting question how often the ratio sq(nh)/sq(n) can be very

large or very small and to quantify these notions. By naive methods, it can bequite hard to find even a single value n such that s2(n

h) < s2(n) for some h.For instance, an extremely brute force calculation shows that the minimal n suchthat s2(n

3) < s2(n) is n = 407182835067 ≈ 239 where one finds s2(n3) = 28 and

s2(n) = 29.In my Oberwolfach talk, I reported on recent work by K. G. Hare, S. Laishram

and myself [7, 8] that settles an old conjecture of Stolarsky [14] from 1978. We find(up to a multiplicative absolute constant) the exact extremal orders of magnitudeof the ratio sq(n

h)/sq(n).


Theorem ([7]). There exist c1 and c2, depending at most on q and h, such thatfor all n ≥ 2,

c2logn

≤ sq(nh)

sq(n)≤ c1(logn)

1−1/h.

This is best possible in that there exist c′1 and c′2, depending at most on q and h,such that

sq(nh)

sq(n)> c′1(logn)

1−1/h,

respectively,sq(n

h)

sq(n)<

c′2logn

infinitely often.

The constants c1, c2, c′1 and c′2 are explicitly computable. The proofs involve

the Bose-Chowla theorem on sets with the distinct sum property [1] and a combi-natorial argument using a certain polynomial of degree four. For any ε > 0 we geta bound on the minimal n such that the ratio sq(n

h)/sq(n) < ε. For the examplegiven above, the approach allows to show that

minn : s2(n3) < s2(n) < 2178.

In the talk I also outlined the answer to the analogous question in the case of theZeckendorf representation of integers [15].

3. Oberwolfach research

During the stay in Oberwolfach I worked with Jeffrey Shallit on the followingrelated problem for the Thue-Morse word: Let k ≥ 1, and denote Nk = n : tkn =1 and f(k) = minn : n ∈ Nk. In other words, this is the first multiple of k suchthat we see a “+1” in the subword (tkn)n≥1 of the Thue-Morse word. The firstfew values of (f(k))k≥1 are given by

1, 1, 7, 1, 5, 7, 1, 1, 9, 5, 1, 7, 1, 1, 19, 1, 17, 9, 1, 5 . . .

From a well-known theorem of Gelfond [5] we get that f(k) <∞ for all k, but canwe give a sharp upper bound for f(k) in terms of k? J.-P. Allouche and J. Shallithad already discussed the problem for some time, but the conjecture f(k) ≤ k+4was still open. In Oberwolfach we tried to make some progress but we were finallyleft with some cases for k where we could not find such an n. After the Oberwolfachweek we made some fresh effort with Johannes Morgenbesser (Marseille) to handlethese tedious cases, too. So, we finally came up with a complete solution to theproblem (joint work in progress).

Theorem 49. For all k ≥ 1 we have f(k) ≤ k + 4. Moreover, we have

(1) f(k) = k + 4 if and only if k = 22r − 1 for some r ≥ 1,(2) There are no k’s with f(k) = k + 3 or f(k) = k + 2.(3) f(k) = k + 1 if and only if k = 6.(4) f(k) = k if and only if k = 1 or k = 2r + 1 for some r ≥ 2.


In fact, by construction, for all k there is an n ∈ Nk with n ≤ k + 4 ands2(n) ≤ 3. We currently try to generalize our approach to a more general settingfor sq(n).

4. Open questions

We end this report with some challenging open questions in this field.

• Determine the frequency of +1 and −1 in the Thue-Morse word for thesubsequence of indices P (n) = n3. More generally, find the distribution inarithmetic progressions of sq(n

h), n ≥ 1, for fixed h ≥ 3 (see [5]).• Find 1

N

∑n<N s2(n

h)/s2(n) (see [14]).

• Find #n < N, n odd : s2(n2) = s2(n) (see [12]).

• Fix k ∈ 9, 10, 11, 14, 15. Decide whether the set

n < N, n odd : s2(n2) = s2(n) = k

is finite or infinite (see [8]).

References

[1] R. C. Bose, S. Chowla, Theorems in the additive theory of numbers, Comm. Math. Helv. 37(1962/63), 141–147.

[2] C. Dartyge, G. Tenenbaum, Congruences de sommes de chiffres de valeurs polynomiales, Bull.London Math. Soc. 38 (2006), no. 1, 61–69.

[3] H. Delange, Sur la fonction sommatoire de la fonction “somme des chiffres”, Enseign. Math.21 (1975), 31–47.

[4] M. Drmota, J. Rivat, The sum-of-digits function of squares, J. London Math. Soc. (2) 72

(2005), no. 2, 273–292.[5] A. O. Gel’fond, Sur les nombres qui ont des proprietes additives et multiplicatives donnees,

Acta Arith. 13 (1967/1968), 259–265.[6] H. Halberstam, K. F. Roth, Sequences, Second edition. Springer-Verlag, New York-Berlin,

1983.[7] K. G. Hare, S. Laishram, T. Stoll, Stolarsky’s conjecture and the sum of digits of polynomial

values, Proc. Amer. Math. Soc., to appear (2010), arXiv:1001.4169.[8] K. G. Hare, S. Laishram, T. Stoll, The sum of digits of n and n2, submitted (2010),

arXiv:1001.4170.[9] B. Lindstrom, On the binary digits of a power, J. Number Theory 65 (1997), 321–324.[10] C. Mauduit, J. Rivat, Sur un probleme de Gelfond: la somme des chiffres des nombres

premiers, Ann. Math. 171 (2010), no.3, 1591–1646.[11] C. Mauduit, J. Rivat, La somme des chiffres des carres, Acta Math. 203 (2009), 107–148.[12] G. Melfi, On simultaneous binary expansions of n and n2, J. Number Theory 111 (2005),

no. 2, 248–256.

[13] M. Peter, The summatory function of the sum-of-digits function on polynomial sequences,Acta Arith. 104 (2002), no. 1, 85–96.

[14] K. B. Stolarsky, The binary digits of a power, Proc. Amer. Math. Soc. 71 (1978), 1–5.[15] T. Stoll, Extremal orders of the Zeckendorf sum of digits of powers, submitted (see author’s

webpage).


A Note on Coloring Factors of Words

Luca Q. Zamboni

1. A Theorem of Ramsey

For each set A, we denote by A2 the set of all subsets of A consisting of 2elements. We recall a well known theorem of Ramsey:

Theorem 50. Let N be an infinite subset of the natural numbers N, C a finitenon-empty set (the set of colors), and c : N 2 → C. Then there exists an elementB ∈ C and an infinite subset N ∗ of N , such that c(X) = B for all X ∈ N ∗

2.

As a consequence of Theorem 50, we deduce the following:

Corollary 51. Let A be a non-empty set (not necessarily finite), and let W =

W1W2W3 · · · ∈ AN be an infinite word on the alphabet A. Let C be a finite non-empty set, and c : F+(W ) → C a coloring of the set of all non-empty factors of W.Then there exists a factorization of W of the form W = V U0U1U2 · · · such thatc(Ui) = c(Uj) for all i and j.

We now consider two variants of Corollary 51: Let W be a non-periodic wordon a finite alphabet.

Question 52. Does there exist a finite coloring

c : F+(W ) → Cwith the property that for any factoring W = U0U1U2 · · · , there exists i 6= j forwhich c(Ui) 6= C(Uj) ?

Question 53. Let c : F+(W ) → C be any (finite) coloring of the set of all non-empty factors of W, and k a positive integer. Then does W contain a factor of theform U1U2 . . . Uk with c(Ui) = c(Uj) and |Ui| = |Uj | for all 1 ≤ i, j ≤ k ?

We note that both questions are easily answered in caseW is periodic: the answeris negative in the case of Question 52 and positive in the case of Question 53.Indeed ifW = UUUU · · · then given any finite coloring c : F+(W ) → C, relative tothe factorizationW = U0U1U2 · · · , where each Ui = U, we have that c(Ui) = c(Uj)for all i, j ≥ 0.

2. On Question 52

Let A be a non-empty finite set, and W =W1W2W3 · · · ∈ AN. It is easy to seethat there exist non recurrent infinite wordsW, and colorings of F+(W ) such thatfor any factoringW = U0U1U2 · · · , there exists i 6= j for which c(Ui) 6= C(Uj). Forinstance, consider the infinite wordW = 10∞. Given a factor U ofW, set c(U) = 1if U contains 1, and c(U) = 0 otherwise. Then for any factoringW = U0U1U2 · · · ,we have c(U0) = 1 while c(Ui) = 0 for all i > 0.

We next show the existence of recurrent infinite words W and a finite coloringof F+(W ) such that for every factoring of W = U0U1U2 · · · , there exists i 6= jsuch that c(Ui) 6= c(Uj).


Proposition 54. Let W be any square free infinite word. Define c : F+(W ) →blue, green by

• c(U) = green if U is a prefix of W,• c(U) = blue otherwise.

Then for any factoring W = U0U1U2 · · · we have that c(Ui) 6= c(Uj) for somei 6= j.

Proof. Consider any factoring W = U0U1U2 · · · . Suppose to the contrary thateach Ui has the same color. Then as U0 is a prefix of W, it follows that each Ui iscolored green, i.e., each Ui is a prefix of W. If |Ui| ≤ |Ui+1| for some i ≥ 0, thenUi would be a prefix of Ui+1 and W would contain the factor UiUi, contradictingthat W is square free. Thus, we must have that |Ui+1| < |Ui|, a contradiction.

Let W ∈ 0, 1N be the Thue-Morse infinite word beginning in 0, that is thefixed point beginning in 0 of the morphism

0 7→ 01 and 1 7→ 10.

It is well known that W does not contain overlaps, i.e., factors of the form UUuwith u a non-empty prefix of U. Unlike in the previous example,W may be writtenas a concatenation of prefixes of W (e.g., W may be expressed as a concatenationof the prefixes 0, 01 and 011); however, we will show that:

Lemma 55. Let W be the Thue-Morse infinite word beginning in 0, and W =U0U1U2 · · · any factoring of W with each Ui a non-empty prefix of W. Then thereexist indices i 6= j for which Ui and Uj terminate in a different letter.

Proof. Suppose to the contrary that each Ui ends in the same letter a ∈ 0, 1. Itfollows that |Un| > |Un+1| for every n ≥ 1. In fact, if |Un+1| ≥ |Un|, then W wouldcontain the factor aUnUn as Un is a prefix of Un+1. Since Un ends in the letter a,we have that W contains an overlap, a contradiction.

Proposition 56. Let W be the Thue Morse infinite word beginning in 0, anddefine

c : F+(W ) → red, blue, greenby

• c(U) = red if U is a prefix of W ending in 0,• c(U) = blue if U is a prefix of W ending in 1,• c(U) = green otherwise.

Then, for any factoring of W = U0U1U2 · · · we have that c(Ui) 6= c(Uj) for somei 6= j.

Proof. Let W = U0U1U2 · · · be a factoring of W and suppose to the contrary thatc(Ui) = c(Uj) for all i, j. Then, since U0 is a prefix, it follows that each Ui is aprefix ending in the same letter as U0. This contradicts the result of the previouslemma.


Remark 57. A variation of the previous example applies to any binary overlap-freeword W.

We now show that the same coloring scheme as in Proposition 56 may be used toshow that a characteristic Sturmian word cannot be factored as a concatenation ofwords all having the same color. The following lemma is an analogue of Lemma 55for characteristic Sturmian words:

Lemma 58. Let W be a characteristic Sturmian word and W = U0U1U2 · · · anyfactoring of W with each Ui a non-empty prefix of W. Then there exist indicesi 6= j for which Ui and Uj terminate in a different letter.

Proposition 59. Let W ∈ 0, 1∞ be a characteristic Sturmian word and let

c : F+(W ) → red, blue, greenby

• c(U) = red if U is a prefix of W ending in 0,• c(U) = blue if U is a prefix of W ending in 1,• c(U) = green otherwise.

Then for every factorization W = U0U1U2 · · · , we have that c(Ui) 6= c(Uj) forsome i 6= j.

Proof. Suppose to the contrary that there exists a characteristic Sturmian wordWadmitting a factorization W = U0U1U2 · · · for which c(Ui) = c(Uj) for all i, j ≥ 0.Then since U0 is a prefix of W, it follows that each Ui is a prefix of W ending inthe same letter contradicting the result of the previous lemma.

The above naturally suggest the following questions:

Question 60. Given a Sturmian word W, does there exist a finite coloring

c : F+(W ) → Cwith the property that for any factoring W = U0U1U2 · · · , there exists i 6= j forwhich c(Ui) 6= C(Uj) ? If so, what is the smallest cardinality of C ?

Question 61. Let W be any aperiodic infinite word. Does there exist a 2-coloring

c : F+(W ) → red, bluesuch that for any factoring of W = U0U1U2 · · · there exists i 6= j with c(Ui) 6=c(Uj).

References

[1] J. Berstel, Sturmian and Episturmian words (A survey of some recent results), Lecture Notesin Computer Science, vol. 4728, Springer-Verlag Berlin 2007, pp. 23-47

[2] R. Graham, B. Rothshild, J. Spencer, Ramsey Theory, Wiley 1980.,[3] M.-P. Schutzenberger, Quelques problemes combinatoires de la theorie des automates, Cours

professe a l’Institut de Programmation en 1966/1967.

Reporter: Dirk Nowotka


Participants

Prof. Dr. Jean-Paul Allouche

Laboratoire de Rech. InformatiqueCNRS, Universite Paris Sud (Paris XI)Centre d’Orsay, Bat. 490F-91405 Orsay Cedex

Prof. Dr. Valerie Berthe

LIRMMCNRS, Universite Montpellier II161, rue AdaF-34392 Montpellier Cedex 5

Prof. Dr. Julien Cassaigne

Institut de Mathematiques de LuminyCNRSCase 907 - LuminyF-13288 Marseille Cedex 9

Prof. Dr. James Currie

Department of Mathematics & StatisticsUniversity of WinnipegWinnipeg MB R3B 2E9CANADA

Dr. Amy Glen

School of Chemical & MathematicalScien.Murdoch UniversitySouth StreetMurdoch , WA 6150AUSTRALIA

Prof. Dr. Tero Harju

Department of MathematicsUniversity of TurkuFIN-20014 Turku

Prof. Dr. Stepan Holub

Faculty of Mathematics and PhysicsCharles UniversitySokolovska 83186 75 Praha 8CZECH REPUBLIC

Prof. Dr. Juhani Karhumaki


Prof. Dr. Aldo de Luca

Dip. di Matematica e ApplicazioniUniversita di Napoli, Federico IIComplesso Monte S. AngeloVia CintiaI-80126 Napoli

Dr. habil. Dirk Nowotka

Institute for Formal Methods inComputer Science (FMI)Universitat StuttgartUniversitatsstr. 3870569 Stuttgart

Prof. Dr. Pascal Ochem

Laboratoire de Rech. InformatiqueCNRS, Universite Paris Sud (Paris XI)Centre d’Orsay, Bat. 490F-91405 Orsay Cedex

Dr. Elena Pribavkina

Department of Algebra & Discrete Math.Faculty of Mathematics & MechanicsUral State UniversityLenina 51620083 EkaterinburgRUSSIA


Dr. Eric Rowland

Department of MathematicsTulane UniversityNew Orleans , LA 70118USA

Dr. Kalle Saari


Prof. Dr. Jeffrey Shallit

School of Computer ScienceUniversity of WaterlooWaterloo ONT N2L 3G1CANADA

Dr. Thomas Stoll

Institut de Mathematiques de LuminyCNRSCase 907 - LuminyF-13288 Marseille Cedex 9

Prof. Dr. Luca Zamboni

Batiment Braconnier 34Boulevard du 11 Novembre 1918F-69622 Villeurbanne Cedex

Documents

Mathematisches Forschungsinstitut Oberwolfach › ~berthe › › Articles › OWR_2010_37.pdf · Mathematisches Forschungsinstitut Oberwolfach Report No. 37/2010 DOI: 10.4171