Jeffrey Shallit - University of Waterloo

Jeffrey Shallit
Are you Jeffrey Shallit?

Claim your profile, edit publications, add additional information:

Contact Details

Jeffrey Shallit
University of Waterloo

Pubs By Year

External Links

Pub Categories

Computer Science - Discrete Mathematics (25)
Mathematics - Combinatorics (24)
Mathematics - Number Theory (10)
Mathematics - Commutative Algebra (1)
Mathematics - General Topology (1)
Computer Science - Data Structures and Algorithms (1)
Computer Science - Computational Complexity (1)
Computer Science - Logic in Computer Science (1)

Publications Authored By Jeffrey Shallit

We study the following natural variation on the classical universality problem: given an automaton $M$ of some type (DFA/NFA/PDA), does there exist an integer $\ell \geq 0$ such that $\Sigma^\ell \subseteq L(M)$? The case of an NFA was an open problem since 2009. Here, using a novel and deep construction, we prove that the problem is NEXPTIME-complete, and the smallest such $\ell$ can be doubly exponential in the number of states. In the case of a DFA the problem is NP-complete, and there exist examples for which the smallest such $\ell$ is of the form $e^{\sqrt{n \log n} (1+o(1))}$, which is best possible, where $n$ is the number of states. Read More

Using a novel rewriting problem, we show that several natural decision problems about finite automata are undecidable (i.e., recursively unsolvable). Read More

The discriminator of an integer sequence s = (s(i))_{i>=0}, introduced by Arnold, Benkoski, and McCabe in 1985, is the function D_s(n) that sends n to the least integer m such that the numbers s(0), s(1), ... Read More

An abelian square is the concatenation of two words that are anagrams of one another. A word of length $n$ can contain at most $\Theta(n^2)$ distinct factors, and there exist words of length $n$ containing $\Theta(n^2)$ distinct abelian-square factors, that is, distinct factors that are abelian squares. This motivates us to study infinite words such that the number of distinct abelian-square factors of length $n$ grows quadratically with $n$. Read More

We determine the minimum possible critical exponent for all palindromes over finite alphabets. Read More

The continued logarithm algorithm was introduced by Gosper around 1978, and recently studied by Borwein, Calkin, Lindstrom, and Mattingly. In this note I show that the continued logarithm algorithm terminates in at most 2 log_2 p + O(1) steps on input a rational number p/q >= 1. Furthermore, this bound is tight, up to an additive constant. Read More

The discriminator of an integer sequence s = (s(i))_{i >=0}, introduced by Arnold, Benkoski, and McCabe in 1985, is the map D_s(n) that sends n >= 1 to the least positive integer m such that the n numbers s(0), s(1), ... Read More

We discuss several two-dimensional generalizations of the familiar Lyndon-Schutzenberger periodicity theorem for words. We consider the notion of primitive array (as one that cannot be expressed as the repetition of smaller arrays). We count the number of m x n arrays that are primitive. Read More

We give an explicit evaluation, in terms of products of Jacobsthal numbers, of the Hankel determinants of order a power of two for the period-doubling sequence. We also explicitly give the eigenvalues and eigenvectors of the corresponding Hankel matrices. Similar considerations give the Hankel determinants for other orders. Read More

A rectifier network is a directed acyclic graph with distinguished sources and sinks; it is said to compute a Boolean matrix $M$ that has a $1$ in the entry $(i,j)$ iff there is a path from the $j$th source to the $i$th sink. The smallest number of edges in a rectifier network computing $M$ is a classic complexity measure on matrices, which has been studied for more than half a century. We explore two well-known techniques that have hitherto found little to no applications in this theory. Read More

We investigate the behavior of the periods and border lengths of random words over a fixed alphabet. We show that the asymptotic probability that a random word has a given maximal border length $k$ is a constant, depending only on $k$ and the alphabet size $\ell$. We give a recurrence that allows us to determine these constants with any required precision. Read More

We prove that the property of being closed (resp., palindromic, rich, privileged trapezoidal, balanced) is expressible in first-order logic for automatic (and some related) sequences. It therefore follows that the characteristic function of those n for which an automatic sequence x has a closed (resp. Read More

We consider the real number $\sigma$ with continued fraction expansion $[a_0, a_1, a_2,\ldots] = [1,2,1,4,1,2,1,8,1,2,1,4,1,2,1,16,\ldots]$, where $a_i$ is the largest power of $2$ dividing $i+1$. We compute the irrationality measure of $\sigma^2$ and demonstrate that $\sigma^2$ (and $\sigma$) are both transcendental numbers. We also show that certain partial quotients of $\sigma^2$ grow doubly exponentially, thus confirming a conjecture of Hanna and Wilson. Read More

We prove a number of results on the structure and enumeration of palindromes and antipalindromes. In particular, we study conjugates of palindromes, palindromic pairs, rich words, and the counterparts of these notions for antipalindromes. Read More

We consider several novel aspects of unique factorization in formal languages. We reprove the familiar fact that the set uf(L) of words having unique factorization into elements of L is regular if L is regular, and from this deduce an quadratic upper and lower bound on the length of the shortest word not in uf(L). We observe that uf(L) need not be context-free if L is context-free. Read More

We implement a decision procedure for answering questions about a class of infinite words that might be called (for lack of a better name) "Tribonacci-automatic". This class includes, for example, the famous Tribonacci word T = 0102010010202 .. Read More

We implement a decision procedure for answering questions about a class of infinite words that might be called (for lack of a better name) "Fibonacci-automatic". This class includes, for example, the famous Fibonacci word f = 01001010.. Read More

We consider a measure of similarity for infinite words that generalizes the notion of asymptotic or natural density of subsets of natural numbers from number theory. We show that every overlap-free infinite binary word, other than the Thue-Morse word t and its complement t bar, has this measure of similarity with t between 1/4 and 3/4. This is a partial generalization of a classical 1927 result of Mahler. Read More

The Danish composer Per Noergaard defined the "infinity series" s = (s(n))_n>=0 by the rules s(0) = 0, s(2n) = -s(n) for n >= 1, and s(2n + 1) = s(n) + 1 for n >= 0; it figures prominently in many of his compositions. Here we give several new results about this sequence: first, the set of binary representations of the positions of each note forms a context-free language that is not regular; second, a complete characterization of exactly which note-pairs appear; third, that consecutive occurrences of identical phrases are widely separated. We also consider to what extent the infinity series is unique. Read More

As is well-known, the ratio of adjacent Fibonacci numbers tends to phi = (1 + sqrt(5))/2, and the ratio of adjacent Tribonacci numbers (where each term is the sum of the three preceding numbers) tends to the real root eta of X^3 - X^2 - X - 1 = 0. Letting alpha(n) denote the corresponding ratio for the generalized Fibonacci numbers, where each term is the sum of the n preceding, we obtain rapidly converging series for alpha(n), 1/alpha(n), and 1/(2-alpha(n)). Read More

We discuss the notion of privileged word, recently introduced by Peltomaki. A word w is privileged if it is of length <=1, or has a privileged border that occurs exactly twice in w. We prove the following results: (1) if w^k is privileged for some k >=1, then w^j is privileged for all j >= 0; (2) the language of privileged words is neither regular nor context-free; (3) there is a linear-time algorithm to check if a given word is privileged; and (4) there are at least 2^{n-5}/n^2 privileged binary words of length n. Read More

A palstar (after Knuth, Morris, and Pratt) is a concatenation of even-length palindromes. We show that, asymptotically, there are $\Theta(\alpha_k^n)$ palstars of length $2n$ over a $k$-letter alphabet, where $\alpha_k$ is a constant such that $2k-1 < \alpha_k < 2k-{1 \over 2}$. In particular, $\alpha_2 \doteq 3. Read More

In this paper we consider the following problems: how many different subsets of Sigma^n can occur as set of all length-n factors of a finite word? If a subset is representable, how long a word do we need to represent it? How many such subsets are represented by words of length t? For the first problem, we give upper and lower bounds of the form alpha^(2^n) in the binary case. For the second problem, we give a weak upper bound and some experimental data. For the third problem, we give a closed-form formula in the case where n <= t < 2n. Read More

We consider the following problem: given that a finite automaton $M$ of $N$ states accepts at least one $k$-power-free (resp., overlap-free) word, what is the length of the shortest such word accepted? We give upper and lower bounds which, unfortunately, are widely separated. Read More

We consider the following novel variation on a classical avoidance problem from combinatorics on words: instead of avoiding repetitions in all factors of a word, we avoid repetitions in all factors where each individual factor is considered as a "circular word", i.e., the end of the word wraps around to the beginning. Read More

We illustrate a general technique for enumerating factors of k-automatic sequences by proving a conjecture on the number f(n) of unbordered factors of the Thue-Morse sequence. We show that f(n) <= n for n >= 4 and that f(n) = n infinitely often. We also give examples of automatic sequences having exactly 2 unbordered factors of every length. Read More

Currie and Saari initiated the study of least periods of infinite words, and they showed that every integer n >= 1 is a least period of the Thue-Morse sequence. We generalize this result to show that the characteristic sequence of least periods of a k-automatic sequence is (effectively) k-automatic. Through an implementation of our construction, we confirm the result of Currie and Saari, and we obtain similar results for the period-doubling sequence, the Rudin-Shapiro sequence, and the paperfolding sequence. Read More

We investigate questions related to the presence of primitive words and Lyndon words in automatic and linearly recurrent sequences. We show that the Lyndon factorization of a k-automatic sequence is itself k-automatic. We also show that the function counting the number of primitive factors (resp. Read More

We show that the subword complexity function p_x(n), which counts the number of distinct factors of length n of a sequence x, is k-synchronized in the sense of Carpi if x is k-automatic. As an application, we generalize recent results of Goldstein. We give analogous results for the number of distinct factors of length n that are primitive words or powers. Read More

In this chapter we discuss the problem of enumerating distinct regular expressions by size and the regular languages they represent. We discuss various notions of the size of a regular expression that appear in the literature and their advantages and disadvantages. We consider a formal definition of regular expressions using a context-free grammar. Read More

We resolve an open question by determining matching (asymptotic) upper and lower bounds on the state complexity of the operation that sends a language L to (c(L*))*, where c() denotes complement. Read More

We describe a technique for mechanically proving certain kinds of theorems in combinatorics on words, using automata and a package for manipulating them. We illustrate our technique by solving, purely mechanically, an open problem of Currie and Saari on the lengths of unbordered factors in the Thue-Morse sequence. Read More

A filtration of a formal language L by a sequence s maps L to the set of words formed by taking the letters of words of L indexed only by s. We consider the languages resulting from filtering by all arithmetic progressions. If L is regular, it is easy to see that only finitely many distinct languages result. Read More

The notion of a k-automatic set of integers is well-studied. We develop a new notion - the k-automatic set of rational numbers - and prove basic properties of these sets, including closure properties and decidability. Read More

A celebrated 1922 theorem of Kuratowski states that there are at most 14 distinct sets arising from applying the operations of complementation and closure, any number of times, in any order, to a subset of a topological space. In this paper we consider the case of complementation and two abstract closure operators. In contrast to the case of a single closure operation, we show that infinitely many distinct sets can be generated, even when the closure operators commute. Read More

We show that there exists an infinite word over the alphabet {0, 1, 3, 4} containing no three consecutive blocks of the same size and the same sum. This answers an open problem of Pirillo and Varricchio from 1994. Read More

Let w be a binary string and let a_w (n) be the number of occurrences of the word w in the binary expansion of n. As usual we let s(n) denote the Stern sequence; that is, s(0)=0, s(1)=1, and for n >= 1, s(2n)=s(n) and s(2n+1)=s(n)+s(n+1). In this note, we show that s(n) = a_1 (n) + \sum_{w in 1 (0+1)*} s([w bar]) a_{w1} (n) where w bar denotes the complement of w (obtained by sending 0 to 1 and 1 to 0, and [w] denotes the integer specified by the word w interpreted in base 2. Read More

Affiliations: 1Department of Mathematics, University of Liège, 2School of Computer Science, University of Waterloo, 3Department of Algebra and Discrete Mathematics, Ural Federal University

We prove a Fife-like characterization of the infinite binary (7/3)-power-free words, by giving a finite automaton of 15 states that encodes all such words. As a consequence, we characterize all such words that are 2-automatic. Read More

The critical exponent of an infinite word is defined to be the supremum of the exponent of each of its factors. For k-automatic sequences, we show that this critical exponent is always either a rational number or infinite, and its value is computable. Our results also apply to variants of the critical exponent, such as the initial critical exponent of Berthe, Holton, and Zamboni and the Diophantine exponent of Adamczewski and Bugeaud. Read More

The separating words problem asks for the size of the smallest DFA needed to distinguish between two words of length <= n (by accepting one and rejecting the other). In this paper we survey what is known and unknown about the problem, consider some variations, and prove several new results. Read More

Given a formal language L specified in various ways, we consider the problem of determining if L is nonempty. If L is indeed nonempty, we find upper and lower bounds on the length of the shortest string in L. Read More


We give another proof of a theorem of Fife - understood broadly as providing a finite automaton that gives a complete description of all infinite binary overlap-free words. Our proof is significantly simpler than those in the literature. As an application we give a complete characterization of the overlap-free words that are 2-automatic. Read More

We show that various aspects of k-automatic sequences -- such as having an unbordered factor of length n -- are both decidable and effectively enumerable. As a consequence it follows that many related sequences are either k-automatic or k-regular. These include many sequences previously studied in the literature, such as the recurrence function, the appearance function, and the repetitivity index. Read More

In this paper we answer the following question: what is the lexicographically least sequence over the natural numbers that avoids 3/2-powers? Read More

Let (t_n) be the classical Thue-Morse sequence defined by t_n = s_2(n) (mod 2), where s_2 is the sum of the bits in the binary representation of n. It is well known that for any integer k>=1 the frequency of the letter "1" in the subsequence t_0, t_k, t_{2k}, .. Read More

A language L is closed if L = L*. We consider an operation on closed languages, L-*, that is an inverse to Kleene closure. It is known that if L is closed and regular, then L-* is also regular. Read More

Let g_j denote the largest integer that is represented exactly j times as a non-negative integer linear combination of { x_1, ... Read More

We consider some questions about formal languages that arise when inverses of letters, words and languages are defined. The reduced representation of a language over the free monoid is its unique equivalent representation in the free group. We show that the class of regular languages is closed under taking the reduced representation, while the class of context-free languages is not. Read More

In this note, we give a construction that provides a tight lower bound of mn-1 for the length of the shortest word in the intersection of two regular languages with state complexities m and n. Read More

We consider variations on the following problem: given an NFA M and a pattern p, does there exist an x in L(M) such that p matches x? We consider the restricted problem where M only accepts a finite language. We also consider the variation where the pattern p is required only to match a factor of x. We show that both of these problems are NP-complete. Read More