Canonical Trees, Compact Prefix-free Codes and Sums of Unit Fractions: A Probabilistic Analysis

For fixed $t\ge 2$, we consider the class of representations of $1$ as sum of unit fractions whose denominators are powers of $t$ or equivalently the class of canonical compact $t$-ary Huffman codes or equivalently rooted $t$-ary plane"canonical"trees. We study the probabilistic behaviour of the height (limit distribution is shown to be normal), the number of distinct summands (normal distribution), the path length (normal distribution), the width (main term of the expectation and concentration property) and the number of leaves at maximum distance from the root (discrete distribution).


Introduction.
We consider three combinatorial classes, which all turn out to be equivalent: partitions of 1 into powers of t, canonical compact t-ary Huffman codes, and "canonical" t-ary trees; see the precise discussion below. In this paper, we are interested in the structure of these objects under a uniform random model, and we study the distribution of various structural parameters, for which we obtain rather precise limit theorems. Let us first define all three classes precisely and explain the connections between them. Throughout the paper, t ≥ 2 will be a fixed positive integer. Figure 1 shows examples in the case t = 2.
1. Partitions of 1 into powers of t (representations of 1 as a sum of unit fractions whose denominators are powers of t) are formally defined as follows: The external size |(x 1 , . . . , x τ )| of such a representation (x 1 , . . . , x τ ) is defined to be the number τ of summands. 2. Second, we consider canonical compact t-ary Huffman codes: . . , t} * | C is prefix-free, compact, and canonical}.   Here, we use the following notions: • {1, . . . , t} * denotes the set of finite words over the alphabet {1, . . . , t}.
• A code C is said to be prefix-free if no word in C is a proper prefix of any other word in C. • A code C is said to be compact if the following property holds: if w is a proper prefix of a word in C, then for every letter a ∈ {1, . . . , t}, wa is a prefix of a word in C. • A code C is said to be canonical if the lexicographic ordering of its words corresponds to a nondecreasing ordering of the word lengths. This condition corresponds to taking equivalence classes with respect to permutations of the alphabet (at each position in the words). The external size |C| of a code C is defined to be the cardinality of C.
If C ∈ C Code with C = {w 1 , . . . , w τ } and the property that length(w i ) ≤ length(w i+1 ) holds for all i, then (length(w 1 ), . . . , length(w τ )) ∈ C Partition . This is a bijection between C Code and C Partition preserving the external size. This connection can be explained by the Kraft-McMillan inequality [20,22], which states that for any prefix-free code C = {w 1 , . . . , w τ } one must have τ i=1 t −length(wi) ≤ 1, and compact codes are precisely those for which equality holds (meaning that they are optimal in an information-theoretic sense). 3. Finally, both partitions and codes are related to so-called canonical rooted t-ary trees: C Tree = {T rooted t-ary plane tree | T is canonical}.
Here, we use the following notions: • t-ary means that each vertex has no or t children.
• Plane tree means that an ordering "from left to right" of the children of each vertex is specified. • Canonical means that the following holds for all k: if the vertices of depth (i.e., distance to the root) k are denoted by v 1 , . . . , v K from left to right, then deg(v i ) ≤ deg(v i+1 ) holds for all i. The external size |T | of a tree is given by the number of its leaves, i.e., the number of vertices of degree 1.
If C ∈ C Code , then a tree T ∈ C Tree can be constructed such that the vertices of T are given by the prefixes of the words in C, the root is the vertex corresponding to the empty word, and the children of a proper prefix w of a code word are given from left to right by wa for a = 1, . . . , t. This is a bijection between C Code to C Tree preserving the external size. Further formulations, details, and remarks can be found in the recent paper of Elsholtz, Heuberger, and Prodinger [11]. We will simply speak of an element in the class C when the particular interpretation as an element of C Partition , C Code , or C Tree is not relevant. Our proofs will use the tree model; therefore C Tree is abbreviated as T .
The external size of an element in C is always congruent to 1 modulo t − 1. This can easily be seen in the tree model, where the number of leaves τ and the number of internal vertices n are connected by the identity τ = 1 + n(t − 1). Therefore, we will from now on consider the internal size: for a tree T ∈ C Tree the internal size of T is the number n(T ) of internal vertices, for a code C ∈ C Code the internal size is the number of proper prefixes of words of C, and for a partition (x 1 , . . . , x τ ) ∈ C Partition the internal size is defined to be (τ − 1)/(t − 1). We will omit the word "internal" and will always use the variable n (or n(T ) for a specific element T ∈ C) to denote the size.
The asymptotics of the number of elements in C of size n has been studied by various authors; see the historical overview in [11]. Special cases and weaker versions (without explicit error terms) of the following result, which is given in [11] (building upon the generating function approach by Flajolet and Prodinger [14]), were obtained earlier and independently by different authors (Boyd [5], Komlós, Moser, and Nemetz [19], Flajolet and Prodinger [14], and Tangora [28]). Theorem 1.1 (see [11]). For t ≥ 2, the number of elements of size n in C is (in Bachmann-Landau notation) given by where ρ > ρ 2 and R are positive real constants depending on t with asymptotic expansions (as t → ∞) In fact, all O-constants can be made explicit and more terms of the asymptotic expansions in t of ρ, ρ 2 , and R can be given.
In spite of the fact that the counting problem has been studied independently by many different authors, to the best of our knowledge the structure of random elements has not been considered before. Thus the purpose of this contribution is to study the probabilistic behavior of various parameters of a random element in C of size n. We always use the uniform random model: whenever a random tree (equivalently, partition or code) of a given order n is chosen, all elements are considered to be equally likely: 1. The height h(T ) of a tree T ∈ C Tree is defined to be the maximum distance of a leaf from the root. In the interpretation as a code, this is the maximum length of a code word. In a representation of 1 as a sum of unit fractions, this corresponds to the largest denominator used (more precisely, to the largest exponent of the denominator).
The height is discussed in section 3. It is asymptotically normally distributed with mean ∼ μ h n and variance ∼ σ 2 h n, where cf. Theorem 3.1. Moreover, we prove a local limit theorem. 2. The number of distinct summands of a representation (x 1 , . . . , x τ ) of 1 as a sum of unit fractions is denoted by d(x 1 , . . . , x τ ). In the tree model, this corresponds to the cardinality d(T ) of the set of depths of leaves in a tree T ∈ C Tree . In the code model, this is the number of distinct lengths of code words.
The number d(T ) is studied in section 4. It is asymptotically normally distributed with mean ∼ μ d n and variance ∼ σ 2 d n, where cf. Theorem 4.1. Moreover, a local limit theorem is proved again. 3. The maximum number of equal summands of a representation (x 1 , . . . , x τ ) of 1 as a sum of unit fractions is denoted by w(x 1 , . . . , x τ ). In the code model, this is the maximum number of code words of equal length. In the tree model, this is the "leaf-width" w(T ), i.e., the maximum number of leaves on the same level.
The number w(T ) is studied in section 5. We prove that E(w(T )) = μ w log n+ O(log log n) with μ w = 1/(t log 2) + O(1/t 2 ) and a concentration property; cf. Theorem 5.1. 4. The (total) path length (T ) of a tree T ∈ C Tree is defined to be the sum of the depths of all vertices of the tree. In our context, it is perhaps most natural to consider the external path length external (T ), though, which is the sum of depths over all leaves of the tree, as this parameter corresponds to the sum of lengths of code words in a code C ∈ C Code . Likewise, the internal path length internal (T ) is the sum of depths over all nonleaves. Clearly, we have for t-ary trees are easily proven. Therefore, all distributional results for any one of those parameters immediately cover all three. The total path length turns out to be asymptotically normally distributed as well (see Theorem 7.1), with mean ∼ μ tpl n 2 and variance ∼ σ 2 tpl n 3 . The coefficients have asymptotic expansions and σ tpl = t 2 12 The path length is studied in section 7. Its analysis is based on a generating function approach for the moments, combined with probabilistic arguments to obtain the central limit theorem. 5. The number of leaves on the last level (i.e., maximum distance from the root) of a tree T ∈ C Tree is denoted by m(T ). This corresponds to the number of code words of maximum length and to the number of smallest summands in a representation of 1 as a sum of unit fractions. This parameter may appear to be the least interesting of the parameters we study. However, it is a natural technical parameter when constructing generating functions for the other parameters. From these generating functions the probabilistic behavior of m(T ) can be read off without too much effort, so we do include these results in section 6.
The limit distribution of m(T ) is a discrete distribution with mean 2t + o (1) and variance 2t 2 + o(1); cf. Theorem 6.1. A noteworthy feature of the results listed above is the fact that the distributions we observe are quite different from those that one obtains for other probabilistic random tree models. Specifically, the parameters differ not only from those of Galton-Watson trees (which include, among others, uniformly random t-ary trees) but also from those of recursive trees and general families of increasing trees. See [7] for a general reference. In particular, the following hold: • The asymptotic order of the height of a random Galton-Watson tree of order n is only √ n, and it is known that the limiting distribution (which is sometimes called a Theta distribution) coincides with the distribution of the maximum of a Brownian excursion [12]. The height of random recursive trees (or other families of increasing trees) is even only of order log n and heavily concentrated around its mean; see [6].
• The path length of random Galton-Watson trees is of order n 3/2 , and it follows an Airy distribution (like the area under a Brownian excursion) in the limit [26]. For recursive trees, the path length is of order n log n with a rather unusual limiting distribution [21]. • While the height of our canonical trees is greater than that of Galton-Watson trees, precisely the opposite holds for the width (as one would expect): it is of order √ n for Galton-Watson trees [8,27], with the same limiting distribution as the height, as opposed to only log n in our setting. For recursive trees, the width is even of order n/ √ log n; see [9]. Indeed, the structure of our canonical t-ary trees is comparable to that of compositions: Counting the number of internal vertices on each level from the root, we obtain a restricted composition, in which each summand is at most t times the previous one. In the limit t → ∞ one obtains compositions of n starting with a 1 in this way. The recent series of papers by Bender and Canfield [1,2,3] and Bender, Canfield, and Gao [4] is concerned with compositions with various local restrictions. In fact it would be possible to derive the central limit theorems for the height and the number of distinct summands from Theorem 4 in [2], but in a less explicit fashion (without precise constants, and further work would still be required for a local limit theorem). A parameter related to the "leaf-width" (the largest part of a composition) is also studied in [4], but in addition to the fact that the parameters are not quite identical, it also seems that the technical conditions required for the main result of [4] are not satisfied here.
Finally, we offer a remark on numerics and notation. Throughout the paper, various constants occur in all our major results, and we provide numerical values for small t as well as asymptotic formulae for these constants in terms of t. The error terms that occur in these formulae have an explicit O-constant, which is indicated by error functions ε j (. . .). These functions have the property that |ε j (. . .)| ≤ 1 for all values of the indicated parameters. All results were calculated with the free opensource mathematics software system SageMath [24] and are available online. 1 The numerical expressions were obtained by using interval arithmetic; therefore they are reliable results. Each numerical value of this paper is given in such a way that its error is at most the magnitude of the last indicated digit. It would be possible to calculate the values with higher accuracy. Determining accurate numerical values and asymptotic formulae is not just interesting in its own right; it is also important for some of our theorems: specifically, for all Gaussian limit laws it is crucial to ensure that the growth constants associated with the variance are nonzero. We will therefore comment repeatedly on how reliable numerical values can be obtained.

The generating function.
In this section, we derive the generating function which will be used throughout the paper.
The analysis of the path length (section 7) also requires results on canonical forests. For r ≥ 1, we consider the set F r of canonical forests with r roots. These r roots are all on the same level and ordered from left to right. The notion "canonical" introduced for trees here is meant to hold over all connected components of the forest. This means that a forest may not be seen as a collection of trees but rather as the subgraph of a canonical tree induced by its vertices of depths ≥ d for some d. In fact, this is also the interpretation for which we will need results on forests. We will phrase the generating function in terms of forests, but most other results will be formulated for trees only.
The height h(T ), the cardinality d(T ) of the set of different depths of leaves, and the number m(T ) of leaves on the last level of a forest 2 T ∈ F r of size n = n(T ) can be analyzed by studying a multivariate generating function H (q, u, v, w), where q labels the size n(T ), u labels the number m(T ) of leaves on the last level, v labels the cardinality d(T ) of the set of depths of leaves, and w labels the height h(T ).
Theorem 2.1. The generating function can be expressed as The functions a(q, u, v, w) and b (q, u, v, w) are analytic in (q, u, v, w) when When u = 1, the generating function can be simplified to The proof of Theorem 2.1 depends on solving a functional equation for the generating function. As we will encounter similar functional equations for related generating functions in section 7, we formulate the relevant result in the following lemma.
Lemma 2.2. Let D ⊆ C be the closed unit disc, and let q ∈ C with |q| < 1. Let P , R, S, f be bounded functions on D and s be a constant such that |S(u)| ≤ s < 1 for all u ∈ D. If holds for all u ∈ D, then Proof. We iterate the functional equation (2.4) and obtain The assumption |q| < 1 implies that lim k→∞ q k u t k = 0 for |u| ≤ 1. Therefore, for u ∈ D. Setting u = 1 in (2.7) yields (2.5).
Proof of Theorem 2.1. The proof of Theorem 2.1 follows ideas of Flajolet and Prodinger [14]; see also [11]. We first consider A forest T of height h + 1 arises from a forest T of height h by replacing j of its m(T ) leaves on the last level (for some j with 1 ≤ j ≤ m(T )) by internal vertices, each with t leaves as its children. If j = m(T ), then all old leaves become internal vertices, so that d(T ) = d(T ). Otherwise, i.e., if j < m(T ), at least one of them becomes a new leaf, meaning that we have a new level that contains one or more leaves, and hence d(T ) = d(T ) + 1.
For the generating function H h , this translates to the recursion where we set Note that the initial value is given by H 0 (q, u, v) = u r v. Now set We note that if (q, u, v, w) ∈ D 0 , we have Therefore, a(q, u, v, w) and b(q, u, v, w) are analytic in D 1 .
In the following lemma, we also state a simplified expression and a functional equation for b (q, u, v, w) in the case v = 1, w = 1.
Proof. This is an immediate consequence of (2.2). Next we recall results on the singularities of H(q, 1, 1, 1); see Proposition 10 of [11]. We use functions ε j for modeling explicit O-constants, as was mentioned at the end of the introduction.
Lemma 2.4. The generating function H(q, 1, 1, 1) has exactly one singularity q = q 0 with |q| < 1 − 0.72 t . This singularity q 0 is a simple pole and is positive. For t ≥ 4, we have For t ∈ {2, 3}, the values are given in Table 1. Furthermore, let for t ≥ 6, and let Q be given by Table 1 for 2 ≤ t ≤ 5. Then q 0 is the only singularity q of H(q, 1, 1, 1) with |q| ≤ q 0 /Q. Setting U = 1 − log 2 t 2 for t > 2 and U = 1 − 19 log 2 80 for t = 2, we have the estimate These results do not depend on the choice of the number of roots r. Proof. By [11,Proposition 10], the function 1 − b(q, 1, 1, 1) has a unique simple zero q = q 0 with |q| ≤ 1 − 0.72/t and no further zero for |q| ≤ q 0 /Q; the asymptotic estimates for q 0 and Q follow from the results given in [11].
At this point, we still have to show that the numerator does not vanish in q 0 . We note that q 0 ≤ 3/5. Using [11,Lemma 8], we obtain  holds uniformly in r.
For t ≥ 30, the estimate (2.9) follows from the asymptotic expressions. For t ≤ 30, it is verified individually.
Using this result, we will be able to apply singularity analysis to all our generating functions in the coming sections. At this point, we restate Theorem 1.1 on the number of trees taking the notation of Theorem 2.1 into account and extend it to the number of canonical forests with r roots.
Lemma 2.5. For r ≥ 1, let where a(q 0 , 1, 1, 1) is taken in the version with r roots. Then uniformly in r ≥ 1 and the number of canonical forests with r roots of size n is also uniformly in r ≥ 1.
Proof. By singularity analysis [13,15], Lemma 2.4, and Theorem 2.1, the number of canonical forests with r roots of size n is The O-constant can be chosen independently of r, as a(q, 1, 1, 1) can be bounded independently of r for |q| = q 0 /Q. The estimate (2.10) immediately yields (2.12). Combining this with (2.14) yields (2.13).
When analyzing the asymptotic behavior of the height (section 3), the number of leaves on the last level (section 6), and the path length (section 7), the corresponding formulae contain the infinite sum b(q, u, 1, w) and its derivatives. In order to perform the calculations to get the asymptotic expressions in t as well as certifiable numerical values for particular t, we will work with a truncated sum and bound the error we make. We define Note that the variable v encoding the distinct depths of leaves is handled separately in Lemmata 2.8 and 2.9.
The following lemmata provide the estimates we need. Lemma 2.6. Let J ∈ N and q, u, w ∈ C with qu t−1 < 1. Set and suppose that Q < 1 holds. Then Note that as qu t−1 < 1, the error bound stated in the lemma is decreasing in J.
Proof of Lemma 2.6. Set which we wanted to show. We also need to truncate the infinite sums of derivatives of b(q, u, 1, w). This is done by means of the following lemma.
Using Cauchy's integral formula once more yields which is the desired result after inserting the bound from above. In section 4 we analyze the distinct depths of leaves. Again, we work with infinite sums by replacing them with finite sums and bounding the error we make. Similar to the estimates above, we define and have the following two lemmata.
and suppose Q < 1 holds. Then Therefore, for j ≥ J we obtain

This leads to the bound
which we wanted to show. The result of the previous lemma can be extended to derivatives; see below. The proof is skipped, as it is very similar to the proof of Lemma 2.7. Lemma 2.9. Let J ∈ N, α ∈ N 0 , and γ ∈ N 0 . Further, let q ∈ C with |q| ≤ 2 3 and and suppose J was chosen such that Q < 1 holds. Then 3. The height. We start our analysis with the height h(T ) of a canonical tree T ∈ T . It turns out that the height is asymptotically (for large sizes n = n(T )) normally distributed, and we will even prove a local limit theorem for it. Moreover, we obtain asymptotic expressions for its mean and variance. This will be achieved by means of the generating function H(q, u, v, w) derived in section 2.
So let us have a look at the bivariate generating function for the height. We consider its denominator From Lemma 2.4 we know that D(q, 1) has a simple dominant zero q 0 . We can see the expansion of D(q, w) around (q 0 , 1) as perturbation of a meromorphic singularity; cf. the book of Flajolet and Sedgewick [15,section IX.6]. This yields a central limit theorem (normal distribution) for the height without much effort. But we can do better: we can show a local limit theorem for the height. The precise results are stated in the following theorem.

For a randomly chosen tree T ∈ T of size n the height h(T ) is asymptotically (for n → ∞) normally distributed, and a local limit theorem holds. Its mean is μ h n + O(1), and its variance is
and Recall that "randomly chosen" here and everywhere else in this paper means "uniformly chosen at random" and that the error functions ε j (. . .) are functions with absolute value bounded by 1; see also the last paragraph of the introduction.
We calculated the values of the constants μ h and σ 2 h numerically for 2 ≤ t ≤ 30. Those values can be found in Table 2. Figure 2 shows the result of Theorem 3.1. It compares the obtained normality with the distribution of the height calculated for particular values in SageMath.  Remark 3.2. For the (central and local) limit theorem to hold, it is essential that σ 2 h = 0, which is why we need reliable numerical values and estimates for large t. As mentioned earlier, we used interval arithmetic in SageMath [24] in all our numerical calculations to achieve such results. We used a precision of 53 bits (machine precision) for the bounds of the intervals. All values are calculated to such a precision that the error is at most the magnitude of the last digit that occurs. The reason for the varying number of digits after the decimal point (in, for example, Table 2) are numerical artifacts. In these cases, we could have given an additional digit at the cost of a slightly greater error (twice the magnitude of the last digit).
The proof of Theorem 3.1 is split up into several parts. At first, we get asymptotic normality (central limit theorem) and the constants for mean and variance by using Theorem IX.9 (meromorphic singularity perturbation) from the book of Flajolet and Sedgewick [15]. For the local limit theorem we need to analyze the absolute value of the dominant zero q 0 (w) of the denominator D(q, w) of the generating function H(q, 1, 1, w). Going along the unit circle, i.e., taking w = e iϕ , this value has to have a unique minimum at ϕ = 0.
From the combinatorial background of the problem (nonnegativity of coefficients) it is clear that q 0 e iϕ ≥ |q 0 (1)|. The task showing the uniqueness of this minimum at ϕ = 0 is again split up: We show that the function q 0 e iϕ is convex in a region around ϕ = 0 (central region); see Lemmata 3.4-3.6. For the outer region, where ϕ is not near 0, we show that zeros of the denominator are larger there. This is done in Lemma 3.3.
Those lemmata mentioned above showing that the minimum is unique work for all general t ≥ 30. For the remaining t, precisely, for each t with 2 ≤ t ≤ 30, the same ideas are used, but the checking is done algorithmically using interval arithmetic and SageMath [24]. Details are given in Remark 3.8.
So much for the idea of the proof. We start the actual proof by analyzing the denominator D(q, w). For our calculations we will truncate this infinite sum and use the finite sum given by Lemma 2.6. In particular, we write down the special case J = 2 of this lemma, which will be needed a couple of times in this section. Substituting 1/z for q, we get under the assumption that |w| < |z| As mentioned earlier, the proof of the local limit theorem for the height for general t consists of two parts: one for w in the central region (around w = 1) and one for w in the outer region. The following lemma shows that everything is fine in the outer region. After that, a couple of lemmata are needed to prove our result for the central region.
Proof. Suppose that we have a zero z 0 of the denominator D(1/z, w) for a given w and that this zero fulfills |z 0 | ≥ 2 − 1/2 t . We can extend the equation which can be rewritten as Taking absolute values and using bound (3.2) obtained from Lemma 2.7 yields We have the lower bounds which can be found by using monotonicity and the value at t = 2. Therefore, we obtain Since we have assumed |z 0 | ≥ 2 − 1/2 t , we deduce On the other hand, using |ϕ| > 97/96 π 2 −t/2 and the inequality |sin(ϕ/4)| ≥ |ϕ| /( √ 2π) for |ϕ| ≤ π (which follows by concavity of the sine on the interval [0, π 4 ]), we have which yields a contradiction. Now we study the central region more closely. Looking at the assumptions used in Lemma 3.3, this is when |ϕ| ≤ 97/96 π 2 −t/2 . As mentioned in the sketch of the proof, we show that the function q 0 e iϕ is convex.
We know the location of the dominant and second dominant zeros of the denominator D(q, 1). As we need those roots for general w (along the unit circle), we analyze the difference of D(q, w) from D(q, 1). Using Rouché's theorem then yields a bound for the dominant zero, which is stated precisely in the following lemma.
On the other hand, the Möbius transform q → 1 − q/(1 − q) maps the circle |q| = 2/3 to the circle |z − 1/5| = 6/5. Therefore |1 − q/(1 − q)| ≥ 1, and so we have This proves the lemma by Rouché's theorem and Lemma 2.4. The previous lemma gives us exactly one value q 0 (w) for each w in a region around 1. We continue by showing that this function q 0 is analytic.
We follow along the lines of the proof of the analytic inversion lemma; cf. Flajolet and Sedgewick [15,Chapter IV.7]. Consider the function Since D(q, w) = 0 for all q and w allowed by the assumptions, this function is continuous. Moreover, using the theorems of Morera and Fubini as well as Cauchy's integral theorem, the function σ 1 is analytic. By Lemma 3.4 and by using the residue theorem we get that σ 1 (w) equals q fulfilling D(q, w) = 0 and |q| < 2 3 ; i.e., we obtain σ 1 (w) = q 0 (w).
Since we have analyticity of q 0 in a region around 1 by Lemma 3.5, we can show that small changes in w do not matter much; see the following lemma for details. Later, this is used to estimate the derivative at some point w by the derivative at 1.
Lemma 3.6. Let t ≥ 30 and w = e iϕ , where ϕ ∈ R with |ϕ| ≤ 97/96 π 2 −t/2 . We have the inequalities denotes the kth derivative of q 0 . For its absolute value we obtain Collecting all those results and using d ≤ 1 2 and the bound given for |ϕ| results in .
Inserting all bounds gives the estimates stated for k ∈ {0, 1, 2}. Now we are ready to show that the second derivative of q 0 e iϕ is positive. To do so, we show that this second derivative is around 1 8 for ϕ = 0 and use the bounds of Lemma 3.6 to conclude positivity for w in some region around 1.
Lemma 3.7. If t ≥ 30 and ϕ ∈ R with |ϕ| ≤ 97/96 π 2 −t/2 , then and analogously Δ qq , Δ qw , and Δ ww for the function D(q, w) derived twice and then evaluated at q = q 0 (w). By inserting the asymptotic expansion of q 0 (see Lemma 2.4) into the expressions For the calculations themselves, we used the approximation D 3 (q, w) of the denominator D(q, w) together with the bound for the tail given in Lemma 2.7. Set w = e iϕ . Using the bounds of Lemma 3.6 yields We define x(ϕ) and y(ϕ) to be the real and imaginary parts of q 0 e iϕ , respectively. Thus Then, the estimates above lead to These in turn together with give us the second derivative which is what we wanted to show. Remark 3.8. The ideas in this section presented so far can also be used to show the uniqueness of the minimum of q 0 e iϕ at ϕ = 0 for a fixed t. In particular, this works for t < 30, where some of the results above do not apply.
For the calculations SageMath [24] is used. Further, we use interval arithmetic for all operations. The checking for fixed t is done in the following way. We start with the interval [−4, 4] for ϕ. In each step, we check whether the second derivative (using (3.4) and (3.5)) is positive. If not, then we half each of the bounds of the interval and repeat the step above. When this stops, we end up with a region around 0 that is convex. For its complementary, we now use a bisection method to show that q 0 e iϕ > |q 0 (1)|. Note that we can use an approximation D J (q, w) instead of the denominator D(q, w), which can be compensated for by taking the bounds obtained in Lemma 2.7 into account.
For 2 ≤ t ≤ 30, those calculations were done with a positive result; i.e., the minimum at ϕ = 0 is unique. Now we have all results together to prove the main theorem of this section. Proof of Theorem 3.1. We use Theorem IX.9 of Flajolet and Sedgewick [15] and apply that theorem to the function H(q, 1, 1, w). This gives us the mean and the variance and as a central limit asymptotic normality. In particular, we obtain By singularity analysis, we can extract the asymptotics to get the linear behavior of this mean and in particular the constant (3.1).
For the local limit, we need a more refined analysis. Recall the notation D(q, w) as the denominator of H(q, 1, 1, w), and let q 0 (w) be given implicitly by D(q 0 (w) , w) = 0, |q 0 (w)| < 2 3 , according to Lemmata 2.4 and 3.4. Set q 0 = q 0 (1) and Then we obtain the asymptotic formula μ h n + O(1) for the mean with To calculate the coefficients c αγ we need derivatives of D(q, w). In order to avoid working with infinite sums, we use the approximations D J (q, w). Lemma 2.7 shows that the error made by using those approximations is small. For the calculations themselves, SageMath [24] was used.
When t < 30, we use an algorithmic approach to check that the minimum at ϕ = 0 is unique. The details can be found in Remark 3.8.

The number of distinct depths of leaves.
In this section we study the number of distinct depths of leaves d(T ) of a canonical tree T ∈ T , motivated by the interpretation as the number of distinct code word lengths in Huffman codes. This parameter is also asymptotically normally distributed, and we show a local limit theorem. The approach is essentially the same as for the height. It is based on the generating function H(q, u, v, w) from section 2. To analyze the parameter d(T ), we look at the bivariate generating function for the number of distinct depths of leaves. Again, we consider its denominator and proceed as in the previous section. Lemma 2.4 tells us the existence of a simple dominant zero q 0 of D(q, 1). Again, we expand the denominator D(q, v) around (q 0 , 1) and use Theorem IX.9 from Flajolet and Sedgewick [15] to get asymptotic normality. The local limit theorem follows from considerations of the dominant zero of D(q, v) with v on the unit circle. This results in the following theorem. Theorem 4.1. For a randomly chosen tree T ∈ T of size n the number of distinct depths of leaves d(T ) is asymptotically (for n → ∞) normally distributed, and a local limit theorem holds. Its mean is μ d n + O(1), and its variance is and for t ≥ 2. Again, as in the previous section, we calculated the values of the constants μ d and σ 2 d numerically for 2 ≤ t ≤ 30, and they are given in Table 3. Figure 3 visualizes the result of Theorem 4.1 as in the previous section.
As mentioned above, the proof of Theorem 4.1 works analogously to the proof of Theorem 3.1. It is again spread over several lemmata. There is a one-to-one  1, u, 1). The error made by this approximation was analyzed at the end of section 2, namely in the Lemmata 2.8 and 2.9. For the local limit theorem, we split up into the central region around v = 1 and an outer region. The following lemma covers the latter one.
The proof follows along the same lines as the proof of Lemma 3.3, but we get the bound Next, we go on to the central region. As a first step, we bound the location of the dominant zero.
which is valid for t ≥ 4. The proof of this analyticity result is the same as that for Lemma 3.5 and is therefore skipped here.
In the central region around v = 1, small changes in v do not change the location of the dominant zero much, which is made explicit in the lemma below.
Again, the proof works analogously to the proof of the corresponding lemma for the height parameter. In order to prove the local limit theorem we show that the second derivative of q 0 e iϕ is positive. This is stated in the following lemma.
Lemma 4.6. If t ≥ 30 and ϕ ∈ R with |ϕ| ≤ 2π 2 −t/2 , then We use the proof of Lemma 3.7 and update the constants. For a fixed t we can use SageMath [24] and perform calculations with interval arithmetic. The details, which are stated for the height in Remark 3.8, remain valid. For integers t fulfilling 2 ≤ t ≤ 30 we showed that q 0 e iϕ has a unique minimum at ϕ = 0.
The proof of Theorem 4.1 follows by the same arguments as the proof of Theorem 3.1: We use Theorem IX.9 of Flajolet and Sedgewick [15] applied to the function H(q, 1, v, 1) to get mean and variance (and asymptotic normality as a central limit, too). For the local limit theorem the uniqueness of the minimum of q 0 e iϕ is shown by a two-fold strategy. The central region with |ϕ| ≤ √ 3 π 2 −t/2 is covered by Lemma 4.6 (using previous lemmata as prerequisites). Lemma 4.2 discusses the outer region. For t < 30 the algorithmic approach above is used.

The width.
In this section, we consider the width, i.e., the maximum number of leaves on the same level, for which we have the following theorem.
Theorem 5.1. For a randomly chosen tree T ∈ T of size n, we have for the expectation of the width w(T ), where μ w is given by for t ≥ 10. For 2 ≤ t ≤ 9, the values of μ w are given in Table 4. Furthermore, we have the concentration property In Figure 4 one can find the distribution of the leaf-width for a given parameter set together with the mean found in Theorem 5.1.
First, we sketch the idea of the proof. We consider trees whose width is bounded by K. The corresponding generating function W K (q) can be constructed by a suitable transfer matrix, and we quantify the obvious convergence of W K (q) to H(q, 1, 1, 1). The dominant singularity q K of W K (q) is estimated by truncating the infinite positive eigenvector of an infinite transfer matrix corresponding to H(q, 1, 1, 1) and applying methods from Perron-Frobenius theory. Then the probability P(w(T ) ≤ K) can be extracted from W K (q) using singularity analysis. Our key estimate states that the singularity q K converges exponentially to q 0 , from which the main term of the expectation as well as the concentration property are obtained quite easily. A more precise result on the distribution of the width would depend on a better understanding of the behavior of q K as K → ∞, which seems to be quite complicated.  The proof of the theorem depends on the following definitions. Apart from the width w(T ), we also need the "inner width" w * (T ) defined to be for a recursive construction. Here, L T (k) denotes the number of leaves at level k. By definition, the inner width w * (T ) does not take the leaves on the last level into account.
For K > 0, we are interested in the generating function for r ≥ 0 so that Here, the summand 1 corresponds to the tree of order 1. For all other trees, the number m(T ) of leaves on the last level is clearly a multiple of t.
Next we set up a recursion for W K,r , 1 ≤ r ≤ N (K), where N (K) := K/(t − 1) − 1. Let us define the column vector W K (q) := (W K,1 (q), . . . , W K,N (K) ) T and the "transfer matrix" where the Iversonian notation 3 [expr ] = 1 if expr is true, 0 if expr is false popularized by Graham, Knuth, and Patashnik [17] has been used. We now express W K (q) in terms of M K (q). Lemma 5.2. For K ≥ t, we have Proof. As in the proof of Theorem 2.1, a tree T of height h+1 ≥ 2, inner width at most K, and m(T ) = rt arises from a tree T of height h, inner width at most K, and m(T ) = st by replacing r of the st leaves of T on the last level by internal vertices with t succeeding leaves each. We obviously have r ≤ st. In order to ensure that w * (T ) ≤ K, we have to ensure that st − r ≤ K. We rewrite these two inequalities as If r ≤ N (K), we have r < K/(t − 1) and therefore s < K/(t − 1) by (5.3), i.e., s ≤ N (K). This justifies our choice of N (K). The construction above yields s new internal vertices in T . There is only one tree T of height < 2, namely the star of order t + 1, which has one internal vertex (the root). In this case, r = 1. Translating these considerations into the language of generating functions yields Rewriting this in vector form yields (5.2). We will obtain asymptotic expressions for the coefficients of W K by singularity analysis. To this end, we have to find the singularities of (I − M K (q)) −1 as a meromorphic function in q. In order to do so, we have to consider the zeros of the determinant det(I − M K (q)). Note that q K is a zero of det(I − M K (q)) if and only if 1 is an eigenvalue of M K (q K ). In the next lemma, we collect a few results connecting M K (q) with Perron-Frobenius theory.
Lemma 5.3. Let K ≥ t and q > 0. Then 1. the matrix M K (q) is a nonnegative, irreducible, primitive matrix; 2. the function q → λ max (M K (q)) mapping q to the spectral radius of M K (q) is a strictly increasing function from (0, ∞) to (0, ∞); x ≥ x holds componentwise for some positive vector x, then λ max (M K (q)) ≤ 1 or λ max (M K (q)) ≥ 1, respectively. Proof. We prove each statement separately.
1. The matrix M K (q) is nonnegative by definition. We note that r t ≤ r − 1 holds for all r ≥ 2 and r + 1 ≤ r+K t holds for all r < N(K). This implies that all subdiagonal, diagonal, and superdiagonal elements of M K (q) are positive. Thus M K (q) is irreducible. As all diagonal elements are positive, it is also primitive. 2. This is an immediate consequence of [16, Theorem 8.8.1(b)]. 3. Assume that M K (q)x ≤ x for some positive x. Let y T > 0 be a left eigenvector of M K (q) to the eigenvalue ρ(M K (q)). Then The result follows upon division by y T x > 0. The case M K (q)x ≥ x is analogous. We consider the infinite matrix when N tends to ∞; cf. Eaves [10].
For |q| < 1, this infinite determinant converges by Eaves' sufficient condition. We now show that the infinite determinant is indeed the denominator of the generating function H(q, 1, 1, 1).
Proof. When expanding the infinite determinant, we take the 1 on the diagonal in almost all rows and some other entry in rows a 1 < a 2 < · · · < a k for some k. These other entries have to come from −M ∞ (q). Extracting the sign for these rows yields We trivially have a i ≤ ta j for j ≥ i, so all entries on the diagonal of ([a i ≤ ta j ]) 1≤i,j≤k and above this diagonal are 1. If a 2 ≤ ta 1 , the first and second rows of ([a i ≤ ta j ]) 1≤i,j≤k are identical, so the determinant vanishes. Therefore, we only have to consider summands with a 2 > ta 1 . In this case, we clearly have a i > ta 1 for all i ≥ 2; i.e., the first column of ([a i ≤ ta j ]) 1≤i,j≤k is (1, 0, . . . , 0) T . Repeating this argument, we see that only summands with a j+1 > ta j for 1 ≤ j < k contribute to the determinant. For those summands, the matrix ([a i ≤ ta j ]) 1≤i,j≤k equals ([j ≥ i]) 1≤i,j≤k and thus has determinant 1.
Therefore, we obtain the representation With the change of variables a 1 =: b k and a j+1 − ta j =: b k−j for 1 ≤ j < k, we obtain 1, 1, 1).
If K tends to infinity, W K (q) tends to H(q, 1, 1, 1), as the restriction on the width becomes meaningless. For our purposes, we will need a slightly stronger result: we also need convergence of the numerator and the denominator of W K (q) given by (5.2) and Cramer's rule to the numerator a(q, 1, 1, 1) and the denominator 1 − b(q, 1, 1, 1) of H(q, 1, 1, 1), respectively. We prove this in two steps. The first one is to prove that the numerator and the denominator of W K (q) tend to the corresponding infinite determinants. This is stated in the following lemma.
of compositions of k with distinct parts: the set S can be recovered as the set of summands in the composition, and the permutation π can be recovered from the order of the summands.
As there are at most exp(2 √ k log k) compositions of k with distinct parts by a result of Richmond and Knopfmacher [23], there are at most that many summands ±q k in the infinite determinant det(I − M ∞ (q)).
The difference between det(I − M ∞ (q)) and det(I − M K (q)) consists of those summands which do not choose the 1 on the diagonal in some row > N(K) or which choose an entry in some column s and in some row r with s > (r + K)/t. In the latter case, the 1 on the diagonal cannot be chosen in row s, so that the exponent of q in this summand is at least r + s > K/t. So all summands in the difference are of the form ±q k for some k ≥ K/t. By the triangle inequality and the above estimates, we obtain The argument does not change if the sth column of both matrices is replaced by the column vector (q, 0, . . . , 0) T . Differentiating the determinant can be done term by term. The error term does not change, as the bound O(q K/(2t) ) is weak enough.
The second step in the proof of the convergence of the numerator and the denominator of W K (q) consists of the following simple lemma.
Lemma 5.6. Let |q| ≤ 0.6. Then the denominator det( to a(q, 1, 1, 1) with the same error. The same is true for the first derivatives with respect to q.
Proof. The first statement is simply the combination of Lemmata 5.5 and 5.4. As a formal power series, W K (q) converges to H(q, 1, 1, 1) in view of the fact that [q n ]W K (q) = [q n ]H(q, 1, 1, 1) holds for n ≤ (K − 1)/(t − 1) because a canonical tree with n internal vertices has 1 + n(t− 1) leaves and therefore width at most 1 + n(t− 1).
In order to obtain information on the roots of det(I − M K (q)) and therefore the singularities of W K (q), we approximate the Perron-Frobenius eigenvector of M K (q) by that of the infinite matrix M ∞ (q). The following lemma gives this eigenvector explicitly-as we will see in the next section, it has a natural combinatorial interpretation.
In particular, if we set p r = [u rt ]b(q 0 , u, 1, 1), then (p r ) r≥1 is a right eigenvector of M ∞ (q 0 ) to the eigenvalue 1, i.e., Proof. Multiplying the left-hand side of (5.4) with u rt and summing over r ≥ 1 yields where the last equality comes from Lemma 2.3. This concludes the proof of (5.4).
We now use the fact that (p r ) r≥1 is an eigenvector of M ∞ (q) to derive bounds for its entries.
Proposition 5.8. All constants p r , r ≥ 1, are positive, and we have p r = Ω(q r * /r) and p r = O(r 2 q r * ), where Proof. As we will see later in the proof of Theorem 6.1, equation (6.2), the p r are limits of probabilities and therefore a priori nonnegative. In fact, this is a consequence of Lemma 2.5. Moreover, they sum to 1 as mentioned earlier, and in view of the eigenvalue equation and the fact that M ∞ (q) is an irreducible matrix, we even know that they must be strictly positive.
By the eigenvalue equation (5.5), we have p r ≥ q r 0 p r/t for all r ≥ 1. Iterating this yields, with p min = min s<t p s , To prove the upper bound, we proceed in two steps. In a first step, we note that the eigenvalue equation (5.5) together with the fact that r≥1 p r = 1 yields the weaker upper bound In a second step, we use induction on r and assume that p s ≤ cs 2 q s * for s < r for some constant c depending on t. Then the eigenvalue equation (5.5) yields As for sufficiently large r. Lemma 5.9. The generating function W K (q) has a unique singularity q K with |q K | ≤ 0.6 for K ≥ c 1 , where c 1 is a suitable positive constant depending on t. It is a simple pole and a zero of det(I − M K (q)). Furthermore, for suitable positive constants c 2 , c 3 depending on t.
Proof. In the following, c 4 , c 5 , . . . denote suitable positive constants depending on t.
As H(q, 1, 1, 1) has a unique pole q with |q| ≤ 0.6 by Lemma 2.4 and the numerator and denominator of W K (q) tend to the numerator and denominator of H(q, 1, 1, 1), respectively, by Lemma 5.6, W K (q) also has a unique pole with |q| ≤ 0.6 for sufficiently large K.
We set x K = (p 1 , . . . , p N (K) ) T . If we find a q > 0 such that M K (q)x K ≥ x K , then Lemma 5.3 implies that λ max (M K (q)) ≥ 1 and q K < q.
We therefore consider the rth row of M K (q)x K for some 1 ≤ r ≤ N (K). We have by the eigenvalue equation (5.5). By Proposition 5.8, we have This means that for q = q 0 + c 7 K 2 q The proof of the lower bound follows along the same lines. Proof of Theorem 5.1. We choose K large enough so that W K (q) has a unique singularity q K with |q K | ≤ 0.6 and such that q K /0.6 < 0.99. By singularity analysis and Lemma 5.6, we have for K ≥ c 8 . We now estimate We use the abbreviation S := 1/q t−1 0 > 1. First, we consider the summands of (5.6) with S K ≤ n/ log 2 n. By Lemma 5.9, we have 10 log n n n ≥ c 10 log n.
We conclude that these summands of (5.6) contribute log S n + O(log log n). Similar estimates imply that Now, we consider the summands of (5.6) with n/ log 2 n < S K < n log 3 n. These are O(log log n) summands with each trivially contributing at most 1, so the total contribution is O(log log n).
Next, we consider the summands of (5.6) with n log 3 n ≤ S K ≤ n 4t log S . We now have and therefore The total contribution of these summands is therefore O(1). Similar estimates imply that Next, we consider the summands of (5.6) with n 4t log S < S K ≤ S tn . This time, we have and therefore The total contribution of these summands is therefore O(1). Finally, we note that all summands with K > tn vanish: any tree with n internal nodes has at most width tn.
Collecting all terms, we obtain The combination of (5.7) and (5.8) immediately yields the concentration property (5.1).

The number of leaves on the last level.
Analyzing the parameter m(T ) counting the number of leaves of maximum depth (labeled by the variable u in the generating function H(q, u, v, w)) is the topic of this section. Here, T is a canonical forest in F r for some number of roots r. We note that for fixed |u| ≤ 1, the dominant simple pole q 0 of H(q, 1, 1, 1) is also the dominant singularity of H(q, u, 1, 1) and is still a simple pole. Therefore, m(T ) tends to a discrete limiting distribution; we refer the reader to section IX.2 of Flajolet and Sedgewick [15]. Note that the number m(T ) is divisible by t unless T has height 0. The result presented in this section is a very useful tool in proving the central limit theorem for the path length in the following section.
Theorem 6.1. Let q 0 , Q, and U be as described in Lemma 2.4 and q * be as defined in Proposition 5.8. For m ≥ 1 such that mt ∈ Z, we set p m = [u mt ]b(q 0 , u, 1, 1) as in Lemma 5.4. Then, for a randomly chosen forest T ∈ F r of size n, we have  for t ≥ 4. For t ∈ {2, 3}, the values of μ m and σ 2 m are given in Table 5. Note that by Lemma 2.3, p m = 0 for noninteger m. Again, we visualize the distribution of the leaves on the last level for a given parameter set; see Figure 5. This is compared with the mean of Theorem 6.1.
We use Cauchy's formula, the residue theorem, and the fact that a(q, u) does not contribute to the residue at q = q 0 to obtain where ν(r) has been defined in (2.11). By Lemma 2.5, the probability generating function P n (u) of m(T ) is given by uniformly for |u| ≤ 1/U and uniformly in the number of roots r (it suffices to bound the numerator and the denominator of H(q, u) separately in order to get a uniform bound in r). We remark that this proves the nonnegativity of the constants p m , which we required in the proof of Proposition 5.8. Expectation and variance follow upon differentiating b(q 0 , u) with respect to u and inserting the asymptotic expression for q 0 . Here, we use the bounds derived in Lemma 2.7.
In order to compute P(m(T ) = mt), we consider Bounding H(q, u) uniformly in r and using Lemma 2.5 and Proposition 5.8 yields (6.1), taking into account that for t ≥ 30 and that U t /q * > 1 remains true for all t ≥ 2.

The path length.
This section is devoted to the analysis of the path length. While the external path length is most natural in the setting of Huffman codes, it is more convenient to work with the total and the internal path lengths, respectively. As it was pointed out in the introduction, all three are essentially equivalent since they are (deterministically) related by simple linear equations.
Theorem 7.1. For a randomly chosen tree T ∈ T of size n the total path length (as well as the internal and the external path lengths) is asymptotically (for n → ∞) normally distributed. Its mean is asymptotically μ tpl n 2 + O(n), and its variance is asymptotically and for t ≥ 30. We determined numerical values of these constants as in the previous sections. They are given in Table 6. Figure 6 shows the result of Theorem 7.1 for particular values. It compares the obtained normality with the distribution of the total path length found by a simulation in SageMath. We first use a generating functions approach to determine the asymptotic behavior of the mean and variance. Let us define for the rth moment of the total path length. Note that L 0 (q, u, w) = H(q, u, 1, w) = a 0 (q, u, w) + b(q, u, w) a 0 (q, 1, w) 1 − b(q, 1, w) in the notation of Theorem 2.1 but writing a 0 instead of a and leaving out the parameter v.
We are specifically interested in L 1 and L 2 . In analogy to the approach we used to determine a formula for H (q, u, v, w) in the proof of Theorem 2.1, we obtain a functional equation for L r (q, u, w) by first introducing Define, for the sake of convenience, the linear operators Φ u = u ∂ ∂u , Φ w = w ∂ ∂w , and Φ q = q ∂ ∂q acting on our generating functions. We get the following result for the generating function of the first moment. with only main terms of mean and variance taken into account). In order to take into account that the total path length is always even, we rescale the limit distribution. with Proof. Replacing j leaves of depth h by internal vertices, thus creating jt new c 2015 SIAM. Published by SIAM under the terms of the Creative Commons 4.0 license leaves of depth h + 1, increases the total path length by jt(h + 1). Thus we get and L 1,0 (q, u) = 0. Then, by multiplying by w h+1 and summing over all h, we obtain Lemma 2.2 yields the desired formula for L 1 (q, u, w). Next, we derive a formula for the generating function of the second moment. Lemma 7.3. We have Proof. As in Lemma 7.2, we derive a functional equation for L 2 (q, u, w). Starting with a tree T of height h and creating jt new leaves of depth h + 1 changes the square of the total path length from (T ) 2 to ( (T ) + jt(h + 1)) 2 . This translates to Note that we have L 2,0 (q, u) = 0. Encoding the height by w h leads to the functional equation for the generating function Again, Lemma 2.2 finishes this proof. In order to determine the asymptotic behavior of mean and variance, one only needs to find the expansion around the dominating singularity q 0 and apply singularity analysis. The main term of the mean is easy to guess: assuming that the vertices are essentially uniformly distributed along the entire height, it is natural to conjecture that (T ) is typically around tn(T )h(T )/2 and thus of quadratic order. This is indeed true, and the variance turns out to be of cubic order (terms of degree 4 cancel, as one would expect). The following lemma substantiates these claims for the mean.
Proposition 7.4. The mean of the total path length is μ tpl n 2 + O(n) with Proof. By substituting L 0 into the functional equation of Lemma 7.2, we get an explicit expression for L 1 (q, 1, w), namely The dominant term in this sum is the first one, with a triple pole at the dominant singularity q 0 . The second and third terms, however, are also relevant in the calculation of the variance, where one further term in the asymptotic expansion is needed in view of the inevitable cancellation in the main term. Singularity analysis immediately yields the asymptotic behavior of the mean: since the pole is of cubic order, the order of the mean is quadratic, i.e., it is asymptotically equal to μ tpl n 2 , where the constant μ tpl is given by Plugging in the definition of b as a sum, it is possible to simplify this further. One has by logarithmic differentiation and thus since h + t h j = h + j by definition. Plugging this into (7.1), we find Substituting = j + k and interchanging the order of summation, we arrive at Noting now that which can be seen by another logarithmic differentiation, we can replace the sum in the expression for μ tpl above by t · (Φ q b)(q 0 , 1, 1), which finally yields and the second fraction is precisely μ h ; cf. (3.1). Our next goal is to obtain the asymptotics of the variance, which will again follow by applying the tools from singularity analysis together with the result for the mean shown above.
Let us use the abbreviation where M is a function in the variable j and Φ an operator, to simplify the expressions in the following lemma.
Proof. In order to calculate the variance, one needs, besides the result of Proposition 7.4, the asymptotic behavior of L 2 (q, 1, 1) at the dominant singularity. Only the terms of pole orders 4 and 5 (i.e., highest and second-highest) are needed. More details on the computation can be found in the appendix. By Lemma 7.3 we obtain b(q, 1, 1 Applying singularity analysis to the highest and second-highest order terms of both L 1 and L 2 yields the variance. The terms of order n 4 cancel (as expected), and one finds that the main term of the variance is asymptotically σ 2 tpl n 3 . In order to obtain expressions (either the asymptotics in t or the values for particular given t) of μ tpl and σ 2 tpl we insert the dominant singularity q 0 (see Lemma 2.4) into the formulae obtained in Proposition 7.4 and Lemma 7.5. We remind the reader again that it is important to establish that σ 2 tpl = 0, so numerical values and estimates for large t are needed again. A couple of technical difficulties arise due to the infinite sums. These are discussed in the following remark.
Remark 7.6. We use SageMath [24] for our calculations. In order to get the asymptotic expression and values for σ 2 tpl in Theorem 7.1 (note that we have μ tpl already due to Proposition 7.4 and the results of section 3), we have to evaluate infinite sums and insert the dominant singularity q 0 .
We will explain step by step how this is done: (a) We start with the expression for σ 2 tpl found in Lemma 7.5. (b) First, let us consider the infinite sums Σ(q 0 , M, Φ). For a suitable J Σ depending on t, we calculate the first J Σ summands directly and use a bound for the tails. More precisely, we use . These choices allow us to find a closed form for ∞ j=JΣ M j Q j−JΣ . Proceeding as described above gives an expression consisting of finitely many summands containing functions b, which will be handled in the following step. (c) Let us deal with the function b(q, u, w) and its derivatives, which all are infinite sums. As above, we calculate the first J b summands directly for a suitable J b chosen depending on t. Then we add the bound provided by Lemmata 2.6 and 2.7 to take care of the tails. At this point, we end up with a symbolic expression not containing any (visible or hidden) infinite sums; only the variables t, q 0 , U and the interval I occur. Thus, we are almost ready to insert the asymptotic expressions or values for these parameters. (d) Now, we are ready to insert the dominant singularity q 0 . On the one hand, this can be the asymptotic expansion of q 0 as t → ∞ (in our case valid for t ≥ 30); cf. Lemma 2.4. We choose J Σ = J b = 3. The result will then again be an asymptotic expression for σ 2 tpl . On the other hand, we can use a particular value for q 0 for given t (which for us means, more precisely, an interval containing q 0 ). In these cases, we . The resulting σ 2 tpl is then computed using interval arithmetic. In order to prove asymptotic normality of the total path length, a different, more probabilistic approach is needed. Standard theorems from analytic combinatorics no longer apply since the path length grows faster than, for example, the height, so that mean and variance no longer have linear order.
We number the internal vertices of a random canonical t-ary tree of size n from 1 to n in a natural top-to-bottom, left-to-right way, starting at the root. Let X k,n denote the depth of the kth internal vertex v k in a random tree T ∈ T of order n. Moreover, set Y k,n = X k+1,n − X k,n ∈ {0, 1}. In words, Y k,n is 1 if the (k + 1)st internal vertex has greater distance from the root than the kth, and 0 otherwise. It is clear that the height can be expressed as which would indeed be an alternative approach to the central limit theorem for the height. More importantly, though, the internal path length can also be expressed in terms of the random variables Y k,n , namely by can be seen as a sum of n−1 bounded random variables Z j,n = n−j n Y j,n . An advantage of this decomposition over other possible decompositions (e.g., by counting the number of vertices at different depths) is that the number of variables is not random. Another important point is that the Z j,n are bounded after rescaling, so that they also have bounded moments.
Unfortunately, the Z j,n are neither identically distributed (which is not a major issue) nor independent, which makes standard versions of the central limit theorem for sums of random variables inapplicable. However, they are almost independent in that they satisfy a so-called strong mixing condition (inequality (7.2) of the following lemma).
Lemma 7.7. Let F s1 be the σ-algebra induced by the random variables Z 1,n , Z 2,n , . . . , Z s1,n , and let G s2 be the σ-algebra induced by the random variables Z s2,n , Z s2+1,n , . . . , Z n−1,n . There exist constants κ and λ (depending only on t) such that for all 1 ≤ s 1 < s 2 ≤ n and all events A ∈ F s1 and B ∈ G s2 . The main idea of the proof of the strong mixing condition is simple: events A ∈ F s1 describe the shape of the random tree T up to the s 1 th internal vertex v s1 , while events B ∈ G s2 describe the shape of the random tree T from the s 2 th internal vertex v s2 on. The probabilities of such events can be calculated by means Table 7 Values of r j and n j for the decomposition of a random tree. of Lemma 2.5 and Theorem 6.1, and the exponential error terms that one obtains through this approach yield the estimate (7.2) above.
Proof of Lemma 7.7. For a canonical tree T , let F λ (T ) and F ρ (T ) be the number of internal vertices on the same level as v s1 , left and right of v s1 , respectively. Similarly, let G λ (T ) and G ρ (T ) be the number of internal vertices on the same level as v s2 , left and right of v s2 , respectively. For fixed s 1 , f λ , f ρ , s 2 , g λ , and g ρ , there is a bijection between the following: • the set of canonical trees T with F λ (T ) = f λ , F ρ (T ) = f ρ , G λ (T ) = g λ , and G ρ (T ) = g ρ and such that v s1 and v s2 are on different levels, and • the set of tuples (T 1 , T 2 , T 3 ) where T j is a canonical forest with r j roots, n j internal vertices, and m j t leaves at the last level, where the values of r j and n j are given in Table 7, T j has no isolated roots, 4 and m j t ≥ r j+1 holds for j ∈ {1, 2}. An illustration can be found in Figure 7.
Here, T 1 consists of the first levels of T up to and including the level of v s1 , T 2 consists of the levels of T from and including the level of v s1 up to and including the level of v s2 , and T 3 consists of the levels of T from and including the level of v s2 . Note that the internal vertices of T are partitioned into those of T 1 , T 2 , and T 3 , as the last level of a forest does not have any internal vertices by definition.
Note that the definition of a canonical forest does allow an arbitrary number of isolated roots; by definition, those are leaves and not internal vertices. In order to use Lemma 2.5 and Theorem 6.1 for our cases, we use the simple bijection between forests with n internal vertices and r roots, all of which are nonisolated, and forests with n − r internal vertices and rt roots realized by omitting all r roots.
With Q = 1 2 + (log 2)/(2t) + 0.06/t 2 (Lemma 2.4), q * = q 1+1/(t−1) 0 (Proposition 5.8), and U = 1 − (log 2)/t 2 (Theorem 6.1), we fix 0 < δ < 1/4 such that holds for all j ≥ 1. We first compute the probability to have at least m 1 t ≥ δ(s 2 − s 1 ) vertices at the level of v s1 . To do so, we use the decomposition as described above with the following modification: we do not use the full decomposition into (T 1 , T 2 , T 3 ) but join the latter two to have a decomposition (T 1 , T 23 ) in the obvious way. By Lemma 2.5 and Theorem 6. canonical forests T 23 . Note that we used ν(tr 2 ) = Θ(1) (see Lemma 2.5) and q r2 0 ≤ 1. Therefore, using U < 1, we find the desired probability to be Analogously, the probability that there are at least δ(s 2 − s 1 ) vertices at the level of v s2 is also O (s 2 − s 1 )U δ(s2−s1) . In particular, the probability that v s1 and v s2 are on the same level is bounded by O (s 2 − s 1 )U δ(s2−s1) . From now on, we consider the event W that v s1 and v s2 are on different levels and that there at most δ(s 2 − s 1 ) vertices at each of the levels of v s1 and v s2 , respectively. The previous discussion shows that (7.4) P(W ) ≥ 1 − O((s 2 − s 1 )U δ(s2−s1) ). Now let two events A ∈ F s1 in the σ-algebra generated by Z 1,n , . . . , Z s1,n and B ∈ G s2 in the σ-algebra generated by Z s2,n , Z s2+1,n , . . . , Z n−1,n be given. The event A consists of a collection of possible shapes of the random tree T up to the s 1 th vertex v s1 , and likewise B consists of a collection of possible shapes of the random tree T from the s 2 th vertex v s2 onwards. For ease of presentation, we assume that the events A and B consist of only one such shape up to s 1 and from s 2 on, respectively; the general case follows upon summation over all shapes in A and B. The shapes A and B uniquely determine F λ (T ) =: f λ and G ρ (T ) =: g ρ , respectively. On the other hand, F ρ (T ) and G λ (T ) will be somewhat restricted by the shapes in A and B, respectively.
Using Lemma 2.5, Theorem 6.1, and the bijection into a tree and forests described above yields the following estimates for the probabilities we are interested in. There, c 2015 SIAM. Published by SIAM under the terms of the Creative Commons 4.0 license the error term O(Q n ) in the denominator will always be absorbed by the error term in the numerator because Q n ≤ Q s2−s−1 . We obtain using the inequalities Q < 1, r 2 ≤ δ(s 2 − s 1 ) (since we are in the situation that event W occurs), and n 2 + n 3 ≥ s 2 − s 1 . We also get Combining this with (7.4) yields the strong mixing property (7.2). Now we are able to apply the following result of Sunklodas. Lemma 7.8 (see Sunklodas [25]). Let d, s ∈ (2, 3], κ, λ, c 0 be fixed positive constants. Then there exists a constant K such that for all positive integers n and random variables X 1 , X 2 , . . . , X n the following holds: If 1. E(X j ) = 0 for all j, 2. max 1≤j≤n E(|X j | s ) ≤ d,
From the results above an expression for the constant σ 2 tpl that occurs in the asymptotic formula for the variance follows. Using Lemma A.5 to rewrite the nested Sfunctions gives the result that was stated in Lemma 7.5.