Smooth numbers and max-entropy

A crowdsourced project to link up erd... Mathematical exploration and discover...

Smooth numbers and max-entropy

Given a threshold {y>1} , a {y} -smooth number (or {y} -friable number) is a natural number {n} whose prime factors are all at most {y} . We use ${\Psi(x,y)}$ to denote the number of {y} -smooth numbers up to {x} . In studying the asymptotic behavior of ${\Psi(x,y)}$ , it is customary to write {y} as ${x^{1/u}}$ (or {x} as ${y^u}$ ) for some {u>0} . For small values of {u} , the behavior is straightforward: for instance if {0 < u < 1} , then all numbers up to {x} are automatically {y} -smooth, so

$\displaystyle \Psi(x,y) \sim x$

in this case. If {1 < u < 2}

, the only numbers up to {x}

that are not {y}

-smooth are the multiples of primes {p}

between

and

, so

$\displaystyle \Psi(x,y) \sim x - \sum_{y < p \leq x} (\frac{x}{p} + O(1))$

$\displaystyle \sim x - x (\log\log x - \log\log y)$

$\displaystyle \sim x (1 - \log u)$

where we have employed Mertens’ second theorem. For {2 < u < 3}

, there is an additional correction coming from multiples of two primes between {y}

and

; a straightforward inclusion-exclusion argument (which we omit here) eventually gives

$\displaystyle \Psi(x,y) \sim x (1 - \log u + \int_2^u \frac{\log(t-1)}{t} dt)$

in this case.

More generally, for any fixed {u>0} , de Bruijn showed that

$\displaystyle \Psi(x, y) \sim \rho(u) x$

where ${\rho(u)}$ is the Dickman function. This function is a piecewise smooth, decreasing function of {u}

, defined by the delay differential equation

$\displaystyle u \rho'(u) + \rho(u-1) = 0$

with initial condition ${\rho(u) = 1}$ for ${0 \leq u \leq 1}$ .

The asymptotic behavior of ${\rho(u)}$ as ${u \rightarrow \infty}$ is rather complicated. Very roughly speaking, it has inverse factorial behavior; there is a general upper bound ${\rho(u) \leq 1/\Gamma(u+1)}$ , and a crude asymptotic

$\displaystyle \rho(u) = u^{-u+o(u)} = \exp( - u \log u + o(u \log u)).$

With a more careful analysis one can refine this to and with a very careful application of the Laplace inversion formula one can in fact show that where ${\gamma}$ is the Euler-Mascheroni constant and ${\xi(u)}$ is defined implicitly by the equation One cannot write ${\xi(u)}$ in closed form using elementary functions, but one can express it in terms of the Lambert {W}

function as ${\xi(u) = -W(-\frac{1}{u} e^{-1/u}) - 1/u}$ . This is not a particularly enlightening expression, though. A more productive approach is to work with approximations. It is not hard to get the initial approximation ${\xi(u) \approx \log u}$ for large {u}

, which can then be re-inserted back into (3) to obtain the more accurate approximation

$\displaystyle \xi(u) = \log u + \log\log u + O(1)$

and inserted once again to obtain the refinement

$\displaystyle \xi(u) = \log u + \log\log u + O(\frac{\log\log u}{\log u}).$

We can now see that (2) is consistent with previous asymptotics such as (1), after comparing the integral ${\int_0^{\xi(u)} \frac{e^s - 1}{s} ds}$ to

$\displaystyle \int_0^{\xi(u)} \frac{e^s - 1}{\xi(u)} ds = u - 1.$

For more details of these results, one can see for instance this survey by Granville.

This asymptotic (2) is quite complicated, and so one does not expect there to be any simple argument that could recover it without extensive computation. However, it turns out that one can use a “maximum entropy” to get a reasonably good heuristic approximation to (2), that at least reveals the role of the mysterious function ${\xi(u)}$ . The purpose of this blog post is to give this heuristic.

Viewing ${x = y^u}$ , the task is to try to count the number of {y} -smooth numbers of magnitude ${y^u}$ . We will propose a probabilistic model to generate {y} -smooth numbers as follows: for each prime ${p \leq y}$ , select the prime {p} with an independent probability ${c_p/p}$ for some coefficient ${c_p}$ , and then multiply all the selected primes together. This will clearly generate a random {y} -smooth number {n} , and by the law of large numbers, the (log-)magnitude of this number should be approximately (where we will be vague about what “ ${\approx}$ ” means here), so to obtain a number of magnitude about ${y^u}$ , we should impose the constraint

The indicator ${1_{p|n}}$ of the event that {p} divides this number is a Bernoulli random variable with mean ${c_p/p}$ , so the Shannon entropy of this random variable is

$\displaystyle \mathbf{H}(1_{p|n}) = - \frac{c_p}{p} \log(\frac{c_p}{p}) - (1 - \frac{c_p}{p}) \log(1 - \frac{c_p}{p}).$

If ${c_p}$ is not too large, then Taylor expansion gives the approximation

$\displaystyle \mathbf{H}(1_{p|n}) \approx \frac{c_p}{p} \log p - \frac{c_p}{p} \log c_p + \frac{c_p}{p}.$

Because of independence, the total entropy of this random variable {n}

$\displaystyle \mathbf{H}(n) = \sum_{p \leq y} \mathbf{H}(1_{p|n});$

inserting the previous approximation as well as (5), we obtain the heuristic approximation

$\displaystyle \mathbf{H}(n) \approx u \log y - \sum_{p \leq y} \frac{c_p}{p} (\log c_p - 1).$

The asymptotic equipartition property of entropy, relating entropy to microstates, then suggests that the set of numbers {n}

that are typically generated by this random process should be approximately

$\displaystyle \exp(\mathbf{H}(n)) \approx \exp\left(u \log y - \sum_{p \leq y} \frac{c_p}{p} (\log c_p - 1)\right)$

$\displaystyle = \exp\left(- \sum_{p \leq y} \frac{c_p \log c_p - c_p}{p}\right) x.$

Using the principle of maximum entropy, one is now led to the approximation where the weights ${c_p}$ are chosen to maximize the right-hand side subject to the constraint (5).

One could solve this constrained optimization problem directly using Lagrange multipliers, but we simplify things a bit by passing to a continuous limit. We take a continuous ansatz ${c_p = f(\log p / \log y)}$ , where ${f: [0,1] \rightarrow {\bf R}}$ is a smooth function. Using Mertens’ theorem, the constraint (5) then heuristically becomes and the expression (6) simplifies to So the entropy maximization problem has now been reduced to the problem of minimizing the functional ${\int_0^1 \frac{f(t) \log f(t) - f(t)}{t}\ dt}$ subject to the constraint (7). The astute reader may notice that the integral in (8) might diverge at {t=0} , but we shall ignore this technicality for the sake of the heuristic arguments.

This is a standard calculus of variations problem. The Euler-Lagrange equation for this problem can be easily worked out to be

$\displaystyle \frac{\log f(t)}{t} = \lambda$

for some Lagrange multiplier ${\lambda}$ ; in other words, the optimal {f}

should have an exponential form ${f(t)= e^{\lambda t}}$ . The constraint (7) then becomes

$\displaystyle \frac{e^\lambda - 1}{\lambda} = u$

and so the Lagrange multiplier ${\lambda}$ is precisely the mysterious quantity ${\xi(u)}$ appearing in (2)! The formula (8) can now be evaluated as

$\displaystyle \rho(u) \approx \exp\left( - \int_0^1 \frac{e^{\xi(u) t} \xi(u) t - e^{\xi(u) t}}{t}\ dt \right)$

$\displaystyle \approx \exp\left( - e^{\xi(u)} + 1 + \int_0^1 \frac{e^{\xi(u) t} - 1}{t}\ dt + \int_0^1 \frac{1}{t}\ dt \right)$

$\displaystyle \approx \exp\left( - u \xi(u) + \int_0^{\xi(u)} \frac{e^s - 1}{s}\ ds + C\right)$

where

is the divergent constant

$\displaystyle C = \int_0^1 \frac{1}{t}\ dt.$

This recovers a large fraction of (2)! It is not completely accurate for multiple reasons. One is that the hypothesis of joint independence on the events {p|n}

is unrealistic when trying to confine {n}

to a single scale {x}

; this comes down ultimately to the subtle differences between the Poisson and Poisson-Dirichlet processes, as discussed in this previous blog post, and is also responsible for the otherwise mysterious ${e^\gamma}$ factor in Mertens’ third theorem; it also morally explains the presence of the same ${e^\gamma}$ factor in (2). A related issue is that the law of large numbers (4) is not exact, but admits gaussian fluctuations as per the central limit theorem; morally, this is the main cause of the ${\sqrt{\frac{\xi'(u)}{2\pi}}}$ prefactor in (2).

Nevertheless, this demonstrates that the maximum entropy method can achieve a reasonably good heuristic understanding of smooth numbers. In fact we also gain some insight into the “anatomy of integers” of such numbers: the above analysis suggests that a typical {y} -smooth number {n} will be divisible by a given prime ${p \sim y^t}$ with probability about ${e^{\xi(u) t}/p}$ . Thus, for ${t = 1 - O(1/\log u)}$ , the probability of being divisible by {p} is elevated by a factor of about ${\asymp e^{\xi(u)} \asymp u \log u}$ over the baseline probability {1/p} of an arbitrary (non-smooth) number being divisible by {p} ; so (by Mertens’ theorem) a typical {y} -smooth number is actually largely comprised of something like ${\asymp u}$ prime factors all of size about ${y^{1-O(1/\log u)}}$ , with the smaller primes contributing a lower order factor. This is in marked contrast with the anatomy of a typical (non-smooth) number {n} , which typically has {O(1)} prime factors in each hyperdyadic scale ${[\exp\exp(k), \exp\exp(k+1)]}$ in {[1,n]} , as per Mertens’ theorem.