Random Matrices with Log-Range Correlations, and Log-Sobolev Inequalities

Let $X_N$ be a symmetric $N\times N$ random matrix whose $\sqrt{N}$-scaled centered entries are uniformly square integrable. We prove that if the entries of $X_N$ can be partitioned into independent subsets each of size $o(\log N)$, then the empirical eigenvalue distribution of $X_N$ converges weakly to its mean in probability. This significantly extends the best previously known results on convergence of eigenvalues for matrices with correlated entries (where the partition subsets are blocks and of size $O(1)$.) we prove this result be developing a new log-Sobolev inequality, generalizing the first author's introduction of mollified log-Sobolev inequalities: we show that if $\mathbf{Y}$ is a bounded random vector and $\mathbf{Z}$ is a standard normal random vector independent from $\mathbf{Y}$, then the law of $\mathbf{Y}+t\mathbf{Z}$ satisfies a log-Sobolev inequality for all $t>0$, and we give bounds on the optimal log-Sobolev constant.


Introduction
Random matrix theory is primarily interested in the convergence of statistics associated to the eigenvalues (or singular values) of × matrices whose entries are random variables with a prescribed joint distribution. The field was initiated by Wigner in [42,43], in which he studied the mean bulk behavior of the eigenvalues of what is now called a Gaussian Orthogonal Ensemble GOE . This is the Gaussian case of a more general class of random matrices now called Wigner ensembles: symmetric random matrixes such that the entries of √ are i.i.d. random variables (modulo the symmetry constraint) with sufficiently many finite moments. There are also corresponding complex Hermitian ensembles, non-symmetric / non-Hermitian ensembles, as well as a parallel world of matrices generalizing the GOE , defined not via the distribution of entries but rather by invariance properties of the joint distribution. In this paper, we take real Wigner ensembles as the starting point.
Given a symmetric matrix , enumerate its eigenvalues 1 ≤ · · · ≤ in nondecreasing order. The empirical spectral distribution (ESD) of is the random point (1.1) Integrating against the indicator function 1 yields the random variable counting the proportion of eigenvalues in (building up the histogram of the eigenvalues of ). In general, the random variables ∫ d for test functions : R → R are called linear statistics of the eigenvalues. Wigner's original papers [42,43] showed that, for the GOE , the ESD converges weakly in expectation to what is now called Wigner's semicircle law: (d ) = 1 2 √︁ (4 − 2 ) + d . To be precise: this means that E( ∫ d ) → ∫ d for each ∈ (R). This convergence was later upgraded to weak a.s. convergence. Many more results are known about the fluctuations of , the spacing between eigenvalues, and the distribution and fluctuations of the largest eigenvalue. The reader may consult the book [1] and its extensive bibliography for more on these endeavors.
There is also a vast literature on band matrices: real symmetric matrices with independent entries above the main diagonal, but with more complicated patterns of non-identically distributed entries (see Theorem 1.4 for a precise definition). The reader should consult the expansive paper [2] which uses combinatorial and probabilistic methods to establish that a large class of band matrices have ESD converging a.s. to the semicircle law, with Gaussian fluctuations of a similar form to Wigner matrices. (Our Theorem 1.4 below improves on the main result in [2].) Apart from the work (of distinctly different flavor) on unitary or permutation-invariant matrix ensembles, there are comparatively few papers dealing with random matrices with correlated entries. In [38], Shlyakhtenko realized that the tools of operator-valued free probability could be used to compute the limit in expectation of certain kinds of block matrices: ensembles possessed of × blocks that have a fixed covariance structure (uniform among the blocks), where the 2 blocks are independent up to symmetry. The recent papers [3,10,11] showed how to explicitly compute the limit ESD for a wide class of such block matrices with Gaussian entries, and used these results to give applications to quantum information theory. Additionally, in [37], a class of these block matrices was studied and proved to converge almost surely, with applications given to signal processing. Note that in these block matrices, the limiting ESD is typically not semicircular. The combinatorial methods used to analyze such ensembles do not easily extend beyond the case that is fixed as → ∞.
Remark. The actual ensembles studied in [3,10,11,37,38] are presented in a different form, with an overall × block structure with × blocks all whose entries are independent; this is just an orthonormal basis change from the description above, and so has the same ESD. Note also that in much of this work, particularly Shlyakhtenko's results on Gaussian band matrices [38], complex matrices were studied. In the present work, we have focused on real ensembles, but the same theorems and proofs we give here could similarly be adapted to the complex case.
Our main results, Theorems 1.1 and 1.2, give a significant generalization of ESD convergence for block-type matrices, both in terms of allowing to grow with , and softening the rigid structure of the partition into independent blocks. We use similar techniques to those used in the proof of Theorem 1.1 to prove the following stronger result in the case of Gaussian entries: under the appropriate uniform integrability conditions, the convergence of the ESD is almost sure, and guaranteed for blocks of much larger size.
Condition (1) in Theorems 1.1 and 1.2 is analogous to the requirement that the second moments of the entries of √ are normalized in Wigner ensembles. Condition (2) generalizes the independent block structure mentioned above; for example, in the ensembles treated in [3,10,11] but with allowed to grow with (with = ( /(log ) 1/2 )), one gets convergence of the ESD weakly almost surely. In particular, Theorem 1.2 extends the results of those papers even in the case = (1), since only convergence in expectation was known before. Remark 1.3. Note that the conclusion of Theorems 1.1 and 1.2 is that the ESDs of these ensembles concentrate around their means; it is not true that all these ensembles converge in expectation. Rather, our results are that any of these ensembles that do converge in expectation also converge in probability, or almost surely, as the case may be. In Section 2.3, we discuss some examples where these results can be applied.
While we are most interested in ensembles with correlated entries, one of the main achievements of our method is an improvement on the (first half of the) main result in [2], which deals with independent entries. The ensembles addressed in Theorem 1.4 are the typical formulation of band matrices, although that name only really applies when the function has the form ( , ) = 1 | − | ≤ for some ∈ (0, 1). (In order to satisfy the stochasticity condition to get the semicircle law in the limit, one must use periodic band matrices, where is the indicator of the strip | − | ≤ on all of R 2 , projected into [0, 1] 2 via the equivalence relation identifying two points if they differ by an element of Z 2 . See [16,18].) The central theorem in [2] is a proof of (the semicircular case of) Theorem 1.4, assuming that the common law of the entries satisfies a Poincaré inequality (cf. (3.1) below). Our Theorem 1.4 yields the convergence in complete generality, only assuming finite second moments; moreover, a technical condition on the laws of the entries (similar to the assumption of a Poincaré inequality) yields almost sure convergence. Remark 1.5. It should be noted that this is only half of the main result in [2], where the authors also show that the fluctuations of these ensembles are Gaussian with an explicit covariance determined by the function . Their methods are largely combinatorial, while ours are analytic/probabilistic. Theorems 1.1-1.4 are proved below in Section 2. (In fact, in Section 2.3, we prove the more general Theorem 2.10 of which Theorem 1.2 is a special case.) We prove these results using concentration of measure mediated by a powerful coercive inequality: the log-Sobolev inequality. A probability measure on R satisfies a log-Sobolev inequality with constant if for all sufficiently integrable positive functions with log d for a -probability density . The inequality (1.4) first appeared in [39] (in a slightly different form, written in terms of = 2 , where the Dirichlet form on the right-hand-side becomes the relative Fisher information of ), in the context of Gaussian measures. It was later rediscovered by Gross [26] who named it a log-Sobolev inequality, and used it to prove an important result in constructive quantum field theory. Over the past four decades, it has played an important role in probability theory, functional analysis, and differential geometry; see, for example, [5,6,13,15,20,21,22,23,29,33,34,35,40,44,45,46].
There is a big industry of literature devoted to necessary and sufficient conditions for a log-Sobolev inequality to hold; cf. [8,9,14,27,31]. Adding to these efforts (with applications to random matrices in mind), the second author of the present paper developed a new approximation scheme, the mollified log-Sobolev inequality, in [47]: if is any bounded random variable and is a standard normal random variable independent from , then the law of + 1/2 satisfies a log-Sobolev inequality for all > 0, with a constant ( ) that is bounded in terms of an exponential of The following result is a generalization of those one-dimensional ideas to higher dimension.

Theorem 1.6. Let Y be a bounded random vector in R , and let Z be a standard centered normal random vector in
(1.5) , but it will be useful in the proof of the second statement of Theorem 1.1 (regarding bounded random matrix ensembles) to use it in the un-centered form.
We briefly expound the history of Theorem 1.6. Following the second author's paper [47], in [41] the authors generalized mollified log-Sobolev inequalites to R (and with a class of measures more general than compactly-supported), using a version of the Lyapunov approach as we do. However, they gave no quantitative bounds on the log-Sobolev constant, which is crucial to our present analysis.
Further complicating this history: an early version of the present paper, posted on the arXiv, proved Theorem 1.6 as its central result. In response, Bardet, Gozlan, Malrieu, and Zitt [12], building on our techniques, sharpened the inequality (1.5) to a stronger form with linear, rather than exponential, dependence on dimension: for some universal constants 1 , 2 . Our proof of (1.5) relies on an estimate for the best constant in the Poincaré inequality, which we were only able to prove with a dimension-dependent bound. The main contribution to this problem in [12] was a dimension-independent bound on the Poincaré constant, [12, Theorems 1.2-1.3].
Remark 1.8. We do not know if the optimal constant ( ) in (1.5) grows with dimension.
In [12], some evidence is given to support the conjecture that the optimal constant is independent of dimension. For our present purposes, a dimension independent bound of this form would not improve our result in Theorem 1.1. It is the exponential dependence of the constant on Y − E[Y] ∞ that forces the blocks to be of size (log ); and this dependence is sharp, as was shown in the second author's paper [47,Theorem 15].
The remainder of this paper is organized as follows. In Section 2.1, we discuss how the log-Sobolev inequality can be used to yield concentration results for eigenvalues of random matrices. Following this, Section 2.2 gives the proof of Theorem 1.1. Then Section 2.3 proves Theorem 1.2, and a generalization (Theorem 2.10) which allows more general entries than Gaussians, and applies these results to several random matrix models from the literature. Section 2.4 then proves Theorem 1.4 as a corollary to Theorems 1.1 and 2.10, and discusses a generalization of band matrices where these results still apply. Finally, Section 3 is devoted to the proof of Theorem 1.6.

Concentration Results for Ensembles with Correlated Entries
The connection between random matrices and log-Sobolev inequalities that we exploit in this paper was introduced by Guionnet in [28]. Using the Herbst inequality [27], which yields Gaussian concentration of Lipschitz functionals, she offered a fundamentally new proof of Wigner's semicircle law; this proof automatically generalized to non-Gaussian ensembles whose entries satisfy a log-Sobolev inequality. The paper [47] was motivated by generalizing this method, to give a proof technique of Wigner's law that would apply (by approximation) to ensembles whose entries do not satisfy a log-Sobolev inequality. That approach, working entry-by-entry, required the entries to be independent. The main goal of this paper is to weaken that assumption, and so we present here a brief discussion of Guionnet's approach, with an eye towards removing independence assumptions.

Guionnet's Approach to Wigner's Law
Let us fix notation as in the introduction: let be a symmetric random × matrix ensemble with eigenvalues 1 ≤ · · · ≤ , and let denote the empirical spectral distribution (ESD) of ; cf. (1.1). Wigner's law [42,43] states that converges weakly a.s. to the semicircle law , in the case that is a GOE . Wigner's proof proceeded by the method of moments and is fundamentally combinatorial. Analytic approaches (involving fixed point equations, complex PDEs, and orthogonal polynomials) developed over the ensuing decades. An argument based on concentration of measure was provided by Guionnet in [28, p. 70, Theorem 6.6]. The result can be stated thus. Theorem 2.1 (Guionnet). Let be a symmetric random matrix. If the joint law of entries of √ satisfies a log-Sobolev inequality with constant , then for all > 0 and all Lipschitz : In fact, in the Wigner ensemble setting, the i.i.d. condition means we really need only assume that the law of each entry satisfies a log-Sobolev inequality. This is due to the following result often called Segal's lemma; for a proof, see [26, p. 1074, Remark 3.3]. Lemma 2.2 (Segal's Lemma). Let 1 , 2 be probability measures on R 1 and R 2 , satisfying log-Sobolev inequalities with constants 1 , 2 , respectively. Then the product measure 1 ⊗ 2 on R 1 + 2 satisfies a log-Sobolev inequality with constant max{ 1 , 2 }. Theorem 2.1 explicitly gives weak convergence in probability of to its limit mean. Moreover, in the Wigner ensemble case where the constant is determined by the common law of the entries and so doesn't depend on , the rate of convergence is fast enough that a standard Borel-Cantelli argument immediately upgrades this to a.s. convergence. In [47], the second author showed that, under certain integrability conditions, the empirical law of eigenvalues minus its mean converges weakly in probability to 0, regardless of whether or not the joint laws of entries satisfy a log-Sobolev inequality. The idea is to use the mollified log-Sobolev inequality (the = 1 case of Theorem 1.6) applied to a cutoff of plus a GOE noise of variance , and then let ↓ 0. For our present purposes, where we no longer assume independence or identical distribution of the entries of , it will not suffice to assume each entry satisfies a (mollified) log-Sobolev inequality, which is why we state Guionnet's result in terms of the joint distribution in Theorem 2.1. Guionnet proved the theorem from the Herbst concentration inequality [27], which shows that Lipschitz functionals of a random variable whose law satisfies a log-Sobolev inequality have sub-Gaussian tails (with dimensionindependent bounds determined by the Lipschitz norm of the functional). Theorem 2.1 is then proved by combining this with functional calculus, together with the following lemma from matrix theory (see [30, p. 37, Theorem 1, and p. 39, Remark 2]).

The Proof of Theorem 1.1
We now proceed to prove Theorem 1.1, using Theorem 1.6. We first prove the second statement of the theorem: let be the matrix ensemble satisfying conditions (1) and (2) We will now show that, with a judicious choice of = , each of the quantities (2.3)-(2.5) converges to 0 a.s. We do this in the following three lemmas.
By assumption, there is a sequence → 0 so that ( ) = Since 1 → ∞, for all sufficiently large this is ≤ 1 2 . The result now follows from the Borel-Cantelli lemma. Lemma 2.7. Let = > 0 be a sequence tending to 0. Then for each ∈ Lip(R), Proof. We simply follow calculations like the ones in the proof of Lemma 2.5: where we applied Jensen's inequality in the second step. Since → 0, the result follows.
We can now prove the theorem under the boundedness assumption. This concludes the proof.
Remark 2.8. We could have arranged for ( ) to be of larger order but still ( 2 /log ), but this would only have resulted in the ratio / being a constant factor larger, and thus would still require = (log ) in order for it to be possible for → 0. Moreover, even if we had made use of the stronger (1.6) from [12], we could not have avoided a factor of / (for some constant ) in the estimate for the log-Sobolev constant ( ), as the reader can quickly verify. In fact, in [47], the second author showed that the optimal log Sobolev inequality (in one dimension) typically has this / form. As such, the assumption that = (log ) cannot be weakened, and the result of Theorem 1.1 cannot be improved, using the approach of this paper.
To conclude the proof, it remains only to remove the boundedness assumption on the entries of √ (at the expense of a downgrade from almost sure convergence to convergence in probability). This is where the uniform integrability comes in, via a standard cutoff argument that we briefly outline. Let , > 0. Let ∈ Lip(R). By uniform integrability, there exists some ≥ 0 such that for all , , . Let be the matrix whose entries are the appropriate cutoffs of Let denote the ESD of . The preceding proof shows that ∫ d minus its mean converge to 0 almost surely, and hence in probability. We now compare the linear statistics of and . This is similar to the preceding analysis. We make the standard /3-decomposition: The above proof in the uniformly bounded case shows that the second term in (2.6) converges to 0 as → ∞. The first term on the right hand side of (2.6) is bounded using the same reasoning as done in the proof of Lemma 2.5: Finally, the third term is bounded as in Lemma 2.7: Since > 0 was arbitrary, we have P giving convergence in probability. This concludes the proof.

Theorem 1.2, a Generalization, and Applications
We begin with a lemma which appeared in the second author's paper [48,Proposition 6], but was surely folklore far earlier. We reproduce the simple proof here, for completeness. Proof. Let denote the law of X. Let : R → R be a non-negative locally Lipschitz function. Then • is locally-Lipschitz and non-negative. Since satisfies the LSI with constant , it follows that Since is Lipschitz, we also have the pointwise estimate
The following theorem covers a wide range of examples of correlated random matrix ensembles. We use the notation M sym to denote the vector space of real × symmetric matrices, equipped with the Hilbert-Schmidt inner product. Lip . By Theorem 2.1, it therefore follows that, for any > 0 and ∈ Lip(R), By assumption, The result now follows exactly as in the proof of Lemma 2.6.
We now prove Theorem 1.2, as a Corollary to Theorem 2.10.
Proof of Theorem 1.2. To begin, we clarify what is meant by "jointly Gaussian". We say a random vector X ∈ R has jointly Gaussian entries if there is an affine map : R → R such that X = (G), where G has i.i.d. normal entries. In the special case that is invertible, this is equivalent to X − E(X) = G for some invertible matrix . Since G has a density equal to a constant times exp(− 1 2 x · ( * ) −1 x), this yields the more standard definition of a (non-degenerate) "jointly Gaussian" random vector.
Let Π = { 1 , . . . , } denote the partition of {( , ) : 1 ≤ ≤ ≤ } in the theorem, and for 1 ≤ ≤ let X denote the random vector given by the entries of with indices in . By assumption, the random variables X 1 , . . . , X are independent; it follows that there are affine maps 1 , . . . , with : R | | → R | | , such that X = ( −1/2 G ), where G is a standard Gaussian random vector in R | | . The entries of all the G are i.i.d. standard normal random variables, which each satisfy log-Sobolev inequalities with constant 1 (cf. [26]). Hence, letting : M sym → M sym be the map which takes the entries in partition block to of those entries, we see that is a Lipschitz function, with Lipschitz norm equal to max{ Lip , 1 ≤ ≤ }. We proceed to estimate the Lipschitz norm of the affine map . This is just the operator norm of its linear part˚= − 1/2 E(X ). The operator norm is bounded above by the Hilbert-Schmidt norm; thus Lip ≤ ˚ HS . Note that where we use the indices , to enumerate the entries of X . Now, note that By assumption, there is a uniform bound so that Var( 1/2 [X ] ) ≤ 2 for all and . Thus By assumption, = 2 log , and so we have shown that Lip = √︃ 2 log . The result now follows from Theorem 2.10.

Theorem 1.4, Generalizations, and Examples
In this section, we show how to prove Theorem 1.4 as a straightforward corollary to Theorems 1.1 and 2.10. To begin, we note that the topic of the paper [38] is the convergence in expectation of ensembles of this form (and slightly more general forms). In particular, using the tools of operator-valued free probability, Shlyakhtenko showed that all ensembles of this form have a limiting ESD, that can be computed (in principle) in terms of the spectral measure of the operator on ∞ [0, 1] defined by ( ) ( ) = ∫ 1 0 ( ) ( , ) 2 d (embedded into a Fock space type model). The limiting ESD can be computed exactly in many cases; in particular, if ∫ 1 0 ( , ) d = 1 for each , then the limit law is semicircular; cf. [38,Remark 3.8]. As such, we concern ourselves here only with the question of upgrading from convergence in expectation to convergence in probability / almost sure convergence, where appropriate.
Proof of Theorem 1.4. We apply Theorem 1.1 to the ensemble . The upper-triangular entries of are all independent, and so condition (2) of Theorem 1.1 (on the size of independent blocks) is automatically satisfied. Hence, to conclude convergence of the centered ESD to 0 in probability, it suffices to show that the family { [ ] 2 } ∈N,1≤ , ≤ is uniformly integrable. Note that By assumption ∈ ( [0, 1] 2 ), and so is bounded. Thus E( [ ] 2 ) ≤ ∞ E( 2 ) = ∞ for all , , (since the are presumed to be centered with variance 1). Thus, there is a uniform bound on the expectation of each element in this family of nonnegative random variables, and it follows that the family is uniformly integrable. Thus, by Theorem 1.1, the centered ESD of converges to 0 in probability. To conclude, let's consider a large family of examples to which the above results apply, this time with correlated entries.
Example 2.11. In [17], the authors analyzed the random Toeplitz matrix: an × random matrix ensemble with independent diagonals but equal entries along each diagonal. That is: They showed, under appropriate rescaling, that the empirical spectral distribution of this ensemble converges to a heretofore unknown unbounded probability distribution, now known as the Toeplitz law. Their methods, which were both combinatorial and probabilistic, showed both universality (that the resultant bulk spectral distribution is independent of the common law of the entries 0 , . . . , −1 provided there are two finite moments), and that the convergence is almost sure. To the second point: they use the method of moments to prove convergence in expectation, and then leverage the precise structure of the correlated blocks (i.e. diagonals) to estimate the rate of convergence tightly enough to upgrade to a.s. convergence.
There have been a number of generalizations of these results in recent years: namely in [36], (which studied circulant, i.e. periodic, variations on the Toeplitz ensemble), [32] (which considered a banded version of the random Toeplitz ensemble), and [25] (which studied the more general case where the entries along each diagonal are not equal, but each pair has the same correlation coefficient). In [25,32], only convergence in expectation was proved. Since the correlated blocks in these ensembles (the diagonals) are all of size ( ) = 2 log , our Theorem 1.2 implies a.s. convergence of the empirical spectral distributions, in the case that the entries are Gaussian (or satisfy the more general Lipschitz constraints of Theorem 2.10).
We can consider a more general family of matrix matrices with independent diagonals, and entries along each diagonal with a given covariance structure that provides some hope of a large-limit for the bulk eigenvalue distribution. This is the topic of a forthcoming preprint [24] by the present first author and his student coauthors. All such ensembles have correlated blocks of size ( ), and hence Theorem 1.2 or 2.10 yield automatic upgrades from convergence in expectation to a.s. convergence of the empirical spectral distribution. One good example of such models where convergence in expectation can be established by combinatorial means is the following kind of mixture of the models in [17] and [32]: let be a "band-width", < , with the same distribution as the entries in the band. If lim →∞ = ∈ (0, 1), the empirical spectral distribution of this ensemble converges in expectation to a law which is unbounded, and different for different values of . As → 1, we recover the Toeplitz law, while for → 0 we recover the semicircle law. See Figure 2.1 for simulations. Again, since the correlated blocks are all of size ( ), by Theorem 1.2 or 2.10, these " -band" ensembles all converge a.s. when the entries are Gaussian.

Mollified Log-Sobolev Inequalities on R
In this section we will prove Theorem 1.6. For convenience, we restate it below as Theorem 3.1, in measure-theoretic language. Theorem 3.1. Let be a probability measure on R whose support is contained in a ball of radius , and let be the centered Gaussian of variance with 0 < ≤ 2 , i.e., ( ) = (2 ) − /2 exp(− | | 2 2 ) d . Then the convolution * satisfies the log-Sobolev inequality with optimal constant ( ) bounded by ( ) ≤ 289 2 exp 20 + 5 2 .
In particular, let 0 , , > 0 be such that where 0 denotes the ball centered at 0 of radius 0 (the existence of such 0 , , is implied by Assumption (2) where ( ) = − ( ) is the density of . Let where > 0 is an arbitrarily chosen parameter. Then satisfies a LSI with constant + ( + 2) .
We remark that the statement of Theorem 3.2 is a combination of results in [19] and preceding papers cited therein (notably [7]). In those papers, they are proved in the more general context of Riemannian manifolds. Also, the constants given above are derived in the proofs in [19] but not presented in the concise form above. The reader is directed to [41, pp. 7-8] for the precise statement above.
With the above, we now prove Theorem 3.1.  For | | ≥ 0 , the above expression is nonpositive, and for | | ≤ 0 , the above expression is of the form − +1 , which has a maximum value of 1, as desired. Now we estimate 0 by estimating sup ∈ 0 ( ) and inf ∈ 0 ( ). For ∈ 0 , we have Using √ + √ ≤ √︁ 2( + ) and the assumptions ≤ 2 and ≥ 1 above, we get