Given two unit vectors
in a real inner product space, one can define the correlation between these vectors to be their inner product
, or in more geometric terms, the cosine of the angle
subtended by
and
. By the Cauchy-Schwarz inequality, this is a quantity between
and
, with the extreme positive correlation
occurring when
are identical, the extreme negative correlation
occurring when
are diametrically opposite, and the zero correlation
occurring when
are orthogonal. This notion is closely related to the notion of correlation between two non-constant square-integrable real-valued random variables
, which is the same as the correlation between two unit vectors
lying in the Hilbert space
of square-integrable random variables, with
being the normalisation of
defined by subtracting off the mean
and then dividing by the standard deviation of
, and similarly for
and
.
One can also define correlation for complex (Hermitian) inner product spaces by taking the real part
of the complex inner product to recover a real inner product.
While reading the (highly recommended) recent popular maths book "How not to be wrong", by my friend and co-author Jordan Ellenberg, I came across the (important) point that correlation is not necessarily transitive: if
correlates with
, and
correlates with
, then this does not imply that
correlates with
. A simple geometric example is provided by the three unit vectors

in the Euclidean plane
:
and
have a positive correlation of
, as does
and
, but
and
are not correlated with each other. Or: for a typical undergraduate course, it is generally true that good exam scores are correlated with a deep understanding of the course material, and memorising from flash cards are correlated with good exam scores, but this does not imply that memorising flash cards is correlated with deep understanding of the course material.
However, there are at least two situations in which some partial version of transitivity of correlation can be recovered. The first is in the "99%" regime in which the correlations are very close to
: if
are unit vectors such that
is very highly correlated with
, and
is very highly correlated with
, then this does imply that
is very highly correlated with
. Indeed, from the identity
(and similarly for
and
) and the triangle inequality
we see that
Thus, for instance, if
and
, then
. This is of course closely related to (though slightly weaker than) the triangle inequality for angles:
Remark 1 (Thanks to Andrew Granville for conversations leading to this observation.) The inequality (1) also holds for sub-unit vectors, i.e. vectors
with
. This comes by extending
in directions orthogonal to all three original vectors and to each other in order to make them unit vectors, enlarging the ambient Hilbert space
if necessary. More concretely, one can apply (1) to the unit vectors
in
.
But even in the "
" regime in which correlations are very weak, there is still a version of transitivity of correlation, known as the van der Corput lemma, which basically asserts that if a unit vector
is correlated with many unit vectors
, then many of the pairs
will then be correlated with each other. Indeed, from the Cauchy-Schwarz inequality
we see that
A similar argument (multiplying each
by an appropriate sign
) shows the related van der Corput inequality and this inequality is also true for complex inner product spaces. (Also, the
do not need to be unit vectors for this inequality to hold.)
Geometrically, the picture is this: if
positively correlates with all of the
, then the
are all squashed into a somewhat narrow cone centred at
. The cone is still wide enough to allow a few pairs
to be orthogonal (or even negatively correlated) with each other, but (when
is large enough) it is not wide enough to allow all of the
to be so widely separated. Remarkably, the bound here does not depend on the dimension of the ambient inner product space; while increasing the number of dimensions should in principle add more "room" to the cone, this effect is counteracted by the fact that in high dimensions, almost all pairs of vectors are close to orthogonal, and the exceptional pairs that are even weakly correlated to each other become exponentially rare. (See this previous blog post for some related discussion; in particular, Lemma 2 from that post is closely related to the van der Corput inequality presented here.)
A particularly common special case of the van der Corput inequality arises when
is a unit vector fixed by some unitary operator
, and the
are shifts
of a single unit vector
. In this case, the inner products
are all equal, and we arrive at the useful van der Corput inequality (In fact, one can even remove the absolute values from the right-hand side, by using (2) instead of (4).) Thus, to show that
has negligible correlation with
, it suffices to show that the shifts of
have negligible correlation with each other.
Here is a basic application of the van der Corput inequality:
Proposition 2 (Weyl equidistribution estimate) Let
be a polynomial with at least one non-constant coefficient irrational. Then one has
where
.
Note that this assertion implies the more general assertion
for any non-zero integer
(simply by replacing
by
), which by the Weyl equidistribution criterion is equivalent to the sequence
being asymptotically equidistributed in
.
In order to use the van der Corput inequality as stated above (i.e. in the formalism of inner product spaces) we will need a non-principal ultrafilter
(see e.g this previous blog post for basic theory of ultrafilters); we leave it as an exercise to the reader to figure out how to present the argument below without the use of ultrafilters (or similar devices, such as Banach limits). The ultrafilter
defines an inner product
on bounded complex sequences
by setting
Strictly speaking, this inner product is only positive semi-definite rather than positive definite, but one can quotient out by the null vectors to obtain a positive-definite inner product. To establish the claim, it will suffice to show that
for every non-principal ultrafilter
.
Note that the space of bounded sequences (modulo null vectors) admits a shift
, defined by
This shift becomes unitary once we quotient out by null vectors, and the constant sequence
is clearly a unit vector that is invariant with respect to the shift. So by the van der Corput inequality, we have
for any
. But we may rewrite
. Then observe that if
,
is a polynomial of degree
whose
coefficient is irrational, so by induction hypothesis we have
for
. For
we of course have
, and so
for any
. Letting
, we obtain the claim. 