Why does centering in linear regression reduces multicollinearity?

Centering in linear regression is one of those things that we learn almost as a ritual whenever we are dealing with interactions. We are taught time and time again that centering is done because it decreases multicollinearity and multicollinearity is something bad in itself. In my opinion, centering plays an important role in the interpretation of OLS multiple regression results when interactions are present, but I dunno about the multicollinearity issue. The thing is that high intercorrelations among your predictors (your “Xs” so to speak) makes it difficult to find the inverse of (X'X)^{-1}, which is the essential part of getting the correlation coefficients. But that was a thing like YEARS ago! Nowadays you can find the inverse of a matrix pretty much anywhere, even online! However, we still emphasize centering as a way to deal with multicollinearity and not so much as an interpretational device (which is how I think it should be taught).

Anyhoo, the point here is that I’d like to show what happens to the correlation between a product term and its constituents when an interaction is done. Let’s take the following regression model as an example:

Y = \beta_{0} + \beta_{1}X+\beta_{2}Z+\beta_{3}XZ+\epsilon 

Because X and Z are kind of arbitrarily selected, what we are going to derive works regardless of whether you’re doing cov(XZ,X) or cov(XZ,Z). I am gonna do cov(XZ,X). In any case, we first need to derive the elements of cov(XZ,X) in terms of expectations of random variables, variances and whatnot. Remember that the key issue here is cov(X,Z) = E[(X-E(X)(Z-E(Z)]=E[XZ]-E[X]E[Z]. You’ll see how this comes into place when we do the whole thing:

cov(XZ,X)= E[(XZ-E(XZ))(X-E(X))]

=E[X^{2}Z-XE(XZ)-XZE(X)+E(X)E(XZ)]

=E[X^{2}Z]-E[X]E(XZ)-E[XZ]E(X)+E(X)E(XZ)

=E[X^{2}Z]-E[X]E(XZ)+0

=E[X^{2}Z]-(E[X](cov(X,Z)+E(X)E(Z)))

=E[X^{2}Z]-\mu_{X}\sigma_{XZ}-\mu_{X}^{2}\mu_{Z}

This last expression is very similar to what appears in page #264 of the Cohen et.al. blue regression textbook. It is not exactly the same though because they started their derivation from another place. But you can see how I could transform mine into theirs (for instance, there is a \mu^{2}_{X} from which I could get a version for \sigma^{2}_{X} but my point here is not to reproduce the formulas from the textbook. The point here is to show that, under centering, \mu_{X}=\mu_{Z}=0 which leaves cov(XZ,X) = E[X^{2}Z] = E[X\cdot XZ]. The reason as for why I am making explicit the product X\cdot XZ is to show that whatever correlation is left between the product and its constituent terms depends exclusively on the 3rd moment of the distributions. For any symmetric distribution (like the normal distribution) this moment is zero and then the whole covariance between the interaction and its main effects is zero as well.

Let’s take the case of the normal distribution, which is very easy and it’s also the one assumed throughout  Cohen et.al and many other regression textbooks. I’ll show you why, in that case, the whole cov(XZ,X) = 0 thing works. Consider (X,Z) following a bivariate normal distribution such that:

(X,Z)' \sim N \begin{bmatrix} \left(\begin{array}{cc}\mu_{x}\\ \mu_{Z}\end{array}\right), & \left(\begin{array}{cc}\sigma_{X}^{2}&\sigma_{XZ}\\ \sigma_{XZ}&\sigma_{Z}^{2}\end{array}\right) \end{bmatrix}

Then for Z_{X} and Z_{Z} both independent and standard normal we can define:

X=\sigma_{X}Z_{X}+\mu_{X}

Z=\sigma_{Z}Z_{X}\rho_{XZ}+\sigma_{Z}Z_{Z}\sqrt{1-\rho_{XZ}^{2}}+\mu_{Z}

Now take:
X^{2}=(\sigma_{X}Z_{X}+\mu_{X})^{2}=\sigma_{X}^{2}Z_{X}^{2}+2\sigma_{X}Z_{X}\mu_{X}+\mu_{X}^{2}

And multiply both terms:

X^{2}Z=(\sigma_{X}^{2}Z_{X}^{2}+2\sigma_{X}Z_{X}\mu_{X}+\mu_{X}^{2})(\sigma_{Z}Z_{X}\rho_{XZ}+\sigma_{Z}Z_{Z}\sqrt{1-\rho_{XZ}^{2}}+\mu_{Z})

Now, that looks boring to expand but the good thing is that I’m working with centered variables in this specific case, so \mu_{X}=\mu_{Z}=0 and:

X^{2}Z=(\sigma_{X}^{2}Z_{X}^{2})(\sigma_{Z}Z_{X}\rho_{XZ}+\sigma_{Z}Z_{Z}\sqrt{1-\rho_{XZ}^{2}})

X^{2}Z=\sigma_{X}^{2}\sigma_{Z}\rho_{XZ}Z_{X}^{3}+\sigma_{X}^{2}\sigma_{Z}Z_{X}^{2}Z_{Z}\sqrt{1-\rho_{XZ}^{2}}

E[X^{2}Z]=E[\sigma_{X}^{2}\sigma_{Z}\rho_{XZ}Z_{X}^{3}]+E[\sigma_{X}^{2}\sigma_{Z}Z_{X}^{2}Z_{Z}\sqrt{1-\rho_{XZ}^{2}}]

E[X^{2}Z]=\sigma_{X}^{2}\sigma_{Z}\rho_{XZ}E[Z_{X}^{3}]+\sigma_{X}^{2}\sigma_{Z}\sqrt{1-\rho_{XZ}^{2}}E[Z_{X}^2Z_{Z}]

Notice that, by construction, Z_{X} and Z_{Z} are each independent, standard normal variables so we can express the product as Z_{X}^{2}Z_{Z}=Z^{3} because Z^{3} is really just some generic standard normal variable that is being raised to the cubic power.

Now, we know that E[Z^{3}]=0 for the case of the normal distribution so:

E[X^{2}Z]=\sigma_{X}^{2}\sigma_{Z}\rho_{XZ}E[Z_{X}^{3}]+\sigma_{X}^{2}\sigma_{Z}\sqrt{1-\rho_{XZ}^{2}}E[Z^{3}]

E[X^{2}Z]=\sigma_{X}^{2}\sigma_{Z}\rho_{XZ}(0)+\sigma_{X}^{2}\sigma_{Z}\sqrt{1-\rho_{XZ}^{2}}(0)

E[X^{2}Z]=0 + 0 = 0

So now you know what centering does to the correlation between variables and why under normality (or really under any symmetric distribution) you would expect the correlation to be 0.