The volume of a 3 X 3 correlation matrix

I think it’s important, as one shapes her or his academic career, to keep track of the articles or blogs or even conversations with someone that somehow had an impact on you. Later down the line, when you grow up, you can always look back at them and say “Oh boy, remember when I used to think like that? Remember when I just discovered this?”. Much like photo albums, the ideas of a scholar remain to bear witness of how his or her thoughts about the world changed and evolved (or sometimes, de-volved).

In my case, here is a very, very simple article from which I took some ideas to run my first ever “relevant” simulation. I call it “relevant” because this simulation study was neither a homework assignment nor something I did just for myself, but something that started me off on the path of graduate-level research. Its results helped me draft the first paper I ever presented at a conference. There is nothing really ground-breaking here (I did this on my 1st year as a Master’s student so cut me some slack!), but it gave me the confidence to know that I could read an article, obtain ideas from it to transform them into R code and, eventually, answer some substantial question of interest. In other words, it helped me prove to myself that I was ‘graduate school’ material, and not someone who just got lucky and got in.

The article I’m talking about is:

Rousseeuw, P. J., & Molenberghs, G. (1994). The shape of correlation matrices. The American Statistician48(4), 276-279.

And I widely recommend it. It is only 5 (yes, only 5!) pages long and I think its written in such a clear way that anyone will be able to get it. While I was reading it, a claim was made that caught my attention (p.3):

The volume of \mathbf{R} can be computed by elementary calculus, yielding V=\frac{\pi ^{2}}{2} \approx 4.93\%. This means that if we generate three numbers r_{xy}, r_{xz}, and r_{yz} independently of each other and uniform in [-1, 1], the probability that the resulting \mathbf{C} is a true correlation matrix equals only \frac{V}{8}=\frac{\pi ^{2}}{16} \approx 61.7\%.

And a visual of the object they’re referring to is here (although you’d probably need to read the article to get the full details):

correlation_shape1

I remember saying something like “Oh wow, that’s interesting. I think when I’m done I will go back and figure out how they got that volume thingy”. The days turned to months, the months to years and just recently, when I was clearing up some stuff, I found my old research notebook with a note on the margin saying “CALCULATE VOLUME NOW!!!”. It made me smile so much! and it also made me realize I never got around to doing it. So now that life is a little bit more slow-paced, I will share with you my insight into how Rousseeuw & Molenberghs (1994) found that volume. Enjoooy!

The shape is bounded by x^{2} + y^{2} + z^{2} - 2xyz = 1 If we fix z = z_{0} then it becomes x^{2} + y^{2} - 2xyz_{0} = 1- z^{2}_{0} which is an ellipse.

We can find this volume by integrating from z=-1 to z=1 the area of x^{2} + y^{2} - 2xyz_{0} = 1- z^{2}_{0}. Which means we first need to find the area of the ellipse.

Area of the ellipse is given by \frac{x^2}{a^2} + \frac{y^2}{b^2} = 1 and equals \pi ab. We’re gonna work with the equation of the ellipse in implicit form. Please check Wikipedia’s article on the area of the ellipse if you’re not familiar with this. Anyway, the implicit form is Ax^{2} + Bxy + Cy^{2} = 1 with an area of \frac{2 \pi}{\sqrt{4AC-B^{2}}} . For our purposes:

A = \frac{1}{1-z^{2}}\ B = \frac{-2z}{1-z^{2}}\ C=\frac{1}{1-z^{2}}

So that our area ends up being:

\frac{2\pi}{\sqrt{\frac{4}{(1-z^{2})^{2}}-\frac{4z^{2}}{(1-z^{2})^{2}}}}=\frac{2 \pi (1-z^{2})}{\sqrt{4-4z^{2}}}= \pi \sqrt{1-z^{2}}

We can finally set up the integral to calculate the volume by doing:

\int_{-1}^{1} \pi \sqrt{1-z^{2}}dz

Using the change-of-variable method, define z=sin \theta. So it follows that:

\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \pi \sqrt{1-sin^{2}\theta}\ dsin\theta=

\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \pi (cos\theta) (cos\theta)\ d\theta = \pi \int_{-\frac{\pi}{2}}^{\frac{\pi}{2}}cos^{2}\theta\ d\theta=

\pi \int_{-\frac{\pi}{2}}^{\frac{\pi}{2}}\frac{1+cos2\theta}{2}\ d\theta = \frac{\pi}{2}\theta \Big|_\frac{-\pi}{2}^\frac{\pi}{2} + \frac{1}{4}sin2\theta \Big|_\frac{-\pi}{2}^\frac{\pi}{2}=

\frac{\pi^{2}}{2}

So that after more than 3 years, I finally come to find the volume of the shape described by the set of all possible 3X3 correlation matrices 🙂