The case of the missing correlations and positive-definiteness.

A couple of weeks ago, a student of mine inquired about how the pos_def_limits()function from the faux R package works. This package is being developed by the super awesome Dr. Lisa DeBruine who you should, like, totally follow, btw, in case you are not doing it already. What makes this post interesting is that I thought pos_def_limits()was doing one thing but it is doing something else. I think it helps highlight how different ways of approaching the same problems can give you insight into different aspects of it. But first, some preliminaries:

What is positive definiteness?

This one is not particularly complicated to point out. Define \Sigma as an n \times n real-valued matrix and v as an n \times 1 real-valued, non-zero vector. Then \Sigma is a positive-definite matrix if v^{t} \Sigma v >0 and it is positive-semi-definite if v^{t} \Sigma v \geq 0 for all v . I prefer this definition of positive definiteness because it generalizes easily to other types of linear operators (e.g., differentiation) as opposed to consequences of this definition, which is what we usually operate on. If you come from the social sciences (like I do) the “version” that you know about a matrix (usually, covariance matrix) being positive-definite is that all its eigenvalues have to be positive. Which is what pos_def_limits() relies on. It implements a grid-search over the plausible correlation range of [-1, +1] and, once it finds the minimum and maximum value for which the set of covariance matrices are all positive definite, it produces a result (or it lets you know if said matrix does not exist, which is super useful as well). Relying on the documentation example:

pos_def_limits(.8, .2, NA)
>  min     max
> -0.427 0.747

which means that if you have a correlation matrix that looks like:

\textbf{R}= \begin{bmatrix} 1 & 0.8 & 0.2\\ 0.8 & 1 & r\\ 0.2 & r &1 \end{bmatrix} 

then as long as -0.427 \leq r \leq 0.747 , your resulting matrix is positive definite and, hence, a valid correlation matrix. So far so good.

How I tackle this problem

This  is what I thought pos_def_limits() was doing under the hood before looking at the source code. So… a similar condition to the positive eigenvalues is that the determinant of the matrix has to be positive. So IF \Sigma is positive-semi- definite, THEN det(\Sigma) \geq 0 . Notice that this DOES NOT work the other way around: just because det(\Sigma) \geq 0 does not mean that \Sigma is positive-semi-definite). Anyway, we can rely on the fact that all correlation/covariance matrices are positive-definite by definition, which means the problem of finding the suitable upper and lower bounds is simply solving for r subject to the constraint that the determinant MUST be greater than or equal to zero. So with the help of a CAS (Computer Algebra System, my favourite one is MAPLE because I’m very Canadian, LOL) I can see that solving for det(\textbf{R}) \geq 0 results in the following:

det(\textbf{R})= -r^{2} + 0.32r + 0.32 \geq 0 

which is… A QUADRATIC EQUATION! We can graph it and see:

parabol1

So that any value on the x-axis between both roots yields a valid inequality and, therefore, a positive-definite matrix \textbf{R} . Do you remember how to solve for the roots of quadratic equations? Using our trusted formula from highschool we obtain:

x_1= -\frac{2(3\sqrt{6}-2)}{25} = -0.42788... 

x_2= \frac{2(2+3\sqrt{6})}{25} = 0.74788... 

which match the values approximated by pos_def_limits()

Extensions to this problem Pt I: Maximizing the determinant

There are 2 interesting things you can do with this determinantal equation. First and foremost, you can choose *the* value that maximizes the variance encoded in the correlation matrix \textbf{R} . The determinant has a lot of interesting properties, including a very nice geometric representation. The absolute value of the determinant is the volume of the parallelepiped described by the column vectors within the matrix.  And this generalizes to higher dimensions. Because correlations are bounded in the [-1, +1] range, the maximum determinant that ANY correlation matrix can have is 1 and the minimum is 0. So if I want my matrix \textbf{R} to have the largest possible determinant, I only need to choose the value at the vertex of the parabola:

parabol2

which has coordinates (0.16, 0.3456). So if r=0.16 then the determinant of \textbf{R} is at it maximum.

Extensions to this problem Pt II: What if more than one correlation is missing?

The “classical” version of this problem is to have a 3 x 3 matrix where 2 correlations are known and 1 is missing. But (as my student questioned further) what would happen if we had, say a 4 x 4 matrix and TWO correlations were missing? Well, no biggie. Let’s come up with a new matrix, let’s call it \textbf{S} with the following form:

\textbf{S}= \begin{bmatrix} 1 & 0.8 & 0.2 & a\\ 0.8 & 1 & r & 0.5\\ 0.2 & r &1 & 0.3\\ a &0.5&0.3 & 1 \end{bmatrix} 

Yeah, I had to make up a couple of extra correlations (0.3 and 0.5) to make sure only TWO correlations were missing. But there is a point to this that you will notice very quickly. Since we still want \textbf{S} to be a valid correlation matrix, the condition det(\textbf{S}) \geq 0 must still hold, irrespective of the dimensions or how many missing elements \textbf{S} has. So, once again, running this new matrix through the CAS system yields the condition:

 a^2r^2-a^2-0.68ar+0.92a-r^2+0.62r-0.0004 \geq 0

Which generates a system of quadratic inequalities which must be solved simultaneously to yield valid ranges. Rather than showing you the equations (booooooring! :D) let me show you something prettier. A picture of the solution space:

weird1

Yup. Any combination of points (a,r) within the red regions would yield a valid solution to the inequality above. HOWEVER, there is only ONE region where the solution is both within the valid (i.e., red) regions AND gives us solutions in the valid correlation range. What does your intuition tell you? I think we both know where to look 😉

weird2

That is correct! That little blob-looking thingy is where we want to be at. ANY combination of values within the blob is both between [-1, +1] AND satisfies the determinantal equation so any pair of values STRICTLY inside the blob will yield a valid correlation matrix \textbf{S} .

“But Oscar — you may ask– how do we choose the one pair of values that maximizes the determinant of \textbf{S} ?” Well, that’s a good question! It is not difficult to answer, but it does require a little more mathematics than the case where only one is missing. What we need is to, first, take the partial derivatives with respect to both r and a and set them to 0 (we need to find maximums):

\frac{\partial}{\partial a}det(\textbf{S})=2ar^2-2a+0.92-0.68r=0

\frac{\partial}{\partial r}det(\textbf{S})=2a^2r-2r+0.62-0.68a=0

So now we have a system of quadratic equations. There are two equations and two unknowns so we know this system is just-identified and *has* a solution. Again, when you throw them in MAPLE you end up with multiple values for a and r that maximize \textbf{S} . There was only one pair of solutions, though, which was both within the valid range [-1, +1] AND inside the special blob:

a=0.473863 ; r=-0.038687 

The last thing we need to check, however, is whether these points are minimums, maximums or saddle points as per the 2nd derivative test.

Which means I need all the 2nd partial derivatives from the equation above as:

\frac{\partial}{\partial a\partial a}det(\textbf{S})=2(r^{2}-1)

\frac{\partial}{\partial r\partial r}det(\textbf{S})=2(a^{2}-1)

\frac{\partial}{\partial a\partial r}det(\textbf{S})=4ar-0.68

Calculate the discriminant setting a=0.473863 and r=-0.038687 :

2(a^{2}-1) 2(r^{2}-1)-(4ar-0.68)^{2} = 2.529668

Which is greater than 0. So (a,r) are either minimums or maximums. The final check is to see if 2(a^{2}-1) < 0 to make sure it’s a local maximum. And since 2(0.473863^{2}-1)=-1.550908 < 0 , then it follows that the point (a,r) indeed maximizes the determinant*.

 

TWO potential future directions 

While working on this I noticed a couple of peculiarities that I think are sufficiently mathematically tractable for me to handle and turn into an actual article. UNLESS you (my dear reader) already know the answer to this. I am, after all, a lazy !@#$#% which means that if someone already worked out a formal proof for it, I’d much rather read it than having to come up with it on my own. Let us start with the easy one:

(1) (Easy): The range of plausible correlations shrinks as the dimensions of the correlation matrix increase

This one is, I think, easy to see. Let’s start with the basic 2×2 correlation matrix:

\textbf{A}= \begin{bmatrix} 1 & r\\ r & 1\\ \end{bmatrix} 

Then if r \in (-1, +1) (read \in as “element of”) then \textbf{A} is a valid correlation matrix and only becomes positive-semi-definite  if either r=1 or r=-1 . But you get the gist, all the valid correlation range applies.

Notice how the range of r shrunk for the 3×3 matrix \textbf{R} of our example to:

r \in [-\frac{2(3\sqrt{6}-2)}{25}, \frac{2(2+3\sqrt{6})}{25}] 

And this fact is independent of what correlation matrix you have. You cannot get any correlation matrix where, given that 2 of them are known, the resulting range of r is the full valid interval [-1, +1] AND it is still positive definite.

So, what happens if we go back to our 4 x 4 example with \textbf{S} ? Just for kicks and giggles, let’s make a=0.3 ? so that we are only left with 1 unknown. The new matrix looks like this:

\textbf{S}= \begin{bmatrix} 1 & 0.8 & 0.2 & 0.3\\ 0.8 & 1 & r & 0.5\\ 0.2 & r &1 & 0.3\\ 0.3 &0.5&0.3 & 1 \end{bmatrix} 

If we run this new \textbf{S} through pos_def_limits()(it can handle any number of dimensions) we get:

pos_def_limits(.8, .2,.3,.NA, 5,.3)
>  min     max
> -0.277 0.734

Yup, the range has now shrunk. But we don’t quite yet know why. Let’s try it now through the determinantal equation:

det(\textbf{R})= -0.91r^{2} + 0.416r + 0.1856 \geq 0 

And solving for the roots of this equation we get:

-\frac{0.416+\sqrt{0.84864})}{1.82} = -0.27759...

\frac{0.416+\sqrt{0.84864})}{1.82} = 0.73473...

Yup, same answer. But notice something interesting. The quadratic coefficient has now shrunk from 1 to -0.91. And, if you remember back from high school, you know that the coefficient of the quadratic term dictates how wide or narrow the parabola is so that if it is outside the [0, 1] range the parabola is wider (i.e., the roots are farther apart) and if it it’s within [0, 1] the parabola is narrower (i.e., the roots are closer together). Which prompts me to make the following claim:

Claim: As the dimensions of the correlation matrix increase arbitrary, the valid range that makes it positive definite shrinks until it collapses to a *single* point. In other words, for a large enough correlation matrix, only ONE value can make it positive definite.

This one shouldn’t be particularly difficult to prove. All I need to show is that the leading coefficient of the quadratic term shrinks as a function of the dimensions of the correlation matrix until it becomes 0. In which case you’ll have  a straight line (not a parabola). And that means you’d get only 1 root (and not 2 that encompass a range).

(1) (Hard): If I sample values for r from the valid correlation range uniformly, the distribution of the determinants concentrate around *the* value of r that maximizes the determinant. 

This one is a little bit more difficult to explain, but let me show you an interesting thing I found. Let’s use the classic 3 x 3 matrix case and only focus on \textbf{R} . We know from above that if r=0.16 then the value of det(\textbf{R}) is maximized. Now, let’s use R (the programming language, not the matrix) to uniformly sample random values of it, calculate the determinants and plot them:

n<-10000
r<-runif(n, min=-.427, max=.747)


for (i in 1:n) {
R<-matrix(c(1,.8,.2,.8,1,r[i],.2,r[i],1),3,3)
pp[i]<-det(R)
}


dat<-data.frame(pp)
a<- density(pp)
mmod<-a$x[a$y==max(a$y)]


p<-ggplot(dat, aes(x=pp))+geom_density(fill="lightgreen", alpha=.4, size=1)
p+ geom_vline(aes(xintercept=mmod),
color="red", linetype="dashed", size=1)+theme_bw()+xlab("Determinant")+ylab("")

det_dist

Compare the mode of the distribution to the theoretical, maximum possible determinant:


>R1<-matrix(c(1,.8,.2,.8,1,.16,.2,.16,1),3,3) > det(R1)
[1] 0.3456
> mmod ##this is the mode of the distribution above
[1] 0.3344818

Close within 0.011 error. Which leads me to make the following claim:

Claim: For the “missing correlation” problem, the distribution of the determinants of the correlation matrix concentrate around the value of r that maximizes it. 

I honestly have no clue how any of these two results would be useful once they are formalized. I am sensing that something like this may be able to play a role in error detection or diagnosing Heywood cases in Factor Analysis? I mean, for the first case (error detection) say you find a correlation matrix within the published literature that is not positive definite. If correlation matrices tend to concentrate around values that maximize their determinants, then you could potentially use this framework to pick and choose ranges of possible sets of correlations to point out where the problem may lie. A similar logic could be used for Heywood cases, ESPECIALLY if the dimensions of the correlation matrix are large. Or maybe all of this is BS and a very elaborate excuse for me to procrastinate. The world will never know  ¯\_(ツ)_/¯

 

 

*Technically speaking, I would *still* need to check solutions at the boundary of the blob because the global maximum may be in one of the other red regions. But we’ll leave this as an “exercise for the reader” 😉