MCMC estimation of the polychoric correlation coefficient

Technically this is not a refereed publication (yet), but I thought it would be nice to have a little space for the function at the centre of my master’s thesis.  It deals with the Bayesian estimation of the polychoric correlation coefficient both under bivariate normal and non-normal data.  It is based on this article written by James H Albert. I was a little bit surprised that such a useful article hadn’t really got a lot of citations, so I wanted to highlight his work by implementing it into R. In my code, I figured out how to deal with both right-skewed and left-skewed latent data (Albert’s article only handles right-skewed data) and it has proven to be very useful for handling difficult datasets, such as when one is dealing with low-count or zero-count cells in contingency tables.

The actual code for the function is here. The main function is called polycorGibbs() because it is based of a Gibbs sampler.

And the way it works is super easy:


library(MASS)

Q <- mvrnorm(100, c(0,0), matrix(c(1,.5,.5,1),2,2))

x <- Q[,1]
y <- Q[,2]

##Make sure you declare these as "integer", not "numeric" or "float"


xb <- as.integer(cut(y, breaks=c(-Inf, mean(y)-sd(y),mean(y)+sd(y), Inf), labels=c(1:3)))
yb <- as.integer(cut(y, breaks=c(-Inf, mean(y)-sd(y),mean(y)+sd(y), Inf), labels=c(1:3)))
plyk <- polycorGibbs(xb, yb, iter = 1e4, trace=FALSE, graph=FALSE)


Summary
mean median SD Numeric SD
rho 0.481 0.487 0.00109 0.0282
c1 -1.070 -1.070 0.00885 0.1530
c2 1.020 1.020 0.01190 0.1500
d1 -1.070 -1.060 0.01090 0.1550
d2 1.010 1.010 0.01090 0.1520


Correlation
rho c1 c2 d1 d2
rho NA 0.00163 -0.0384 -0.0153 -0.0323
c1 NA NA 0.2100 0.5080 0.2260
c2 NA NA NA 0.2330 0.5220
d1 NA NA NA NA 0.2220
d2 NA NA NA NA NA

I offer some extra functionalities like plotting the time series plot for the Markov Chain and other ways to check how well everything converged and whatnot.

Overall, the function is a tad bit finicky. But then again this was my first “major”  project and I pretty much had to teach myself Bayesian statistics and MCMC from scratch to get it to work, which is why I am very proud of it. But it will get better in time.

It would be really cool if you gave me a shout-out before using it 🙂 oscar.olvera100@yahoo.com