Technically this is not a refereed publication (yet), but I thought it would be nice to have a little space for the function at the centre of my master’s thesis. It deals with the Bayesian estimation of the polychoric correlation coefficient both under bivariate normal and non-normal data. It is based on this article written by James H Albert. I was a little bit surprised that such a useful article hadn’t really got a lot of citations, so I wanted to highlight his work by implementing it into R. In my code, I figured out how to deal with both right-skewed and left-skewed latent data (Albert’s article only handles right-skewed data) and it has proven to be very useful for handling difficult datasets, such as when one is dealing with low-count or zero-count cells in contingency tables.
The main function is called polycorGibbs()
because it is based of a Gibbs sampler.
And the way it works is super easy:
library(MASS)
Q <- mvrnorm(100, c(0,0), matrix(c(1,.5,.5,1),2,2))
x <- Q[,1]
y <- Q[,2]
##Make sure you declare these as "integer", not "numeric" or "float"
xb <- as.integer(cut(y, breaks=c(-Inf, mean(y)-sd(y),mean(y)+sd(y), Inf), labels=c(1:3)))
yb <- as.integer(cut(y, breaks=c(-Inf, mean(y)-sd(y),mean(y)+sd(y), Inf), labels=c(1:3)))
plyk <- polycorGibbs(xb, yb, iter = 1e4, trace=FALSE, graph=FALSE)
Summary
mean median SD Numeric SD
rho 0.481 0.487 0.00109 0.0282
c1 -1.070 -1.070 0.00885 0.1530
c2 1.020 1.020 0.01190 0.1500
d1 -1.070 -1.060 0.01090 0.1550
d2 1.010 1.010 0.01090 0.1520
Correlation
rho c1 c2 d1 d2
rho NA 0.00163 -0.0384 -0.0153 -0.0323
c1 NA NA 0.2100 0.5080 0.2260
c2 NA NA NA 0.2330 0.5220
d1 NA NA NA NA 0.2220
d2 NA NA NA NA NA
I offer some extra functionalities like plotting the time series plot for the Markov Chain and other ways to check how well everything converged and whatnot.
Overall, the function is a tad bit finicky. But then again this was my first “major” project and I pretty much had to teach myself Bayesian statistics and MCMC from scratch to get it to work, which is why I am very proud of it. But it will get better in time.