# MCMC estimation of the polychoric correlation coefficient

Technically this is not a refereed publication (yet), but I thought it would be nice to have a little space for the function at the centre of my master’s thesis.  It deals with the Bayesian estimation of the polychoric correlation coefficient both under bivariate normal and non-normal data.  It is based on this article written by James H Albert. I was a little bit surprised that such a useful article hadn’t really got a lot of citations, so I wanted to highlight his work by implementing it into R. In my code, I figured out how to deal with both right-skewed and left-skewed latent data (Albert’s article only handles right-skewed data) and it has proven to be very useful for handling difficult datasets, such as when one is dealing with low-count or zero-count cells in contingency tables.

The main function is called `polycorGibbs()` because it is based of a Gibbs sampler.

And the way it works is super easy:

``` library(MASS)```

`Q <- mvrnorm(100, c(0,0), matrix(c(1,.5,.5,1),2,2))`

```x <- Q[,1] y <- Q[,2]```

`##Make sure you declare these as "integer", not "numeric" or "float"`

``` xb <- as.integer(cut(y, breaks=c(-Inf, mean(y)-sd(y),mean(y)+sd(y), Inf), labels=c(1:3))) yb <- as.integer(cut(y, breaks=c(-Inf, mean(y)-sd(y),mean(y)+sd(y), Inf), labels=c(1:3))) plyk <- polycorGibbs(xb, yb, iter = 1e4, trace=FALSE, graph=FALSE)```
``` Summary mean median SD Numeric SD rho 0.481 0.487 0.00109 0.0282 c1 -1.070 -1.070 0.00885 0.1530 c2 1.020 1.020 0.01190 0.1500 d1 -1.070 -1.060 0.01090 0.1550 d2 1.010 1.010 0.01090 0.1520```
``` Correlation rho c1 c2 d1 d2 rho NA 0.00163 -0.0384 -0.0153 -0.0323 c1 NA NA 0.2100 0.5080 0.2260 c2 NA NA NA 0.2330 0.5220 d1 NA NA NA NA 0.2220 d2 NA NA NA NA NA```

I offer some extra functionalities like plotting the time series plot for the Markov Chain and other ways to check how well everything converged and whatnot.

Overall, the function is a tad bit finicky. But then again this was my first “major”  project and I pretty much had to teach myself Bayesian statistics and MCMC from scratch to get it to work, which is why I am very proud of it. But it will get better in time.