Simulate a 2-level Multilevel/HLM/linear mixed model

I think multilevel models are still “a thing”, right? Like they used to be all hot and sexy a few years ago but maybe they’ve started to become a little more common. The HLM software has made them very popular, although eventually I will rant about how much I dislike the whole “Level 1” and “Level 2” parlance that Raudenbush & Bryk helped disseminate. But for better or worse it’s here to stay and I need it if I want to communicate with my colleagues.

In any case, the model from which we will be simulating data in this post looks like this:

$y = X\mathbf{\beta} + Z\mathbf{u} + \mathbf{\epsilon}$

Where:
$y$ is an nX1 continuous vector of data (the dependent variable).
$X$ is an nXp model matrix with a pX1 vector of coefficients $\mathbf{\beta}$ (so this is the ‘fixed effects’ part of the model).
$Z$ is another nXq model matrix with a qX1 vector $\mathbf{u}$ (the ‘random effects’).
$\mathbf{\epsilon}$ are the residuals.

I know people reading this are probably more familiar withe the Level 1 and Level 2 approach of looking at things and I’m pretty sure it’s easy to make explicit its relationship with the way I stated the model, which is the Laird-Ware (1982) specification (this is, I think, a lot more common than the Raudenbush & Bryk one). Since I will be relying on matrices and matrix operations to simulate data, I thought it was a better idea to keep everything Laird-Ware.

In any case, by assuming that the random effects $\mathbf{u}$ are distributed $\mathbf{u} \sim N(0, \sigma^{2}D)$ (which is what we typically do when we fit a multilevel model) then $Var(y) = Var(Z\mathbf{u}) + Var(\mathbf{\epsilon})$ . From regular OLS regression the assumption is that $Var(\mathbf{\epsilon})= \sigma^{2}I$ to which we simply need to add $Var(Z\mathbf{u})$ . After doing some basic matrix algebra we now have $\epsilon = \sigma^{2}I+\sigma^{2}ZDZ^{t})$ so that $y \sim N(X\beta, \sigma^{2}(I+ZDZ^{t}))$

Because we’ve worked through the math a little bit here, now we know what we need to do in order to simulate data the way we want it: specify $X\beta$ for the fixed effects, the matrix $D$ for the covariance of the random effects and $\sigma^{2}$ for the residual.

::UPDATE::

Following the advice of quite a few people, I would like to direct you to the improved version of this function that is now in MY PUBLISHED ARTICLE. This way you get a nice function that has been vetted by peer-review AND you can cite something if you choose to use it.

The code is all the way at the end of the article. Enjoy!

Share this: