Simulate a 2-level Multilevel/HLM/linear mixed model

I think multilevel models are still “a thing”, right? Like they used to be all hot and sexy a few years ago but maybe they’ve started to become a little more common. The HLM software has made them very popular, although eventually I will rant about how much I dislike the whole “Level 1” and “Level 2” parlance that Raudenbush & Bryk helped disseminate. But for better or worse it’s here to stay and I need it if I want to communicate with my colleagues.

In any case, the model from which we will be simulating data in this post looks like this:

y = X\mathbf{\beta} + Z\mathbf{u} + \mathbf{\epsilon}

y is an nX1 continuous vector of data (the dependent variable).
X is an nXp model matrix with a pX1 vector of coefficients \mathbf{\beta} (so this is the ‘fixed effects’ part of the model).
Z is another nXq model matrix with a qX1 vector \mathbf{u} (the ‘random effects’).
\mathbf{\epsilon} are the residuals.

I know people reading this are probably more familiar withe the Level 1 and Level 2 approach of looking at things and I’m pretty sure it’s easy to make explicit its relationship with the way I stated the model, which is the Laird-Ware (1982) specification (this is, I think, a lot more common than the Raudenbush & Bryk one). Since I will be relying on matrices and matrix operations to simulate data, I thought it was a better idea to keep everything Laird-Ware.

In any case, by assuming that the random effects \mathbf{u} are distributed \mathbf{u} \sim N(0, \sigma^{2}D) (which is what we typically do when we fit a multilevel model) then Var(y) = Var(Z\mathbf{u}) + Var(\mathbf{\epsilon}) . From regular OLS regression the assumption is that Var(\mathbf{\epsilon})= \sigma^{2}I to which we simply need to add Var(Z\mathbf{u}) . After doing some basic matrix algebra we now have \epsilon = \sigma^{2}I+\sigma^{2}ZDZ^{t}) so that y \sim N(X\beta, \sigma^{2}(I+ZDZ^{t}))

Because we’ve worked through the math a little bit here, now we know what we need to do in order to simulate data the way we want it: specify X\beta for the fixed effects, the matrix D for the covariance of the random effects and \sigma^{2} for the residual.


Following the advice of quite a few people, I would like to direct you to the improved version of this function that is now in MY PUBLISHED ARTICLE. This way you get a nice function that has been vetted by peer-review AND you can cite something if you choose to use it.

The code is all the way at the end of the article. Enjoy!