My code is also now part of the SimDesign R package in case you’d like to use it from there as opposed to downloading the stand alone function.
This is a very brief tutorial on how to use the headrick02 function I programmed to test various data-generation algorithms.
First, it would be a good idea for you to download this. They include the 5th-order polynomial coefficients to use the Headrick (2002) method for the sample cases I studied in my simulations. The function will refer to them as starting points to find solutions and, if you use the same combinations of marginal skewness/kurtosis, it will just refer back to them and you’ll have the solution instantly. In general, if the combination of skewness/kurtosis uses high values it takes longer to find a solution.
Just copy-paste them in a .txt file and save it in your working directory (the headrick02 function will refer to them).
In any case, let’s assume that you already did this and want to simulate data from a 1-factor model, 4 indicators and loadings of 0.7. Using lavaan we can specify things like this:
mod1 <- ' ksi1 =~ 0.7*x1 + 0.7*x2 + 0.7*x3 + 0.7*x4
x1 ~~ 0.51*x1
x2 ~~ 0.51*x2
x3 ~~ 0.51*x3
x4 ~~ 0.51*x4'
datum1 <- simulateData(mod1,1,return.fit=TRUE)
Sigm1 <- solve(solve(fitted(attr(datum1, 'fit'))$cov))
The reason of why I did this is because I need the population covariance matrix (
Sigm1) implied by the factor structure in
mod1. If you know how to specify factor models directly into covariances without needing the structural equations, you can skip this step.
Anyway, now that I have the covariance matrix, all I need to specify are the other moments of the distribution (mean, marginal skewness and marginal kurtosis) that should hold on the population so I can provide them to the function. REMEMBER! Not all combinations of skewness/kurtosis are possible. Just have a look at my article to make sure you’re choosing them within the theoretically-derived boundaries. I don’t want to have you waiting 3 days for a solution to the 5th order polynomials that doesn’t exist and, if it is found, it will probably be wrong anyway.
Now all we need to do is (after you declared the headrick02 function):
setwd("SOME WORKING DIRECTORY") #set the working directory to the one where headrick02.R is located
N <- 200
mean <- c(rep(0,4))
sd <- c(rep(1,4))
skewness <- c(rep(3,4))
kurtosis <- c(rep(21,4))
headrick02(N, mean, sd, Sigm1, skewness, kurtosis)
As you can see, the headrick02 function takes its input in the order of: sample size (200 in this case), vector of population means (0, in this case), vector of standard deviations (1, in this case), covariance matrix (Sigm1), vector of marginal skewnesses (population skewness of 3), vector of marginal kurtosis (population kurtosis of 21). These values correspond to those used by Curran, West & Finch (1996)
The headrick02 function will output:
– A list with the desired population-level moments
– A list with the sample moments, as generated from the data (just running descriptive stats on the sample).
– The population correlation matrix
– The sample correlation matrix
– The intermediate correlation matrix
– The 5th order polynomial coefficients per marginal distribution
– The list with 200 observations generated from the Headrick (2002) method.
I kept things easy by labeling everything and allowing S3 methods (so the ‘$’ operator) to extract them. If all you really care for is the actual data you only have to do something like:
datum <- as.data.frame(headrick02(N, mean, sd, Sigm1, skewness, kurtosis)$obs)
colnames(datum) <- c("y1","y2","y3","y4","y5","y6","y7","y8","y9")
And the object
datum now holds the dataset to be analyzed. Because the headrick02 function outputs things as a matrices and lists and lavaan only takes in dataframes, I’m also changing the nature of the object so that everything in ‘datum’ is ready to be used in lavaan or exported.