Category Archives: Uncategorized

Normality: residuals or dependent variable?

So… something interesting happened the other day. As part of an unrelated set of circumstances, my super awesome BFF Ed and I were discussing one of those interesting, perennial misconceptions within methodology in the social sciences. OK, maybe in other areas as well but I can only speak about what I know best. The interesting aspect of this conversation is that it reflects the differences in training that he and I have so that, although we tend to see things from the same perspective, our solutions are sometimes different. You see, Ed is a full-blown mathematician who specializes in harmonic analysis but with a keen interest in urban ornithology as his more “applied” researcher side. Oh, and some psychometrics as well. I’m a psychometrician that’s mostly interested in technical problems but also flirts with the analysis of developmental data. This is going to play an important role in how we approached the answer to the following question:

In a traditional ANOVA setting (fixed effects, fully-balanced groups, etc.)… Does one test the normality assumption on the residuals or the dependent variable?

Ed’s answer (as well as my friends): On the residuals. ALWAYS.

My answer: Although the distributional assumptions for these models is on the residuals, for most designs found in education or social sciences it doesn’t really matter whether you use the residuals or the dependent variable.

Who is right, and who is wrong? The good thing about Mathematics (and Statistics as a branch of Mathematics) is that there’s only one answer. So either he is right or I am. Here are the two takes to the answer with a rationale.

Ed is right.

This is a simplified version of his answer that was also suggested on talkstats. Consider the following independent-groups t-test as shown in this snippet of R code. I’m assuming if you’re reading this you know that a t-test can be run as a linear regression.

dv1 <- rnorm(1000, 10, 1)
dv2 <- rnorm(1000, 0, 1)
dv <- c(dv1, dv2)
g <- as.factor(rep(c(1,0), each=1000))


res <-

If you plot the dependent variable, it looks like this:


And if you plot the residuals, they look like this:


Clearly, the dependent variable is not normally distributed. It is bimodal, better described as a 50/50 Gaussian mixture if you wish. However, the residuals are very much bell-shaped and.. well, for lack of a better word, normally distributed. If we wanted to look at it more formally, we can conduct a Shapiro-Wilks test and see that it is not statistically significant.


Shapiro-Wilk normality test
data: res$residual
W = 0.99901, p-value = 0.3432

So… yeah. Testing the dependent variable would’ve led someone to (erroneously) conclude that the assumption of normality was being violated and maybe this person would’ve ended up going down the rabbit hole of non-parametric regression methods… which are not bad per se, but I realize that for people with little training in statistics, these methods can be quite problematic to interpret. So Ed is right and I am wrong.

I am right.

When this example was put forward I pointed out to Ed (and other people involved in the discussion) to look at the assumption made regarding the population effect size. That’s a Cohen’s d of 10! Let’s see what happens when you run what’s considered a “large” effect size within the social sciences. Actually, let’s be very, very, VERY generous and jump straight from a Cohen’s d of 0.8 (large effect size) to a Cohen’s d of 1 (super large effect size?).

The plot of the dependent variable now looks like this:


And the residual plot looks like this:


Uhm… both the dependent variable and the residuals are looking very normal to me. What if we test them using the Shapiro-Wilks test?

Shapiro-Wilk normality test

data: res$residual
W = 0.99926, p-value = 0.6328

Shapiro-Wilk normality test

data: dv
W = 0.99944, p-value = 0.8515

Yup, both are pretty normal-looking. So, in this case, whether you test the dependent variable or the residuals you end up with the same answer.

Just for kicks and giggles, I noticed that you needed a Cohen’s d of 2 before the Shapiro-Wilks test yielded a significant result, but you can see that the W statistics are quite similar between the previous case and this one.  And we’re talking about sample sizes of 2000. Heck, even the plot of the dependent variable is looking pretty bell-shaped

Shapiro-Wilk normality test

data: dv
W = 0.99644, p-value = 8.008e-07


This is why I, in my response, I included the addendum of for most designs found in education or social sciences it doesn’t really matter whether you use the residuals or the dependent variable. A Cohen’s d of 2 is two-and-a-half units larger than what’s considered a large effect size in my field. If I were to see such a large effect size, I’d probably think something funky going on with the data than actually believing that such a large difference can be found. Ed, comes from a natural science background. And I know that, in the natural sciences, large effect sizes are pretty common-place (in my opinion it comes down to the problem of measurement that we face in the social sciences).

As you can see now, the degree of agreement between the normality tests of the dependent variable and the residuals is a function of the effect size. The larger the effect size, the larger the difference between the shapes of the distributions of the residuals VS the dependent variable (within this context, of course. This is not true in general).

Strictly speaking, Ed and the talkstats team are right in the sense that you can never go wrong with testing the residuals. Which is what I pointed out in the beginning as well. My applied research experience, however, has made me more practical and I realize that, in most cases, it doesn’t really matter. And at a certain sample size, the normality assumption is just so irrelevant that even testing it may be unnecessary. But anyway, some food for thought right there 😉

Lawley’s test for equality of correlations

So… while I kept on cleaning computer files I found something pretty interesting. A “secret” (as in “I-didn’t-remember-where-it-was” kind of secret) folder where I kept all my attempts at a Master’s thesis. Apparently, I went through 11 different projects which I left at various unfinished stages. But there was one that just draw me back in because it was… well… kind of pointless?

For some reason that I can’t quite remember, I thought it would be a good idea to investigate the properties of Lawley’s test for the equality of correlations in a correlation matrix.  So the null hypothesis of this test kind of looks like this:


I have absolutely no idea why I thought this was a good idea. I’m thinking it may have been back in the days where I was not as proficient in Structural Equation Modelling (SEM) so I’d dwell in my mathematical training to accomplish things that can be trivially set up as SEMs. Following on my poorly-documented code and meager notes, it seems like I thought I could tweak the test to see whether or not the assumption of Parallel Tests or Tau-Equivalent Tests could be evaluated. You know, to make sure people would know whether their Cronbach’s alpha was a true estimate of reliability or just a lower bound. Little did I know at that point that people just use alpha whether the assumptions behind it hold or not. The truth is nobody really cares.

The one thing I remember is that I couldn’t find any R package that would run the test. So I ended up coding it myself. I’m not sure if this would be of any use to anyone, but I’m hoping this may save some time to someone out there who may need it for whatever reason.

So the R function you’d have to declare is:

lawley <- function(data) {

R <- cor(data)
m <- cor(data)

r <- nrow(m)
c <- ncol(m)

u <- c
z <- lower.tri(c,diag = FALSE)
u[z] <- 0

l <- c
z1 <- upper.tri(c,diag = FALSE)
l[z1] <- 0
p <- r
n <- nrow(data)

mcp <- 0*1:ncol(m)
for (i in 1:ncol(m))
mcp[i] <- (sum(m[,i])-1)/(ncol(m)-1)
} # vector of average values of non-diagonal elements of each row

lR <- R
z2 <- upper.tri(R,diag = FALSE)
lR[z2] <- 0

mc <- (2/(p*(p-1))) * sum(R-lR)

TR <- R
z3 <- upper.tri(R,diag = TRUE)
TR[z3] <-0

A <- (TR - mc)^2
A[z3] <-0
A <- sum(A)

B <- sum((mcp-mc)^2)

C <- ((p-1)^2*(1-(1-mc)^2))/(p-((p-2)*(1-mc)^2))
X2 <- (((n-1)/((1-mc)^2))*(A-C*B))
v <- ((p+1)*(p-2))/2
P <- 1-pchisq(X2,v,ncp = 0, lower.tail = TRUE, log.p = FALSE)

result <- rbind(c(X2, v, P))

dimnames(result) <- list(c("X2"), c("statistic", "df", "p.value"))


And the way you use it is very simple. Let’s try it with a mock dataset we’ll generate in the R package lavaan. Here we’re generating data from a One Factor model with equal loadings of 0.3 and equal error variances:


pop = 'f1 =~ 0.3*x1 + 0.3*x2 + 0.3*x3 + 0.3*x4 + 0.3*x5'

dat = simulateData(model=pop, sample.nobs=500)

statistic df p.value
X2 7.840194 9 0.5503277

So… yeah. Non-significant p-value (as per glorious alpha of .05) tells us that the null hypothesis is true and the population correlation matrix indeed has equal elements all around.

Let’s change one loading and see what happens:


pop = 'f1 =~ 0.4*x1 + 0.3*x2 + 0.3*x3 + 0.3*x4 + 0.3*x5'

dat = simulateData(model=pop, sample.nobs=500)

statistic df p.value
X2 18.78748 9 0.02706175

So the null hypothesis is now rejected because one of the loadings is different so there is one element of the covariance matrix that is not equal to every other element.

I literally have no clue why Lawley thought having such a test was a good idea. But then again I was investigating it a few years ago so maybe *I* thought the test was a good idea in the first place.

Anyhoo, I hope this helps someone if they need it.