The skewness-kurtosis parabola

So while I was helping out a student in designing a simulation study I noticed something slightly odd. Most of the time, if you’re running a simulation of the type “the robustness of …. (insert your favourite statistical method here)” you’ll probably want to say something about how it performs under non-normality. And an easy way to characterize non-normality is through the 3rd moment (skewness) and 4th moment (kurtosis). That’s all fine and dandy. What I realized quickly is that some people may not be aware that those two moments are not independent and you can’t just willy-nillingy choose their values as you please for simulation conditions. What I’m going to be showing here is not new. It’s been around for almost a century now, but perhaps it is shown in such an impenetrable way that either most people can’t follow it or they just forgot about it. Maybe this could help clarify things a little bit.

Define skewness as:

\gamma_{1}=E\begin{bmatrix}\ \left( \frac{X-\mu}{\sigma} \right )^{3}\end{bmatrix}

and kurtosis as:

\gamma_{2}=E\begin{bmatrix}\ \left( \frac{X-\mu}{\sigma} \right )^{4}\end{bmatrix}

Where X is a random variable with population mean \mu and population standard deviation \sigma so that both \gamma_{1} and \gamma_{2} are the standardized 3rd and 4th moments. I’d like to note that these are the Pearson definitions of skewness and kurtosis because I know there are others out there.

Now, assume that X is standardized so that E[X]=0 and \sigma_{X}=E[X^{2}]=1 . From the definitions above, it’s easy to see that E[X^{3}]=\gamma_{1} and E[X^{4}]=\gamma_{2}  The key step here is using the Statistics version of a very well-known result from Calculus, the Cauchy–Schwarz inequality which, for random variables A and B , can be expressed as:

E[AB]^{2} \leq E[A^{2}]E[B^{2}]

Substitute A = X and B = X^{2}-1 as defined above, expand and simplify:

(E[X(X^{2}-1)])^{2} \leq E[X^{2}]E[(X^{2}-1)^{2}]

(E[X^{3}-X])^{2} \leq E[X^{2}]E[X^{4}-2X^{2}+1]

(E[X^{3}]-0)^{2} \leq (1)E[X^{4}-2E[X^{2}]+1]

(E[X^{3}])^{2} \leq E[X^{4}]-2+1

(E[X^{3}])^{2} \leq E[X^{4}]-1

(E[X^{3}])^{2}+1 \leq E[X^{4}]

Which looks exactly was the inequality found on the wiki article for kurtosis.

If we wanted to express it in terms of excess kurtosis, as it is sometimes presented to make the kurtosis of the normal distribution 0, then we just need to substract 3 from both sides of the inequality to make it:

\gamma_{2} \geq \gamma_{1}^{2}-2

As I said previously, this is not a new result but I am starting to realize many people are not aware of it. And if simulation studies are conducted where levels of skewness and (excess) kurtosis are chosen outside of this parabola, the results do not make sense. So if you either use Monte Carlo simulation studies to inform your data analysis practice or do Monte Carlo simulation studies, it would be a good idea to keep this quadratic relationship in mind to make sure the stuff you’re reading is fully legit 😉