My love-hate relationship with G*Power

I can’t help but have a love-hate relationship with G*Power and power analysis as carried out in the social sciences. I love it because it provides applied researchers who may not have a strong statistical background with a (somewhat) sensible way to plan their sample sizes. I hate it because it reminds me that we still have a long, looong, LOOOOOONG way to go before we can even attempt to claim we are all following “best practices” in data analysis. And the fact of the matter is that we may never will.

Let me show you why.

Say we have the very simple scenario of calculating power for the easy-cheesy t-test of the Pearson correlation coefficient. We are going to be extra indulgent with ourselves and claim the population effect size is \rho=0.5 (so a LARGE effect size à la Cohen). If you plug in the usual specifications in G*Power (Type I error rate of .05, desired power of 0.8, population effect size of \rho=0.5 against the null of \rho=0.0 ) this is what we get:


So your sample size should be 26. Just for kicks an giggles, I simulated the power curve for this exact scenario and marked with a line where the 80% power would be located.


Same answer as with G*Power, somewhere a little over n=25. Pretty straightforward, right? Well… sure… if you’re comfortable with the assumption that your data is bivariate normal. Both in the R simulation I made and in G*Power, the software assumes that your data looks like this:


For even more kicking and giggling, let’s assume your data is NOT normal (which, as we know, is the more common case). In this particular instance, both variables are \chi^{2} -distributed with 1 degree of freedom (quite skewed). Each variable looks like this:


And their joint density (e.g. if you do a scatterplot), looks like that:


But here’s the catch… because of how I simulated them (through a Gaussian copula if you’re wondering), they both have the *same* population effect size of 0.5. What does the power curve look like in this case? It looks like this:


So that for the same large population effect size, you need a little over TWICE the sample size to obtain the same 80%.

You see where I’m going with this? Where’s the my-data-is-not-normal option in G*Power? Or my data-has-missing-values? Or my-data-has-measurement error? Or my data has all of those at once? Sure, I realize that this is a little bit of an extreme case because the sample size is not terribly large, the non-normality is severe and by the time n=100, the malicious influence of the non-normality has been “washed away” so to speak. The power curves look more and more similar as the sample size grows larger and larger. But it is still a reminder that every time I see people report their power analyses through G*Power my mind immediately goes to… “is this really power or a lower/upper bound to power?” And, moreover… if you go ahead, do your analyses and find your magic p-value under .05, you’re probably going to feel even *more* confident that your results are the real deal, right? I mean, you did your due diligence, you’re aware of the issues and you tried to address them the best way you could. And that’s exactly what kills me. Sometimes your best is just not good enough.

Solutions? Well… I dunno. Unless someone makes computer simulations mandatory in research methods classes, the only other option I have is usually to close my eyes and hope for the best.