I am a big fan of Deirdre McCloskey. One of the things she’s always carrying on about is ‘How big is big?’. She argues that in much empirical analysis that people confuse statistical significance with substantive significance. In a play on words, she describes this as being the standard error of empirical analysis. For readers who are not statistically literate the standard error refers to the precision of the estimate that the analysis has produced. McCloskey argues that it isn’t enough for an estimated coefficient to have a small standard error (i.e. be estimated with a high degree of precision) it must also have ‘oomph’. I agree. So a highly statistically significant relationship might actually have a very small effect and so not be of substantive importance. So it’s not really enough to just look at the statistical significance of any relationship, we also need to think about the size of the relationship. McCloskey talks about this in her book, joint with Stephen Ziliak, The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives and an entire issue of the 2004 Journal of Socio-Economics (subscription required) is dedicated to discussing the issue.
When doing an empirical test we should look for substantive significance and statistical significance and not just one or the other. We all agree that when a hypothesis lacks both statistical significance and the statistic lacks oomph then we should reject whatever hypothesis we are investigating. Similarly when we have statistical significance and oomph then we should accept whatever hypothesis we’re investigating. (This language might annoy some purists, we don’t accept hypotheses we fail to reject them etc. etc.). The Reject (1) category is what annoys McCloskey so much; a coefficient that has statistical significance but no oomph. The Reject (2) category is more controversial, to my mind. McCloskey also makes the argument that our conventional t values for hypothesis testing are arbitrary – of course she is right, they are arbitrary. She seems to suggest that it depends from case to case. I do have some sympathy for that argument, but I am uncomfortable with the position. My view is closer to Jeffrey Wooldridge’s (2004) Journal of Socio-Economics position
While I completely agree that statistical significance does not imply economic significance, I think pushing an economically large effect that is statistically insignificant is usually a stretch.
Results in this category shouldn’t be ignored; I think a case for much more work can be made for this category of result.
So why am I carrying on about this? In Phil Jones’ BBC interview we see an example of McCloskey’s standard error. Recall questions B and C.
B – Do you agree that from 1995 to the present there has been no statistically-significant global warming
Yes, but only just. I also calculated the trend for the period 1995 to 2009. This trend (0.12C per decade) is positive, but not significant at the 95% significance level. The positive trend is quite close to the significance level. Achieving statistical significance in scientific terms is much more likely for longer periods, and much less likely for shorter periods.
C – Do you agree that from January 2002 to the present there has been statistically significant global cooling?
No. This period is even shorter than 1995-2009. The trend this time is negative (-0.12C per decade), but this trend is not statistically significant.
In my previous discussion of this interview I made the point that Jones should have told us what his significance levels actually were (so a standard error or a t-stat or a p-level would have been very useful). I have guesstimated his analysis in e-views using data from the CRU website. I estimated the following equation:
Temp = constant + B*Time Trend + AR(1) + error
I included the AR(1) term to take care of any unit-root problems and by using the Newey-West correction was able to get results very similar to what Jones describes in his interview.
The coefficient I estimate for question B is slightly smaller than Jones’ estimate (0.11C per decade to his 0.12C per decade) and my coefficient for question C is also slightly smaller (-0.14C per decade to his -0.12C per decade) [-0.14 is a smaller number than -0.12, don’t get confused by the minus signs]. Neither of those coefficients is significant at the 95 percent significance level, as Jones says. But, as he says, the question B coefficient is very close to significance – it has a p-value of 0.0512. If we were to accept a 90 percent significance level it would be statistically significant (1 – 0.0512 = 94.88 percent). So why doesn’t Jones say that? I’ve often seen people making the argument that a 90 percent significance is okay.
I think the answer is in the question C coefficient. There he says the trend is not statistically significant. But look at the p-value 0.0723. It is clearly not statistically significant at the 95 percent level, but it would be significant at the 90 percent level (1 – 0.0723 = 92.77 percent). In other words, Jones cannot claim that the answer to question B is statistically significant without then conceding that the answer, the negative coefficient, to question C is also statistically significant at the 90 percent level. (The p-value for question C is very sensitive to the Newey-West adjustment, without that adjustment it is statistically significant at the 95 percent level.) Under the standard error approach that McCloskey so hates, it would be game over.
So it looks to me that he is playing silly-buggers with significance levels. In his July 5, 2005 email Jones had indicated
The scientific community would come down on me in no uncertain terms if I said the world had cooled from 1998. OK it has but it is only seven years of data and it isn’t statistically significant.
Well, maybe now it is; the trend coefficient from 2003 looks to be statistically significantly different from zero at the 90 percent significance level.
Some caveats: I am not an econometrician. I have guesstimated what Jones did. What I have done is very rough and ready. He may have done something very different and the significance tests in his analysis might be very different to those I have reported here. He should post his tests and the significance levels on the web so that we can all have a look at them.