For those unfamiliar, the t-test was developed by William Sealy Gosset, who worked at the Guniness brewery in Dublin in 1908. He was forced to use a pen name, which is where the term “student” was born. What his test does is tell you the odds of a sample of data (which has a Gaussian distribution) randomly occurring against the backdrop of a population with a similar standard deviation. In addition, the test is used to compare to two samples of data (where one is the control and the other has been altered in a deliberate way) and determine the odds of the altered data set randomly occurring within the control data set. This is done by comparing whether the means of the two (Gaussian distributed) samples are equal. Whether using a lookup table in a textbook or a program such as Matlab, the result is encapsulated in what’s called the p-value. And this is where the confusion arises.
Depending how strict your study is, one may conclude that their results are significant if the p-value is less than .05 or .01. This is saying that, only if the result occurs by chance only 5 or 1% of the time, would they consider the change from the control to have a significant statistical effect. As this article mentions, however, few studies actually correctly interpret the results of the t-test. You’ll often hear that if the p-value is .05, then the author can be 95% certain that the change made to the control caused a significant difference. In the sciencenews article, Tom Siegfried diagnoses the problem with this:
That interpretation commits an egregious logical error (technical term: “transposed conditional”): confusing the odds of getting result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result. A well-fed dog may seldom bark, but observing the rare bark does not imply that the dog is hungry. A dog may bark 5 percent of the time even if it is well-fed all of the time.
I think that in life we make these type of transposed conditional type errors more often than we realize. To summarize, the correct way to interpret the results of the of value is the following: for a p-value of .05, there is a 5% chance of the altered sample occurring randomly if the change from the control has no effect on the result (i.e., if the null hypothesis is true).
If these t-statistic complications weren’t enough, Siegfried also explains some issues with the test even when it’s properly interpreted. First, if your p-value is less than your (.05 or .01) threshold, then either there is a real effect present, or the result was an unlikely fluke (i.e., sometimes it’s the latter and we won’t know). On the other hand, if your p-value is greater than your threshold, then either the studied effect doesn’t exist, or perhaps your test wasn’t powerful enough to detect a small but real effect. Lastly, statistical significance may not mean practical importance, as the studied effect may be so small such that it effects few situations or people.
While these points may seem recherché, they're important to keep in mind when interpreting the barrage of studies we come across (in the media and elsewhere) as we try to take “small steps toward a much better world.”