Grad Stat Lab

The P-value Controversy

Recent years have seen statisticians become increasing vocal about the limitations of traditional hypothesis tests and their associated p-values. The limitations are not new discoveries, many were well known one hundred years ago when hypothesis testing was popularized by Ronald Fisher. Fisher’s goal was to provide tools that research workers could employ in their own work. However, several recent highly publicized reviews of scientific literature have demonstrated the weaknesses of using p-values, especially with their present overemphasis and their preeminent role in deciding if research is publishable or not.

For students of inferential statistics, the issue is complicated by the traditional method of instruction. Statistics is typically taught by introducing the theories of probability, which is used to build-up the notion of sampling distributions, which lead to the construction of hypothesis tests, p-values, and confidence intervals. P-values and hypothesis tests provide a seemingly useful tool as well as an impressive conclusion in such courses, yet the limitations and alternatives are rarely covered in these introductions.

In the words of The American Statistical Association,

Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index [e.g. the p-value] should substitute for scientific reasoning. [emphasis added]
ASA Statement on Statistical Significance and P-Values (2016)

The statement has six additional key points:

Since the release of the ASA Statement, one of the flagship journals of the ASA has published a special issue dedicated to the topic. The lead editorial adds an additional directive:

Regardless of whether it was ever useful, a declaration of ‘statistical significance’ has today become meaningless….don’t say it and don’t use it.
(Wasserstein, Schirm, & Lazar, 2019)

The authors fully intend to abandon all equivalent procedures, not simply the phrase "statistically significant". This includes "significantly different", "p<.05", asterisks used to indicate significance, and "nonsignificant"—essentially any traditional form of hypothesis testing.

Grad students, of course, should seek guidance from their instructors, faculty advisors, grant agencies, and journal guidelines. Yet, nothing prevents the student from using a richer set of tools in forming their own personal beliefs, even if institutions are not yet ready to abandon hypothesis testing.

  • Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129-133. doi:10.1080/00031305.2016.1154108
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a World Beyond "p < 0.05". The American Statistician, 73(sup1), 1-19. doi:10.1080/00031305.2019.1583913