Gelman and his colleague Eric Loken didn’t care for these alternatives. In 2013, they wrote that they “regret the spread of the terms ‘fishing’ and ‘p-hacking’ (and even ‘researcher degrees of freedom’),” because they create the “misleading implication that researchers were consciously trying out many different analyses on a single data set.” The “garden of forking paths,” on the other hand, more aptly describes how researchers can get lost in all the decisions that go into data analysis, and not even realize that they’ve gone astray.
“People say p-hacking and it sounds like someone’s cheating,” Gelman says. “The flip side is that people know they didn’t cheat, so they don’t think they did anything wrong. But even if you don’t cheat, it’s still a moral error to misanalyze data on a problem of consequence.”
Simmons is sympathetic to this criticism. “We probably didn’t think enough about the connotations of the word ‘hacking,’ which implies intentions,” he says. “It sounds worse than we wanted it to.” He and his colleagues have been very explicit that p-hacking isn’t necessarily a nefarious endeavor, but rather a human one, and one that they themselves had been guilty of. At its core, p-hacking is really about confirmation bias—the human tendency to seek and preferentially find evidence that confirms what we’d like to believe, while turning a blind eye to things that might contradict our preferred truths.
The “hacking” part makes it sound like some sort of immoral behavior, and that’s not helpful, Simmons says. “People in power don’t understand the inevitability of p-hacking in the absence of safeguards against it. They think p-hacking is something that evil people do. And since we’re not evil, we don’t have to worry about it.” But Simmons says that p-hacking is a human default: “It’s something that every single person will do, that I continue to do when I don’t preregister my studies.” Without safeguards in place, he notes, it’s almost impossible to avoid.
Still, there’s something indisputably appealing about the term p-hacking. “You can’t say that someone got their data and garden-of-forking-pathed it,” Nelson adds. “We wanted to make it into a single action term.”
The genesis of the term p-hacking made it easier to talk about this phenomenon across fields by harkening to the fact that this was a behavior—something researchers were actually doing in their work. Even though it was developed by psychologists, the term p-hacking was soon being used by people talking about medicine, nutrition, biology or genetics, Nelson says. “Each of these fields have their own version, and they were like, great. Now we have a term to describe whatever is our version of semilegitimate statistical practices.”
The fact that p-hacking has now spread out of science and into pop culture could indicate a watershed moment in the public understanding of science, and a growing awareness that studies can’t always be taken at face value. But it’s hard to know exactly how the term is being understood at large.
It’s even possible that the popularization of p-hacking has turned the scientific process into a caricature of itself, reinforcing harmful ideas about the scientific method. “I would hate for the concept of p-hacking boiled down to something like ‘you can make statistics say anything you want’ or, worse, that ‘scientists are liars,’” says Nuzzo, the science writer. “Because neither of those things is true.”
In a perfect world, the wider public would understand that p-hacking refers not to some lousy tendency or lazy habit particular to researchers, but one that’s present everywhere. We all p-hack, to some extent, every time we set out to understand the evidence in the world around us. If there’s a takeaway here, it’s that science is hard—and sometimes our human foibles make it even harder.
More Great WIRED Stories