Statistical Significance is Costing You Money
Joe Wyer - Head of Economics and Applied Science @ Haus
April 13, 2023
Tired of waiting around for statistical significance?
Many marketers and data analysts misuse the concept of statistical significance - they conclude that marketing experimentation lacking statistical significance means marketing lift is inconclusive or even zero. This causes leaders to lose faith in the marketing tactic or in the testing practice all together. In this blog, I am going to share why statistical significance is costing you growth and how to implement a profitable testing strategy by ignoring statistical significance.
It is profitable to ignore statistical significance when making marketing investments.
“I cannot reject the null hypothesis that I will earn zero returns from owning shares in this company, therefore I must hold cash.” - No one, ever
Marketing investments, like financial investments, have uncertain returns. Marketing investment returns are even more difficult to understand because not only do you have to guess what the return of a new investment will be, you also have to rely on faulty, inaccurate data to estimate what the returns of past investments were! Like financial investments, the most important number to think about for marketing investments is the expected return. Once you know the expected return of the marketing investment then the investment decision is a matter of balancing risk and reward. Ideally, you’re placing marketing bets that have different expected returns with different degrees of confidence.
Imagine a situation where click-based attribution says that your return on ad spend (ROAS) on Meta is large and your team puts together a great incrementality test design and estimates a similar return. You see that the incremental ROAS is in the ballpark of where you need it to be profitable, but the analyst put in the executive summary that they do not find a significant ROAS. You ask the analyst what this means and they say “I can’t reject the null hypothesis that the ROAS is zero, therefore results are inconclusive.” Some analysts might even say that ROAS is zero if they can’t reject the null hypothesis!
Do you need to run the test longer? It depends on what else you could be doing with this budget. If you have high ROI places to invest, you want to establish some certainty on where these dollars are best spent, so it could be wise to run the test longer. However, if you have click-based and incrementality-based signals that the ROAS is high then every day you have a control group running is wasting the opportunity to profitably reach more customers. Also, continuing with the current test means that you are spending time not conducting tests on other investments. If half of all marketing dollars are wasted, your time is best spent searching for that half, and waiting around for statistical significance prevents you from stopping the bleeding.
Where does this stat sig fixation come from?
Most of us are products of academia. Many academics (not all, but most) will call any estimated impacts that do not have a p-value less than .05 a zero impact, regardless of the size of the impact! These academics might also publish papers that spare only a few sentences on whether the size of the impact is worth talking about, as long as the p-value is very small. Authors Ziliak and McCloskey discuss many examples of this in detail in their book The Cult of Statistical Significance: How the standard error costs us jobs, justice, and lives.
This fixation on statistical significance may make sense if you are trying to uncover unassailable truths about things like medicine or economic policy, but is detrimental if you are trying to make a profitable investment decision for a business where risk of failure is much more acceptable. For example, if you are testing the efficacy of a new pharmaceutical drug, you will definitely want to consider the p-value of those test results because the impact of getting it wrong could be detrimental to public health.
So, in marketing, if you don’t wait for full statistical significance, then what do you use to determine if test results are reliable?
Consider the credible interval instead.
Of course, you don’t want to jump to conclusions off of extremely slim data that might be short sighted, so consider the credible interval of experiment results instead. A credible interval is like a confidence interval, but you can use it to understand how likely different levels of marketing returns are. How much of the credible bell curve is in different zones should drive different decisions.
For example, maybe you are testing the incrementality of your PMAX campaigns. Your goal is to keep the PMAX ROAS greater than $2.50 and your incrementality results show an incremental ROAS of $3.00 with an 80% credible interval of $2.60 - $3.20. Well, in most possible scenarios here, you are exceeding your $2.50 ROAS goal so no matter where exactly on the credible interval you lie, your actions would likely be the same - increase or maintain your investment in PMAX. However, if your credible interval was $2.00 - $3.50, you’d want to determine what the probability of the results being less than $2.50 is and make a decision to either keep the test running to increase power and tighten the credible interval or if the probability was still low enough, you can move forward knowing a $3.00 ROAS is more likely.
It is even more profitable to blend incrementality estimates with expert judgment.
A model that produces unbiased estimates can still give estimates that are not possible in reality, like negative or wildly positive marketing lift. This does not necessarily mean that the model is wrong in the sense that it is biased or broken, it means that it’s not precise and the model outputs have to be balanced with expert judgment for best results.
Suppose your latest test turned up a negative lift estimate. A leader balancing their past experience with new test results would say: “Attribution shows very positive lift, but the incrementality test is negative. That means the truth could be somewhere in between, be it zero or much less positive.” Then the leader will have to decide whether the risk of retesting, shutting down, or trimming this spend outweighs the risk of having those budget dollars wasted when they could be better spent elsewhere. Unless other high budget places need to be tested, or if there is no obvious better place to reallocate the budget, extending the test or retesting will be worth it.
How do you translate these ideas into a concrete testing strategy?
First, test whether your biggest bets are paying back the way you expect them to. If the incremental return is in the ballpark of where you need it to be, test the next biggest bet. Once you find a bet that isn’t paying out what you expect, use your expert judgment to decide to retest it or to reallocate those dollars to proven bets. Incrementality measurement is messy so negative (and massively positive) results can happen and need to be blended with what you think is actually possible. This strategy of moving quickly between imperfect tests to find big wins is called test and roll and it is highly profitable when there can be big winners and big losers sprinkled throughout the strategies you are testing.
Inexact test results might make you uncomfortable. You might be used to seeing very stable platform attribution or multi-touch attribution (MTA) results. Unfortunately, attribution is often so positively biased that it is dangerous. For example, brand paid search strategies have been proven to provide negligible returns while their attribution is amazing (Blake, Nosko, and Tadelis 2015). Data deprecation from privacy changes have merely pulled the curtain back and revealed the attribution wizard of Oz. Incrementality, on the other hand, is less precise but much more accurate. If you choose incrementality, over time you will be moving in the right direction and plugging the leaks in your marketing strategy even if some tests need to be rerun or thrown in the trash.
But Joe, we need to give the CFO a number every week! If incrementality measurements aren’t perfect, then what do I report? In my experience, frankly acknowledging that there’s uncertainty in marketing measurement and having a clear test and roll strategy that takes that into account earns a lot of trust. I’ve worked with some marketing leaders who have been able to completely throw out wizard of oz attribution dashboards in favor of a strategic testing approach. However, of course we often need to hold ourselves accountable to some hard numbers!
Check out the 5 ways we recommend applying incrementality results for reporting.