How MMM and Lift Tests Can Work Together

Michael Kaminsky - Founder of Recast
September 27, 2023

The most important metric for measuring advertising performance is incrementality. “Incrementality” is a measure of the true causal impact of marketing activity on a business KPI and answers the questions: “if we spent an additional $10,000 on this channel, how many additional sales would we get?” and “if we stopped spending into this channel, how many sales would we lose?”.

Marketers have a few tools in their toolbox for measuring incrementality:

  • Experiments or “lift tests” which might take the form of randomized controlled trials or geographic holdout tests
  • Marketing mix models (MMMs) that use historical observational data and a statistical model to identify causal relationships in the data

Both of these methods have tradeoffs, but when used properly they actually work well together and each reinforces the other in a virtuous cycle.

Lift Tests and Experiments

Lift tests are an incredibly important and powerful tool for measuring true incrementality. When run and interpreted correctly, lift tests and experiments give us the best read on the true incrementality of our marketing channels. However, there are a few important limitations that marketers should be aware of:

  1. Lift tests provide incrementality reads during the snapshot of time when the test was run
  2. Lift tests must be set up and interpreted correctly in order to get valid results
  3. Some channels are very difficult or impossible to test for structural reasons

Let's take these limitations one by one.

Lift tests are snapshots

If you run a lift test for a certain channel between April first and April 15th, the results apply most directly to that time period and then are less applicable with every passing day as things change. You might change your creative or your targeting strategy or the platform itself can change. For example, Meta or Youtube could change its targeting algorithm or more other brands could start targeting your same audience. Additionally, macroeconomic impacts or seasonality can change the effectiveness of your marketing activity over time.

Setup and interpretation

In order for the incrementality read of the experiment to be correct you have to make sure the experiment is set up correctly. For example, in the case of a geographic holdout test where you run a marketing campaign in certain geographies and not in others, the results of the test depend on ensuring that the geographies in the treatment and control groups are actually comparable. In the United States you probably wouldn’t want to treat Cincinnati as your treatment group and New York and Los Angeles as your control group since those geographies aren’t generally comparable.

While tools like Haus make test result interpretation much easier and less error prone, if you’re interpreting the results yourself you need to make sure that you understand exactly what the test results are saying and what subset of your customer population they apply to.

Difficult channels

Experimentation frameworks rely on the ability to accurately target certain individuals or geographies with ad campaigns. For some channels this can be difficult or impossible. For example, Podcasts are very difficult to test since outside of certain dynamically-inserted ads you can’t serve different ads to people in New York and Texas. Channels like Linear Television are also difficult to test since national TV buys are very different from local media buys and so it’s unclear to what extent a test for local TV actually generalizes to a national TV buy in the case of a test.

Marketing Mix Modeling (MMM)

MMM is a very important tool for marketers as well. When set up and run correctly, MMM provides a continuous read on the incrementality for all marketing channels and how they change over time. However, there are a few important limitations that marketers should be aware of:

  1. MMMs are nuanced and make lots of assumptions and are very difficult to validate
  2. MMMs are often less precise than experiments and have more uncertainty
  3. MMMs generally operate at a fairly high level of aggregation

Let’s take these limitations one by one.

Validating an MMM

MMMs are complex statistical models that rely on a combination of structural assumptions and historical observational data in order to estimate incrementality. Different reasonable models may yield conflicting results and it’s very easy to generate MMM results that “look reasonable” but are actually incorrect and misleading. 

While it’s very easy to get results out of an MMM, it’s very difficult to get correct results out of an MMM and even highly trained statisticians can have differences of opinion on what’s correct or not.

Precision

Because MMMs use historical observational data to infer causality, the results from an MMM are generally less precise than what you might be able to get from an experiment where you directly control the intervention on a subset of the population of interest.

Practically speaking this means that MMMs generally have wider uncertainty bounds on their incrementality estimates which can make decision-making more difficult.

Aggregation

MMMs are most generally trained at the aggregate channel + tactic level. For example, you might estimate the impact of “non-branded search” but not the impact of individual search keywords or strategies. Or you might estimate the impact of “facebook prospecting” but not individual campaigns or creative that comprise facebook prospecting.

There are many questions or hypotheses that marketers might have about sub-channel performance that simply aren’t answerable with an MMM.

Better together: MMM + Experiments

The good news is that MMM and experiments can be used together such that each tool reinforces the other and accounts for the other’s weaknesses. Here’s how the ideal world looks:


MMM can be used for continuous reads on incrementality even when things are changing

One situation you can imagine is a business that has substantial amounts of seasonality. If you run an incrementality test during the peak season, you may suspect that those results don’t apply perfectly to the offseason. So you can use an MMM to try to model how the incrementality results might change over time as you move from peak- to off-season.


Experiments can be used to calibrate the MMM to validate the model and improve precision

MMMs are generally noisy measures. Because MMMs rely on observational data, in general the incrementality read from the MMM will be uncertain, and sometimes very uncertain depending on how much signal is in the data. Additionally, since MMM models are so flexible, there can be multiple different sets of parameter estimates that all fit the data equally well. 

 Experiments can be used to both increase precision of the model estimates (instead of a credible interval between 1x and 2x ROI, we can shrink it to between 1.25x and 1.75x ROI) as well as to help “rule out” plausible parameter sets that don’t match the incrementality tests.


The MMM results can drive the experimentation roadmap. For example, if the MMM shows that a channel’s performance is trending up or trending down, then that might be a good channel to test

Using the MMM to identify possible changes in ground truth can help to structure the experimentation roadmap. You might imagine someone saying “the MMM is indicating that radio performance has improved since we released new creative. Let’s prioritize another geo-holdout test in that channel next month.”

Or, you might notice that one channel’s incrementality estimate from the MMM conflicts with data you’re seeing in your digital-tracking or last-touch model. That is a great channel to go and test!


MMM can help you to pull all of the experimental data you have together into one complete view which can also be used for forecasting and planning

A good MMM can help you combine information from different experiments over time and then use that information to help with planning and forecasting. It might be the case that you ran a test on Facebook in April, then a TV test in May, and a test on TikTok in June. Over that same time period you released a new product but also consumer sentiment has been changing. It’s now July and you need to forecast the rest of the year. Combining all of the information from the tests and the business trends can be very challenging, but with a good MMM you can structurally and smartly combine these different data points and plan for the rest of the year.

Of course, all of this is contingent on being able to run good experiments and using an MMM framework that’s flexible enough to correctly incorporate the results from the experiments into its estimates as well as updating frequently enough to be able to provide ongoing incrementality reads.

Oops! Something went wrong while submitting the form.