Summary: This case study demonstrates that in many situations, especially related to advertising, the traditional regression paradigm (traditional MMM average coefficients) do not work and new methods of estimation – based on individual, not common effects of factors – should be implemented.
A large company launched a new product by sending a different number of promotion letters to 40 regions and then collected the responses. One can buy this product only through promotion, it’s not available anywhere else. A chart showing the relation between letters and purchases is depicted below. The correlation between the two variables is negative 0.26. Two regression lines represent normal regression, one with intercept and the other without. Both describe data very poorly, on a level of several percent. Because no other factors shade the relation due to the exclusivity of the offer, the regression without the intercept seems more logical-without letters the model should generate a zero response.
||This simple example demonstrates a deep problem with regression analysis, the main instrument of MMM. In fact, the analysis so far tells us the following:
- the more letters the fewer sales (due to negative correlation);
- when the model is set up in the correct way (without the intercept), the more letters the more sales (a positive sign of regression coefficient);
- the letters determine the very poor purchase rate, which implies that some other factors should explain the rest;
- the first and second conclusions contradict each other because in the second model, the correlation still remains negative; the third conclusion is incorrect because we know that no other factors exist;
- one may say that in this example, reflecting a real problem, the statistician would know in the beginning
that sales do not depend on other factors, etc., and cannot be confus
What really happened – there is a third, unobserved direct variable, which could be called “response/purchase rate”, or yields, which is different from region to region. It shows that different regions react to promotions differently, which is always the case in real life. If one additionally assumes that response rates do not depend on the volume of letters, then it is clear that the real relationship between letters and sales is positive – because letters may produce only positive or zero number of purchases, but masked by that “third variable”. In statistical terms, regression assumes that only one response rate governs the process, and it is determined by the regression coefficient. However, it’s not right and one should try to find all coefficients for all markets. This is exactly what yield analysis does.
Application of the algorithm finds individual yields very close to those which were used in simulation of the data, which is clear on the following graph. Applying them to the data, determination raises from 8% to 95%.
- one can estimate the real (positive) effectiveness even if negatively correlated with sales advertising variables;
- estimation is possible for variables with very sparse values, where correlations could be misleading anyway;
- estimated yields allow tracking of tendencies of effectiveness in time and/or space, which gives a powerful tool to manage media: instead of increasing expenses for spending, it makes sense to check why the sales rate per dollar is going down;
- this approach has demonstrated extremely good results on several accounts, particularly for a large financial institution.
|This means that if one believes that indeed advertising works at different rates for different regions or weeks, he/she should admit that regression might seriously distort the real relations, while yield analysis might recover them. This is not to say that it always does as well as in this example, but a number of numerical experiments we performed showed that correlation between actual and predicted yields is always positive, and often highly positive, i.e., real recovery takes place. Applied to advertising and other marketing activities, this approach is extremely effective because of the following: