TEN REASONS WHY MODELS MAY FAIL
Kent Leahy and Nethra Sambamoorthi, Ph.D.
DM NEWS – 1997
Addendum (by Nethra Sambamoorthi – 2003):
“Statistical scoring models are used to rank consumers/customers for marketing purposes. They are used very commonly in the b2b analysis, b2c analysis, CRM analysis, enterprise analysis [to rank initiatives, partners (affiliates), and consumers], and customer interaction analysis. In broad terms analysis powers Customer Knowledge Management, Customer Interaction Management, B2B marketing (B2B Analytics), and B2C marketing (B2C Analytics). With out CRM analytics, we will never be able to optimize our resources, effort, and time. Further more, analytics is the key component of “Marketing Analytics”, “Marketing Technology Automation”, and “Enterprise Analytics”. Fast forward to better CRM processes, increased CRM productivity, and higher ROI using CRM analytics. Some studies have indicated analytics contributes as much as 50% of ROI by improving your business processes, keeping managers accountable and responsive, and by improving productivity.
While there may be more than ten reasons why models may fail, we have pointed out the most important ten reasons why models hinder or even create loss instead of bring forth its full benefits in direct marketing, database marketing, or CRM marketing. For reasons that is easy to understand some companies will fail more often for a particular reason than the other.”
There are many reasons why direct response predictive segmentation models may do less-well than expected, or perhaps even fail, or fail miserably. Some of these reasons are listed here, but by no means does the following exhaust the list. However, they are felt to be among the most common the authors have confronted in the industry, so hopefully, by bringing them to the readers’ attention, many of these errors may be avoided in the future. Please note that the list is not ordered either by “severity” or “frequency of occurrence”.
(1) The person who will actually be building the model is not included in the initial discussions or design of the model.
This problem is one of the most regularly occurring in the industry. Quite often the modelling methodology is done independent of the statisticians input which can be disastrous at the backend. Well-trained statisticians are indispensable in spotting potential model trouble-spots before they become actualized. Research methodology and design operations are inherently statistical issues for which statisticians have been trained. By limiting his/her input at the earliest stage, one is merely asking for trouble.
(2) The model has been “overfit” to the sample at hand ,and, consequently, does not generalize well to the actual mailing population, or is otherwise unreliable.
Typically the mailing results are quite disappointing when this happens. Remember, the mark of an effective modeller is one who knows when to “stop”, and who is not obsessed with obtaining impressive “pre-rollout” gains tables at the expense of real back-end results, or who refuses to engage in adventurous “data-mining” or otherwise ” torturing the data until it confesses”.
(3) The circumstances surrounding the actual mailing change or the mailing environment turns out to be substantially different from the one on which the model was built.
Economic changes, seasonal variations not captured in the model , and lessened demand for the product are just a few of the many “extra-sampling” reasons why a model may fail, and fail badly. It is not so much the changes per se, that can be problematic, although they can obviously negatively impact on a mailing, but the fact that the effect of such changes may not be constant across differing levels of the model predictors. Under such circumstances, the model could end up selecting people considerably less likely to respond than originally anticipated based upon the model. This is the reason why models require periodic updating
(4) The model is used as though it were ‘generic’ or ‘universally applicable’.
For example, the model might be developed using a particular product mix and then used in a promotion where a different mix is used. Examples of other changes foisted on the model which may have a negative impact on the mailing include using a different “creative”, a different “package”, or even using the model on a different population One particular (actual) example comes to mind which occurred awhile ago when the results of a mailing for a term life insurance policy with face value choices of either $10,000, $15,000, or $25,000 was used to develop a model which was later used on a mailing which had policy offerings that were considerably more attractive to a more “upscale” audience (i.e., $15,000, $35,000, or $75,000). As might have been expected,the model selected out the opposite prospects, instead of the best ! The lesson to take away is that a model is not impervious to changes in the conditions under which it was built.
(5) Changes in the mailing environment in conjunction with the use of an ‘overfitted’ model.
Some of the reasons a model may fail are only (or primarily) problematic when they occur in the presence of another (or other) problems. For example, changes in the mailing environment of the roll-out could very well be innocuous were it not for the use of an “over-fitted” model, which does not allow for even minor deviations from the model-building environment. Thus the deleterious effect of the simultaneous occurrence of the two conditions is greater than the sum of their individual effects. The “overfitting” problem, for example, invariably exacerbates many other problems that might be present, in addition to being a prime reason itself why a mailing might fail.
(6) The model contains “post-event” variable(s), or those that occurred after the event you are trying to predict.
For example, suppose you are trying to predict who is likely to purchase a new car next year based upon this years behavior. You build a model and find out that “age of car” is an excellent predictor, in fact it appears to be too good a predictor. .Unfortunately, unbeknown to you, auto records on your database are updated every six months, and you find out that individuals who bought cars this year have a car age value representing the car they just recently bought. Obviously such data is going to be a excellent predictor of this years auto purchases, but it obviously won’t do very well next year, except for those who buy a car every year.
(7) Not ‘test-scoring’ the model, or making an error when implementing the model.
Nothing is more disastrous nor more easy to do then make an error when scoring the prospective mailing file with a scoring algorithm. One can build the best and most reliable model in the world and it can still self-destruct if it is not implemented properly. From making an error in re-coding a variable to inserting the wrong model weights, implementing the model can be a veritable mine-field. One way to greatly reduce the possibility of making such errors is to have the algorithm tested on the file that was used to build the model. If both the model and the algorithm produce the same ‘gains table’ counts, then one can be reasonably assured that the mailing file will be scored correctly. One note of caution, however. Sometimes a new or second file is used to test-score the algorithm, or one different from the one the model was built on. The danger with this is that if the same transcribing error is made when formulating both the original model and the scoring algorithm, then the model implementation could be in error despite the fact that it tests out correctly. This is why it is preferable to use the original model-building data file for the test. A general rule for test-scoring is to minimize human transfer of instructions, which can be accomplished by using “cut and paste” operations whenever possible.
(8) Failing to run an audit of the file as the first step in the model-building process.
More often than not the model-builder in not presented with a “clean” file with which to build the model. Such “messy data” is typically riddled with everything from observations that have missing values on one hand to records that have values that exceed the maximum possible values for that variable on the other, and everything in between. The first step in building a “workable” model,therefore, is always to check the file for such problems. Doing so so at the start of the model-building process can save a lot of heartache at the end.
(9) A consensus on just exactly what the model is expected to predict (and for which audience) is not reached and/or well understood.
This may sound elementary, but many times models are built that end up being “shelved” because they predict outcome measures that were not intended, or predict correct outcome measures for the wrong population. The major reason for this is a lack of communication between interested parties. One way to help prevent this is to develop a “model specification form”, in which all pertinent information, including the audience, outcome measure, and so on is explicitly stated. In this way, the likelihood of inappropriate models being built can be substantially minimized.
(10) The model performs well but the mailing itself is not a financial ‘success’.
This is ordinarily the result of a lack of financial planning, or insufficient attention being paid to the financial or “economic” aspects of the mailing, including such things as the marginal cost per piece mailed, the marginal revenue needed to reach certain financial objectives, and/or the depth of file that should be mailed to be maximally or optimally profitable. Although these considerations are in reality outside the domain of the model itself, many a model has been said “not to have worked” despite the fact that it did all that it could be expected to do. The use of a viable segmentation model in and of itself is no guarantee of a financially successful mailing, only careful financial planning in a conducive economic setting or environment can do that.