Monthly Archives: August 2012

Analytic Competitiveness Score – What is Your Organization’s Score?

We saw

Costliest Lesson in Web Analytics, a $45 Billion Plus Loss

Bestbuy lost $9 Billion, Sony $33 Billion, in the last 18 months. 

I have been following the biggest debacles of enterprises and I want to bring out the analytics connections for these debacles (debacle: a complete sudden collapses or failures).

Prof. Davenport and Harris in their book, “Competing on Analytics” point out the following.

There is so much to think about various statements in the above tabulation, a half a day talk using various case studies.

I take these as basis of my discussion in the following, enhancing it with my own interpretation and implementation pathway, as a five step process to take your organization through analytic competitiveness.
 

  1. Have unique product/service
  2. Start right away with the right metrics/KPIs that defines your business model, and stick with those metrics in defining the success, success driven by consumer immediacy.  Moneyball is a great story on understanding the importance of the right and first metric that will take your organization to your success
  3. Senior management committed to value through competitive analytics and prioritize projects in support of (2)
  4. Continue to use analytics improving distinctive capability in line with life style and market dynamics
  5. Let data intelligence in real time feed the customer oriented product innovation and customer relevance and market relevance across all the organization.

The interesting and telling point is that, analytics can itself be a great differentiation which means, this is a success path even if you do not get positively scored in (1).

While these are primary dimensions, we have a host of sub-dimensions in each of these cases that can help us get a weighted analytic competitiveness score.

Do you know what your score is?


PredictiveModeling in a Complex Competitive Market Environment – P & C and Viabilities of Integrated Modeling Platform

courtesy: thesmarterwallet.com

In the last 4 years, predictive sciences in some form or other has been the “Top 10 Casualty Actuarial Stories” and I also see lot of hiring going on in P&C insurance space for analysts. I summarize some salient features of predictive modeling opportunities for analysts in this space with this write up and also additional professional opportunities.

The first of which is a paper by By Jun Yan, Ph.D., Mo Masud, and Cheng-sheng Peter Wu, FCAS, ASA, MAAA.

In their call to the CAS, in their article,
Staying Ahead of the Analytical Competitive Curve:Integrating the Broad Range Applications of PredictiveModeling in a Competitive Market Environment, gave the directions for building predicting models on three fronts.

Pricing Models
Underwriting Models
Marketing Models

and also discussed an integrated platform for all the above three; especially the focus of their article is on how to integrate all the above three in a P&C operation.

Part of the reason why P&C insurers have difficulty in integrating these three parts is the following.

Roll up the ‘exposure level’ modeling results and profiles into ‘policy level’ for underwriting and marketing models, their results and profiles.  Also, marketing acquisition models, though not always, it may involve in acquisition of new risks in terms of types and regions.  They may not have any data experience for these new segments and so how to integrate the historical information based models and results with the forward looking new risk acquisition models can be a difficult talent to work on.

Also often insurers, especially for commercial lines “They do not have their own GLM-based pricing models.”, per the main article, a huge opportunity, because only in differentiating your product from your competition you become financially a viable opportunity and a reason to exist.

The authors view the ability to do the above as market leading differentiation in P &C.

The key simple document analysts have to read to get exposure regarding the important P&C terminologies on pricing models is the following.

A Practitioners Guide to Generalized Linear Models is also a well known article published by the CAS, and it satisfies the ‘minimum bias’ operational best practices that actuaries follow.  The GLM models are industry best practices across PPA (peronal private auto) lines and other personal lines pricing and underwriting predictive models.  It is observed it is also making inroads in P&C marketing predictive models.

For a highly mathematical and authoritative establishment of the best in class modeling approach of GLM in P&C, see

A SYSTEMATIC RELATIONSHIP BETWEEN MINIMUMBIAS AND GENERALIZED LINEAR MODELS

On the marketing side, for why policy holder retention matters and how to do, see, http://www.casact.org/education/ratesem/2004/handouts/modlin2.pdf

These are unique opportunities for applying predictive modeling in P&C and while you are an analyst, if you really like the profession, you can write FAS exams to become an actuary analyst.

Best
Nethra

From Data Monster & Insight Monster

The Best Job Opportunities of the Future – Data Intelligence – Actuaries, Statisticians, and Analysts

Do you want to be in the field that assures well paying jobs in the next decade and beyond; jobs that has less stress and more time for yourself?

This is based on the following article with a twist of those three dimensions

While this gets our attention, this is not that predictive for the following reasons.

It completely leaves out computational sciences, and BIG data movements.  Two biggest short comings.

The computer science field is now two forks,

Either become a software engineer or become a data scientist.

So enjoy reading this with the caution that it totally misses the predictive component.

My News Letter – Big Data, Data Mining, Predictive Modeling, Visualization

I wanted to bring out a news letter that will have key collection of articles which are not available usually but could be made available with in 24 hours of publication, with out any commentary.

This news letter serves this purpose. 

http://www.scoop.it/t/big-data-data-mining-predictive-modeling-visualization

This is a news aggregation on the topic of my blog.  It is essentially meant to include quick access to publications that are happening in my field.

I have been accumulating for the last 3 months and I will continue to do that.  If you have any thing for me to put that news/reference/, please do not hesitate to send me a note and I will add that to the aggregator newsletter so that it can be a community collection! – thanks

Nethra 

From Data Monster & Insight Monster

Real Time Market Research – The New Trend

A disruptive technology is getting its application in marketing intelligence and brand understanding.

Few marketing challenges are tougher than identifying and influencing what drives customers’ attitudes and behavior. Traditionally, executives have relied on a combination of quantitative data from surveys (such as those that track customer satisfaction and brand image) and qualitative insights from focus groups and interviews.” 

In the latest issue of Harvard Business Review, the authors bring out the importance of real time tracking (RET) in using the concepts and associated engineering because of the

– bias and inaccurate reporting due to memory decay/faults in traditional market research (quantitative product preferences and usage) and focus group (qualitative brand research)
– the inordinate time delay from research to usage of intelligence

The beauty of this development is that there are only four different bits (may be bytes) of information that need to be sent to the implementing organization’s data capture through their smart phone from “each each encounter: 

– the brand involved, 
– the type of touchpoint (TV ad, say, or call to the service center), 
– how the participant felt about the experience, and 

– how persuasive it was. (Did it make the customer more inclined to choose the brand next time?)

This is getting the management attention in many companies including,  including FOX, Schweppes, Unilever, PepsiCo, Schweppes, HP, Energizer, Microsoft, InterContinental Hotels, …

This article points out many developments, including the top insight bytes for market research.

You may see more details here: 

Better Customer Insight—in Real Time

from which the quoted notes are selected. 

From Data Monster & Insight Monster

How to Judge a New Competing Predictive Model

Fig – Standard Lift Chart

We innovate analytical solutions, often times, driven by competitive opportunities.

It is possible in fact your new R-square or KS is better than the previous gold standard model, a model that is considered as the gold standard because of its stability, wide applicability among all segments, and the variables are acceptable legally.

Usually, in the gold standard model, the variables are also interpretable easily and consistently in varisous consumer and market dynamics.

So when we build new models or create a wonderful new innovation in modeling, we need to test against the gold standard, make judgments and support the management with implementation of the new model.

When you compare the new model with gold standard, simple R-square or KS are not good enough, though it is the start of getting attention. The devil is in the details.

This note is all about innovative solutions in marketing and medical testing where AUROC (area under ROC curve) and measures associated with such tools are considered as method of judging predictive models.

This following is a classic industry call.

An alternative to FICO scores?

Or, your boss asks how does the best new model compare with the gold standard model that he has been using for 20 years. 

Here are three situations.  These illustrations are meant to explain some important concepts.

 The Gold standard uniformly beats the New model (Fig1).  The gold standard lift chart is in gold color and new model lift chart is denoted in green color.

 Gold beats the New, in key top segments (Fig2).  AUC is same for both ‘Gold’ and ‘New’.  However, the ‘New’ is discriminating the non-target (also called references) better than the ‘Gold’, while gold is discriminating the targets better than references. 

Both Gold and the New have same KS and totally different interpretations. This is an intricate one; see where the KS is attaining maximum in the ‘Gold’ vs. ‘New’ and also AUC is same (Fig3)

When I have competing solutions, I look for the following as an initial inquiry.

– To do the comparison we need results from hard (gold vs. new) data for comparable metrics.  AUC and KS are good and goes only so far.

– State clearly population of the model sample and validation sample

– Definition of sub-population where it is more successful and definition of sub-population where it is less successful; use comparative confusion matrices to show how various sub-segments differ.

– Lift chart for both the full population and the sub-populations that graphically brings out the range of deciles/demi-deciles.  This in turn will bring out the range of explanatory variables where one is superior to the other, using profile analysis.

– Stress test the elasticity of the lift charts for macro-economic factors combined with interpretation of consumer/marketing moral hazard situations

– How does KS and AUC gets redefined if profit maximization is the goal?

Remember, every model has Type I and Type II errors which are captured in false positives and false negatives.  It is indeed possible a model is better than another one especially if you start using highly non-linear methods such as random forests and ensemble methods along with boosting and they might become a new gold standard. However, to judge against a gold standard requires careful attention and understandable interpretations using hard data.

Also, I use a bound to define my best model; the bound is defined in the following graph.

With in the defined bound, both KS and AUC will select the same best model.  If there are sharp edges in the lift curve, the lift curve is smoothed, before we apply the bounds for selecting the ranked risks or ranked on the basis of expected profits or ranking for responders.

In marketing problems, this bound is more understandable and likely to be usable compared to risk modeling, because the management wants to only target the top groups in marketing.  In risk modeling  it is possible that every segment has its own offers and hence the management wants to go beyond the intersecting point. 

What is the solution if only KS/AUC are not to be used? 

Because ranking and selection is so fundamental in marketing and in many other areas of human endeavor, the trick of the trade is to convert your metric in to a ranking and selection problem and hence it would be surprising if KS/AUC are not the start of your goodness in ranking and selection.

It would be fun to hear your opportunity area where a variation of KS/AUC is needed or KS/AUC is totally not applicable.

Have fun.

From Data Monster & Insight Monster

The Evil Twin of Predictive Modeling

To get access to my world class team who can address your opportunity areas and network with such luminaries,

– Join my group (use the ‘Join this site’ on the right)
– Leave comments, and questions that might help every one else. 

Thanks

====================================================================

So we saw the ‘Fundamental Theorem of Predictive Modeling‘.  It should give you the answer as to what is the evil twin of predictive modeling.

People build great models; they mention how nicely they got the R-square or great KS and show all paraphernalia of the model to convince the managers or clients and close the model with a scoring function; in the end mostly supported by the explained part or the high KS and a nicely decreasing trend of the KS chart.
What is not discussed in the delivered results of a topic is

– the part called “not explained by the model”. The first part of the right hand side of the variance decomposition formula.  This needs more explaining, before that take a look at the picture that artfully shows the explained and unexplained part.

I am bringing this one from Dr. Chuanhua Yu (as noted below at the bottom of the picture) which I liked for its focus and artistry,  where you can see all the little details for normal regression.  See how the unexplained deviation is calculated, both algebraically as well as its representation in the graph.

The unexplained part is where the expert’s commitment comes in explaining why the predictive model is likely to fail, how often, and how devastatingly it could turn out possibly in the field, when we apply it.

I know it is not easy to say one’s own work does not measure up its celebrity status it will potentially receive based on the naive application of goodness of fit of the model.  But if it is not done, when the results come back from the campaign conducted in the field, it will indeed show its ugly side, the evil-twin side of the predictive modeling.

I can create a beautiful regression model with R-square 0.99 and yet its applicability could be in a range of the predictive region of x values which are useless for some specific application of the model. The difficulty happens either because we show only the sunny side and not the dark side or people who read and have self-interest pick up only sunny side of the story and communicate it down the stream.
On the technical side, this happens because the structure of the model might change completely different for any number of practical reasons, outside of the range where the model was developed and validated.

Some times, the out of time validation helps and yet that does not capture the full story.

That brings us to the second part of the explanation ‘Bias’ in the predictive modeling.  This is the most sinuous and hidden part in most of the modeling that are not discussed explicitly.

The bias is not represented in the above graph, nor usually discussed in text books.

I  evaluate 100s of models in a year generally in low seasons and as adviser to various organizations, I see many many vendors’ work.

As a general practice, I never see analysts – even mentioning with few bullet points – as to what are the hidden assumptions in the unexplained part nor sign off saying that certain assumptions are not applicable and the results are safe-guardeded, and hence we are safe in applying the predictive model.

Though we say this as unexplained part or unexplained variation, for functional purposes, this is everything that is the ugliest part of the model building, that include the following

– specification error
– correlated errors
– selection bias
– correlated predictors
– heterogeneity
– simultaneity, lagged, and networked variables
– bias in the original sample survey
– the consumer and market dynamics (this I have addressed under why predictive models fail?).

There are more sub-topics here – the whole science of modeling, and I am just trying to give some shades of the evil-twin. 

Some of the above topics are simpler and not so ugly and some could be misleading you into confidently spending resources and time.

Few words of caution on sample bias:
The bias in the sample happens more often, especially if you are building models from surveys or you have access to your consumer data only partially and that is the one I will address today.

One way to address the sampling bias is weight it by its population representativeness which survey researchers provide you.  Sometimes, they may not provide you that for any number of reasons including calling it as proprietary material.  There are ways to circumvent that challenge.

Sampling weights are useful only if there are reasonable representative samples in the survey  or consumer transaction data in all the consumer segments you are interested in.  Some times there may not be any representative samples or even if it is there, it may be so spotty that its weighted calculation will introduce too much variance or too little variance artificially giving you false confidence, with out knowing it.

So when you receive a predictive model, the following questions can help you protect when you apply the model.  A great R-square or KS or any other variation of goodness of fit is only half the story.

“How do we know the prediction will work in the field? what built in fail-safe mechanisms are addressed in the modeling?”

These are not easy questions but helps you to be careful of the evil-twin.

Have fun and have a wonderful weekend.

Next week, I will bring out this amazing social media analysis tool which is absolutely free – how do we get it? Well Uncle Sam in all his benevolence funds these companies like he did for internet and one of the promise of such funding is that it is available for public.

See also: Bias in Sample Surveys, Comments

From Data Monster & Insight Monster