Monthly Archives: November 2008

What is a Lift Chart?

Lift charts are the ways to understand the effectiveness of ranking of consumers or businesses. Take a look at the following lift chart.

(Double click on the lift chart to get better readable copy)

This lift chart can be enriched with the following
– False positive and true positive measurements
– KS statistic
– C measure and AROC (Area under the Receiver Operating Curve – Curve that is plot of False positives – Horizontal axis vs. True Positivies – Vertical Axis); C is the area under the ROC as a measure between (0.5,1) – since it is measure of area above the diagonal (which is considered the effect of random assignment) – this is the shaded area, plus the 0.5 area below the diagonal line shown below in this curve.

– The curve has a dramatic shift in tangent at the 4th decile and does that mean it is not applicable?
– Cumulative AROC, that is derived from the cumulative tabulation.

From Data Monster & Insight Monster

What is an Hyper Segmentation?

Hyper segmentation is a two step segmentation method.

In the first step, we may have 50-1000, any number of segments/groups that captures a wide variety of groups of consumers, called micro segments, the idea being to explain as much variation as possible.

In the second step, we bring together these large number of micro segments in to a manageable number of hyper segments, usually any number between 5 to 12.

The first level segments could be detailed list of micro segments that could run into 100s or even thousands.

There are standard statistical methods to do the hyper-segmentation; any clustering method can be applied to the microsegments to get to the hyper segmentation.

An example of Hypersegmentation: (Demo)


From Data Monster & Insight Monster

Key Analytics for Marketers

Understand your consumers

– Segmentation methods – Hypersegmentation

– Predictive models – all kinds of regression methods

Test, Test, Test – to align right consumer to the right message – Experimental designs and expected ROI

Find more of the best consumers (right consumer for the right message) in your consumer base

– Consumer typing – segmentation projection to the population

Find more of the best consumers in the national base

– Consumer typing

From Data Monster & Insight Monster

Models for Count Variables – An Example of Poisson Regression

Consider the problem of characterizing and identifying people based on 20 attributes of “some one who exhibits the characteristics of recession affected vs. some one who is not recession affected”.

The data is coming from a market research done on say 5000 people who answered among other questions, questions like

– what is your view on the economy, will it recover before some time point, will it recover as much as before year XXXX,…

– are you planning to pay off your debt

– are you planning to postpone major purchases

– reduce vacations.

So if we want to characterize people based on the number of times they say “YES” vs. “NO” or some such binary classifictions, then our dependent variable is a count variable. Typically there are four different ways to model

– Use Poisson (it assumes the mean=variance)

– Use negative binomial regression (good for handling excessive occurrence of zero value of the random variable)

– Use hurdle model

– Zero inflated regression

See “Count Data Models in SAS”, by WenSui Liu and Jimmy Cela, SAS SUGI2008.
From Data Monster & Insight Monster

PROC QUANTREG – Modeling a Quantile not the Average

Proc QuantReg helps build quantile regression models.

The median regression is one such example. Often it may be necessary to build model for a specific percentile. This is where PROC QUANTREG becomes handy. Look at the picture that provides scatter plot of the measures of trout density in 71 test places in a stream where the WD Ratio provides the width and depth ratio. The hypothesis being that less the ratio (means at higher depths if the width is small or lower depth if the width is higher) we are likely to see more trout habitats (due to less likely disturbances).

I also plot the 90th percentile; we want to find out the relationship between density and WDRatio where 90% of the trout population habitats densities could be estimated. That is for a given WD Ratio, we want to find how much density of trout habitats will be found at 90th percentile. Note that if we were building linear regression for example, it will be the red line. But we are interested in building a quantile regression, a function, which best fits the red dots.

If you plot Median or average where will the red dots be? That will give indications as to which curve we are modeling and how it is different from the 90% curve.

If we want to model this data for 90% regression then the SAS codes are as follows. An attempt is made to take key points from the reference mentioned at the bottom of this notes.

proc quantreg data=trout alpha=0.01 ci=resampling;
model LnDensity = WDRatio / quantile=0.9
CovB CorrB
seed=12345;
test WDRatio;
run;

The output is
The QUANTREG Procedure
Model Information
Data Set WORK.TROUT
Dependent Variable LnDensity LOG(Density)

Number of Independent Variables 1
Number of Observations 71
Optimization Algorithm Simplex
Method for Confidence Limits Resampling
Summary Statistics
Standard

Variable Q1 Median Q3 Mean Deviation MAD
WDRatio 22.0917 29.4083 35.9382 29.1752 9.9859 10.4970
LnDensity -2.0511 -1.3813 -0.8669 -1.4973 0.7682 0.8214

The QUANTREG Procedure
Quantile and Objective Function
Quantile 0.9
Objective Function 7.2303
Predicted Value at Mean -0.5709
Parameter Estimates
Standard 99% Confidence
Parameter DF Estimate Error Limits t Value Pr > t
Intercept 1 0.0576 0.2727 -0.6648 0.7801 0.21 0.8333
WDRatio 1 -0.0215 0.0073 -0.0408 -0.0022 -2.96 0.0042

Testing does not come out as a standard output; one has to mention that in the command list, as in above. The PROC uses SIMPLEX method for minimizing the error and Monte Carlo Marginal Bootstrap method for confidence interval and testing.

Another Example:
proc quantreg data=salary ci=none;
model salaries = years years*years years*years*years
/quantile=.25 .5 .75;
run;

The data plot and the quantile regression fit is shown here.

For many other examples, and some good amount of theory, see the pdf article, from where most of the material is summarized here.
The QUANTREG Procedure – Power Point Presentation (Experimental)
PROC QUANTREG – Experimental – SAS document – Material contributing to the above PPT.
From Data Monster & Insight Monster