Category Archives: Analytics

How to Think Like a Deep Learner – Two Basic Figures – Part1

This discussion is based on some understanding of ANN.  If ANN is not clear, this is not for you, for now.  Also, to be crisp, I will not belabor various nuances of terminology, meaning, and reasoning, at this time; there is a place for that.  This is to let data analysis practitioners understand why and what of deep learning.

Also, the discussion is based on the publication mentioned below.  This is a classic and encourage ML enthusiasts and statisticians to read and learn the mind of a deep learning analyst.

Learning Deep Architectures for AI (Foundations and Trends(r) in Machine Learning) Paperback – October 28, 2009

by Yoshua Bengio

from which I picked up two important pictures to put the key points forward.  These are two different aspects of understanding how to think in Deep Learning.  This is a start, not a complete presentation.   This should lead to some important good questions to further understand the nuances.

I selected the order of the figures to explain my purpose, though the book has it in different order as a mathematical exposition.

Deep learning is deep, because, you want to keep on training on the depths until the learning algorithm achieves a level of certain acceptable predictability.

Learning in polynomial circuit means the following structure.

Take a look at this Figure.


The input layer is at the bottom, first level interaction is next layer, additions of all interactions is another layer.  The picture does not easily point out directly other interactions.  For example, x1x4 and also higher levels of interactions, such as x1x2x4.   You may think that such interactions fell off in the feature selection process; and/or it is coming through at a higher level with a different activation relationship.  For example, at the top most level in the picture, we have x1 and x4 coming together in a particular way, but that may be an artificial exhibition of the starting point of the construct.   None the less, the picture, though elegant, is a difficult and incomplete way how a deep learner looks at (all) possibilities. But that is not an issue, as long as we get the point.  The important point is there are countably infinite possibilities!  We will make it finite, by clever intrusion on the intrusion of possibilities.

In essence, considering interactions are represented as a split in a decision tree, you may think of deep search process for a specific collection of all possible roots of a tree, and one beautiful representation of deep thinking is that it is supposed to learn without training; figuring out “labels”. The mathematics above helps in that sense.

The second picture I want to share with you is the following.


The above figure explains how to define layers.

The important point here is that every representation, however simple it may look like should be considered as a layer.  In an unsupervised machine learning context, this rule becomes easier to understand, where we want to take out any trace of human judgement.  The result of this is lots of layers to train.

This is especially needed in unsupervised methods, where learning is very difficult and accordingly one has to keep trying different transformations (and all possible transformations with in a construct of a machine)

So the first question is “When do I stop”?

Stop, when the algorithm learned enough.  That is ok, because that is the purpose of the gift of reducing cost of raising computing power.

Does it take billion samples?  So be it.  That attitude is the right thing to appreciate.  We solve a problem; a difficult problem of how to get a machine to learn itself.  Boy, this baby does not need a parent.  It learns itself, though it may take billion exposures, as in the case of computer vision.  That is in no way different from our own evolution.

… Part 2 coming with programming codes and a simple application.

Mathematics of Strategic Metrics, KLIs, and KPIs – The Foundations of Predictive Dashboard

Here are the statistical relationships among Strategic Metrics, KLIs (Key Lead Indicators) and KPIs (Key Performance Indicators).

The interesting thing in this discussion is that strategies around developing a set of KLIs will have definitive and huge impact on Organizational Strategic Metric and it carries the Moneyballstatistical relationship even in the predictive time periods.  Remember that KLIs are highly predictive measures.

  • The KLIs and Organizational Strategic Metric are correlated, but organization’s KLIs will be uncorrelated with the KLIs of the competitive companies, and hence of the Strategic metric of the enterprise. Think about the reasons.  After all, one’s strategic metric should be independent of the competition’s strategic metric because of uniqueness of products and services and the uniqueness of value segments. The depth of correlation defines the types of strategies you will develop.
  • Strategic metric is a function of vision, an organization’s unique products/services, value segments which the organization is serving, and the budget
    • KLIs of an enterprise are highly collinear (or atleast defined in a way as collinear for better interpretation – as one can always construct a complementary measure which may have negative correlation but its impact will be to have favorable effect on the organization’s strategic metric) among themselves and highly correlated with the strategic metric.  KLIs are influential metrics that have favorable impact on the organization wide strategic metric
    • KPIs are highly correlated and have favorable impact with the strategic metric, though they are not influential metrics.
  • Also, KLIs and KPIs are highly correlated

Example: Oakland A’s

The strategic metric of Oakland A’s OBP (onbase percentage)

Their value segment is its fans – these are mostly local (this is not that simple in the case of other organizations spread out around the country).  There is a good analytical opportunity to figure out value segments from this context.

We all know what was their budget situation.  Had the budget been a different order of magnitude there is no interesting story here.   Oakland A’s had one third of the budget of the best team in the franchise against which Oakland A’s was daring to dream of beating them.

All the metrics associated with hiring, training, fielding, and firing were all KLIs one has to see the movie to appreciate the details here.

The ticket sales, for example is not a KLI. On its own, it will neither happen nor influential to win the division title for example. But as KLIs start pushing the organization’s strategic metric, any number of KPIs one may define will be after effect blooms of the effect of succeeding in the strategic value creation, more and more of the winning.

Why wouldn’t you buy this function, for Oakland A’s, Winning=function of OBP and OBP=(KLIs)

A key point I can not stress enough is that the strategic metric has a lag relationship with KPIs, and also they are weak relationships.  Managers should be thinking about KLIs, Key Leading Indicators to help them accomplish their favorable impacts on strategic metrics.

Go KLIs.

From Big Data To Small Data In Real Time – What does this 1 per million part score card mean to you?

There are many takeaways. 

Seven powerful metrics that bring down a specific type of big data opportunity (security) into small data.

A 1 per million part identification system bringing together different data types in real time, a big data opportunity.

Behavioral interpretation and social analytics are still key to make sense of data, to quickly bring big data into interpretable and usable small data

I am working on a weighting system that is what makes this a very effective, better than six sigma identification system – this is my intellectual property

The following is a difficult thing for me to discuss as this indicates an unsettling prospect of

  • how resourceful organizations are going to be watching you and me going forward with big data, 
  • how privacy is going to be a challenge to maintain, 
  • how we are going to loose our moral superiority.  In difficult times like this, after Boston Marathon, it is important we have a tool like this, though. 

This only makes why discussions about Type I and Type II errors are becoming more and more important and, added to that, how the idea of bias (pre-conceived notions) are going to undermine real intelligence.

The following, based on my initial estimates, is a 1 per million part identification system, better than six sigma, using big data and a weighting system that would help identify that 1 per million part measure of a dangerous person who is lost in our day to day hurry-burry life of innocence, dream, and celebration of love and accomplishment.


– Sudden changes in behavior or performance or relationships

Their usual metric of performance will be lost.  For a student, he/she will get outlying set of grades from his normal performance or performance evaluation or angry exchanges

Loose commonly known best friends or other gender relationships

– Sudden changes in the watchful eyes of organization or people; will travel to not so common places and will acquire new relationships who are in turn in the watchful eyes of security organizations.  At least the chatter inside the security organization has just got elevated or elevated chatter comes and goes, but not able to stick to a well defined resolution of not dropping the ball.  Resolving clearly does not mean putting people in jail, but have an officer report on the latest activities, log in the details so that it flows through the right people for right action, making sure the watchful eyes not sleeping. 

– Unprecedented access changes happening around one’s neighborhood, social relationships, and one’s lifestyle interests, that would provide opportunities for dark side to show up its head

– Firearms, crude bomb, illegal activities blip on the intelligence radar or the correlated words of “bomb” or “firearms” or “mass danger” materials are popping up in the radar

– physical (becoming a post teenager – things get hardened around this time) or emotional changes at home (death or separation) or with closely related people (friends lost)

– New buying/shopping activities of apparently looking unrelated items; this will be a second or third blip, almost always, if they are innocent looking items but used as an aid to complete the intended action

– sleep patterns, telephone call patterns, internet information access patterns, even visiting one’s own home are changing unreliably, but with consistency one’s change started

I say, these are seven metrics of highly dangerous people who need help badly.  What caught my attention is that our security agencies were so close to the …. and yet the tragedy happened.

Of course, it takes a lot of fine intelligence to be careful about Type I and Type II errors that is also respectful of privacy of citizens.

My prayers are with the innocent victims of all ages, an innocent child who cared for kindness, a couple starting their dream life, and an accomplished elderly who did not give up running a marathon at age 78.

From Data Monster & Insight Monster

What is an Hyper Segmentation?

Hyper segmentation is a two step segmentation method.

In the first step, we may have 50-1000, any number of segments/groups that captures a wide variety of groups of consumers, called micro segments, the idea being to explain as much variation as possible.

In the second step, we bring together these large number of micro segments in to a manageable number of hyper segments, usually any number between 5 to 12.

The first level segments could be detailed list of micro segments that could run into 100s or even thousands.

There are standard statistical methods to do the hyper-segmentation; any clustering method can be applied to the microsegments to get to the hyper segmentation.

An example of Hypersegmentation: (Demo)

From Data Monster & Insight Monster

Key Analytics for Marketers

Understand your consumers

– Segmentation methods – Hypersegmentation

– Predictive models – all kinds of regression methods

Test, Test, Test – to align right consumer to the right message – Experimental designs and expected ROI

Find more of the best consumers (right consumer for the right message) in your consumer base

– Consumer typing – segmentation projection to the population

Find more of the best consumers in the national base

– Consumer typing

From Data Monster & Insight Monster

Models for Count Variables – An Example of Poisson Regression

Consider the problem of characterizing and identifying people based on 20 attributes of “some one who exhibits the characteristics of recession affected vs. some one who is not recession affected”.

The data is coming from a market research done on say 5000 people who answered among other questions, questions like

– what is your view on the economy, will it recover before some time point, will it recover as much as before year XXXX,…

– are you planning to pay off your debt

– are you planning to postpone major purchases

– reduce vacations.

So if we want to characterize people based on the number of times they say “YES” vs. “NO” or some such binary classifictions, then our dependent variable is a count variable. Typically there are four different ways to model

– Use Poisson (it assumes the mean=variance)

– Use negative binomial regression (good for handling excessive occurrence of zero value of the random variable)

– Use hurdle model

– Zero inflated regression

See “Count Data Models in SAS”, by WenSui Liu and Jimmy Cela, SAS SUGI2008.
From Data Monster & Insight Monster