Monthly Archives: December 2014

Solution Implementation Issues… Notes from Trenches

Today I saw the following.

I ended up putting my notes here only because my appreciation of these observations and analysis by Dr. Biernbaum, PhD, in LinkedIn pulse did not allow me to post a longnote.  So here it is.


My thoughts to share here is from the point of development of analytical solutions and implementation.  Your approach is usable in many constructs. I think of your notes as part of “Design Thinking”.  Thanks.

I love the statement, “Our goal in all of these activities was to gather criticism of our program, and love it.”, making this as a systematic process of seeking out all possible data points, a dispassionate and hawk type looks at observable processes, specifically at the time of implementation.

Thanks for bringing out an active, involved, owned approach to implementation.  In the absence of this type of approach, the IT or the management will own this very critical process and act as knower and doer, unfortunately.

So, I see it as an important thorough involvement of interfacing of analyst and product/solution developer.  Unfortunately, these ideas and steps are not taught in graduate schools, whether it is a statistics program (where collection and analysis of data are more stressed, in a designed way, facilitating structured and thoughtful approach) or a computer science program (where automation of collection and analysis of data are more stressed, from the point of unstructured and natural flow approach).  Interpretation of the model follows, and usually, the implementation moves to the project sponsoring department, a black hole from developer point of view.

Hopefully, analytics (a hybrid) academic programs will give importance to these ideas.

What Helps in Defining the Right Elements in Data Collection? Analytics Interpretation of Issue Tree Method1

Have you ever wondered as to where do data come from?  I guess you are wondering, also,Example 3: Offensive Strategy — Spread Offense where do hypotheses come from?  There is a beautiful connection among data elements, hypothesis, design thinking, issue tree method, and in fact the various aspects of data analytics steps.

In this note, I will bring out a systematic way of convincing your managers or your client on how to collect data, right data, that sets clarity from the beginning an analyticsproject is initiated.  While Statisticians have devised many ways to handle analysis of data however messier the data are, it is better to put your expectations in the beginning so that the analytical processes are simplified and the clients are in credible communication sequence.  This is similar to sample size estimates.

When pressed for resources, it is better to minimize the sample size and yet get credible results, though higher samples are always going to be more useful.  In the same way, more data elements are good, but the minimal best data elements should be covered in trying to collect data.  What are they ?  How do we know it?  There is a systematic process for this.

The idea is to start with your one measure that you are trying to predict.

The immediate question is “What are the most important hypotheses you should be testing, and hence what are the most important data elements you should be looking for?”

As I have been writing about these ideas, this is where design thinking and issue tree method come handy.  This write up is about the relationships among design thinking, issue tree method, hypotheses, data elements, and modeling.  You will see they are all  consistently coming together.

The following five questions are important we go through the issue tree method.  Issue trees provide a systematic way of collecting not only the right data, but data defined in such a way that they are mutually exclusive and totally exhaustive in terms of their information content that help predict the one important measure we selected earlier , called target measure.

Check out the right answer in these following multiple choice questions.  If you need answers, write to me.


Ref first picture:

Modifying Black-Scholes Model for Theoretical Equity Options Pricing With Macroeconomic Model of Trends in Lifestyle, Lifestage, and Segment Dynamics

TBlack-Scholes Option Call Pricing Modelhe famous equity option pricing model, Black-Scholes Model, is very interesting, intriguing, and commonly used in the equity market.  I have the equation explained for call pricing on the left.  The equation is provided in the second reference below. The put pricing is derived from the principle of “Call-put parity”.

It is well explained in wikipedia page,

and also at

The second site also gives a excel worksheet to calculate the pricing model.

According to Black Scholes model, there are six elements that affect the pricing, as stated in the second reference.

  1. Price of the underlying stock or financial instrument;
  2. Option exercise price or strike price;
  3. Options time value;
  4. Options implied volatility;
  5. Risk-free interest rate;
  6. Dividends paid over the life of options.”

What is not covered are, the following.  Every company (stock) has a specific segment that is being dominated by its product;  the trends of the companies/consumers who are using those products, their volatility, the company’s own innovation acceleration and their volatility, and consumer’s life style, and life stage and their volatility will all be playing important role why the predictability of the stock movements and hence its option pricing movements are interpretable and predictable above and beyond the Black-Scholes equation.

For example, this is the reason why BABA and AAPL are pointing to upward trend and also with less volatility.

It is posited here that the Black-Scholes is not the end of the story, if they are not adjusted by the trends and volatility of the lifestyle, lifestage segment of the consumers(/companies) of their products and the company’s own innovation acceleration and its volatility.  Let us call this L factor.

So how do we bring that component into the Black-Scholes equation.  If the six factors are independent of the L factors, which is important to go to the next level and integrate the quickest possible improvements to the Black-Scholes equation.  Then L is a multiplying factor.

Estimate the L factor with an estimated model of the rate of change of the L factor using any standard applicable predictive model.  The tough thing is how to get the covariates to build the model. There are proxies and intrumental variables we can pick up from various data sources for the covariates of L factor.

Which Machine Learning algorithm you have to make sure analysts in your organization are well trained…

There are 179 different classifiers though they all can bWhich is a best classifier - picture 1e grouped into 17 families.

There is no point belaboring many of these classifiers.  How many of them?  Which one is most trust worthy?  Which three are trust worthy?

Here is an answer.

* The idea of analyst effect is still not addressed

* One more reason why ensembling does not just mean putting together of trees, but also putting together of algorithms, because there are patterns of clustering and distance among these classifiers.  (See: Second reference)

For more details, goodness, and caveats, see:


We evaluate 179 classi ers arising from 17 families (discriminant analysis, Bayesian,
neural networks, support vector machines, decision trees, rule-based classi ers, boosting,
bagging, stacking, random forests and other ensembles, generalized linear models, nearestneighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classi ers available today.

We use 121 data sets, which represent the whole UCI dataWhich is a best classifier - picture 2 - number of data sets
base (excluding the large-scale problems) and other own real problems, in order to achieve
signi cant conclusions about the classi er behavior, not dependent on the data set collection.
The classi ers most likely to be the bests are the random forest (RF)
versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of
the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically signi cant with the second best, the SVM with Gaussian kernel
implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few
models are clearly better than the remaining ones: random forest, SVM with Gaussian
and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classi ers (3 out of 5 bests classi ers are RF), followed by SVM (4 classi ers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).

Keywords: classi cation, UCI data base, random

For full details regarding the research, seeReference:

There is more evidence in the geometry of predictive behaviors of classifiers.

What are the most important three?  (RF, Boosting), SVM, and avNNet

If your analysts or vendors do not try these algorithms before they start selling it, take a moment and ask some good questions.

How to Think Like a Deep Learner – Two Basic Figures – Part1

This discussion is based on some understanding of ANN.  If ANN is not clear, this is not for you, for now.  Also, to be crisp, I will not belabor various nuances of terminology, meaning, and reasoning, at this time; there is a place for that.  This is to let data analysis practitioners understand why and what of deep learning.

Also, the discussion is based on the publication mentioned below.  This is a classic and encourage ML enthusiasts and statisticians to read and learn the mind of a deep learning analyst.

Learning Deep Architectures for AI (Foundations and Trends(r) in Machine Learning) Paperback – October 28, 2009

by Yoshua Bengio

from which I picked up two important pictures to put the key points forward.  These are two different aspects of understanding how to think in Deep Learning.  This is a start, not a complete presentation.   This should lead to some important good questions to further understand the nuances.

I selected the order of the figures to explain my purpose, though the book has it in different order as a mathematical exposition.

Deep learning is deep, because, you want to keep on training on the depths until the learning algorithm achieves a level of certain acceptable predictability.

Learning in polynomial circuit means the following structure.

Take a look at this Figure.


The input layer is at the bottom, first level interaction is next layer, additions of all interactions is another layer.  The picture does not easily point out directly other interactions.  For example, x1x4 and also higher levels of interactions, such as x1x2x4.   You may think that such interactions fell off in the feature selection process; and/or it is coming through at a higher level with a different activation relationship.  For example, at the top most level in the picture, we have x1 and x4 coming together in a particular way, but that may be an artificial exhibition of the starting point of the construct.   None the less, the picture, though elegant, is a difficult and incomplete way how a deep learner looks at (all) possibilities. But that is not an issue, as long as we get the point.  The important point is there are countably infinite possibilities!  We will make it finite, by clever intrusion on the intrusion of possibilities.

In essence, considering interactions are represented as a split in a decision tree, you may think of deep search process for a specific collection of all possible roots of a tree, and one beautiful representation of deep thinking is that it is supposed to learn without training; figuring out “labels”. The mathematics above helps in that sense.

The second picture I want to share with you is the following.


The above figure explains how to define layers.

The important point here is that every representation, however simple it may look like should be considered as a layer.  In an unsupervised machine learning context, this rule becomes easier to understand, where we want to take out any trace of human judgement.  The result of this is lots of layers to train.

This is especially needed in unsupervised methods, where learning is very difficult and accordingly one has to keep trying different transformations (and all possible transformations with in a construct of a machine)

So the first question is “When do I stop”?

Stop, when the algorithm learned enough.  That is ok, because that is the purpose of the gift of reducing cost of raising computing power.

Does it take billion samples?  So be it.  That attitude is the right thing to appreciate.  We solve a problem; a difficult problem of how to get a machine to learn itself.  Boy, this baby does not need a parent.  It learns itself, though it may take billion exposures, as in the case of computer vision.  That is in no way different from our own evolution.

… Part 2 coming with programming codes and a simple application.