Monthly Archives: October 2014

The Importance of Strategy In Redefining an Organization – Brilliant, Refreshing, and Strategic

Strategy can also change over time (Case study of PepsiCo)

VIP Speaker Series: Pepsi CEO Indra Nooyi speaks at McCombs

Do you hear the “Strategic Metric”?

Aspen Institute Interview of Pepsi CEO Indra Nooyi

There are some brilliant workstyle advise

Five C’s of Leadership – Summary here





Compass (Integrity – True compass pointing to the right north)

Empirical Comparison of Ensembles… Could There be Modeler’s Effect Also?

Popular Ensemble Methods – An Empirical Study – Journal of Artificial Intelligence Research 11 (1999) 169-198

Graphical version of Table 2:

23 Datasets Ensembling Error Rates - ANN first four - Next is DT



From Table 2:  “Test set error rates for the data sets using (1) a single neural network classifier; (2)an ensemble where each individual network is trained using the original training set and thus only differs from the other networks in the ensemble by its random
initial weights; (3) an ensemble where the networks are trained using randomly
re-sampled training sets (Bagging); an ensemble where the networks are trained
using weighted re-sampled training sets (Boosting) where the re-sampling is based
on the (4) Arcing method and (5) Ada method; (6) a single decision tree classifier;
(7) a Bagging ensemble of decision trees; and (8) Arcing and (9) Ada Boosting
ensembles of decision trees.” – as noted by the author.

Let us just look at key measures only in two graphs, first on for ensembling using ANN and the next one is ensembling using decision tree method.  In both the cases it looks like you can just trust boosting (and hence combined with bagging).  However the author interprets it as 95% CI basis there is no difference between boosting and non-boosting.

23 Datasets Ensembling Error Rates - Adaboost wins in ANN23 Datasets Ensembling Error Rates - Adaboost wins in Decision Tree


What are wrong with these graphics … ?

1. Florida Crime Rate at a Glance:

Florida Crime Rate at a Glance

The site also provides data. While the comparison between 1993 to 2013 as a single year to year comparison is well and good, when we look at a series of data, suddenly the measure is year to year in cumulative data.  This is probably one of the trickiest finds and to convince that the visual impact is stunningly deceptive is actually difficult, because I was not expecting year to year changes are on the cumulative data.

One way to find this type of interesting twists to visualization is that it is systematically decreasing, the chance of such an event is very less and it should point out to look for details.

Also, I am not sure whether any definition of booking of crime changed.  While our cognition is expecting that from 1993 to 2013, nothing changed, it may not be the case, unless it is stated explicitly that nothing changed.  It is absolutely possible that these rates actually decreased, but short changing the clarity can actually play not in favor of the graph provider, an unintended public disservice, unfortunately.

The following site provides how Fox news always misrepresents the interpretation of statistical graphics using poor practices of graphical representation of data

A History of dishonest charts

2. Use different standards of measurement in comparison among the people or segments.  See the reference on actually what measures are compared between Bush and Obama, which is a mumbo jumbo of spend rate increments vs. incremental spend rates.

Use Different Standards of Measurement in the Comparisonn

3. If the graph is not trending the right way, just invert the direction of the Y axis.  Bingo…

Invest the axis to have better trend in the graph













Actually the “stand your ground” law did not work according to the data they show!

Big data definition vs. Big Data Solution?

From the consumer (end users) point of view, it does not matter how long we are going to define/discuss about the big data using the brain power of world’s most powerful data scientists. Scientists are talking about it and business people want to make money out of it.

What really matters is how can I use it (big data) today, with out much fuss.

So companies are working on the following principles.

Decision making tool that has a back end engine which

Integrates (data from disparate sources, considering variety), that works

Fast (considering volumes of data)

Easier to work with, and help leverage

New opportunities.

A system that could be called DIFEN.  Here is a one minute video from IBM.