Category Archives: Big data

How to Think Like a Deep Learner – Two Basic Figures – Part1

This discussion is based on some understanding of ANN.  If ANN is not clear, this is not for you, for now.  Also, to be crisp, I will not belabor various nuances of terminology, meaning, and reasoning, at this time; there is a place for that.  This is to let data analysis practitioners understand why and what of deep learning.

Also, the discussion is based on the publication mentioned below.  This is a classic and encourage ML enthusiasts and statisticians to read and learn the mind of a deep learning analyst.

Learning Deep Architectures for AI (Foundations and Trends(r) in Machine Learning) Paperback – October 28, 2009

by Yoshua Bengio

from which I picked up two important pictures to put the key points forward.  These are two different aspects of understanding how to think in Deep Learning.  This is a start, not a complete presentation.   This should lead to some important good questions to further understand the nuances.

I selected the order of the figures to explain my purpose, though the book has it in different order as a mathematical exposition.

Deep learning is deep, because, you want to keep on training on the depths until the learning algorithm achieves a level of certain acceptable predictability.

Learning in polynomial circuit means the following structure.

Take a look at this Figure.


The input layer is at the bottom, first level interaction is next layer, additions of all interactions is another layer.  The picture does not easily point out directly other interactions.  For example, x1x4 and also higher levels of interactions, such as x1x2x4.   You may think that such interactions fell off in the feature selection process; and/or it is coming through at a higher level with a different activation relationship.  For example, at the top most level in the picture, we have x1 and x4 coming together in a particular way, but that may be an artificial exhibition of the starting point of the construct.   None the less, the picture, though elegant, is a difficult and incomplete way how a deep learner looks at (all) possibilities. But that is not an issue, as long as we get the point.  The important point is there are countably infinite possibilities!  We will make it finite, by clever intrusion on the intrusion of possibilities.

In essence, considering interactions are represented as a split in a decision tree, you may think of deep search process for a specific collection of all possible roots of a tree, and one beautiful representation of deep thinking is that it is supposed to learn without training; figuring out “labels”. The mathematics above helps in that sense.

The second picture I want to share with you is the following.


The above figure explains how to define layers.

The important point here is that every representation, however simple it may look like should be considered as a layer.  In an unsupervised machine learning context, this rule becomes easier to understand, where we want to take out any trace of human judgement.  The result of this is lots of layers to train.

This is especially needed in unsupervised methods, where learning is very difficult and accordingly one has to keep trying different transformations (and all possible transformations with in a construct of a machine)

So the first question is “When do I stop”?

Stop, when the algorithm learned enough.  That is ok, because that is the purpose of the gift of reducing cost of raising computing power.

Does it take billion samples?  So be it.  That attitude is the right thing to appreciate.  We solve a problem; a difficult problem of how to get a machine to learn itself.  Boy, this baby does not need a parent.  It learns itself, though it may take billion exposures, as in the case of computer vision.  That is in no way different from our own evolution.

… Part 2 coming with programming codes and a simple application.

From Big Data To Small Data In Real Time – What does this 1 per million part score card mean to you?

There are many takeaways. 

Seven powerful metrics that bring down a specific type of big data opportunity (security) into small data.

A 1 per million part identification system bringing together different data types in real time, a big data opportunity.

Behavioral interpretation and social analytics are still key to make sense of data, to quickly bring big data into interpretable and usable small data

I am working on a weighting system that is what makes this a very effective, better than six sigma identification system – this is my intellectual property

The following is a difficult thing for me to discuss as this indicates an unsettling prospect of

  • how resourceful organizations are going to be watching you and me going forward with big data, 
  • how privacy is going to be a challenge to maintain, 
  • how we are going to loose our moral superiority.  In difficult times like this, after Boston Marathon, it is important we have a tool like this, though. 

This only makes why discussions about Type I and Type II errors are becoming more and more important and, added to that, how the idea of bias (pre-conceived notions) are going to undermine real intelligence.

The following, based on my initial estimates, is a 1 per million part identification system, better than six sigma, using big data and a weighting system that would help identify that 1 per million part measure of a dangerous person who is lost in our day to day hurry-burry life of innocence, dream, and celebration of love and accomplishment.


– Sudden changes in behavior or performance or relationships

Their usual metric of performance will be lost.  For a student, he/she will get outlying set of grades from his normal performance or performance evaluation or angry exchanges

Loose commonly known best friends or other gender relationships

– Sudden changes in the watchful eyes of organization or people; will travel to not so common places and will acquire new relationships who are in turn in the watchful eyes of security organizations.  At least the chatter inside the security organization has just got elevated or elevated chatter comes and goes, but not able to stick to a well defined resolution of not dropping the ball.  Resolving clearly does not mean putting people in jail, but have an officer report on the latest activities, log in the details so that it flows through the right people for right action, making sure the watchful eyes not sleeping. 

– Unprecedented access changes happening around one’s neighborhood, social relationships, and one’s lifestyle interests, that would provide opportunities for dark side to show up its head

– Firearms, crude bomb, illegal activities blip on the intelligence radar or the correlated words of “bomb” or “firearms” or “mass danger” materials are popping up in the radar

– physical (becoming a post teenager – things get hardened around this time) or emotional changes at home (death or separation) or with closely related people (friends lost)

– New buying/shopping activities of apparently looking unrelated items; this will be a second or third blip, almost always, if they are innocent looking items but used as an aid to complete the intended action

– sleep patterns, telephone call patterns, internet information access patterns, even visiting one’s own home are changing unreliably, but with consistency one’s change started

I say, these are seven metrics of highly dangerous people who need help badly.  What caught my attention is that our security agencies were so close to the …. and yet the tragedy happened.

Of course, it takes a lot of fine intelligence to be careful about Type I and Type II errors that is also respectful of privacy of citizens.

My prayers are with the innocent victims of all ages, an innocent child who cared for kindness, a couple starting their dream life, and an accomplished elderly who did not give up running a marathon at age 78.

From Data Monster & Insight Monster