This discussion is based on some understanding of ANN. If ANN is not clear, this is not for you, for now. Also, to be crisp, I will not belabor various nuances of terminology, meaning, and reasoning, at this time; there is a place for that. This is to let data analysis practitioners understand why and what of deep learning.

Also, the discussion is based on the publication mentioned below. This is a classic and encourage ML enthusiasts and statisticians to read and learn the mind of a deep learning analyst.

Learning Deep Architectures for AI (Foundations and Trends(r) in Machine Learning) Paperback – October 28, 2009

from which I picked up two important pictures to put the key points forward. These are two different aspects of understanding how to think in Deep Learning. This is a start, not a complete presentation. This should lead to some important good questions to further understand the nuances.

I selected the order of the figures to explain my purpose, though the book has it in different order as a mathematical exposition.

Deep learning is deep, because, you want to keep on training on the depths until the learning algorithm achieves a level of certain acceptable predictability.

Learning in polynomial circuit means the following structure.

Take a look at this Figure.

The input layer is at the bottom, first level interaction is next layer, additions of all interactions is another layer. The picture does not easily point out directly other interactions. For example, x1x4 and also higher levels of interactions, such as x1x2x4. You may think that such interactions fell off in the feature selection process; and/or it is coming through at a higher level with a different activation relationship. For example, at the top most level in the picture, we have x1 and x4 coming together in a particular way, but that may be an artificial exhibition of the starting point of the construct. None the less, the picture, though elegant, is a difficult and incomplete way how a deep learner looks at (all) possibilities. But that is not an issue, as long as we get the point. The important point is there are countably infinite possibilities! We will make it finite, by clever intrusion on the intrusion of possibilities.

In essence, considering interactions are represented as a split in a decision tree, you may think of deep search process for a specific collection of all possible roots of a tree, and one beautiful representation of deep thinking is that it is supposed to learn without training; figuring out “labels”. The mathematics above helps in that sense.

The second picture I want to share with you is the following.

The above figure explains how to define layers.

The important point here is that every representation, however simple it may look like should be considered as a layer. In an unsupervised machine learning context, this rule becomes easier to understand, where we want to take out any trace of human judgement. The result of this is lots of layers to train.

This is especially needed in unsupervised methods, where learning is very difficult and accordingly one has to keep trying different transformations (and all possible transformations with in a construct of a machine)

So the first question is “When do I stop”?

Stop, when the algorithm learned enough. That is ok, because that is the purpose of the gift of reducing cost of raising computing power.

Does it take billion samples? So be it. That attitude is the right thing to appreciate. We solve a problem; a difficult problem of how to get a machine to learn itself. Boy, this baby does not need a parent. It learns itself, though it may take billion exposures, as in the case of computer vision. That is in no way different from our own evolution.

… Part 2 coming with programming codes and a simple application.