Classic Fransis Galton heights of fathers and sons data set and the simple regression

R codes to get the data and run the regression

father.son<-read.table(“”,sep=” “)[,-1]
plot(hson ~ hfather, data=father.son, bty=”l”, pch=20)
abline(a=0,b=1,lty=2,lwd=2) # what is the hypothesis here? Why we are looking at this line
abline(lm(hson ~ hfather, data=father.son),lty=1,lwd=2) # what is the hypotheses here? which one to use – this or the previous one?

The outputs are:

lm(formula = hson ~ hfather, data = father.son)

(Intercept)      hfather
33.8866       0.5141

The graphical output is

AI – Will it get wild? Why Will it Not…?

Will it get wild. Why will it not…?AI_Google_Images

(A small sample of Google images on “AI”)

First premise for deciphering human interpreted conscious machine: Consciousness(hierarchical levels of evolution) === hierarchical layers of latent dimensions (hierarchical average statistical dimensions)

Second premise for deciphering human interpreted conscious machine: Greed(fear of the future traps), fear(inability traps), gluttonous desire(existence traps), love(network traps), procreation(perpetuity traps) brings errors in judgment creating variations and trap holes keeping lower levels of outcome

Third premise for deciphering human interpreted conscious machine: The environment. The levels of certainty will influence in the variations of manifested sequence of events connected with mechanical consciousness.

Our children are our creations so are our children’s children.  Mechanical consciousness is our thought children. If it does not carry our thought patterns constrained by the above premises, then it does not exist. 

The interpretation of existence, and hence the implementation of the statistical mechanics of AI, is very much our perception and we exist because we interpret.

Will it get wild. Why will it not…?   It is in our genes, and we consider that as the opportunity of living the time.  

One Minute Test for Graphics – Economics of Productivity and Never Grown Real Wages

The following beautiful multiple part graph is about

1. Extraordinary productivity growth

2. lopsided personal income gains, increasingly, by the higher and higher income gainers

3. families committing two person earners more and more, more from the point of view of inability to live with in the means of single earners (inferred from the line of insight of the authors)

4. and yet the debt is becoming unsustainable

My Note: Number 3 does not take into consideration the relevance and importance of women participating in labor market, on their own or as part of natural growth, for personal fulfillment and societal growth.

This is courtesy of NY Times. Read the full story here, “The State of Working America”:

age of greatest productivity growth and no growth real wages of lowest wage groups


Figuring out latent dimensions – How to Change Yourselves and What Attitude Will Help

Today, I saw the following, forwarded by my friends in LinkedIn.

Often, people do not where the original publication happened and by whom, making it difficult to give credit.

Subsequently, I found the right original link: ; if anyone has objection to this find, please help me correct.

I took this to next question, on how I may use this to help drive developing specific talents to make changes to one’s life.

I came up with this.

Note that in real life, if some one is missing only one attribute as it portrays here, it is the easiest thing.  Often, we miss multiple attributes.

402 PPT Decks

When my hard disk crashed, I had to put together from various previous versions.  The material is complete but the arrangement could be repeated some times.  I am working on a revision of the material.

W1-Video1 – Introduction of the Course and The New Data Economy

W1-What are analytics and_Examples of Marketing-Socio-Economic-Political Dynamics

W2 -Video3 & Video4 – BI Part1 and BI Part2

W3 and W4-V5 and V6 – Internal and External Processes Analytics, and Strategic Metrics and Dashboard

W5-Video7_Development and Deployment of an Information Strategy

Week7 – Video8_Survey Methodology and Evil Twin of Bias_V2


Data Collection Methods, Sources, and Processes

Data Collection Methods, Sources, and Processes

Equipment based data collection:

Current level of data capture in variety, volume, and velocity with width, breadth, and richness of data captured are represented via intensity of the color, in this list.  The ordering will change in the next 5 years, where sensors/wearables of all types types of equipments, tools, and inter-connectors will dominate the IP(Internet protocol) world, spewing out continuous automated data capture paradigm.

Equipmentbased data collection..Process based data collection:

Process based identification and categorization of data provides very rich understanding of how our life is sewn using data as the thread.  It provides the basis for interactbase, an active data management process compared to database, a reactive and/or passive data mostly, unless it is defined intentionally and then it becomes static and unchangeable easily for fast changing market conditions!  We provide  a rainbow of rich colorful data about ourselves, without realizing the implications of providing such data to others, other than the immediate relevance, satisfaction, or fulfillment of providing such data.

The fact that something – man/machine/environment/ambiance – exists, is good enough reason that it will be generating data and will get stored somewhere in some form and analyzed or will get analyzed once people start seeing the importance of such data.

Process based data collection..















Some Useful Starting Point Documents for Statisticians to Familiarize The Opportunities in Big Data

1.   Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society July 2, 2014 A Working Group of the American Statistical Association1

2. A Survey of Statistical Methods and Computing for Big Data – Chun Wang, Ming-Hui Chen, Elizabeth Schifano, Jing Wu, and Jun Yan March 2, 2015

3.  A Statistician’s View on Big Data and Data Science in Pharmaceutical Development (Version 4 as of 10.10.2014) Dr. Diego Kuonen, CStat PStat CSci

Frequentist vs. Bayesian – What is the Problem?

Recently the NY times reignited this.

I do not agree with the first example, though. Even if the coin is loaded, we will be able to get the answer correctly, by following the usual approach. The author is  stating how the frequentist is solving the problem, not actually how they solve the problem.  It is very simple.  Toss that 100 times, you will know the answer, close to reality.   Your answer is correct even if it is a biased coin.

This is not to point out that the Bayesians are thinking incorrectly.  Science is after all, not that simple.   Bayes theorem is beautiful, elegant and has its important value in applications.

I like the discussions here:

The answer by user28 points out the important point that as n increases, the estimates are the same, whatever the method you use, though for some problem with small sample size, this may matter.

Because often time, we do not know the process and it is varying and the only way to keep updating the answer is the way of Bayesian method. Spam filtering is a great example, where you need to have Bayesian updating process for recomputing the probability models. Bayesian does have an elegant mathematics and updating process.  But then it comes a non-parametric, computationally oriented updating process.

A response by an applied statistician

Enjoy the conversations and learn from others, and especially noting that we can be stuck in our new found intelligence.-9-21-31

The Power of Graphics – Example of A Difficult Graphics from NY Times … Passed the one minute test – A learner’s perspective

There are so many factors that influence tax and how those factors show up as outcome is difficult to communicate.  I could not have taken a better example than this, “How the tax burden has changed over time”.

By way of the following example, you can understand what helps achieve the one minute test. . For illustration I will pick up here only one graph from a collection of graph that are used to explain 7 Important conclusions are (titles and sub-titles are NY Times):

1. Tax rates have fallen for most Americans, especially high earners.

Share of yearly income paid in federal, state and local taxes, by income bracket.

2. What’s driven the changes? Federal income tax rates have declined …

Share of income paid in federal income taxes.

3. … while payroll taxes have risen for all — but not as much for the affluent.

Share of income paid in federal payroll taxes.

4. State and local taxes have risen, most of all for the lowest income groups.

Share of income paid in property, sales and state income taxes.  

5. And corporate taxes — ultimately paid by people — have declined.

Federal and state corporate tax burden, as a share of income.

6.  Affluent households are earning more — and paying a larger share of taxes.

For each income bracket, its share of nation’s — Income — Population

7. But the distribution of the tax burden has become less progressive.

Ratio of each group’s share of taxes paid to its share of the nation’s income.

Watch how each of the titles is accompanied by two items, a statement that carries the conclusion and below that in sub-title font, the measure used.

Here is the first picture alone.  Pls. see the full article to understand, appreciate, the effort and beauty of how NY times covers the bridge of data, message, and communication.

This is a multi-graph, per conclusion, with multiple measurements across seven graphs and hence we have to give seven minutes, technically.

Pass maadi! – The one minute test for graphics.  (It is a funny story, where I got the phrase, “Pass maadi”!, and reserved for a different conversation)

Where I had difficulty:  The mouse over is not natural to understand as it was changing the look of a graph in terms of years even we move outside the range of the graphics.  Once we understand and control it properly, it  is easier to understand.

I would have also liked if the income group is below each of those graphs again and again for each of the graph, since the graphs are separated by reading text and the context of each graph becomes cloudy after a some good interpretation of some type.

Have You Heard of One Minute Test(TM) for Understanding Graphs? Tale of Three Graphs

The tale of three graphs:  Why right “right” visualization matters? If you can not say the conclusions of a graph with in a minute after you see the graph, the purpose of visualization is lost.

Take a look at the graph: (Image via the Georgetown University Center on Education and the Workforce analysis of U.S. Census Bureau, American …).  This compares median salaries of specialization major by experience.  Use legends at the bottom of this graph.

Now see same information from, , though i have a screen shot below.

Now, consider the following.

As a third example, consider the third graph, which again bringing it here for quick lesson.  But I encourage you to go to the above link to see yourselves, for another set of rich information that I am bringing out in the end.