Monthly Archives: March 2015

Some Useful Starting Point Documents for Statisticians to Familiarize The Opportunities in Big Data

1.   Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society July 2, 2014 A Working Group of the American Statistical Association1

2. A Survey of Statistical Methods and Computing for Big Data – Chun Wang, Ming-Hui Chen, Elizabeth Schifano, Jing Wu, and Jun Yan March 2, 2015

3.  A Statistician’s View on Big Data and Data Science in Pharmaceutical Development (Version 4 as of 10.10.2014) Dr. Diego Kuonen, CStat PStat CSci

Frequentist vs. Bayesian – What is the Problem?

Recently the NY times reignited this. http://www.nytimes.com/2014/09/30/science/the-odds-continually-updated.html?_r=1

I do not agree with the first example, though. Even if the coin is loaded, we will be able to get the answer correctly, by following the usual approach. The author is  stating how the frequentist is solving the problem, not actually how they solve the problem.  It is very simple.  Toss that 100 times, you will know the answer, close to reality.   Your answer is correct even if it is a biased coin.

This is not to point out that the Bayesians are thinking incorrectly.  Science is after all, not that simple.   Bayes theorem is beautiful, elegant and has its important value in applications.

I like the discussions here: http://stats.stackexchange.com/questions/22/bayesian-and-frequentist-reasoning-in-plain-english

The answer by user28 points out the important point that as n increases, the estimates are the same, whatever the method you use, though for some problem with small sample size, this may matter.

Because often time, we do not know the process and it is varying and the only way to keep updating the answer is the way of Bayesian method. Spam filtering is a great example, where you need to have Bayesian updating process for recomputing the probability models. Bayesian does have an elegant mathematics and updating process.  But then it comes a non-parametric, computationally oriented updating process.

A response by an applied statistician

http://simplystatistics.org/2014/10/13/as-an-applied-statistician-i-find-the-frequentists-versus-bayesians-debate-completely-inconsequential/

Enjoy the conversations and learn from others, and especially noting that we can be stuck in our new found intelligence.-9-21-31

The Power of Graphics – Example of A Difficult Graphics from NY Times … Passed the one minute test – A learner’s perspective

There are so many factors that influence tax and how those factors show up as outcome is difficult to communicate.  I could not have taken a better example than this, “How the tax burden has changed over time”.

By way of the following example, you can understand what helps achieve the one minute test.

http://www.nytimes.com/interactive/2012/11/30/us/tax-burden.html . For illustration I will pick up here only one graph from a collection of graph that are used to explain 7 Important conclusions are (titles and sub-titles are NY Times):

1. Tax rates have fallen for most Americans, especially high earners.

Share of yearly income paid in federal, state and local taxes, by income bracket.

2. What’s driven the changes? Federal income tax rates have declined …

Share of income paid in federal income taxes.

3. … while payroll taxes have risen for all — but not as much for the affluent.

Share of income paid in federal payroll taxes.

4. State and local taxes have risen, most of all for the lowest income groups.

Share of income paid in property, sales and state income taxes.  

5. And corporate taxes — ultimately paid by people — have declined.

Federal and state corporate tax burden, as a share of income.

6.  Affluent households are earning more — and paying a larger share of taxes.

For each income bracket, its share of nation’s — Income — Population

7. But the distribution of the tax burden has become less progressive.

Ratio of each group’s share of taxes paid to its share of the nation’s income.

Watch how each of the titles is accompanied by two items, a statement that carries the conclusion and below that in sub-title font, the measure used.

Here is the first picture alone.  Pls. see the full article to understand, appreciate, the effort and beauty of how NY times covers the bridge of data, message, and communication.

This is a multi-graph, per conclusion, with multiple measurements across seven graphs and hence we have to give seven minutes, technically.

Pass maadi! – The one minute test for graphics.  (It is a funny story, where I got the phrase, “Pass maadi”!, and reserved for a different conversation)

Where I had difficulty:  The mouse over is not natural to understand as it was changing the look of a graph in terms of years even we move outside the range of the graphics.  Once we understand and control it properly, it  is easier to understand.

I would have also liked if the income group is below each of those graphs again and again for each of the graph, since the graphs are separated by reading text and the context of each graph becomes cloudy after a some good interpretation of some type.