A quick review of various correlation coefficients is surveyed here before we come to the concepts and references regarding what is touted as 21st century correlation coefficient.
The popularly known measure of association or relationship between two continuous variables is what is called Pearson correlation.
See: https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php for a nice introduction of “r”, the Pearson correlation coefficient.
The “r” is affected by outliers.
So we use Spearman rank correlation which uses the same formula but uses it with ranking of the raw pairs of continuous data elements.
You can actually change one of the values (the 10th width value from 500 to 10000, for example) to make it an outlier and you will see it will not affect the Spearman correlation value but it will affect seriously the Pearson correlation depending on how bad the outlier is.
However, if you want to find correlations when you have ordinal data, we need to use concepts of concordance and discordance as a way of understanding how the co-relation (joint relationship of association) can be defined.
For a nice introduction on concordance and discordance see: http://stats.stackexchange.com/questions/51604/ordinal-trends-and-finding-concordant-discordant-pairs
Now using concordance and discordance ideas one can define the correlation (association) using Kendall’s tau.
Here is an example:
You can also use the following sas codes to calculate various measures of correlation as explained here, where the above three measures of correlation along with Hoeffding’s D are output in SAS.
In the above document you see all the four main measures of correlation. The last one being Hoeffding. Hoeffding measures general concept of independence.
The proc logistic provides some important ways to compare the goodness of fit using various measures of association in a modeling perspective.
For a recent document that compares and summarizes these four measures see
This document takes the direction of information content between two variables as a measure of association, using the concepts of mutual information and maximal information coefficient.
Interestingly “maximal information coefficient” – MIC – is also touted as the correlation coefficient of 21st century, http://www.slideshare.net/daniel_bilar/speed-2011-mic-a-correlation-forthe21stcentury
This is a nice simple collection of R-codes in its expository mode.
Now why would you agree or disagree with the above claim that this is the 21st century correlation coefficient? Like to hear your points of views.