Locating the right data and defining the right measurement to work on the verifiable hypotheses are not equally loaded in our favor.
The search for data has been a cognitive process much longer in our psyche than carefully understanding and using right measure. We are easily fooled by the argument of how we explain by quoting or not quoting the data source provided by others, often not even knowing the veracity of the source, and we willingly or easily believe. However, we hardly spend enough time to question, how is the measure defined?
The famous problems of people getting confused in regard to use of joint probability vs. conditional probability, or when to use mean, median or mode, when to use arithmetic mean vs. geometric mean vs. harmonic mean are enough to make people pass clueless, and often times even above college level educated people.
Being analytical is the next higher level of cognition and it requires systematic thinking and systematic statistical principles and concepts. While every one would like to be analytical in thinking, deciding with least bias and knowing and keeping the amount of error in prediction to be minimal is a deeper cognitive process.
The famous Monty Hall problem in which two goats and a car behind three closed doors are given for contestant to select two doors but get distracted , was a challenge of a 1000+ PhDs. See the summary article I created and provided in the link, http://blog.crmportals.com/2014/03/25/even-math-professors-fail-in-this-simple-game-why-2/
Also, we all know how important the measurement, OBP was in the Moneyball assignment, and in fact, it was the organizational strategic metric.
In terms of locating the right data, there are four sources.
- Application/Registration/Inquiry data – Prospect data
- Transaction data
- Third party – syndicated data (geo-demographic, lifestyle, attitudinal, behavioral data)
- Survey data (special enterprise initiated vs. existing panels)
In terms of defining the right measure to use for analysis, it is tightly connected to the hypotheses on hand. If it is a simple classification problem where we hypothesize that high value customers are highly educated and making above average income, this measurement of interest is a conditional probability.
On the other hand, if the hypothesis is that those with two or more children of age less than 15 years old are more likely to be a loyal customers in the next 3 years, then the measurement is likelihood but it is more subtler than the previous example; it is a loyalty survival probability.
The way these two probabilities are defined, calculated, interpreted, implemented, and the value they bring to the organizations, as well as how the consumers need to be engaged are all very different.
So pay close attention to the hypotheses and make them help identify and define the right measure.