# Real Time Analytics Basics – Bayesian Updating – Part 2

Real time analytics for credit approval, fraud detection, health care monitoring, epidemic monitoring, homeland security, stock market monitoring, advertisement platform management, weather reporting … are all dependent on some very powerful analytical methods.

Real time analytics with Bayesian updating method is one such key concept for building real time interaction data bases and scoring methods.

Also, this is one more reason why complex business hypotheses requires deeper analytical talent and decision scientists’ opportunity analysis hat.

Statistical thinking is different and it is complicated with probability calculations which are confusing like the example discussion ‘<a href=”http://www.blogger.com/blogger.g?blogID=25615276#editor/target=post;postID=7707666505443290603″>Even Mathematics Professors Fail in This Simple Game</a>’ and on top of it, it has confusing looking probability (P) statements like:

In decision making, decision scientists know that there are two types of mistakes by which decisions come to happen.

Well, the full collection of pointers on this are given in the tabulation below:
The confusing set of statements are coming out because the conditional probabilities are calculated on the basis of rows, not on the basis of the columns.

So what are its implications?

When you calculate the Probability [False Positive], which is also called alpha (Prob of Type I error) and Probability[False Negative], which is also called beta (Prob. of Type II error), you know why decision making for complex business hypotheses are not as simple as taking proportion of miss-classification and calling that as errors in judgment.

Some times people use Actuals on the column (also called ‘Gold Standard’) and Test or ‘Predicted’ on the rows.  That is how it is shown in Wikipedia.org.  The key tabulation from wikipedia.org for our focused pointers is:

For detailed discussion on sensitivity and specificity, see:  http://en.wikipedia.org/wiki/Sensitivity_and_specificity

Why does this matter?

There are many ways to defend the answers to the above question.  I will bring that out from the point of view of real time update of decision rules.

In complex hypotheses, the real time updates will be done on the basis of Bayesian updates, where we will have to incorporate both these types of errors. (Note that complex hypotheses are related to power data)

Ok, I will have to introduce the Bayesian updating method first…
Using the formula:

Weighted reasoning:  (this is Bayesian update with out tears – see how simple it is)

Total weighted average probability:  (0.8*0.9+0.2*.1) = 0.74
Weighted value for the event:  0.8*0.9 = 0.72
So, weighted relative value is 0.72/0.74 = 0.72/0.74 = 0.972973 (updated probability of event)

Similarly, weighted relative value for non-Event = 0.2/.74 = 0.027 which is also, 1-updated prob. (Event).

Of course the practical problems are more complicated than this and we will address that as we go along in this route.
<div>————–Alternatively with systematic symbols———————</div>
<div>A:  event prior distribution</div>
<div>+:  Modifying distribution related components</div>
That is, given that the event happens
<div></div>
<div></div>
<tbody>
<tr>
</tr>
<tr>
<td>Oh! these symbols</td>
</tr>
</tbody>
</table>
<div>

</div>
In the above example, the P[False positive] is artificially created as if it is Prior Prob[Event].  I just borrowed the example from the reference as a test of accuracy of calculation even though I try to use my own terminologies in interpretation of calculations.  So let us change the modifying (likelihood) distribution to be as follows.
<div></div>
Here is an important exercise.
More intriguing example. What is the probability that some one actually is not an illegal drug user when the test comes back as non-user ? Note the changes made in yellow colored cells (I modified the prior distribution and the false positive and false negative probabilities so that it does not look like false positive is 1-P[Ec]) which was the case in the previous example.

It is easier to solve this using tree diagram, for all the following four questions:

If you observe that the test is coming out to be positive what is the probability that the tested person is actually an illegal drug user.

If you observe that the test is coming out to be positive what is the probability that the tested person is actually not an illegal drug user.

If you observe that the test is coming out to be negative what is the probability that the tested person is actually an illegal drug user.

If you observe that the test is coming out to be negative what is the probability that the tested person is actually not an illegal drug user.
<tbody>
<tr>
<td><img alt=”” src=”” /></td>
</tr>
<tr>
<td>Update with out tears</td>
</tr>
</tbody>
</table>

An important and interesting observation is that now in the posterior, P[True Positives] + P[False Positives]=1 and also P[False Negatives]+P[True Negatives]=1.

For symbolic representation at various stages, you may see: <a href=”http://home2.fvcc.edu/~dhicketh/Math117/notes/Chapter7/M117sec7_5problems.pdf”>http://home2.fvcc.edu/~dhicketh/Math117/notes/Chapter7/M117sec7_5problems.pdf</a>

For decision scientists to address the predictive power of models in complex business hypotheses or organizational hypotheses, these intriguing terminologies and methods will become handy.

Also, these are fundamental concepts that will help automation of scoring with latest information of a consumer as the company interacts with them.

The terminologies may be confusing looking complementary structures such as Prob[False Positive] not equal to (1-Prob[True Positive]); none the less, these are worth spending time with, because these are the foundations which help real time scoring of target offers and target selections.

For companies doing marketing as a product seller or as a publisher of other companies products, or an agency representing a vertical, or network servicing a collection of publishers, products, or agencies, this real time update can be very powerful.

Also, fraud, and health care triaging for treatment are all in the realm of real time applications.  We will call in these as we develop the architecture for all these applications.

At the same time, it is important that an analyst is watching over the convergence or divergence of segment level propensities so that the trends that are developing are meaningfully interpreted and incorporated in the business processes.

Next we will see how to extend this to a sequence of new evidence:  <a href=”http://predictive-models.blogspot.com/2012/08/real-time-analytics-bayesian-part-2.html”>http://predictive-models.blogspot.com/2012/08/real-time-analytics-bayesian-part-2.html</a>