Monthly Archives: November 2012

How to Balance between False Positives and False Negatives – 1.3 Million Women Were Mis-Diagnosed (false positives) With Breast Cancer in The Last 30 Years

1.3Million misdiagnoses of breast cancer in the last 30 years – The politics of communication in the push and pull of type I and type II errors


In the end, I provide a way of handling similar problems in corporate data intelligence.

According to the study, nearly 31% of cases are misdiagnosed in breast cancer detection.  The initial screening identifies 100% increase in cases identified as malignant tumor but in late stage we find there is a net decrease of 8%, over a study period from 1976 to 2008.  In a way it should not be surprising from a statistical interpretation point of view.  If you are willing to increase false positives, you will identify more cases but non-linearly the true positives will decrease faster (Remember, true positives is not 100-false positives!)

In the article in New England Journal of Medicine which published fairly detailed study on a retrospective sample of 30,000 women (the largest study) that used “Surveillance, Epidemiology, and End Results data to examine trends from 1976 through 2008 in the incidence of early-stage breast cancer (ductal carcinoma in situ and localized disease) and late-stage breast cancer (regional and distant disease) among women 40 years of age or older”.

“To reduce mortality, screening must detect life-threatening disease at an earlier, more curable stage. Effective cancer-screening programs therefore both increase the incidence of cancer detected at an early stage and decrease the incidence of cancer presenting at a late stage.”

The key take away I want to bring to discussion is that certain decisions are popular even in corporate world and if the organization does not use fully the information strategy there will be emotional decisions made that will have a long term effect on the organization because of dominance of false positive loaded decision making or two few true positives loaded decision.

Why does this emotional roller coast in life and wastage of resources happen?  From one side, the reasoning is “every individual matters”.  On the other side of the argument is that “the incremental value for the society as a whole involves emotional suffering of having the disease when in fact it is not there – fear of ghost – and also the pain/suffering/waste of additional tests/treatments/ and family difficulties of ‘imaginary ghosts’.

So how to reduce such high incidence of false positives? and what is the right decision?

Look at the measures from many different angles.  A simple majority voting should be good enough if ever one has to go though the majority angles interpretation for consistency of direction of message.

Recommend mammography after weighing in additional factors to triage the cases better and more importantly involve the patients and families in explaining risks of type I and type II errors.  The system somehow is revolving with the understanding that it is difficult to explain these confusing errors and associated risks.

Furthermore, weigh in the full emotional and material cost of drawing a line on how much to accept as false positives and how much to accept as false negatives so that every body is focused on the clear unemotional decision.

End users should be aware of the misdiagnosis rate or miss-classification of consumers and emotional and material cost attached to it.  In the case of breast cancer end users are women and households.  In the case of corporate world, the different department heads who are being compensated for the success or failure of programs.

What is the take away for corporate analytics?

In the business world, this is a much easier decision making path as everything has to come down to simple dollars and cents (sense)! So unit cost and revenue differential for every false positive and every false negative can easily translate into final measure that can be used to compare decisions for optimal decision making.

From Data Monster & Insight Monster

Where Do BIG data Come In Here? The Beginning of A Quest…

Reading on something else, I found this wonderful article in New Yorker magazine – Annals of Medicine – Bell Curve, published six years back, that discusses the importance of comparative measurements, patient centered approach, extensive observing and listening to the patient, cross-validating the information across sciences, hospitals, physician practices taking into consideration in a systematic way all that support the understanding that,

“…subtleties of medical decision-making can be identified and learned. The lessons are hidden. But if we open the book on physicians’ results, “and being patient centered (my words)”, the lessons will be exposed…”, and significant improvement in healthcare can be achieved.

Annals of Medicine – The Bell Curve

What happens when patients find out how good their doctors really are? by Atul Gawande 

I have pull together the importance of how to create a BIG data process mapping and how to reduce it to a functional analytics project for large scale implementation as a science.

Abstracting the article, for scaling it as a data science, few things come out very clearly.

– The science has to be patient centered

– Every little observation becomes part of BIG data

– Comparative measures of physicians dialog/measurement/performance is the yellow brick road for significant improvement in healthcare delivery

– Extraordinary partnership among patients, physicians, hospitals, and data scientists

All these bring down the BIG data into a few best practices; the ultimate goal of BIG data is to get to this small list of best practices.

The point is that every big problem (read big opportunity) has a process map that can help identify the analytical and data opportunities and in the end, the goal of BIG data from the analytics point of view is to come down to the small  set of smart decisions and best practices that can be standardized.

Based on the write up of the New Yorker article, it is clear one of the greatest and quickest way to achieve significant improvement in patient care is in the two important analytics opportunities identified in the green boxes, namely, identifying the shortest list of tests to be done and their priorities and also the comparative analysis and reporting of physician and hospital success stories.

See, it is very interesting, when you focus on the patients, every body wins including the physicians and the hospitals.  However, there is a poor understanding what is meant by ‘patient centered care’

By the way, one of the best practices the physicians did not consider is Yoga, considering the method of “percuss” in 14 places as one of the necessary conditions of bringing out the mucus. Do not get distracted on this line regarding suggestion of Yoga, which was not the purpose of this article. I will be writing separately about the shortest path to the BIG data approach in a separate article. Perhaps, the following clinical trials will illuminate the yoga hypothesis.

You see how the data becomes BIG and how it will fall back to small data… Some times, sheer ignorance adds to data.  That is another topic – How our inability to process the existing data or inability to rank the importance of data and paradigms becomes the reason of BIG data.  I will discuss this in a separate article.

From Data Monster & Insight Monster

Important Success Factor for a Data Scientist and Your List of Secret Source of Trend Data


To get access to my world class team who can address your opportunity areas and network with such luminaries,

– Join my group (use the ‘Join this site’ on the right)
– Leave comments, and questions that might help every one else.



As part of my teaching, I try to impress up on my students every term on the following.

Peace – I will explain. Give me a minute or two…Smarty
Dr. Spock –

It is lot more easier to build logistic regression, cluster analysis, and random forest (which are tools), neural net or genetic algorithm based business rules than to be a subject matter expert who also happens to know the tools and methods of analyzing the data with the right sourced – internal and external.

Getting the right data, interpreting importance of various components of such right data, and using them right and utilizing them to solve the business problems is what will get you in the board room of your company.

And, you see, for prediction, you have to be ahead of the game.  In various articles I have written on most important factors in different contexts  such as,

Most Important and Powerful Factor that Predicts Your Market Opportunities

The Important functions of a Chief Analytics Officer

The Five Most Important Characteristics of a Good Segmentation Scheme

These understandings can help sculpt your pathway of putting together your team, team leader, and some methods too, and start collecting data that would enhance your prediction.

However, we all need a waterhole, or secret garden, or more directly list of trade publications where you can, through the concepts of BIG data, capture business cycle, social trend, seasonality, and lifestyle trend of your key segments, to enhance the power of prediction, besides the third party data and your own sample surveys or some form of syndicated sources.

Believe me with out life-cycle, business-cycle, trends data, you do not have a predictive model; only some kind of data intelligence model and that is ok as long as your analyst/vendor does not claim and promise that the model is highly predictive.

So where can I get that Dr. Spock?

Ok, I will bring out today one aspect of such data sources.

I wish we have vertical specific business cycle, seasonality, trends, and life styletrend data are captured in some systematic way and made available in a simplified tabular form so that we can all use in an easy way.  It is not available as far as I know.

One of my personal interest is P&C and I am providing a condensed list of publications where you can get trends, cycles, and seasonality, and life cycle change point data:  I want to caution that these things can become overwhelming and extracting any trends from these sources can be too diffusing, though harmlessly I suggest these in the context of BIG data and probably there are other journals and industry conferences which can also be part of these data aggregation, only increasing the number of sources where you can identify the data sources.

A Caution:
Be watchful.  There may be third parties who sell quality data aggregated from such sources and that could be better than dissipating your energy in trying to tease out data from these.  I already see that these references are misunderstood in  professional circles. In fact this is the problem of BIG data.  As you bring together, you are likely to be discouraged because you are not sure of how to tease out data.   Get the small signal flicker from where you pick up the important data points (series or streams) that will eventually increase the predictability or competitiveness and can become a standard source, is a difficult and frustrating task many are not equipped with.

Also, some key BIG 8 sites are: 

A little bit of research in the net can get you the list in your vertical and it is just the beginning. Go ahead and fill up the above list for your needs.  What we need ultimately is a nicely put together explicit tabular form of these trend data in a meaningful way.

Of course do not forget the Google insights and trends data, which is the easiest source for trends in surfing data.

I will stop here for your own pathway to hunt for these.

Have a great weekend.

From Data Monster & Insight Monster

The Importance of Analytics in Fiscal Cliff Opportunity – A 16 Trillion Dollar Restructuring and Redistribution

The fight for who is going to shoulder how much and how to sequestrate the whole thing creatively in case it ends up in a debilitating long fight.

I am wondering in seeing whether there could be a business opportunity where every one including the whole world economy participants can be part of the sequestration process through a set of competitive activities.

Life is very uninteresting if you just tax one group and pay to the benefits of another group, and in fact the hatred and cursing, and possibly violence can happen if the push comes to shove in following such an approach.

But President Obama is smart and lucky to have few things in his arsenal
– an expiring President Bush tax cuts
– senate majority
– willingness and tone change in Washington, DC and WallStreet
– Europe is ready to play with US
– China is in all likelihood will be willing to play with US

There are some challenges too

Identity Crisis of Analysts and Analytical Managers

Some times, professionals and people need to search for the meaning of one’s work; need to know what is the destination and target.  How do we make the journey as much fun as the waiting for the celebration of reaching the destination.

When I see overwhelming discussion of

– do we need Ph.D to become an analyst becomes a politcally charged discussion
– should analysts be

It is the Information Strategy, Smarty! Is GOP Method Stone Age One?

I will cross between business/organizational ideas and politics to borrow the best ideas and identify the best practices and how it contributed for the election results.  The conclusion is clear – It is the Information Strategies, S…..

It is morning after monday night football and the learning is after the fact, it may be argued.  But for me this confirms again the power of data based information strategies that continue to prove its validity and value.

See the today’s program of Morning_Joe program.  Chuck Todd mentions the information strategies of GOP looked like stone age approach compared to democratic party.

Also, see: 

Based on various reading and my own analysis the following is the score card of the two parties in terms of their abilities to use information strategies.  I would say I have been graciously graded Mr. Romney’s data intelligence team and even then it does not measure up!

Our analysis is based hard cold data – David Axelrod.

Visit for breaking news, world news, and news about the economy

Managers’ Score Card For Modeling Vendor Evaluation

To get access to my world class team who can address your opportunity areas and network with such luminaries,

– Join my group (use the ‘Join this site’ on the right)
– Leave comments, and questions that might help every one else. 



With the pervasive usage of data driven decisions taking central stage, many vendors of different breeds,
– boutique modeling agencies
– part of big 8 who provide integrated evaluation and analytical solutions as additional services
– technology companies serving also as data usage companies
– data capture companies acting as data intelligence companies
– data aggregators acting as data intelligence companies
– data sellers providing data intelligence services
and anything else that touches on multiple attributes mentioned above, are providing data intelligence services.

Vendors are competing for the opportunities to help companies in creating data intelligence; however, the quality of solutions and definition of solutions vary a lot among these companies though often times many will be interpreting differently the same definitions.

For a logical and consistent way to evaluate and understand the best practices, it would be good if there is some kind of scoring card for evaluating the predictive analytics or BI analytics solutions with or with out the context of BIG data.

To that extent, I am providing a template of a scoring card that may be useful to you.  If you need to use this the following are some specificity that need to be well understood.

A good understanding of the vendor and aligning them to your specific business is important to articulate your needs to your vendor and negotiate/optimize your budget for the best execution of the analytics projects.

Behind each one of these dimensions are multiple sub-dimensions that contribute to the primary dimensions.  These sub-dimensions are weighed according to the importance of the client needs. Thus, this is a two step process in evaluating the desirability of a vendor.

Note also that not all these dimensions will play equally in the weighted score, where one ultimate weighted score is what matters.  Even if a vendor is not selected purely on the basis of the weighted score, you will still know what factors you are willing to sacrifice and how much and why certain weights are not representing well your requirements and hence make very valuable and justified decision. The following is a trade marked(TM) in progress material.  You can use it with permission.

To get original size and easier readable view click on the above doc.

From Data Monster & Insight Monster