Monthly Archives: May 2012

Survey Methodology – A short Intensive Training For Effective Business Intelligence

Know how to develop a powerful survey design, avoid bias in sample surveys, execute, and create sophisticated consumer intelligence.


Copyright Warnings: To obtain references to this video, you are requested to send permission request to my email address: nethra.sambamoorthi at northwestern.edu

Ref:

Survey Methodology (2004), Robert M. Groves, Floyd J. Fowler, Jr., Mick P.Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau

Avoiding Bias in Research Interview (http://www.childrenshospital.org/cfapps/research/data_admin/Site2846/Documents/Avoiding Bias in the Research Interview.pdf 

From Data Monster & Insight Monster

Why GM is Quitting Facebook Advertisement Platform? – What is the Elephant in The Room?

Social Media

My predictions were pointed out here: 

Does Facebook Segmentation – Social Ads Work? How do we know? What Will Stop Billions More Lost? published on March 31, 2012 

The proof in the pudding came with today’s news.

Yet, I want to caution that this is not about gloom and doom.

Facebook is one of the most powerful marketing platform; a platform where all kinds of personal data for 700 million individuals (Close to 200 Million of them are people in US and I wonder whether there is some possible confusions on unique individuals vs. unique pages; usually there is at least 1.5 pages per account exists even in the early stages of a site – all wild guessing until these are well understood) along with the network connection of who can be influenced by whom information is available for marketers.  These are available for ethical marketing applications but has come under stress of what I was predicting just less than two months back.  A marketers dream finally coming true in this new social media world is indeed the challenge not only for Facebook but also for the marketers.

Yet, today news should not be shocking…

What is clear is that marketers will be better of understanding the behavioral aspects of consumers, market dynamics, personal data, and consumer power and social media giants like Facebook has everything to offer.

However, we are in an unusual situation.  Facebook has all the things a marketer will dream of and yet the business model is not that clear as to how to utilize that innocent looking resource that would support the aura of  ‘be good with data, especially personal data’.

Facebook requests ‘ … spend less money on developing applications and more money on advertising and promotion’, the article reads and yet the clients do not know what it means to say ‘marketing through Facebook’ and do not know how the platform is contributing to the ROI.

The idea that alternative marketing concepts are open and possibly not used in a way that will support favorable ROI vis-a-vis Facebook advertisement platform is the elephant in the room. “…G.M. is spending $40 million year and $30 million of that is going toward managing its page, that’s a lot of money, says eMarketer.

This is to support 378,421 likes, which sure will grow but not all growth may be in Facebook. The unit cost of each like is $79.  This is just to maintain the relationship and only 20% of them will be meaningfully active, raising the cost of communication relationship to close to $400.

If $400 is the net acquisition cost for selling a new car then the ROI potentially is justified in all likelihood – a mere fraction of the average revenue of a vehicle sold, say $20K.

A messy understanding of consumers and their behavior results in the article like “…Rumors that Facebook would announce its own ad network, allowing advertisers to buy inventory on the site in real time, did not materialize. Instead, advertisers were given more places to put their ads, including on the log-out page and in a user’s news feed.”

Neither Facebook nor the clients could understand the ROI because Facebook is not able to give their data (The challenge that one is likely to be easily traced back to privacy violations for simplest of leaks that might occur in the data pipes when one owns too much of personal information of consumers, …), nor clients able to do analytics on their own that supports the ROI, the sources indicate.

What is companies like GM to do in this instance.  That is the news in today’s article

http://www.nytimes.com/2012/05/16/business/media/gm-to-quit-facebook-ad-campaign-worth-10-million-a-year.html?ref=technology

What is Facebook trying to do:

Facebook privacy changes hint at ad ambitions

Will this solve the problem?
 

Is the elephant entering the room?

What I can leave with is that the idea of consumer marketing and its symbiotic relationship with the marketing platform is a careful strategy marketers have to think about by looking from the point of view of consumers, consumers who will be making the decision to purchase, not driven by the pure correlated sense of factors.  With all these caution and sensibilities, Facebook has a way to a more acceptable and rewarding way to harness its assets but how?  It is an opportunity analyst’s work and there is a way.  This is a different topic for discussion. Some details regarding what is going on in the market chatter:

http://dealbook.nytimes.com/2012/05/16/ahead-of-facebook-i-p-o-a-skeptical-madison-ave/?src=busln

Yet, I want to caution that this is not about gloom and doom. Facebook has a resource that is very powerful and has a place in everyday life. How could that be a marketing tool also in way that carries the message, do no harm using the conduit of personal info, is the key point.

The good thing to note is “There is nothing that can not be solved in an acceptable and creative way!”.

From Data Monster & Insight Monster

Data Scientist vs. Analyst – or Is it The Janus?

Janus, in ancient Roman religion
 is a two faced god and he has
two faces because he is looking
at both the past and the future. In
common parlance, showing two
faces means deception, no where
near where it started

We, analytics experts are always supposed to be looking into the past to learn and predict the future.  If we are not going to use it for the next event or time period, we do not need the decision rule.  So inherently, analytics is connected to the future event.  We look at both past and future and provide the best rules for implementation of strategic judgement to business people or policy makers.  That is the advanced ability of an analytics expert (AE).


In one of my earlier note, I was sharing with you how you can be a great BI analyst and still a lot you can contribute; do not have to be concerned about the true application of the predictive part.

I will be using this phrase, AE (analytics expert or analytics engineer) because some how the word analyst is getting disassociated from the current trendy activities of data capture, management, and associated analytics along with the uncanny ability to be an AE, as one personality.

It is understandable that not every one can do all those tasks, but disassociation is hard to understand.

People like to be called as Data Scientist nowadays, because of the phenomenon of BIG data.  But the true data scientists who coined the term, using it around the big data companies are actually automation smart computer scientists whose main job is capturing, managing, and automating the parsing of gazillion and gazillions of bits of information every day. With the help of the democratization of analytics tools and machine learning methods, these smart people are also becoming well versed in the sciences of decision making, to manage the ‘data sciences’ processes, so that definition just does not stop with data side alone, but also includes the ability to become more and more an AE.

A split Personality Janus?

On the other hand,  analysts (smart data intelligence creators) have been just AE and they lost the ability to connect to the needs of the speed at which they have to access the data, process the data ,and provide the intelligence almost in real time.   Because of that they are, or at least the term ‘analyst’ is, slowly but steadily loosing its importance in the data intelligence world, it seems.

I may be cautioning or unnecessarily creating a divide here for some.   I do feel that previously people who were happy to be called as analyst (a highly respected term) are dropping them like hot potato and want to be tagged as ‘data scientist’, to be relevant for the time.

Data Scientist can access, manage, analyze large amounts of data and in real time and with abilities to apply machine learning methods by the practical definition of the usage of that term. The processes and applications followed mostly do not have to worry about Type I error and Type II error or any other best practices of statistics, econometrics, and other behavioral sciences.   Every thing seems to work with the simple most powerful techniques of ranking, grouping, and associative rules for almost all problems one faces. And, ada boosting along with random forest and decision trees opened up the next big thing, which is supervised learning. With that it completes the most sophisticated level that can be achieved as of now bringing data scientist closer to analytics expert.

Analysts are interpretive intelligent people but can not access, manage, analyze large amounts of data in real time.  They are not updating their skills for the latest trends.  With intelligent data management and real time decision reports, computer scientists are becoming data scientists, while the intelligent AEs are not becoming computer scientists (note, you do not need everything in computer science).

With web and mobile becoming hot, definitely there are more opportunities to become a data scientist.

So it looks like a Janus has been created by the current trends and there are going to be hot mails writing about this.  The good thing is it does not have to be like that.

When Does an Analyst Become a Data Scientist and More:
It looks to me that the core aspect of data intelligence has not changed.  Only thing is that the traditional analytics experts have to become skilled in key programming languages such as Python and some of the Hadoop technologies and associated programming languages so that he or she will always be a Janus looking into the past and at the same time follow the principles of predicting the future so that they provide the right advice to business people and policy makers with powerful abilities of automation.

If Python can get away with its implementation as an alternative to Hadoop then you just need to be an expert user of Python, besides SAS, R, and SPSS.

Some discussions of such developments are here:

Great professing regarding the benefits of Python and why it is ready to swallow the Hadoop

Complementary Skills Representation

So I like the complementary Janus figure which is on the right.
The picture says it well.  Don’t ask me which side is data scientist and which side is traditional data intelligence person.  That is how I started with the dichotomy to reflect what is going on in the market place.  However, this Janus is the symbol of complementary skills needed in the new world so that the Janus can also have the dual duty done well, that is look in the past and predict the future.

With yesterday’s discussion on who is a BI analyst vs. who is a predictive analyst, now it looks like the functions of super AE are the following.

– Be a master of data access, management, decision rule creator in real time – BIG data or small data
– Do powerful analytics (advanced or not does not matter)
– See the past (Powerful BI analytics)
– Apply to the future (True Predictive Analyst)

So when in discussions, your triaging should ask for such details.

Keep up good cheer…will talk to you again.

PS:

Fun question for aspiring students, which program provides training needed to become a balanced Janus – AKA – a modern much sought after predictive analyst.  Note that this can be an unfair expectation, because there is a lot to learn on the job.  However, it is good to think about these if you are aspiring to become predictive analytics students.

From Data Monster & Insight Monster

Intriguing Visualization Tools – Yet to Become Popular From Research Labs – The Last One is Not a Secret, Just Not Caught Up with Analysts

A technical overview is here

ftp://ftp.cc.gatech.edu/pub/people/stasko/movies/sunburst-infovis00.mpg 

A wonderful review of how ‘Jigsaw’ application is used for evaluating associations among documents

http://www.cc.gatech.edu/gvu/ii/jigsaw/Jigsaw.mov  

Association among tabular data even if there is no

http://www.cc.gatech.edu/gvu/ii/ploceus/Ploceus.mov 

Google visualization tool 

Google visualization examples

Five things a Modern Predictive Analyst Needs to Know for a Successful Career – Some Say Four is Enough and Be a Great BI Analyst

e-education.psu.edu

The fifth one is not an easy one.  However, for a specific time of application, everything and anything is in steady state and hence the argument goes why it does not matter to be concerned about the fifth one. The resulting conclusion is that there is enough opportunity to benefit by using simple analytical methods and even with out the predictive component in that.  A fair statement in many many situations.

An attempt is made by PSU educationist in a picture for the four things, provided on the sides.  My take as given below. 

After 20 plus years of working in the field, it seems the top 4 things a modern predictive analyst has to be good at to be a great analyst are the following. (Some how the word Analyst is not that attractive compared to ‘Data Scientist’:  When does an Analyst become a Data Scientist, is a different topic of discussion)

Methods (How to Solve): Know statistical and machine learning methods. Know Type I error and Type II errors and how it permeates the whole statistical sciences.  While certain sections of data intelligence community will get away with not talking about both errors because the cost of one of the errors may not be that much, as a decision scientist, it is very useful to know both the errors.

Connect to Practical World: Know the behavior science to some extent; you do not need to go through a degree program but need to know how to explain consistently the statistical and computational results. Just observing behaviors and interpreting them in a meaningful way in each and every project is all what is needed.

Tools (Solve With What?): Know programming languages such as R/SAS/Python. You can add your favorite ones based on your needs. This one solves any and every problem, for now (!)

Work on What (The oxygen and water)? Know the various data sources and data acquisition platform in the web/mobile/third party platforms and their integration with other data sources, traditional – structured or unstructured, small or big.

When Does All The Above Become Predictive?  Future proof the predictive model. Usually when the models are released, it is out of an analyst’s hand and one gets to hear only when it fails and often times when it succeeds, it is everyone’s creation. But when the dynamical factors are more active with higher velocity then one will be hard pressed to see the results promised by the model. So any model released should have the caveat statements that can be broadly identified as “future proofing the application” – that is indeed the ‘safe harbor statements’ in predictive analytics.

However, if you do not take into reasons why prediction may fail because things are always in a flux and dynamical sense, then it is really not a predictive analytics.

More details are here in the following link:

Why Prediction Fails in Marketing ? Consider macro dynamics factors as potential sources

Well some might say, to be successful you don’t need to be a predictive analyst; just a good intelligent analyst.  So may be you will be happy just being a BI analyst.  There is a lot of money and joy in doing just that.

But that is exactly what happened why we could not predict hard enough and with full life cycle of events or even when economists and statisticians point out some people feel better of temporally ignoring such dire warnings.  The end result is that we ended up with the trillions and trillions of dollars of housing market collapse.

Happy predicting.

Ref: All the pictures except the first one are from MS clipart.

From Data Monster & Insight Monster

Engaging Dashboards: When to Use What Graphics Tools and Methods – Session 1 – Line, Point, Bar, Pie Graphs

I will be using R here:

The basic points in communication in those situations is whether

1.1- a measure is high or low among items of comparison  (use a bar graph)
1.2- a measure is increasing or decreasing among items of comparison (use a line graph or a bar graph, with some thing that indicates the change agent that has order property – usually time or an activity that has order property)
1.3- a measure is more or less among items of comparison (use a pie graph)

Visualization Tools and Top Visualization Sites – Dashboards 3.0 – this is going to grow!

Top Sites for Learning and Getting Excited:

This is a growing section of my site:With some patience and daily attention and posting, grow it will, every day.

There are thousands of sites who are doing this whether it is for students, professionals, or for publishers. Why do I have to create one more. The simple reason is that it is like a play ground for starters, and specifically where I will have opportunity to interact with my students.

You will see why I call this as Dashboard 3.0 over a period of time.

So one  thing i decided to do is the following.  I always wanted a cheat sheet for myself which gives me simple directives as to what graph to use and when to use it.

The nice thing is, since it is just starting, you can grow this garden with me, just showing interest and being with me. I have deep interest and fascination with pictures, graphs, and visuals as they communicate succinctly the key points in a data deluge.  This will be a center for creating visuals, and initially i will be concentrating more on R though I will occasionally post on Excel and SAS based on questions I may get.

This will be arranged in the following parts:

– R to create your visuals, for those who are adventuresome
– Excel to create your visuals, for those who are not yet confident of learning a new language such as R
– SAS to create visuals
– Top sites to get help or see how they do it

R Graphics:

You can see the ever growing sophisticated set of gallery of graphs and visuals that can be created by R system is

http://addictedtor.free.fr/graphiques/allgraph.php?page=1 

You can install your free copy of R following the various OS based installations that are available here at http://cran.r-project.org/doc/manuals/R-admin.html#Top

Top TED visualization presentation (material and intro sourced from TED)

Of course Hans Rosling Gapminder
is probably the most successful, because he sold his approach and software solution to Google.

Hans Rosling’s 200 Countries, 200 Years, 4 Minutes – The Joy of Stats – BBC Four

There are other really cool stuff.

See:

David McCandless turns complex data sets (like worldwide military spending, media buzz, Facebook status updates) into beautiful, simple diagrams that tease out unseen patterns and connections. Good design, he suggests, is the best way to navigate information glut — and it may just change the way we see the world.  Data is new oil or new soil? – David McCandless, the presenter here.

Aaron Koblin, the talented interaction designer known from groundbreaking works like Flight Patterns, Bicycle Built for 2,000, eCloud, New York Talk Exchange, Hand-Drawn Sheep Market and Radiohead’s 3D Laser Scanner Video Clip has recently given a TED talk, summarizing and demonstrating many of his works.

Instead of clicking on all of these internal links above, you can watch all this, and much more, in less than 20 minutes below.
Aaron currently leads the “Data Arts Team” in the “Google Creative Lab”, a team who initiated the Google Data Viz challenge and just launched their WebGL globe.
He claims that “an interface can be powerful narrative device. As we collect more and more personally and socially relevant data, we have an opportunity and maybe even an obligation to … tell some amazing stories”.

I will soon be synthesizing these and other visualization sites and tools.

————————————————————————————
 Top Sites:

I entered here and i ended up writing and sharing this excitement.

http://blog.visual.ly/20-great-visualizations-of-2011/

http://www.thevisualeverything.com/ 

– especially i liked

http://www.thevisualeverything.com/2011/12/top-10-visualizations-2011/


http://flowingdata.com/

-especially i got sucked in here: (in a nice way… what can i say)

http://flowingdata.com/2010/12/14/10-best-data-visualization-projects-of-the-year-%E2%80%93-2010/

 ==========                                                         =============

More to come…

From Data Monster & Insight Monster

The Challenge of The Second Secret of Data Sciences – BIG data or Small, The Illusion of Moving Brings Back To This Same Place

What I am going to bring out today is not a top secret in analytics.  We know what the top secret is.  It is about what we want to measure.  Now the second top secret people make a big noise about, and hence people have to talk to each other mouth to ear so that the other person can hear that well, is how do we create intelligence in that measure.

Do not think of differences in algorithms, volume,  velocity, variety, the professional groups, the type of science where it is applied, as any thing new in defining the characteristics of analytical intelligence.

Do not confuse algorithms as data intelligence, which are basically how to get to the end results in any of the different types of solutions.  It is like confusing the ship with the destination.

Do not confuse with mastering the technique which is important but it may not solve every problem you will face, as the whole science of data sciences is promising to do.  It is like having the biggest ship as the destination or even the plane for the destination.

The subtlest of these perhaps is captured in the following little story of martial art student.

Getting frustrated out of too many fighting styles and how to become a super smart know-all master, he asked his master how many more styles he has to learn, mid-way his learning and training.  How long it will take?

To know the right answer you need to know the right question. When you are sure of the question, discuss we will, said the master.

Whether data comes in BIG or small, these are the only analytics you will ever be using, whether you use it with Type I and Type II errors or only with Type I errors.  Why I bring-in these additional qualifiers using types of error, is a separate discussion.

Coming back to the list, here they are

-Association and Correlation
-Ranking (this is the reason why RFM is going to be there ever as a cute, simple, down to earth method), differencing, and sequencing
-Probability distribution and its characteristics; centrality, variance, tail, skew
-Regression – simple and generalized
-Classification
-Clusters and overlap analysis
-Dimension reduction (and hence variable reduction)
-Time relationships -lag, auto, single, multiple, and components
-Univariate and multivariate relationship (left side, right side, and as a network)
-Bayes and learning methods – Bayes contribution is so fundamental, this will be the only one that will have a name of a person in my list.  I am sure there will be some hot mails about this.  I am willing to listen.

It is bad, it looks like I intentionally made it as 10.  Perhaps I missed something or put some fluff here that makes it as 10.  Help me, and let us get out of the mode of counting as 10.

If you want to know a world wide net that figured out the top 10 algorithms here it is.
http://blog.crmportals.com/2011/09/28/top-10-data-mining-algorithms-ieee-knowledge-and-information-systems-2008/

In fact, the foundation of analytics has only four characters, G, R, P, W one might argue, like A,T,G,C, the foundations of the genome.

– Grouping, Correlation, Probability, and Weighting.  Still, do not confuse algorithms with the destination measure.  This is the second most important question in analytics – what to measure?

Final word of caution; hopefully you have read up to this and the question when does this become the science of consumer engagement, prediction and the joy of money making is still pending. We will attend to some of this in the next discussion.

Until, we meet next, have a wonderful time internalizing the second secret and throw me a challenge as to what I missed in this or what fluff I added here.