Data Quality Issues – Incorporate in the beginning when a project starts – Don’t catch the tiger by tail!



“Quality is designed in the process; it is not checked or verified in the end (after the process is streamlined)” – Deming – Quality Guru.

Some of the simple measurements that will help the whole team to be on the same page regarding data quality are the following:

The whole team has to sign of on the simple measurements such as:

– Total number of records (possibly at some finer levels)
– range of the values of each field
– The meta data of the layout of a record and the whole data set
– Simple expected relationships among some of the key variables

So an important activity here is that it is not enough to see some sample records; you have to do some proc univariate, proc means, and proc frequency type analysis and as part of commitment to work, submit a report using the outputs that would pointing out variable by variable anomalies or acceptability of that variable so that client can have input for you immediately.

This has to be a 24 hour or 48 hour turn around for client appreciation.  Line up your resources before hand so that “continuous rowing of the boat” is happening, so to say.

If these are not agreed in the beginning of the project’s first week, you have really started handling the tiger by the tail or the elephant by its tusk, unless you play with innocent baby tiger or an innocent baby elephant.

Then everybody is worried about saving oneself from the tiger, and the focus on time management as part of total project management will be seriously challenged and some body in the team will be hurt!

Watch out, there will be lot of frustration and the team will be pulled down from down under in its boat race to keep up with the time commitment of project management. Nothing can save some body in the team getting blamed. It is the responsibility of the team leader to force this discipline.

From Data Monster & Insight Monster

Leave a Reply

Your email address will not be published. Required fields are marked *