Follow :
A Statistician’s Ten steps for data quality management.

Identify and agree regarding the system implemented meta data vs. business logic supporting meta data, every time you receive data.  Always ask for a data dictionary which is managed by the IT department.  Also, ask for first and the last 10 records of the data that are being delivered.

  1. Ask for data to be delivered in a particular format (CSV, TXT with special separation character, EXCEL, or Other database forms, SAS, SPSS, DB2, … ) that you are very familiar to handle.  Over a long period of experience, I found it easier if the data is delivered in fixed format text form.  Yet, it is much easier if there is an automation that would create what is called ‘Data Audit Report’ for analysts to have a quick look at the delivered data and communicate with the data delivery team on the quality of the data.
  2. Make sure you can read the data and output the top 10 and bottom 10 records.  Visually read the sample data for each of the variables and make sure it matches with the data promised to have been delivered to you by the IT department.
  3. Check to see whether total number of observations sent by the provider and the total number of observations received are the same.
  4. How are the numeric elements coded? Numeric or character?
  5. If a field is a numeric element, find out (1) is it Integer or not, (2) Min, (3) Max, and (4) Number OF Missing values for numerical variables.  Check out the equivalence of full list of alpha (character) values along with number of missing for alpha variables
  6. Check for all consistency checks in the data that exist among variables.  For example, if there is a total revenue and also revenue by product groups, make sure the sum of the product group revenues is same as total revenue, after checking with business/IT managers that such a consistency check exist or not.  This is a tricky part. Because there are so many ways you can identify the consistency checks.  Identify the quick major ones and check it out.
  7. The Data Audit Report should also have distributions of each of the variable.  If a variable is a numeric variable, use quintiles or deciles to see the distribution.  If a variable is a character variable, use the occurrences of each of the characters.
  8. Make sure weights are provided if there is a sample survey or if sample is taken from a population.  If weights are not provided create a weighting system using an available auxiliary variable that is available for the full population.
  9. If the data is provided for a predictive model, make sure you are selecting the right reference population when modeling the target population.  It is not the whole US population list whether it is B2B or B2C application.
  10. Missing value distributions (missed or not) should also be covered in any communication with the IT department so that re-orienting the processes for better capture of data can be implemented.
From Data Monster & Insight Monster



08/22/2016 1:35am

Yeah, these ten steps are very important. Thank you for this great and informative post.

10/19/2016 11:51pm

Thanks for sharing such great post, download showbox apk for android and watch HD movies for free.

08/22/2016 6:01am

download the best movie streaming application app. killer of all app.

09/28/2016 2:26pm

The only difference between successful and average people is their ability to manage their tasks effectively. This ability could be improved significantly by using task management software and other task management tools. Try the following recommended tools to improve your task management abilities...

10/25/2016 1:30am

If you face any kind of trouble during movies downloading and you want to see latest movies in HD format then you need an app like showbox nowadays this app getting much popular because of its attractive and trendy features.

11/30/2016 4:56am

If you are video lover and love to watch latest movies then showbox is the most easiest way to get latest collection of movies anytime you need to follow just few simple steps.

11/01/2016 4:13am

These ten steps might be very useful for me. I will use your experience in the future.

04/11/2017 9:01am

I'm using Excel and csv for collecting and managing the data. This is a good system with a lot of functions.

Comments are closed.