Monthly Archives: July 2009

10 Questions for Prediction – I am sure this can become a 100 questions list

(1) A good lift chart is one which can be based on deciles (x-axis) vs. some good measure of lift index (y-axis)

YES/NO
(2) Collinearity should be apparent by looking at the correlation coefficient
YES/NO
(3) Using the weights (single number for responders and a single number for non-responders) is important to rank the responsiveness of consumers; in other words, if the weights are not used we mess up the ranks
YES/NO
(4) Assuming there is no problem with the quality of the independent variables, if the analyst can not explain the meaning of how the variable fits in to the business process, then that is good enough reason to drop the variable, even if it belongs to the top 10 variables – Just get creative and it is part of the job
YES/NO
(5) NeuralNet or Genetic algorithm (which are black boxes in interpretation) are not useful as much as the interpretable models which a statistician can build.
YES/NO
(6) The validation data set always never looked at until analysts build the model with the remaining sample and only once the analyst uses validation data set to see how the model will perform in real world – a common practice
YES/NO
(7) Decision trees that assume to split the records with few (it could be just 2) leaves per node (the extreme split) are not good choices
YES/NO
(8) Factor analysis are not good as the factors are not interpretable like other attributable/describable/interpretable methods
YES/NO
(9) Essentially all the black box methods Neural nets/Genetic algorithmic methods do have ideas similar in statistics but with terminologies different
YES/NO
(10) It is a good idea to do committee models and make a choice of the models using a committee (could be electronic agents or human committee members).
YES/NO