! Today’s greatest news is about “luck” being the independent variable in a regression model !
There are many reasons why this kind of language happens. The different language constructs and terminologies among scientists and media people, the need for media to reach out to all the mass in a simplified way and communicate the conclusions, scientists not giving importance to the fact that “unknown is too much” and there is no proper caveats mentioned for the “unknown”.
This one graph explains the main conclusion. The more the cell divisions, you are more likely to get cancer but then division happens to all people equally as human beings, in general. So who gets cancer, and who does not is just “luck”.
In the picture above, the authors conclude that it is observed in observational studies that “small intestine” shows lower likelihood of cancer compared to “colorectal”.
So combining all the information above and some additional computations on environmental and genetics, the ability to explain the occurrence of cancer by cell division is lot more than environmental and genetics, and even after controlling for environmental and genetics, the correlation is still significant.
For more details are in http://news.sciencemag.org/biology/2015/01/simple-math-explains-why-you-may-or-may-not-get-cancer. Please see the original article for a complete coverage.
From teaching point of view, this is a great example for a discussion on “left out variables”. What are the left out variable? Why left out variable is always a mystery and many of the problems perhaps can be traced to left out variable? How do diagnostic plots look like with and with out left out variable? and so on.
Left out variables are well studied in econometrics and here is a chapter 6 of the book, http://www.aw-bc.com/info/studenmund/book.html, beautifully explained with many examples and many types of specification errors.
The important conclusion of left out variables are, “If a variable that is correlated with one of the variables in the model and it is left out, then we will get a biased relationship”, and that is my hypothesis for the above conclusion, all else being acceptable.
Why I am committing to comment about this successful scientists’ publication is the following. The conclusions from this article is circulating around the world and it is an important conclusion. So it is an important point for scientists to understand and explain.
What if there is another thing going on in the cell division and cells’ absorption of extraneous material – organic and non-organic – in a processed form where transfer of carcinogenic items/processes are picked up. Then the more the cell division happens, the more likely one will get cancer. Is this eliminated in the study? Well that is not the purpose of the study because, this is a macro level analysis of aggregate data.
That brings out the next point, which is a common problem in usage of macro level aggregate data analysis.
People use aggregate analysis and interpret it as if they are case-control methods; not intended but at least that is how people are likely to take it.
For some historical cases of caveats of lab/data analysis, published in Economist is here, http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble
Recalling or undoing published papers in prestigious journals do happen. Here is a quote from Economist.
“Similar problems undid a 2010 study published in Science, a prestigious American journal (and reported in this newspaper). The paper seemed to uncover genetic variants strongly associated with longevity. Other geneticists immediately noticed that the samples taken from centenarians on which the results rested had been treated in different ways from those from a younger control group. The paper was retracted a year later, after its authors admitted to “technical errors” and “an inadequate quality-control protocol”.
The number of retractions has grown tenfold over the past decade. But they still make up no more than 0.2% of the 1.4m papers published annually in scholarly journals. Papers with fundamental flaws often live on. Some may develop a bad reputation among those in the know, who will warn colleagues. But to outsiders they will appear part of the scientific canon”
Note that the authors are not saying that environment and genetics do not play a role. The chance is overwhelming.
This is not uncommon. We know only around 5% of variations in predictability is due to genetics in breast cancer. My doctor says, in some therapeutic classes, the efficacy of medications is hardly 5% to 10%. We use such medications because it still explains curing such percentage of incidents. The fact that 90% is not explainable by genetics or environment is not interpreted as “luck” as the reason.
Some where, the media want to say it in simpler term so that common people may understand the interpretation, but loose the scientific lingua, which is precise. It is not being geeky to be scientific! It can be life and death for all of us.
Here is the biggest question. Will big data change the above conclusion. My favored hypothesis is “YES”. Only, right data and right analysis will give us the assertive answer either way.