I can’t tell you how happy I am to be back to this blog, talking about data. I’ve actually spent a lot of the last month writing about data issues, but for my last class of my Master’s degree in library and information studies instead of this blog. On that front, I’m happy to report that I graduated this past weekend!
My last assignment for my degree involved writing on data sharing. While all of my thoughts on the topic are too numerous to write about in a single blog post, there is one particular thread of the assignment worth elaborating upon here: the recent Reinhart and Rogoff news.
If you missed it, Reinhart and Rogoff are two Harvard economics professors who published a study (pdf) examining economic growth for countries with high debt-to-GDP ratios. Their finding have been used as evidence for austerity measures in both America and Europe. Unfortunately, their conclusions are wrong because their analysis is flawed.
The errors were discovered by a UMass-Amherst grad student Thomas Herndon who read the paper and tried to reproduce the analysis. Failing to do so, he contacted the authors and was given access to the spreadsheet containing their data and analysis. Upon examining the spreadsheet, Herdon found data points erroneously discarded and coding errors. When the errors were fixed, the conclusions of the original paper were not supported.
This story is important for a few reasons. First, the article has had significant and most likely negative impact on the American and European economies. Second, it was only through the sharing of the original data and analysis that the errors were conclusively discovered and proven. Third, had the original authors not chosen to share their data (which is still not a common practice) the errors and resulting economic policies could have persisted for years.
I find this story to be one of the best examples of the power of data sharing, between the paper’s significant impact and the fact that a careful reading of the article was not enough to conclusively prove mistakes. Stanford statistic professor David Donoho once likened (pdf) journal articles to the advertising of scholarship, whereas the data and analysis are the actual scholarship. That perfectly encapsulates the issues here.
Science values reproducible work but reproducibility often can’t be proven from articles alone. Thankfully, checking for reproducibility becomes easier if data sharing is a part of the standard research process. Scientists can go directly to the data and analysis if they have questions with the work.
The ultimate goal is to have an accurate scientific record, preventing more studies like Reinhart and Rogoff’s from causing harm. And as evidenced from the Reinhart and Rogoff story, data sharing can play an important role in reaching this goal.