The thud you might have heard yesterday was NIH dropping a new Data Management and Sharing Policy. It won’t go into effect until 2023-01-25 but the policy has so many ramifications that I don’t plan to waste time in preparing.
I’m going to do a short overview of initial thoughts here. I expect that I’ll be working through all of the nuances more in the weeks to come.
Here are the highlights for how this policy effects researchers:
- All NIH grants will be required to have a 2-page maximum data management plan (DMP). NIH expects researchers to: be clear in the DMP about where they plan to share (“to be determined” is no longer acceptable), notify them if plans change, and actually follow the plan.
- You will be sharing more data, as NIH not only wants the data that underlies publications but all data that verifies results.
- You will be sharing data sooner. NIH prefers if you share as soon as possible, but at the latest sharing should occur with publication or at the end of the grant period, which ever comes first. That last part is a huge change.
- You will share your data in a repository. Criteria for data repositories are provided in a supplement and I expect to see more in this area between now and 2023.
- You can ask for money to support data management and sharing activities, including pre-paying for long-term hosting of open data.
- If you conduct data on people, sharing expectations are changing. NIH really wants researchers to fine-tune the balance between sharing and privacy. Two mechanisms explicitly called out are outlining sharing practices during informed consent and controlled data sharing, even for de-identified data. This is another area where I want to see more development.
- If you are doing research on indigenous populations, you must respect Tribal sovereignty. This is a great addition to the policy.
I think this is a good policy, though it’s definitely overdue. I don’t love the lack of clarity around retention times and I’m not sure how I feel about review of DMPs shifting from peer reviewers to program officers. But these are minor quibbles in what I think is a pretty solid policy.
The biggest takeaways is that this policy represents a shift in expectations for data sharing. It has stronger requirements than the NSF data policy and will really move things forward. Some people are going to hate it and it’s going to be a big adjustment, but it’s a win for reproducibility and open data.
I appreciate this TL;DR!
I’m curious whether any actual data repos (or AWS, GCP, etc.) allow prepaying for storage. This may be something that needs to change on the vendor side before we can comply. Given storage costs always go down, I’ve often wondered if someone might someday offer an up-front charge to store your data ‘forever’.