With so many new policies from funding agencies and journals requiring data sharing, it’s growing more likely that you will encounter a data sharing mandate at some point in time. However, it can be difficult to know how to comply if you are new to such requirements. This is because, while the act of sharing data is not complicated, data sharing comes with new systems and best practices that are unfamiliar to many researchers. So let’s walk through the process of sharing your data so you know what to do when faced with a data sharing requirement.
Policy sources
The two most common places you will encounter data sharing requirements are your funder and the journal in which you publish. A list of US funders with data management and sharing requirements is available from the DMPTool. A list of journals requiring data sharing is available from Dryad. Always refer to the specifics of the policies that apply to you, as they can vary from the general description of data sharing requirements I’m outlining here.
What to share
To satisfy most data sharing requirements, you should share any data that underlie a publication. This means making available any and all data necessary to prove or reproduce your findings. Since data are so heterogeneous, you do have some leeway in the exact form of the data you share. Use your best judgment as to whether your peers will prefer raw data, analyzed data, data in a particular file format, etc. Do be sure to perform quality control on your data and add documentation prior to sharing.
When to share
Data sharing should occur at or slightly after the time you publish the article to which the data belong. Note that a few journals want to see your data during peer review (see below). With a few exceptions, you are not required to share you data before you publish your findings.
How to share
The best way to share your data is to place it in a data repository. Repositories are preferable to sharing-by-request as the repository does all of the work to ensure data persistence and discoverability. A repository is a very hands-off way to share once you deposit the data. Repositories also make data more findable and citable, meaning you’re more likely to get recognition for your work. To find a repository, look for suggestions from your journal, your local librarian, or on the repository lists at DataBib and re3data.
Peer review
While peer review is not the norm for shared data, there are methods available for you to have your datasets peer reviewed. The first is that a few journals look at data as part of the peer review process. More common is publishing your data as a “data paper”. Whereas a normal article describes the analysis done on a dataset, a data paper describes the dataset itself and undergoes peer review in tandem with the data. The reason some researchers prefer sharing data via data papers, besides providing thorough documentation and being peer reviewed, is that data papers receive citations just like articles. To see the journals that accept data papers, refer to this list from the University of Michigan library.
Final thoughts on data sharing
Data sharing is not complicated but it does to require work to clean up your data, add documentation, and deposit your data into a repository (though it does become hands-off at this point). One scientist estimated that he spent almost 10 hours preparing a dataset for public sharing, though he expected that preparation time for the next shared dataset would be shorter. I think that this demonstrates one of the biggest barriers to data sharing: we’re not used to doing it. The systems take time to learn and we have to think about preparing our data for sharing while we’re actively working on them in the middle of a project.
Eventually, everyone will get used to thinking about data as important research products and the systems for sharing data will become more established. In the meantime, I hope this post provides some clarity on complying with new data sharing requirements.