Deciding on where to share your data can be difficult. The current universal expectation for data sharing is swiftly becoming: put shared data in a data repository. The challenge for researchers is that the data repository landscape is still evolving. Given the number of data repositories available and how many of them specialize in a particular type of data, it can be helpful to understand how to navigate the data repository landscape.
As a librarian who helps people find data repositories for sharing, I have a basic strategy for picking a repository when I am sharing data. It goes as follows:
- Identify all of the data that needs to be shared.
- Is there is a known disciplinary data repository, such as one used by everyone in your research field for a specific type of data? If so, deposit the relevant data in that repository; continue if there is more data to share, otherwise go to step 7.
- Is there a logical disciplinary data repository on this list of recommended repositories? If so, deposit the relevant data in that repository; continue if there is more data to share, otherwise go to step 7.
- Does your institution have a data repository? If so, deposit the remainder of your data in that repository and jump to step 7.
- Do you have a preferred generalist data repository? If so, deposit the remainder of your data in that repository and jump to step 7.
- Pick a generalist data repository and deposit the remainder of your data and continue to the next step.
- Record the permanent identifier, ideally a DOI, from each data deposit. If you didn’t receive a permanent identifier, go back and select a different repository for that data.
One important thing to know about this strategy is that it doesn’t assume that you will deposit all of your data in the same data repository. This often happens when you are required to deposit a specific type of data in one repository (e.g. genetic information in GenBank) but that repository doesn’t accept all of the data you need to share. To account for this, you should work your way through repositories from most specific to most general until all of the data has been deposit once.
The other notable thing about this strategy is tacked onto the end: you need a permanent identifier for each data deposit. Having a permanent identifier, such as a DOI, for shared data is a newish requirement but one that will soon be universal from funding agencies. For example, the recent NIH data management and sharing policy requires a permanent identifier for shared data; there isn’t a compliance mechanism for this yet, but expect to report these DOIs back to the NIH within a few years. For now, always make sure you get a permanent identifier such as a DOI or accession number when you deposit your data.
I can’t guarantee that this strategy is perfect but it’s a good place to start when you’re trying to figure out what to do with your data. Hopefully, the data repository landscape will get less confusing, but in the meantime, you have a way to navigate it!