I’m currently developing a workshop on creating a data management plan (DMP) and, as part of this development process, I want to identify the absolute most important things to know in order to create a DMP. Part of the reason for this is that I have a finite amount of time to address DMPs in this session but also because I don’t want to waste people’s time covering less important information.
To start my development process, I’ve come up with a list of some things a researcher might want to know when create a data management plan:
- What is a DMP?
- Why create a DMP?
- What are the benefits of a DMP (other than getting funding)?
- What are key parts of a DMP?
- What information do I need to know for each part of a DMP?
- What are the specific DMP requirements for my grant program?
- Where can I find an example DMP from my field?
- Where can I get help on my DMP?
- Do I really have to share my data?
- How will my DMP be assessed?
- I don’t have NSF funding, why should I care about a DMP?
- Are there any tools/resources I can use to create my DMP?
From this list, it’s clear that some of these points may be better addressed on a webpage of resources than during an in-person session (ie. finding DMP requirements, finding example plans, and a list of DMP tools/resources). Other points are simply not a priority to cover.
This leaves me with, what I think, are the most important things to know for creating a data management plan:
- Why are researchers being asked to create a DMP (why create a DMP/benefits of a DMP)?
- What are the key parts of a DMP?
- How do I apply each of these key parts to my research?
These points also translate nicely into working through an outline DMP during my planned session, meaning researchers will leave the session with something usable and concrete.
With these three points identified, let’s dig into each one a bit more. I’ll cover the first two points in this post and the third in another post in a couple weeks.
Why Are Researchers Being Asked to Create a Data Management Plan?
Researchers with NSF and NEH Digital Humanities Directorate (pdf link) funding are currently required to create a data management plan as part of their grant applications. In the next few years, the other federal funders will add similar requirements for DMPs in response to the recent White House OSTP Public Access memo (pdf link). So everyone is getting on the DMP wagon, but the question is why?
From the funder perspective, data represents significant scholarly products that are not being utilized to their full potential (this is especially troubling to funders in the current financial environment). For this reason we are seeing funder mandates for data sharing; the eventual goal is to have massive data sharing akin to the distribution of scholarly articles. The barrier to reaching this goal is the fact that most research data are not well managed and often aren’t maintained past the publication of the associated article. So data management plans are really the first step toward a new way of conducting research because well managed data are more easily shared data.
From the researcher perspective, DMPs are a requirement but also an aid to the research process. I’ve talked about it on this blog before, but deliberate management of data makes it easier to conduct research. Good data management means that researchers are less likely to lose data, more likely to find data when they need it, and can more easily use the data due to better organization and documentation. I’ve even heard it said that one minute of data planning at the start of the project will save 10 minutes of headache later in the project.
The bottom line is: yes, you’re being asked to jump through another hoop in to get funding, but if you’re already creating a plan why wouldn’t you use it to make your research easier?
What Are the Key Parts of a Data Management Plan?
An NSF data management plan must include the following information:
- The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project
- The standards to be used for data and metadata format and content
- Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements
- Policies and provisions for re-use, re-distribution, and the production of derivatives
- Plans for archiving data, samples, and other research products, and for preservation of access to them
The actual DMP requirements will vary from agency-to-agency and even between directorates within one particular agency, so you’ll want to look up the requirements for your particular grant before you write up your DMP. Still, we can distill NSF’s requirements into some common themes for the composition of a DMP. Basically, your plan should answer the following questions:
- What types of data will I create?
- What standards will I use to document the data?
- How will I archive and preserve the data?
- How will I protect private/secure/confidential data?
- How will I provide access to and allow reuse of the data?
These are the key questions you need to ask yourself when creating any data management plan. They represent the many aspects of managing data from creation and documentation through preservation and reuse. By answering these questions, you will come up will a way to manage your data throughout the project.
I’ll go into these 5 questions more in my next post and discuss how to apply each question to your individual research project.
Pingback: The Absolute Most Important Things to Know in Order to Create a Data Management Plan (Part 2) » Data Ab Initio