Clarification and Correction on My Uniform Guidance Post

After talking more yesterday with my university’s compliance person about the new Uniform Guidance, I realize that I misinterpreted the “new” part of the guidance relating to data, A-81 section 200.430, in my last post. Having now read through the guidance several more times (don’t you just love long, dry government documents?), I want to correct my comments on this section.

For clarity, here is the section in question:

(i) Allowable activities. Charges to Federal awards may include reasonable amounts for activities contributing and directly related to work under an agreement, such as delivering special lectures about specific aspects of the ongoing activity, writing reports and articles, developing and maintaining protocols (human, animals, etc.), managing substances/chemicals, managing and securing project-specific data, coordinating research subjects, participating in appropriate seminars, consulting with colleagues and graduate students, and attending meetings and conferences. [emphasis mine]

While I originally interpreted this as meaning all data management expenses can be charged to a federal grant (if you’re at an institute of higher education), really it is only people’s time spent managing data that is allowable. This is part of a larger expansion of allowable personnel charges, such as for administrative staff, under the new Uniform Guidance. My fault for not reading more carefully that this section applies to only people’s time.

Do note that this does not supersede any individual funders’ stipulations that allow a wider variety of data management expenses (eg. storage infrastructure, preservation in a repository, etc.) to be charge to a grant.

While I’m obviously disappointed that my original interpretation is not correct, it is still nice to see the cost of data management explicitly being allowed to be paid for by a federal grant. Because data management certainly requires people’s time to perform. That said, it also usually requires infrastructure and I’d like to see funders do more to cover the total cost of taking care of research data.

Posted in dataManagement, fundingAgencies, government | Leave a comment

New Federal Grants Guidance and How It Effects Data

If I made a list of the things I cite the most in the course of my job as a data management specialist, at the top would be ISO 8601, the recent Vines, et al. study on data loss over time, and OMB Circular A-110. I’ve already written about the first two on my blog and I want to finally consider Circular A-110 in this post.

Circular A-110 comes from the White House Office of Management and Budget (OMB) and is the document that defines research data and retention requirements for all research supported by US federal funding. It’s also no longer applicable to federally-sponsored research in the US.

Replacing A-110 and several other Circulars is the new Uniform Guidance, also known as OMB Circular A-81. This document was designed to standardize guidance for everyone receiving federal funding in the US (hence the name “Uniform Guidance”). For this reason, it echoes many of the requirements that were in place before but with a few exceptions. Most of these exceptions concern grants administration and are not relevant to this blog, but I am interested in what the new guidance says about data.

On the whole, the new Uniform Guidance looks a lot like the old A-110. For instance, it includes a verbatim copy of the definition of “research data” from A-110 (see A-81 section 200.315):

(3) Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This “recorded” material excludes physical objects (e.g., laboratory samples). Research data also do not include:

(i) Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and

(ii) Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.

Section 200.315, like the A-110 section 36, also states that the Federal government has a right to access and reproduce data produced under a federal award and delineates how to respond to a Freedom of Information Act request for data.

d) The Federal government has the right to:

(1) Obtain, reproduce, publish, or otherwise use the data produced under a Federal award; and

(2) Authorize others to receive, reproduce, publish, or otherwise use such data for Federal purposes.

(e) Freedom of Information Act (FOIA).

(1) In addition, in response to a Freedom of Information Act (FOIA) request for research data relating to published research findings produced under a Federal award that were used by the Federal government in developing an agency action that has the force and effect of law, the Federal awarding agency must request, and the non-Federal entity must provide, within a reasonable time, the research data so that they can be made available to the public through the procedures established under the FOIA. If the Federal awarding agency obtains the research data solely in response to a FOIA request, the Federal awarding agency may charge the requester a reasonable fee equaling the full incremental cost of obtaining the research data. This fee should reflect costs incurred by the Federal agency and the non-Federal entity. This fee is in addition to any fees the Federal awarding agency may assess under the FOIA (5 U.S.C. 552(a)(4)(A)).

A-81 also still requires a 3-year retention period for all research records (see A-81 section 200.333), though the exceptions differ slightly from those in A-110:

Financial records, supporting documents, statistical records, and all other non-Federal entity records pertinent to a Federal award must be retained for a period of three years from the date of submission of the final expenditure report or, for Federal awards that are renewed quarterly or annually, from the date of the submission of the quarterly or annual financial report, respectively, as reported to the Federal awarding agency or pass-through entity in the case of a subrecipient. Federal awarding agencies and pass-through entities must not impose any other record retention requirements upon non-Federal entities. The only exceptions are the following:

(a) If any litigation, claim, or audit is started before the expiration of the 3-year period, the records must be retained until all litigation, claims, or audit findings involving the records have been resolved and final action taken.

(b) When the non-Federal entity is notified in writing by the Federal awarding agency, cognizant agency for audit, oversight agency for audit, cognizant agency for indirect costs, or pass-through entity to extend the retention period…

On the whole, these requirements are the same (and often verbatim copies of) requirements from OMB A-110.

There is, however, one section of the new Uniform Guidance concerning data that does not appear in Circular A-110. This is A-81 section 200.430, which states that grants to institutions of higher education may include the following items in their budgets:

(i) Allowable activities. Charges to Federal awards may include reasonable amounts for activities contributing and directly related to work under an agreement, such as delivering special lectures about specific aspects of the ongoing activity, writing reports and articles, developing and maintaining protocols (human, animals, etc.), managing substances/chemicals, managing and securing project-specific data, coordinating research subjects, participating in appropriate seminars, consulting with colleagues and graduate students, and attending meetings and conferences. [emphasis mine]

This means that you are allowed to charge data management expenses people’s time spent managing data [ADDED 2015-02-18, see follow up post on this] to your grant. Currently, many US funding agencies requiring data management plans already allow data management-related expenses to be added to the grant budget, but this appears to be an entirely new stipulation at the federal level. Personally, I’m very happy to see this allowance in the new Uniform Guidance because researchers often need funds to manage data properly.

Overall, there’s very little change to the research data landscape under the new Uniform Guidance with the exception that all university researchers can now charge data management expenses to their grants. This is definitely something I plan to promote more to the researchers on my campus!

Posted in fundingAgencies, government | 1 Comment

“Data Is” or “Data Are”?

Want to start a disagreement amongst data managers? Ask them if “data” is a singular or plural noun. Does one say “data are” or is it better to say “data is”? Data people often have opinions about which is correct (and will let you know about it).

Personally, I’ve been on the “data are” side of this war for some time. This is partly due to the fact that my PhD advisor drilled into my head that one must never say “spectrums”; it’s either one spectrum or many spectra. Likewise “data” is the plural form. However, I recently had an opportunity to re-evaluate my viewpoint and am starting to lean more toward “data is”.

Much of the reason for my change of opinion came from feedback on my writing. As much as “data are” seems like it should be correct, many people stumble over reading this in a sentence. The meaning of the sentence gets lost as the brain tries to process the grammar. As a writer, this is the last thing that I want. Therefore, I started considering using “data” as a singular noun.

The other thing that moved me toward “data is” was the essay sent to me by a fellow data manager, Amanda, called “Data is a singular noun”. The author makes a good case, based on history and grammar, that it should always be “data is” instead of “data are”.

Part of this author’s reasoning is due to the fact that a word’s usage and evolution in English are more important than how the word’s originating language says the word should be used. So even though Latin suggests “data” should be plural, what matters most is how people actually use the word “data” in English. A second reason for choosing the singular is that we really never use the word “datum” anymore. This presumes that “data” is de facto singular form for this word. Either way, there’s a lot of history behind using “data” as a singular noun.

I’m sure that we’ll eventually reach a point where there is a conclusive answer to this question. Until then, I’m going to try to be more conscious about using “data is”. At least in my writing.

So, “data is” or “data are”? What do you think?

Posted in Uncategorized | Leave a comment

2015 Data Resolutions

With only a little time left in 2014, I’m sure I’m not the only one making New Year’s resolutions. While many people put diet and exercise at the top of this list, this year I’m making a few data management resolutions. I hope you will consider adding a similar goal to your 2015 list.

For all I blog about data management and do pretty well at managing my own digital content, there are a few things that I need to do better. Data management requires paying attention to your content and it’s easy to let things slides. For me, it’s my personal files that are getting out of hand. My work data are fairly organized and well backed up, but my personal files are a mess. Thankfully, most of the data management tricks that work for scientific data work for digital content in general.

Overall, I’m facing two problems. The first is data spread. I don’t have a consistent organization system for all of my personal content and have things randomly saved across multiple devices (laptop and external hard drive) and cloud storage platforms (Dropbox, Google Drive, and SpiderOak). Worse, there is no rhyme or reason to why a file gets saved, for example, to Dropbox instead of in my laptop’s documents folder. I spent a good 20 minutes last week failing to find a particular sewing pattern only to have it show up a day later in the most unlikely place (my SpiderOak Hive folder). Clearly, I need to be more conscious about where I’m saving which files.

Related to the data spread is the fact that I don’t have a good backup system in place for my personal files (though I do have a good backup system at work). With files in so many places, it’s practically impossible to make sure everything’s backed up properly. Additionally, my laptop is getting older and I need to be sure that all of my files are safe in case my hard drive dies in the near future.

So this year, I resolve to spend a little time getting my personal files organized, streamlined onto one central platform, and properly backed up. This will take a little time but will pay off hugely if/when my laptop finally dies.

I hope that by admitting my own flaws in data management you can see that nobody’s a perfect data manager. Instead, what matters most is that you make an effort. Any little bit I do in 2015 to take care of my personal files makes my files better protected from loss and easier to find when I need them. It’s not hard, it just requires a little work.

With the advent of the new year, I hope that you too take some time to care for your digital content. The start of the new year is the perfect time to review what needs attention or to resolve to improve your practices. It doesn’t have to be big, it just has to be something; every little bit helps. Therefore, I challenge you to make 2015 the year to start improving your data management habits.

Posted in dataManagement | Leave a comment

How to Share Your Research Data

With so many new policies from funding agencies and journals requiring data sharing, it’s growing more likely that you will encounter a data sharing mandate at some point in time. However, it can be difficult to know how to comply if you are new to such requirements. This is because, while the act of sharing data is not complicated, data sharing comes with new systems and best practices that are unfamiliar to many researchers. So let’s walk through the process of sharing your data so you know what to do when faced with a data sharing requirement.

Policy sources

The two most common places you will encounter data sharing requirements are your funder and the journal in which you publish. A list of US funders with data management and sharing requirements is available from the DMPTool. A list of journals requiring data sharing is available from Dryad. Always refer to the specifics of the policies that apply to you, as they can vary from the general description of data sharing requirements I’m outlining here.

What to share

To satisfy most data sharing requirements, you should share any data that underlie a publication. This means making available any and all data necessary to prove or reproduce your findings. Since data are so heterogeneous, you do have some leeway in the exact form of the data you share. Use your best judgment as to whether your peers will prefer raw data, analyzed data, data in a particular file format, etc. Do be sure to perform quality control on your data and add documentation prior to sharing.

When to share

Data sharing should occur at or slightly after the time you publish the article to which the data belong. Note that a few journals want to see your data during peer review (see below). With a few exceptions, you are not required to share you data before you publish your findings.

How to share

The best way to share your data is to place it in a data repository. Repositories are preferable to sharing-by-request as the repository does all of the work to ensure data persistence and discoverability. A repository is a very hands-off way to share once you deposit the data. Repositories also make data more findable and citable, meaning you’re more likely to get recognition for your work. To find a repository, look for suggestions from your journal, your local librarian, or on the repository lists at DataBib and re3data.

Peer review

While peer review is not the norm for shared data, there are methods available for you to have your datasets peer reviewed. The first is that a few journals look at data as part of the peer review process. More common is publishing your data as a “data paper”. Whereas a normal article describes the analysis done on a dataset, a data paper describes the dataset itself and undergoes peer review in tandem with the data. The reason some researchers prefer sharing data via data papers, besides providing thorough documentation and being peer reviewed, is that data papers receive citations just like articles. To see the journals that accept data papers, refer to this list from the University of Michigan library.

Final thoughts on data sharing

Data sharing is not complicated but it does to require work to clean up your data, add documentation, and deposit your data into a repository (though it does become hands-off at this point). One scientist estimated that he spent almost 10 hours preparing a dataset for public sharing, though he expected that preparation time for the next shared dataset would be shorter. I think that this demonstrates one of the biggest barriers to data sharing: we’re not used to doing it. The systems take time to learn and we have to think about preparing our data for sharing while we’re actively working on them in the middle of a project.

Eventually, everyone will get used to thinking about data as important research products and the systems for sharing data will become more established. In the meantime, I hope this post provides some clarity on complying with new data sharing requirements.

Posted in openData | Leave a comment