My Data Christmas List

In the spirit of the holidays, I want to end the year with a fun blog post – my data Christmas list. Here is what I’m wishing for this year:

  • I wish that all lost datasets may be found. No dataset should be away from home, especially at Christmas.
  • For the data elves to visit in the night and work miracles on documentation.
  • A good backup system under every researcher’s tree.
  • For data to be shared as freely as Christmas joy.
  • Folders all lined up along the mantle in a logical organization structure with a good naming system and filled with brilliant data.
  • Data analysis that is easy to prepare and won’t make a mess of the kitchen; researchers never enjoy an undercooked (or overcooked) analysis.
  • World peace. Or at least making peace with one’s data.

I hope you all have a lovely holiday season. May your data be merry and bright!

Posted in Uncategorized | Leave a comment

Rethinking Research Data

How do you persuade the average person to care about open research data?

This was the challenge that I faced in my recent TEDxUWMilwaukee talk on “Rethinking Research Data“. The theme of the talk was that whenever we publish the results of research, we also need to publish the corresponding data. There are so many examples – from economic austerity to Joachim Boldt – that make this relevant to everyday people.

Untitled | Flickr - Photo Sharing! : taken from - https://www.flickr.com/photos/136904454@N02/21529103724/Author: TEDxUWMilwaukee Team https://creativecommons.org/licenses/by-nc-nd/2.0/
Untitled | Flickr – Photo Sharing! : taken from – https://www.flickr.com/photos/136904454@N02/21529103724/Author: TEDxUWMilwaukee Team https://creativecommons.org/licenses/by-nc-nd/2.0/

So if you’ve never thought about how open data impacts you or are wary of this new data trend, do check out the video to see why open data is important!

Posted in openData | Leave a comment

Getting Credit for Your Research Data

Happy Open Access week! As usual, I’m celebrating the Open Data portion of Open Access, but my special focus this year is on getting credit for your research data. My library did an in-person workshop on the topic earlier this week and I want to share the idea out more widely on the blog because I think it’s important.

The crux of the issue is that more and more researchers are sharing their data (either because their funder/publisher requires it or because the researcher believes in open access to research materials), but not all data sharing venues are made equal. Consider the following sharing pathways[1]:

  1. Sharing data on personal or lab website
  2. Sharing data by request via email or Dropbox
  3. Publishing data as supplemental material for a journal article
  4. Depositing data into a disciplinary data repository
  5. Depositing data into a general research repository or institutional repository.

All of these methods work to distribute your data and many comply with requirements to share. However, not all of these sharing venues will maximize the credit you receive for your data. For example, the last three sharing options will provide researchers with a stable location and a citation for the data, increasing the data’s citability. Option 4, in particular, will probably maximize credit because your research peers are likely to look for your data in a disciplinary repository.

Getting credit for your work is supremely important in research and this doesn’t get any less true when it comes to data sharing. The good news is that sharing data actually increases citation counts on the corresponding article by roughly 10%, with a higher citation boost for older papers.[2] However, this finding likely holds true only if others can actually find your data. Therefore, I encourage you to think about data sharing venues with respect to maximizing your credit.

As a follow up to getting credit for your data, I want to touch on how to actually give credit for using another researcher’s data: data citation. Data citation is very similar to article citation in that you cite the data you used in the works cited section of your article. Where data citation differs is the citation format and that fact that you cite the data separately from the article. Let’s look a little bit at how this works.

At its most basic, a data citation should include the following information:

  • Creator
  • Publication Year
  • Title
  • Publisher
  • Identifier

The format of your data citation can vary across citation styles (APA, Chicago, etc.) but at a minimum should contain these five components. If you don’t have a recommended data citation format, you can use the following:

Creator (PublicationYear): Title. Publisher. Identifier

It’s often considered good practice to cite the corresponding article whenever you cite the dataset, but it’s not strictly necessary. Use your best judgement and always give credit for the content you do use.

I want to wrap up by saying that as we get into a greater regime of data sharing, I hope you start thinking about this topic with respect to maximizing credit. This means placing your data in a location where you’ll get the most credit as well as giving proper credit to others, via data citation, when using their data. Framing data sharing through the lens of credit means that we’ll do right by our data going forward and properly recognize it as an important scholarly product.

 

[1] Many thanks to Lisa Johnston (U Minnesota) for inspiration from her pro/con data sharing exercise
[2] Piwowar HA, Vision TJ (2013) Data reuse and the open data citation advantage. PeerJ 1: e175. http://dx.doi.org/10.7717/peerj.175

Posted in openData | Leave a comment

Do You Have an Institutional Data Policy? aka. Who Owns Your Data?

September seems to be publication month and I’m so excited to share my other item that was just published: a research article on institutional data policy.

This paper came out of a question that my collaborators and I had: who owns the data produced by university researchers? We had a sense that some universities made it clear that they owned the data while the answer was ambiguous at other institutions. Complicating matters further was the question of ownership when researchers from different universities collaborate. This was particularly applicable in our case as my colleagues and I all work at different institutions.

So we set down path of trying to find some clarity around research data ownership only to realize that this is a complex question. Data ownership has many facets including: laws, copyright, funding, policy, etc. To simplify things, we decided to start by looking at what universities say about data ownership. This meant studying university data policy.

For this article, we looked at 206 Carnegie “High” and “Very High” research universities in the United States and pulled any policies on research data that we could find. We found that just under half (44%) of the institutions studied had some policy covering research data. Two-thirds of these policies (29% overall) were stand-alone data policies and one-third (15% overall) were IP policies that cover data.

The good news is that we found that the majority of discovered policies (67%) defined the owner of the data; most often this was the university. The bad news is this means that over two-thirds of all institutions studied (71%) offered no guidance on data ownership at all. With so many new requirements around research data in the United States, data ownership is definitely an area where institutions need to step up and offer more clarity.

We’re still on the path for more answers about data ownership, but in the meantime our research article has a lot more to say about institutional data policies and library data services at US research institutions. I encourage you to check out the paper if these topics interest you and to peruse all of the special data issue of the Journal of Librarianship and Scholarly Communication (bonus: it’s Open Access!). The whole issue has definitely jumped to the top of my reading list!

Citation: Briney, K., Goben, A., & Zilinski, L.. (2015). Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies. Journal of Librarianship and Scholarly Communication, 3(2), 1–25. DOI: http://doi.org/10.7710/2162-3309.1232

Posted in Uncategorized | Leave a comment

“Data Management for Researchers” is Here!

I couldn’t be more excited to tell you that my book, Data Management for Researchers, is now officially published!

Data Management for Researchers

The book is described as:

A comprehensive guide to everything scientists need to know about data management, this book is essential for researchers who need to learn how to organize, document and take care of their own data.

Researchers in all disciplines are faced with the challenge of managing the growing amounts of digital data that are the foundation of their research. Kristin Briney offers practical advice and clearly explains policies and principles, in an accessible and in-depth text that will allow researchers to understand and achieve the goal of better research data management.

This book is a direct descendant of this blog and also describes the practical things researchers can do to take care of their data better. While some of the content may be familiar to long-time blog readers, there is plenty of new information in the book that has never been mentioned here. Being able to expand on the full range of data management topics is definitely a major advantage of the book format!

If you’d like a copy for yourself (or want to recommend your library buy a copy), it’s available through the publisher. You can also get a copy on Amazon (the ebook is already available and the paperback/hardback version will be in stock soon).

Finally, thank you all so much for reading the blog. Without this forum and your interest in the topic, I wouldn’t be the proud author I am today. So thanks.

Posted in dataManagement | 1 Comment

Data Management Threshold Concepts

We’ve been going through the new ACRL “Framework for Information Literacy for Higher Education” recently at work. This document discusses ways to teach students how to search and understand information resources, framing critical skills as “threshold concepts”. While the Framework itself is interesting, I’m really intrigued by the idea of a threshold concept and wonder if there are any threshold concepts for data management.

For those unfamiliar with the term, a “threshold concept” is an idea that, once understood, completely reframes the way you view a topic. It’s like seeing a hidden image in that it’s very difficult to un-see the image afterward. Threshold concepts are so fundament to understanding that it’s actually necessary to understand the concept in order to progress in the field.

Let’s look at the ACRL Framework to better understand how such concepts work. The six concepts are:

  • Authority is Constructed and Contextual
  • Information Creation as a Process
  • Information Has Value
  • Research as Inquiry
  • Scholarship as Conversation
  • Searching as Strategic Exploration

If you understand these concepts, you’ll easily see, for example, why a scholarly article may be an appropriate source for one research project while a blog post would be better for a different project, depending on the topic. Or why searching doesn’t always turn up the content you are looking for on the first try. Etc.

This blog post is not about the new Framework, but rather how the Framework challenged me to think about what the threshold concepts are for data management. Taking a stab at it (directly cribbing from the Framework), I have three ideas:

  • Data is Contextual
  • Data Management is a Process
  • Data Has Value

Let’s look at these individually to get into what I mean in each case.

First, data is contextual. That means that data never exists independently of the information about how it was acquired and processed. Just like a chemist records notes about her data in a lab notebook, so should any dataset come with enough documentation to be understood by someone who is not the dataset creator. Without this extra information, the data is practically useless.

Second, data management is a process. It’s not something that you do once and are done with forever. It’s a process by which you take better care of your data continually over time. That doesn’t mean that it’s incredibly difficult. Rather, it’s like doing regular preventative maintenance to avoid disaster.

Third, data has value. This is something that many researchers are currently grappling with due to new data sharing requirements. If you can understand that your data has value, you can see how published data adds richness to an article, why data should be preserved after the end of a project, and why other researchers might want to use your data (hint: it’s valuable!).

These three ideas are by no means the final say on threshold concepts in data management, only my initial ideas. I’m still mulling them over (for example, I’m wondering if “data is contextual” and “data has value” are truly independent concepts) and trying to figure out if there are more concepts in this field.

I would love to hear other people’s ideas about threshold concepts in data management. Has anyone had an “aha!” moment about something that really affected the way they think about data management? Let me know!

Posted in dataManagement | 1 Comment