Do You Have an Institutional Data Policy? aka. Who Owns Your Data?

September seems to be publication month and I’m so excited to share my other item that was just published: a research article on institutional data policy.

This paper came out of a question that my collaborators and I had: who owns the data produced by university researchers? We had a sense that some universities made it clear that they owned the data while the answer was ambiguous at other institutions. Complicating matters further was the question of ownership when researchers from different universities collaborate. This was particularly applicable in our case as my colleagues and I all work at different institutions.

So we set down path of trying to find some clarity around research data ownership only to realize that this is a complex question. Data ownership has many facets including: laws, copyright, funding, policy, etc. To simplify things, we decided to start by looking at what universities say about data ownership. This meant studying university data policy.

For this article, we looked at 206 Carnegie “High” and “Very High” research universities in the United States and pulled any policies on research data that we could find. We found that just under half (44%) of the institutions studied had some policy covering research data. Two-thirds of these policies (29% overall) were stand-alone data policies and one-third (15% overall) were IP policies that cover data.

The good news is that we found that the majority of discovered policies (67%) defined the owner of the data; most often this was the university. The bad news is this means that over two-thirds of all institutions studied (71%) offered no guidance on data ownership at all. With so many new requirements around research data in the United States, data ownership is definitely an area where institutions need to step up and offer more clarity.

We’re still on the path for more answers about data ownership, but in the meantime our research article has a lot more to say about institutional data policies and library data services at US research institutions. I encourage you to check out the paper if these topics interest you and to peruse all of the special data issue of the Journal of Librarianship and Scholarly Communication (bonus: it’s Open Access!). The whole issue has definitely jumped to the top of my reading list!

Citation: Briney, K., Goben, A., & Zilinski, L.. (2015). Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies. Journal of Librarianship and Scholarly Communication, 3(2), 1–25. DOI:

Posted in Uncategorized | Leave a comment

“Data Management for Researchers” is Here!

I couldn’t be more excited to tell you that my book, Data Management for Researchers, is now officially published!

Data Management for Researchers

The book is described as:

A comprehensive guide to everything scientists need to know about data management, this book is essential for researchers who need to learn how to organize, document and take care of their own data.

Researchers in all disciplines are faced with the challenge of managing the growing amounts of digital data that are the foundation of their research. Kristin Briney offers practical advice and clearly explains policies and principles, in an accessible and in-depth text that will allow researchers to understand and achieve the goal of better research data management.

This book is a direct descendant of this blog and also describes the practical things researchers can do to take care of their data better. While some of the content may be familiar to long-time blog readers, there is plenty of new information in the book that has never been mentioned here. Being able to expand on the full range of data management topics is definitely a major advantage of the book format!

If you’d like a copy for yourself (or want to recommend your library buy a copy), it’s available through the publisher. You can also get a copy on Amazon (the ebook is already available and the paperback/hardback version will be in stock soon).

Finally, thank you all so much for reading the blog. Without this forum and your interest in the topic, I wouldn’t be the proud author I am today. So thanks.

Posted in dataManagement | Leave a comment

Data Management Threshold Concepts

We’ve been going through the new ACRL “Framework for Information Literacy for Higher Education” recently at work. This document discusses ways to teach students how to search and understand information resources, framing critical skills as “threshold concepts”. While the Framework itself is interesting, I’m really intrigued by the idea of a threshold concept and wonder if there are any threshold concepts for data management.

For those unfamiliar with the term, a “threshold concept” is an idea that, once understood, completely reframes the way you view a topic. It’s like seeing a hidden image in that it’s very difficult to un-see the image afterward. Threshold concepts are so fundament to understanding that it’s actually necessary to understand the concept in order to progress in the field.

Let’s look at the ACRL Framework to better understand how such concepts work. The six concepts are:

  • Authority is Constructed and Contextual
  • Information Creation as a Process
  • Information Has Value
  • Research as Inquiry
  • Scholarship as Conversation
  • Searching as Strategic Exploration

If you understand these concepts, you’ll easily see, for example, why a scholarly article may be an appropriate source for one research project while a blog post would be better for a different project, depending on the topic. Or why searching doesn’t always turn up the content you are looking for on the first try. Etc.

This blog post is not about the new Framework, but rather how the Framework challenged me to think about what the threshold concepts are for data management. Taking a stab at it (directly cribbing from the Framework), I have three ideas:

  • Data is Contextual
  • Data Management is a Process
  • Data Has Value

Let’s look at these individually to get into what I mean in each case.

First, data is contextual. That means that data never exists independently of the information about how it was acquired and processed. Just like a chemist records notes about her data in a lab notebook, so should any dataset come with enough documentation to be understood by someone who is not the dataset creator. Without this extra information, the data is practically useless.

Second, data management is a process. It’s not something that you do once and are done with forever. It’s a process by which you take better care of your data continually over time. That doesn’t mean that it’s incredibly difficult. Rather, it’s like doing regular preventative maintenance to avoid disaster.

Third, data has value. This is something that many researchers are currently grappling with due to new data sharing requirements. If you can understand that your data has value, you can see how published data adds richness to an article, why data should be preserved after the end of a project, and why other researchers might want to use your data (hint: it’s valuable!).

These three ideas are by no means the final say on threshold concepts in data management, only my initial ideas. I’m still mulling them over (for example, I’m wondering if “data is contextual” and “data has value” are truly independent concepts) and trying to figure out if there are more concepts in this field.

I would love to hear other people’s ideas about threshold concepts in data management. Has anyone had an “aha!” moment about something that really affected the way they think about data management? Let me know!

Posted in dataManagement | Leave a comment

Data Dispute Prompts Lawsuit

I talk a lot on this blog about one of my big personal interests, data management, but I’m always excited to have an excuse to discuss another interest of mine, university data policy. Today’s excuse to delve into policy comes from one of my data-policy-research collaborators, who sent me a data story so thorny that I just had to discuss it here on the blog.

The case involves a prominent Alzheimer’s researcher, Paul Aisen, who ran the Alzheimer’s Disease Cooperative Study at UC San Diego and just took a job at new Alzheimer’s center run by USC. Aisen is taking 8 staff members with him to the new center, plus his National Institute on Aging grant and its corresponding data. Unfortunately, UC San Diego says that Aisen does not have permission to transfer these grant resources to USC. The data is particularly sticky issue here, as UC San Diego is alleging that the researcher transferred the data to an Amazon server and won’t share the password with UC San Diego administrators. The result is that UC San Diego – or more specifically, the UC System Regents – are now suing both Aisen and USC over the money and the data.

There’s a few issues going on in this case that are worth discussing. First, can the researcher take the grant to another institution? Second, who owns the data? Third, can the researcher take the data to another institution?

The first issue involves grant administration. The news article about this lawsuit states that “university declined to let [Aisen] keep the associated government funding.” UC San Diego likely has some authority to do this as grants are usually given to universities to administer on behalf of the researcher and not directly to the researchers themselves. So while most institutions allow researchers to transfer grants when they move jobs, it’s not necessarily a given – especially where funding covers a whole center rather than a single research group.

The second issue is actually the clearest of the three. University of California System policy states that all data generated by research on campus is owned by the university, or more specifically the University Regents (who are the official suing party in the lawsuit). So Aisen does not own this data, the University Regents do.

However, just because the university owns the data doesn’t mean a researcher doesn’t have rights to the data when he/she leaves the university. PI’s at UC schools are allowed to take a copy of the data with them but can’t take the original without written permission from their Vice Chancellor for Research (this presumes that the data is not “tangible research material”, which the researcher cannot remove at all without written permission). So at the very least, university system policy states that Aisen cannot prevent UC San Diego from accessing and maintaining the master copy of the data. On the flip side, Aisen should be able to take some data with him to USC but it would only be a copy of the data for which he was listed as PI on the grant and not the whole study dataset, which dates back to 1991.

So without knowing the specifics of the case, I would say that UC San Diego seems to have a good claim to the data. This directly results from having a clear data policy.

My own research has found that such university data policies are becoming more common but are far from ubiquitous. While these policies do provide important clarity, anecdotal evidence – like this story – suggests that universities are mainly leveraging these policies when significant amounts of money or prestige are involved. I think that’s a shame because such policies can be very helpful for data decision making.

The other key issue here is the fact that the university owns the research data. This is something that many researchers are uncomfortable with but is often a routine part of doing research at a university; it’s akin to the university claiming patent rights. That said, individual researchers usually get to make most all decisions about the data (in their capacity as data stewards) and should expect something in return for this deal. Namely, universities should take their ownership claim seriously and devote enough university resources to the care and maintenance of “their” data.

I’m looking forward to hearing more details about the case and going beyond my personal speculation to see how things are resolved. In the meantime, it’s makes for another good story to share on the importance of clear data policy.

August 2015 addendum: It looks like my initial assumptions that UCSD owned the data were correct. This LA Times article details how a judge ordered the data returned to UCSD.

Posted in dataDispute, ownership | Leave a comment

Data Management Videos

I’ve been so busy talking about documentation on the blog recently that I’ve forgotten to share an awesome project that I’ve been working on: the data management video series!

Over the course of the last semester, I worked with an intern to create a series of 10 data management videos. The videos cover a range of topics and are all available on YouTube, so not only can you watch them whenever but you are also free to embed them on other webpages. I’m all for sharing content and, while these videos were predominantly made for researchers at my university, the more researchers who learn this stuff the better.

The full series list is as follows:

(As a geeky aside, I also want to point out that I’m wearing some of my favorite handmade items in a few of those videos. Keep an eye out for the epic sweater of awesome, the bad passwords dress, and the marvelous woman-in-science dress as you watch!)

These 10 videos are a solid start to work in this medium and I’m hoping that we can add more to this series over time!

Posted in dataManagement, video | Leave a comment

Taking Better Notes

I’ve been talking a lot about documentation on this blog over the last few months but there is definitely one more issue I need to address before we move onto other topics: taking better notes. Taking better notes is really at the heart of improving your documentation because this is the main way that researchers document their work.

To review, having sufficient documentation is central to making your data usable and reusable. If you don’t write things down, you’re likely to forget important details over time and not be able to interpret a dataset. This is most apparent for data that needs to be used a year or more after collection, but can also impact the usability of data you acquired last week. In short, you need to know the context of your research data – such as sample information, protocol used, collection method, etc. – in order to use it properly.

All of this context starts with the information you record while collecting data. And for most researchers, this means taking better notes.

Most scientists learn to take good notes in school, but it’s always worth having a refresher on this important skill. Good research notes are following:

  • Clear and concise
  • Legible
  • Well organized
  • Easy to follow
  • Reproducible by someone “skilled in the art”
  • Transparent

Basically, someone should be able pick up your notes and be able to tell what you did without asking you for more information.

The problem a lot of people run into is not recording enough information. If you read laboratory notebook guidelines (which were established to help prove patents), they actually say that you should record any and all information relating to you research in your notebook. That includes research ideas, data, when and where you spoke about your research, references to the literature, etc. The more you record in your notebook, the easier it is to follow your train of thought.

I would also recommend employing headers, tables, and any other tool that helps you avoid having a solid block of text. These methods can not only help you better organize your information, but make it easier for you to scan through everything later. And don’t forget to record the units on any measurements!

Overall, there is no silver bullet to make you notes better. Rather, you should focus on taking thorough notes and practice good note taking skills. It also helps to have another person look over your notes and give you feedback for clarity. Use whatever methods work best for you so long as you are taking complete notes.

Research notebooks have been used for hundreds of years. We can still refer to Michael Faraday’s meticulous notes or read Charles Darwin’s observations that lead to the theory of evolution. These documents show that handwritten research notes have been and will continue to be useful. But to get the most out of your research notes, you need to start by taking better notes.

I challenge you this month to think about your research notes and work to take clearer, more consistent, and more thorough notes. Your ultimate goal is to make sure you have all of the documentation you need for whenever you use your data.

Posted in documentation, labNotebooks | Leave a comment