Gaining Competency in Data Visualization

I couldn’t be more excited that my latest journal article, “Gaining Competency: Learning to Teach Data Visualization,” was just published in the Journal of eScience Librarianship.

The idea behind the paper is: how do we as librarians teach data skills in an area, specifically data visualization, in which we often have little expertise? Data librarians teach many data competencies, but the “data visualization” competency has always been an awkward one. We see a lot of desire for help in this area but don’t always have the expertise to meet this need. Data visualization has not historically be associated with the library and it isn’t covered in our usual data-management-based curricula. This paper seeks to close this gap.

I admit that it’s a quirky little paper. So many papers in librarianship are in the “we did this awesome thing” mold, and there’s a health dose of that in this paper in that I discuss the data visualization workshop I offer at my institution. However, I decided to go a step further by describing the lead-up process in which I prepared to teach the workshop. I thought it might benefit other librarians to have a framework for developing our own skills to the point where we can help others with data visualization.

So if you’ve ever thought about supporting data visualization but don’t feel like you have the requisite skills, I encourage you to check out my new paper. It’s part of a larger data visualization special issue and I’m certainly looking forward to digging into the whole issue!

Exit Stage Left

Have you ever fallen down a research rabbit hole? It does not happen to me very often (thankfully) but when it does, I fall deep. All of this is to say that I’m back from my 5-month blogging hiatus. I promise that there will be future posts about this particular rabbit hole, but in the meantime we’re going to talk about exit strategies!

I was involved in a Twitter conversation the other day about the importance of exit strategies.

This all came out of a conversation about the end of Storify and what people should do with all of their content on that platform. @Libskrat discussed the importance of an exit strategy for your online content and I want to spend this post talking about how important that is for research data as well.

The simple fact of the matter is that you should never put your data into something where you can’t get it out again. This applies to many situations: storage platforms, file formats, e-lab notebooks, specialized analysis software, etc. Things happen and if you don’t have Plan B for your data then you sometimes have no data at all.

A couple big areas where researchers get into trouble around exit strategies are using proprietary file types and adopting specialized software/platforms (note: these overlap somewhat). The first area is something that I’ve had experience with. My PhD lab used a specific analysis software, which I lost access to when I left the lab. Unfortunately all of my data was locked up in a proprietary file format that I can no longer open. Good thing I don’t need access to all of that chemistry data as a Data Librarian because it would be a huge problem to get it back. If only I had saved a copy of my data to Excel or as a .csv, I wouldn’t be dealing with this mess.

E-lab notebooks (ELN) are a good example of the dangers of adopting software without an exit strategy. You should assume that you won’t be usuing the same ELN in 10 years. So you need to ask about getting your data out before you put one piece of data in an ELN. Even if your exit strategy is literally printing out every page of your ELN, it’s better than nothing (heck, it’s about equivalent to your paper lab notebook).

So next time you’re adopting a tool for your research, ask the question “do I have an exit strategy out of using this tool?” If the answer is no, run far and fast. That tool is not worth the short-term benefits it might offer.

Data Privacy/Security Is Not An Afterthought

I’m currently doing a lot of reading for an upcoming presentation at the American Library Association annual meeting on learning analytics, library patron privacy, and data management. This panel presentation is being given in response to the trend in academic libraries to mine existing data, especially patron-level data, to justify the library’s value on campus. I’m particularly interested in this topic because I recognize many practices from data management that can inform the design of such projects. Because of this, I often have a hard time reading literature in this area as I’m finding data-handling practices that I would advise against as a data management expert.

One report that has especially struck me as problematic is the ACRL’s “The Value of Academic Libraries” report from 2010. This report gives lots of details on how libraries can benefit from an expansion of their assessment programs by collecting new types of data for analysis. The problem that I have with this report is that it gives only a token nod to privacy as something to be accounted for (note: privacy is a core value in librarianship).

The report states that privacy considerations need to be worked out but fails to contextualize it’s major recommended assessment strategies within the scope of actual privacy and security practices. It says “account for privacy” without telling one how to do so and fails to acknowledge privacy in all of the parts of the research process where it belongs (such as in planning, data collection, training, etc.). In other words, the report does not to do what all human subject research should do: bake privacy and security considerations into all aspects of a project from the very beginning.

Here’s where I come back around to general research best practices, as this is a data management blog and not a library blog. All research projects involving human subjects or personally identifiable information need to account for – from the very beginning of a project – participant privacy and how to keep personal information safe. This should happen even before a researcher even collects her first data point and continue through the end of the project.

Baking privacy and security considerations into a project from the very beginning affects things in several ways.

The first way is administrative. Participant privacy is a significant part of the Institutional Review Board (IRB) process for getting approval for human subject research. If a researcher cannot describe his security practices – from secure storage to anonymization strategies – he does not get approval for that research. In other words, it’s a requirement for human subject research conducted at a university.

Second is design. Focusing on privacy affects the design decisions of a study in terms of what information a researcher should/shouldn’t collect and how that information is stored. It’s possible that in taking this privacy lens, a researcher will have to limit an avenue of inquiry, but such avenues would pose a risk to research participants. An important part of doing such research is in the balancing of risk and reward – this cannot be an afterthought.

Third, a focus on participant privacy is a focus on ethical research. In the words of Zook, et al. (from a big data paper I rather like), researchers should “acknowledge that data are people and can do harm”. Only by incorporating privacy and security considerations upfront and throughout a project can we truly make such an acknowledgement.

Data librarians tend to talk about data management plans a lot, in that they should be created at the start of every research project. This recommendation goes from “should” to “must” when personal information is involved. Researchers conducting projects involving people need to make data handling decisions upfront and keep making privacy-based decisions throughout a project. Privacy is a feature, not a bug. We do poorly by our study participants by acting in any other way.

Making Decisions About Your Data

I ran across this lovely video by UMN copyright librarian Nancy Sims on making decisions about your research data and found it so helpful that I had to share it here.

She gives an excellent overview of how copyright (see my earlier post on data and copyright) is only a small part of deciding what you can do with your data and goes on to explain other considerations for “ownership” and decision making. I hope it provides some clarity in who gets a say in the data decision process!

Privacy Tools

Let’s talk for a moment about data privacy. With so much going on in the US news, you might have missed recent efforts to roll back the FCC rule on internet privacy. Basically, removing these rules would allow internet service providers (ISPs) to sell your browsing histories, alter your webpages, and track what you do across the internet.

There are obviously huge concerns here about personal privacy. However, a lot of research data lives on the internet and is stored in the cloud and, while it’s not clear to me how these new rules would affect that content, I’m always an advocate for making sure that private data stays private.

Independent of the fate of this bill, it’s useful to know how to navigate the web in a secure manner.

Before I get into the details of tools (and because I’m not a security expert), here are two useful articles on what you can do to better secure your digital life:

I’ve used these two to roadmap where I need to go with my own privacy practices.

Now let’s talk tools! Here are three tools that are already part of my repetoire and I love:

  • LastPass (password manager) – Count me in as a password manager convert! LastPass makes it easy to manage hundreds of different passwords and make sure that those passwords are strong. I give high marks to their “Security Challenge”, which identifies old, weak, potentially breached, and duplicate passwords and can even auto-update passwords for me.
  • Privacy Badger (tracking blocker) – This browser plug-in keeps cookies and tracking out of my life. It disables trackers without breaking a website; in the rare event that breakage happens, it’s easy to selectively turn trackers back on or Privacy Badger off for that site. Privacy Badger runs in the background, so you don’t have to worry about tracking once it’s installed.
  • DuckDuckGo (private internet search) – I’ve converted from Google to DuckDuckGo for my internet searches, as the latter doesn’t track you and still gives good search results. Hint: change your browser’s default search engine to DuckDuckGo in the settings!

Two tools that I just added:

  • HTTPS Everywhere (webpage security) – This is a browser plug in that secures your web browsing by defaulting all pages to https.
  • uBlock (malicious ad blocker) – I’m still getting a feel for this tool, which is another browser plug in that runs in the background. It potentially overlaps with Privacy Badger but it’s not a 100% overlap and I’d rather have 2 tools that cover everything.

Finally, here are the things I’m working my way up to:

  • VPN – I really need to go all in on a VPN, especially for when I’m working on public networks. I don’t have a VPN in mind at the moment, but research is definitely on my to-do list.
  • Tor – This would take my internet security to the next level, though I’m not sure it’s a level I need to be at. Still, I’m leaving this here as future ambitions.

So that’s my tool list for internet privacy. A lot of these things are browser changes and plug-ins that, once set, don’t disrupt browsing. The one tool that was a big change was LastPass, but once I got everything set up it has actually made dealing with passwords 100x easier (and my passwords are 1000x more secure).

I hope this tool list shows you that internet security doesn’t have to be difficult but you do have to take a little time to set things up. I think the rewards are definitely worth it.

