Data Privacy/Security Is Not An Afterthought

I’m currently doing a lot of reading for an upcoming presentation at the American Library Association annual meeting on learning analytics, library patron privacy, and data management. This panel presentation is being given in response to the trend in academic libraries to mine existing data, especially patron-level data, to justify the library’s value on campus. I’m particularly interested in this topic because I recognize many practices from data management that can inform the design of such projects. Because of this, I often have a hard time reading literature in this area as I’m finding data-handling practices that I would advise against as a data management expert.

One report that has especially struck me as problematic is the ACRL’s “The Value of Academic Libraries” report from 2010. This report gives lots of details on how libraries can benefit from an expansion of their assessment programs by collecting new types of data for analysis. The problem that I have with this report is that it gives only a token nod to privacy as something to be accounted for (note: privacy is a core value in librarianship).

The report states that privacy considerations need to be worked out but fails to contextualize it’s major recommended assessment strategies within the scope of actual privacy and security practices. It says “account for privacy” without telling one how to do so and fails to acknowledge privacy in all of the parts of the research process where it belongs (such as in planning, data collection, training, etc.). In other words, the report does not to do what all human subject research should do: bake privacy and security considerations into all aspects of a project from the very beginning.

Here’s where I come back around to general research best practices, as this is a data management blog and not a library blog. All research projects involving human subjects or personally identifiable information need to account for – from the very beginning of a project – participant privacy and how to keep personal information safe. This should happen even before a researcher even collects her first data point and continue through the end of the project.

Baking privacy and security considerations into a project from the very beginning affects things in several ways.

The first way is administrative. Participant privacy is a significant part of the Institutional Review Board (IRB) process for getting approval for human subject research. If a researcher cannot describe his security practices – from secure storage to anonymization strategies – he does not get approval for that research. In other words, it’s a requirement for human subject research conducted at a university.

Second is design. Focusing on privacy affects the design decisions of a study in terms of what information a researcher should/shouldn’t collect and how that information is stored. It’s possible that in taking this privacy lens, a researcher will have to limit an avenue of inquiry, but such avenues would pose a risk to research participants. An important part of doing such research is in the balancing of risk and reward – this cannot be an afterthought.

Third, a focus on participant privacy is a focus on ethical research. In the words of Zook, et al. (from a big data paper I rather like), researchers should “acknowledge that data are people and can do harm”. Only by incorporating privacy and security considerations upfront and throughout a project can we truly make such an acknowledgement.

Data librarians tend to talk about data management plans a lot, in that they should be created at the start of every research project. This recommendation goes from “should” to “must” when personal information is involved. Researchers conducting projects involving people need to make data handling decisions upfront and keep making privacy-based decisions throughout a project. Privacy is a feature, not a bug. We do poorly by our study participants by acting in any other way.

If you are interested in library learning analytics, privacy, and data management and will be in Chicago on June 25th, I encourage you to attend my ALA panel. It’s sure to be an interesting discussion!

Posted in security | Leave a comment

Making Decisions About Your Data

I ran across this lovely video by UMN copyright librarian Nancy Sims on making decisions about your research data and found it so helpful that I had to share it here.

She gives an excellent overview of how copyright (see my earlier post on data and copyright) is only a small part of deciding what you can do with your data and goes on to explain other considerations for “ownership” and decision making. I hope it provides some clarity in who gets a say in the data decision process!

Posted in copyright, openData, ownership | Leave a comment

Privacy Tools

Let’s talk for a moment about data privacy. With so much going on in the US news, you might have missed recent efforts to roll back the FCC rule on internet privacy. Basically, removing these rules would allow internet service providers (ISPs) to sell your browsing histories, alter your webpages, and track what you do across the internet.

There are obviously huge concerns here about personal privacy. However, a lot of research data lives on the internet and is stored in the cloud and, while it’s not clear to me how these new rules would affect that content, I’m always an advocate for making sure that private data stays private.

Independent of the fate of this bill, it’s useful to know how to navigate the web in a secure manner.

Before I get into the details of tools (and because I’m not a security expert), here are two useful articles on what you can do to better secure your digital life:

I’ve used these two to roadmap where I need to go with my own privacy practices.

Now let’s talk tools! Here are three tools that are already part of my repetoire and I love:

  • LastPass (password manager) – Count me in as a password manager convert! LastPass makes it easy to manage hundreds of different passwords and make sure that those passwords are strong. I give high marks to their “Security Challenge”, which identifies old, weak, potentially breached, and duplicate passwords and can even auto-update passwords for me.
  • Privacy Badger (tracking blocker) – This browser plug-in keeps cookies and tracking out of my life. It disables trackers without breaking a website; in the rare event that breakage happens, it’s easy to selectively turn trackers back on or Privacy Badger off for that site. Privacy Badger runs in the background, so you don’t have to worry about tracking once it’s installed.
  • DuckDuckGo (private internet search) – I’ve converted from Google to DuckDuckGo for my internet searches, as the latter doesn’t track you and still gives good search results. Hint: change your browser’s default search engine to DuckDuckGo in the settings!

Two tools that I just added:

  • HTTPS Everywhere (webpage security) – This is a browser plug in that secures your web browsing by defaulting all pages to https.
  • uBlock (malicious ad blocker) – I’m still getting a feel for this tool, which is another browser plug in that runs in the background. It potentially overlaps with Privacy Badger but it’s not a 100% overlap and I’d rather have 2 tools that cover everything.

Finally, here are the things I’m working my way up to:

  • VPN – I really need to go all in on a VPN, especially for when I’m working on public networks. I don’t have a VPN in mind at the moment, but research is definitely on my to-do list.
  • Tor – This would take my internet security to the next level, though I’m not sure it’s a level I need to be at. Still, I’m leaving this here as future ambitions.

So that’s my tool list for internet privacy. A lot of these things are browser changes and plug-ins that, once set, don’t disrupt browsing. The one tool that was a big change was LastPass, but once I got everything set up it has actually made dealing with passwords 100x easier (and my passwords are 1000x more secure).

I hope this tool list shows you that internet security doesn’t have to be difficult but you do have to take a little time to set things up. I think the rewards are definitely worth it.

Posted in security | 1 Comment

Love Your Data 2017: Finding the Right Data

It’s that time of the year again – Love Your Data Week. This annual celebration focuses on getting more from your data with helpful data management tips and skills.

Each day of Love Your Data Week has its own theme, but I want to focus on today’s theme (Thursday) because it’s something that I have not discussed on the blog before: finding the right data.

You might be used to finding books and journals for your research, but finding data is often more difficult. That is because data systems are rarely connected and there is no guarantee that the data you want even exists. It can be frustrating to even know where to begin looking for data, which makes the process of finding data feel time consuming and full of rabbit holes. Thankfully, I’m here to share some strategies!

The best strategy for finding data is to think about who may be creating data and search their websites/publications

  • Is it government data? Departments like the Centers for Disease Control (CDC), Department of Education, the Census, etc. often make data available.
  • Consider non-governmental organizations who might have data, like the United Nations, World Health Organization, International Monetary Fund, etc.
  • Private business often make data available for purchase. These resources are sometimes available through your local library.
  • Individual researchers are increasingly sharing their study data for reproducibility purposes. Check out their publications or send the corresponding author an email.
  • Might the data live in a special data repository? re3data lists a huge variety of repositories, many of which are subject specific.

Be aware that your local library probably has a few data resources

  • Libraries sometimes subscribe to databases that contain datasets instead of articles. An added benefit here is that you can ask a librarian for help with these resources!

To get started, try scanning publications for data

  • Published articles (newspaper and journal articles) may contain data tables and references to data sources. This is a good place to start if you are looking for background information on a topic.
  • Journal articles are increasingly linking to the data and code used for analysis. See if the publication mentions accompanying data, check out supplimental information, or email the author.

Add the word ‘statistics’ or ‘data’ to your searchers

  • Using Google, the library catalog, or another search tool.

When it doubt, ask for help

  • Librarians are really good at finding information (and that includes data!).

Remember, finding data often involves brainstorming and rethinking your search strategy when you hit a dead end. Try taking a step back when you get stuck. If you can’t find the specific data you are looking for, is there a more general dataset that you can still use to build your case/provide background? Finally, don’t forget to cite your sources!

I hope these tips help you find data for your research and that you can learn other helpful strategies from Love Your Data Week and its Twitter stream #LYD17!


Extra resources:

Posted in dataSources | Leave a comment

Book Review: Effective Data Visualization

Effective Data VisualizationI know a little bit about a lot of data things, but one area I’m weak in is data visualization. Sure, I can make a graph in Excel but that doesn’t mean that the graph is necessarily good. Thankfully, Sal Gore blogged a recommendation for the book Effective Data Visualization and, after a quick read, I’m feeling like a data viz wiz.

What I like about this book is that it doesn’t assume you have data visualization knowledge apart from basic familiarity with Excel. That’s actually a plus for this book, as the author Stephanie Evergreen shows you how to make most of these charts IN EXCEL. I know I’ve previously ragged about Excel on this blog but it really is the first place most people start with data viz. So if we’re all going to start there, at least this book shows you how to make your Excel charts not suck. Even better, Evergreen tells you how difficult a chart will be to create in Excel by including a helpful Excel ninja rating.

The other thing that’s great about this book is that charts are organized by the type of data you want to present. Categories include: a single number, comparisons, beating a benchmark, survey results, parts of a whole, correlations, qualitative data, and data over time. Evergreen bases her selection of charts on research showing which chart types are more effective for information retention. It’s a different way to think about charts, but one that I’m finding really useful.

The range of covered charts includes the usual suspects, from bar charts to scatter plots, but Evergreen also details visuals that I haven’t used before. The ones I plan to immediately add to my graphing repertoire are: icon arrays, slopegraphs, dot plots, back-to-back bar charts, and small multiples graphs.

Beyond choosing the right chart and knowing how to make it in Excel (which, of themselves, are incredibly useful skills), this book gave me a framework for creating charts that are easy to read and convey a clear message. For example, I now understand how to write an effective chart title, select good colors, reduce data overload, and eliminate chart junk. It’s reached the point where I can’t even look at my old graphs without wanting to tweak them.

There is one downside of this book and it’s that it was done with two-color printing. All of the charts are limited to shades of blue and grey. While this makes for a visually cohesive (and cheaper) book, the printed figures occasionally do not fully convey the author’s point – most often when showing a bad chart. This is annoying but it’s not enough to detract from the many good things about this book.

Overall, Effective Data Visualization is the perfect book for people who want to level up their data visualization skills beyond the defaults in Excel. I’ve learned so much from this book and it has fundamentally changed the way I think about visualizing data. I hope that you will find it just as useful.

Posted in bookReview, dataVisualization | 1 Comment


There have been several discussions among my data librarian colleagues about the future of open data and science in 2017, spurned on by articles such as this one on the future of data sharing and these articles on the continued existence of government-held climate data.

These concerns are realistic. We’ve seen from our neighbors in Canada that politics can have a profound impact on the sharing of science. In turn, librarians have a role to play to advocate for continued access to information (shout out to the amazing John Dupuis for that last link).

Relevant to my work here, the two things I’m most concerned about are:

  • Continued existence of requirements for funding agency data sharing.
  • Muzzling of researchers, particularly climate scientists.

I’ll going to try to keep up with what is going on in these two areas and will occasionally share my thoughts back here. In the meantime, I’ve started a #TrumpSci bookmarks list that you can follow along with here: list and RSS feed.

Please send me relevant stories as you find them!


Edited to add (2016-12-15): The wonderful John Dupuis preempted me with a Trump list. I’m still going to work on my list and talk about this topic on the blog but in the meantime you should definitely check out his more thorough round up.

Posted in government, openData, Uncategorized | Leave a comment