Breaking the Blogging Silence

Wow what a year it’s been. I know that it has been really quiet on the blog since this time last year and that’s because life has been anything but quiet!

I stopped posting here a year ago when we added a new member to the family. And just when things looked to be calming down, we decided to move cross-country from Wisconsin to California. It’s been a big change and I’m still getting used to the weather, the culture, and the commute.

The good news is that I’ve started a new job as a Biology Librarian and will continue to do some data work in this role. So there will be future posts on data management tips and tricks! I’m looking forwarding to being back.

Posted in admin | Leave a comment

Data Management in Library Learning Analytics

My latest paper was published this week and I am so very excited to share it with you all. It is Data Management Practices in Academic Library Learning Analytics: A Critical Review.

Every article has a story behind it and this one, as happens with the best articles, started with me getting very annoyed. I had just been introduced to the concept of library learning analytics and was reading a pair studies for a different project. I couldn’t focus on the purpose of the studies because I kept running into concerns with how the researchers were handling the data. What annoyed me most was that one of the studies kept insisting that their data was anonymous when it clearly wasn’t, which has huge implications for data privacy. A little poking around made me realize that such data problems appear with terrible frequency in library learning analytics.

There’s quite a history of ethical debates around library learning analytics but almost no research on the data handling practices which impact patron privacy in very practical ways. After a little digging through the literature and a lot of shouting at my computer, I knew I had to write this paper.

So what did I find? Libraries: we need to do better. For all that we talk about patron privacy, there is sufficient evidence to show that we’re not backing up that intent with proper data protections. The best way to protect data is to collect limited amounts, de-identify it where possible, secure it properly, and keep it for a short time before deleting it. We’re not doing that. I’m also concerned about how we handle consent and opt-in/out, something I didn’t originally intend to study but couldn’t ignore once I started reading. There’s a lot more in the paper, including some explanations of why these are best practices, so I encourage you to go there for more details. And afterward go figure out how to protect your data better.

Finally, I need to again thank Abigail Goben and Dorothea Salo for acting as my sounding boards through this entire process. They listened to me rant, helped me worked out a path for this research, and edited drafts for me. I am deeply grateful for their assistance and I know this paper would not be half as good without their help.

Posted in dataManagement, libraries | Leave a comment

Taking a Break: Some Stories of Documentation

I’ve been thinking a lot about documentation this month as I prepare to take 12 weeks of leave away from my job. The upside is that I’ve had 9 months to plan for this, but I will also say that following good data management and reproducibility practices has greatly helped with the efforts to shift my duties temporarily to others.

In today’s post, I want to provide some snapshots on how I’m documenting tasks so that others can perform them in my absence. I’m hoping that these vignettes will provide inspiration for others aiming to provide enough documentation with their work, whether they are taking leave or not.

Story #1

My smoothest project to shift involved some R code I wrote over the summer to run automated reports for my library’s public services statistics. I knew going into the project that I would not be able to run the reports myself during the key period at the end of the semester, so I made sure to document everything at a level for a novice to pick up. In practice, this meant including a README.txt to walk someone through everything from installing the software to adjusting key variables to running the code. I also tried to make clear within the code, via commenting, which parts needed to be updated to customize the reports. Building code with the intention of it being used by others is really the best practice, and I can see the benefits of taking this approach that will help beyond my 12 weeks away.

Story #2

Another task I’m temporarily shifting is acting as Secretary for my professional association. Again, I’ve helped myself a lot here by having a good README.txt file laying out the structure and permissions for all of the files I manage as Secretary. So it was simply a matter of adding notes on my duties so that they could be adequately covered.

Story #3

A more involved project to shift is a research project I’m on for which I have an assistant. Key documentation here included a timeline for assistant onboarding tasks and a lot of communication with my collaborators. The timeline turned out to be a good idea, generally, as expectations are clear for everyone; this is likely a method I’ll use in the future. Otherwise, I’m trying to go for a more-communication-is-better approach, which requires extra work but will benefit everyone when I’m away.

These three vignettes show [what I hope are] successful efforts to document tasks for others. I think what makes them good is that I’ve built a lot of the documentation into the projects to begin with, making it easier for me to pass stuff on now. I admit that not every project I run is documented to the levels described here but my future self is usually more grateful than not for taking 10 minutes early on to write a README or send a status email.

I hope these stories provide you with some ideas for your own projects that may need to be passed along or picked up again by your future self. Even a little documentation created early in a project is helpful and usually doesn’t take a huge amount of time to create. The benefits to your collaborators and your future self usually make it all worthwhile.

Posted in documentation | Leave a comment

SPEC Kit on Privacy and Library Learning Analytics

I mentioned in a post a few months ago that I started doing research in the area of learning analytics, libraries, privacy, and data management. I’m happy to report that my first publication to come out of this work went live today: an ARL SPEC Kit on Learning Analytics.

The SPEC Kit provides an overview of where large U.S. academic libraries currently stand with respect to learning analytics and privacy practices. If you’re at all interested in what libraries are doing with student data, I encourage you to check it out.

Also, many thanks to my co-authors Michael Perry, Abigail Goben, Andrew Asher, Kyle M. L. Jones, M. Brooke Robertshaw, and Dorothea Salo for their work on this publication. Expect to see more publications from this team in the future!

Posted in admin | Leave a comment

A Framework for Choosing the Right Visualization Type for Your Data

I’m starting to do more work in the area of data visualization, as evidenced by my most recent publication and my somewhat periodic Twitter rants against pie charts. While I’ve already discussed one of my favorite visualization resources on the blog, I have not discussed how I approach data visualization. Particularly, I want to share the framework I teach in my visualization workshops.

The framework involves three steps:

  1. Determine your message
  2. Determine what type of data this corresponds to
  3. Choose the best visualization format to match the message and data type

Let’s break each step down in more detail.

The first step in any visualization is determining what you want say. While you can just visualize all of your data and let the reader decide, this makes more a lot work for the reader; they’ll either give up or take time away from understanding some other key point in your work to understand the data in this visualization. To illustrate why message is important, consider Nathan Yau’s exercise in visualizing the same data 25 different ways. Each visualization highlights a different nuance of the data and causes the viewer to interact with the data differently. So the message is key to determining which is the “best” in the set of 25. In your figures, you can both be transparent by visualization everything and frame things in a way to make a point. Basically, having a message can really help you add clarity to a figure.

Once you determine your message, you’ll next want to consider what type of data you’re working with. For example, while you might have survey results, depending on your message you could really be trying to visualize a single number, a comparison, a change over time, etc. Each of these data types merits a slightly different visualization strategy. So spend some time thinking about your message and how it directs the specific data features you want to highlight.

The final step in this framework is to actually chose the type of visualization or chart to use. By making this decision last, you can reflect on how the chart type best lines up with what you’re trying to say. As you pick your chart type, continually ask if this visual fits your message. If it doesn’t, pick a different chart type.

I fully admit to building this framework off the work of Stephanie Evergreen. If you want some structure for determining the best chart types for different data types, I recommend checking out her book “Effective Data Visualization”. For qualitative data, she also has a qualitative chart chooser on her website. These resources will help you make the jump from data types to chart types.

I should also note that this framework is for visualizing final results. If you’re looking to do visualization for exploratory analysis, that’s a totally different topic and one with other resources (e.g. Tukey’s “Exploratory Data Analysis” or R’s summarytools package).

So that’s one way to start thinking about making good visualizations. If you’re going to go to the effort to visualize your data, I challenge you to make your visuals their most effective by determining your message and the key data features before you decide on the chart type.

Posted in dataVisualization | Leave a comment