Taking a Break: Some Stories of Documentation

I’ve been thinking a lot about documentation this month as I prepare to take 12 weeks of leave away from my job. The upside is that I’ve had 9 months to plan for this, but I will also say that following good data management and reproducibility practices has greatly helped with the efforts to shift my duties temporarily to others.

In today’s post, I want to provide some snapshots on how I’m documenting tasks so that others can perform them in my absence. I’m hoping that these vignettes will provide inspiration for others aiming to provide enough documentation with their work, whether they are taking leave or not.

Story #1

My smoothest project to shift involved some R code I wrote over the summer to run automated reports for my library’s public services statistics. I knew going into the project that I would not be able to run the reports myself during the key period at the end of the semester, so I made sure to document everything at a level for a novice to pick up. In practice, this meant including a README.txt to walk someone through everything from installing the software to adjusting key variables to running the code. I also tried to make clear within the code, via commenting, which parts needed to be updated to customize the reports. Building code with the intention of it being used by others is really the best practice, and I can see the benefits of taking this approach that will help beyond my 12 weeks away.

Story #2

Another task I’m temporarily shifting is acting as Secretary for my professional association. Again, I’ve helped myself a lot here by having a good README.txt file laying out the structure and permissions for all of the files I manage as Secretary. So it was simply a matter of adding notes on my duties so that they could be adequately covered.

Story #3

A more involved project to shift is a research project I’m on for which I have an assistant. Key documentation here included a timeline for assistant onboarding tasks and a lot of communication with my collaborators. The timeline turned out to be a good idea, generally, as expectations are clear for everyone; this is likely a method I’ll use in the future. Otherwise, I’m trying to go for a more-communication-is-better approach, which requires extra work but will benefit everyone when I’m away.

These three vignettes show [what I hope are] successful efforts to document tasks for others. I think what makes them good is that I’ve built a lot of the documentation into the projects to begin with, making it easier for me to pass stuff on now. I admit that not every project I run is documented to the levels described here but my future self is usually more grateful than not for taking 10 minutes early on to write a README or send a status email.

I hope these stories provide you with some ideas for your own projects that may need to be passed along or picked up again by your future self. Even a little documentation created early in a project is helpful and usually doesn’t take a huge amount of time to create. The benefits to your collaborators and your future self usually make it all worthwhile.

Posted in documentation | Leave a comment

SPEC Kit on Privacy and Library Learning Analytics

I mentioned in a post a few months ago that I started doing research in the area of learning analytics, libraries, privacy, and data management. I’m happy to report that my first publication to come out of this work went live today: an ARL SPEC Kit on Learning Analytics.

The SPEC Kit provides an overview of where large U.S. academic libraries currently stand with respect to learning analytics and privacy practices. If you’re at all interested in what libraries are doing with student data, I encourage you to check it out.

Also, many thanks to my co-authors Michael Perry, Abigail Goben, Andrew Asher, Kyle M. L. Jones, M. Brooke Robertshaw, and Dorothea Salo for their work on this publication. Expect to see more publications from this team in the future!

Posted in admin | Leave a comment

A Framework for Choosing the Right Visualization Type for Your Data

I’m starting to do more work in the area of data visualization, as evidenced by my most recent publication and my somewhat periodic Twitter rants against pie charts. While I’ve already discussed one of my favorite visualization resources on the blog, I have not discussed how I approach data visualization. Particularly, I want to share the framework I teach in my visualization workshops.

The framework involves three steps:

  1. Determine your message
  2. Determine what type of data this corresponds to
  3. Choose the best visualization format to match the message and data type

Let’s break each step down in more detail.

The first step in any visualization is determining what you want say. While you can just visualize all of your data and let the reader decide, this makes more a lot work for the reader; they’ll either give up or take time away from understanding some other key point in your work to understand the data in this visualization. To illustrate why message is important, consider Nathan Yau’s exercise in visualizing the same data 25 different ways. Each visualization highlights a different nuance of the data and causes the viewer to interact with the data differently. So the message is key to determining which is the “best” in the set of 25. In your figures, you can both be transparent by visualization everything and frame things in a way to make a point. Basically, having a message can really help you add clarity to a figure.

Once you determine your message, you’ll next want to consider what type of data you’re working with. For example, while you might have survey results, depending on your message you could really be trying to visualize a single number, a comparison, a change over time, etc. Each of these data types merits a slightly different visualization strategy. So spend some time thinking about your message and how it directs the specific data features you want to highlight.

The final step in this framework is to actually chose the type of visualization or chart to use. By making this decision last, you can reflect on how the chart type best lines up with what you’re trying to say. As you pick your chart type, continually ask if this visual fits your message. If it doesn’t, pick a different chart type.

I fully admit to building this framework off the work of Stephanie Evergreen. If you want some structure for determining the best chart types for different data types, I recommend checking out her book “Effective Data Visualization”. For qualitative data, she also has a qualitative chart chooser on her website. These resources will help you make the jump from data types to chart types.

I should also note that this framework is for visualizing final results. If you’re looking to do visualization for exploratory analysis, that’s a totally different topic and one with other resources (e.g. Tukey’s “Exploratory Data Analysis” or R’s summarytools package).

So that’s one way to start thinking about making good visualizations. If you’re going to go to the effort to visualize your data, I challenge you to make your visuals their most effective by determining your message and the key data features before you decide on the chart type.

Posted in dataVisualization | Leave a comment

Privacy v Confidentiality

I’ve been thinking a lot recently about the difference between privacy and confidentiality. This issue surfaces in libraries around handling patron data (relevant to my current line of research) but also more generally in how researchers handle human subjects data. I think it’s important to recognize the difference between privacy and confidentiality and how this might play out in dealing with research data.

Privacy is a concept centered on the person from which the information originates. It entails that individual’s personal information and their discretion in disclosing it to others. I’m not going to attempt to define the “privacy” (there are plenty of privacy scholars who have already done this) but instead want to highlight that here I’m applying the term to an individual’s choices about their own information.

Confidentiality instead centers on the entity that holds someone else’s personal information and their duty to not expose that information further. Some professions, such as doctors and lawyers, routinely deal with confidentiality but it’s an important concept to apply to personal information more broadly.

So, being a blog on data management, how does this apply to research data? First, it’s important to think about privacy in how we collect data. When do we need specific sensitive information versus when can we go without it? What is the impact on the person disclosing sensitive information? What is considered private information and how is that clearly defined and communicated?

When we do need to collect personal information for research, the second part is to think about confidentiality in ensuring that information is not further disclosed. How will the data be secured? Who is allowed access to the data, such as for a research team, and how is this communicated to the research participant? What will happen to the data at the end of the project?

As you can see, privacy and confidentiality play out during different phases of the research process yet are both important when doing research involving personal information. We, as researchers, keep data confidential to help maintain other people’s privacy. So while they are different concepts and actions, confidentiality does have its roots in privacy.

There are many nuances in the types of data that may be considered private, but it’s worth recognizing that, as researchers, we have a role in both navigating disclosure of personal information as well as then securing that information to maintain confidentiality. I hear a lot of discussions about privacy, but I think it’s just as important to discuss the role of confidentiality when we do have to collect sensitive information.

Posted in dataManagement, privacy | Leave a comment

Come Hear My Thoughts on Data Management and Learning Analytics

I mentioned way back in December that I fell down a research rabbit hole at the end of 2017 but I never said what that rabbit hole was: it’s about data management and library learning analytics. I’m not quite ready to reveal the full results of that research project (the paper is currently working its way through the publication process), but I will be discussing my results at the ALA Annual Conference in New Orleans next week!

The session I’m speaking at is called “Libraries and Learning Analytics: Identifying the Issues” and is on Saturday, June 23 at 2:30-3:30pm in Morial Convention Center, Rm 395-396.

My talk is actually a follow up of my ALA 2017 talk but with actual research results to share. I’ll be specifically discussing anonymization, consent, and security, because hooboy do I have some things to say in these areas.

If you can’t make it to ALA, I’ll be posting the slides afterwards and hopefully you’ll see the paper in the near future too!

[Added 2018-06-26] Slides now available here: Briney ALA presentation file

Posted in admin | Leave a comment