Leaving the University

Don’t worry, I’m not actually leaving my job. Rather, I’ve been thinking about what happens to data when a researcher leaves a university.

For how common separation is, it’s often a time during which data gets lost because there usually aren’t rules or structure for how to account for data during this transition. Without proper discussion, it can also be unclear what rights the departing researcher has over the data and if they are allowed to republish with it.

To help with these problems, I got together with my friend and research collaborator Abigail to come up with the Data Departure Checklist. The goal of the checklist is to account for all of the data, decide who gets to keep what data, and set both the individual and the lab they are departing from up for success going forward.

Goben, A., & Briney, K. A. (2023). Data Departure Checklist. https://doi.org/10.7907/h314-4×51

To use the checklist, the departing researcher and a project administrator should sit down and work through all of the items on the list. There are sections for hardware, personal devices, storage systems, data permissions, administrative documentation, manuscripts in progress, administration, and email. The researcher and administrator should both check off that the relevant item has been taken care of appropriately. The goal is to make the departure process more transparent and streamlined.

This is our first draft of the checklist so let us know if you use it and/or think of ways to improve it! We hope this is a useful tool for researchers.

Posted in dataManagement | Leave a comment

The Last of the Data Doubles Publications

It’s been pretty quiet on the blog, though it definitely has not been quiet in real life. I have lots of things that I’m looking forward to sharing with you!

First off, I had a pair of papers published this summer from the Data Doubles project. The first paper is the result of focus groups around student expectations of privacy from three different educational data scenarios:

Jones, K. M. L., Goben, A., Perry, M. R., Regalado, M., Salo, D., Asher, A. D., Smale, M. A., & Briney, K. A. (2023). Transparency and Consent: Student Perspectives on Educational Data Analytics Scenarios. Portal: Libraries and the Academy, 23(3), 485–515. https://doi.org/10.1353/pla.2023.a901565

No paywall version here: https://resolver.caltech.edu/CaltechAUTHORS:20230711-164456715]

The second article covers the development of the large, reproducible survey (published here) we did the in the second phase of the Data Doubles research:

Asher, A., Briney, K., & Goben, A. (2023). Valid questions: the development and evaluation of a new library learning analytics survey. Performance Measurement and Metrics, 24(2), 101–119. https://doi.org/10.1108/PMM-04-2023-0009

No paywall version here: https://resolver.caltech.edu/CaltechAUTHORS:20230720-155500670

These two papers wrap up the work I’ve done on that huge project. I miss working with everyone but we did get some great findings out of the work! I hope you enjoy them.

Posted in privacy | Leave a comment

Visualizing COVID in Fiber, Part 2

It’s been two years since I posted about my 2020 US COVID daily fatalities handwoven visualization. I have had many interesting conversations about that piece (for examples, see here) and it’s one of my handmade projects for which I am most proud.

The thing is, though, the pandemic is not over. And the ongoing US COVID fatalities have been as numerous and terrible and just as worth of highlighting as the 2020 deaths. So I made the 2021 version of the visualization.

Person hidden behind a narrow vertical piece of fabric made up of small red and dark red hexagons. The top quarter of the fabric is dark red, with red hexagons through the middle, and mostly dark red in bottom third. In the background is a green yard with plants and bushes.

I finished the visualization in March of last year and finally hung it up in my office, directly below the 2020 version. The color representation is the same between the two visualizations, though 2021 version only uses red hexagons (100-999 deaths) and dark red hexagons (1000+ deaths). If anything, the 2021 version is more depressing because of the advances in science (e.g., vaccines) that were countered by disregard for that science (e.g., anti-masking sentiments), leading to thousands of deaths.

White wall of an office with two long fabric strips hanging toward the top of the wall, one strip on top of the other. The top strip is thin and long, starting white on the left and transitioning to red, with three large dark red patches. The lower strip hangs directly below the first and is dark red on the ends and red in the middle. Other office furniture, art, and books are visible.

Since it’s the start of 2023, I’m reflecting on whether I want to make the 2022 version. It’s a lot of labor but our ongoing COVID losses are a story that still needs to be told. That said, the CDC moved from reporting daily deaths to reporting weekly deaths, so I don’t have an equivalent dataset for this past year. (Notably, you can see the move toward less frequent reporting at the end of 2021, where reported deaths at the end of the year are lower on the weekends than during the week.)

White woman with glasses and braided brown hair wearing a colorwork-yoked sweater stands in front of the picture as a selfie. In the background, two long narrow strips of fabric hang on the white wall behind her. The fabric is made up of small hexagons in colors ranging from white and pink to red and dark red.

I don’t yet have an answer for how to handle the 2022 data. It’s a bigger question of how to adjust when a dataset changes. Do I average weekly data to fit everything into the existing format (e.g. rows of identically colored hexagons)? Do I visualize weekly data (e.g. one long thin strip) and, if so, how to I make this harmonize with my existing visualizations?

There are no “best” answers here. But I do know that we need to remember those that we’ve lost in this ongoing pandemic, however that might look.

Posted in dataVisualization | 1 Comment

A Summary of Research Data Documentation Methods

A phrase that often comes up when data librarians speak about documentation is that “documentation is a love letter to your future self.” Basically, documentation is necessary because research data rarely speak for themselves and researchers forget details over time; therefore documentation is necessary in order to interpret data in the future.

Different disciplines gravitate to different forms of documentation (e.g. chemists use laboratory notebooks but social scientists running surveys more often leverage codebooks) but this often means that researchers are not aware of the full scope of documentation methods beyond what is preferred within their discipline. While I’ve written a lot about documentation on this blog already, I wanted a quick way to show researchers the full range of documentation possibilities. So I developed a 2-page handout that summarizes different forms of documentation and when to use them.

The handout summarizes seven different documentation forms to expose researcher to methods that might improve their data management and work better in certain parts of their research workflow. These seven methods include:

  • Laboratory Notebook, Field Notebook, or Research Notebook
  • e-Lab Notebook (ELN)
  • README.txt
  • Templates: Data Sheet, Collection Sheet, or Field Sheet
  • Data Dictionary
  • Codebook
  • Metadata Schema, Standard, or Taxonomy

I’ve blogged about most of these documentation types in more detail, but I hope it’s helpful to be able to review them all in one place.

Please do check out the new documentation handout and feel free to reuse it – it’s licensed under a Creative Commons Attribution (CC BY) license, meaning you are welcome to use and remix it so long as you credit me. And thank you to Tom Morrell and Megan O’Donnell, who reviewed earlier version of this handout and suggested improvements.

I hope this new resource is helpful to you all!

Posted in documentation | Leave a comment

“Managing Data for Patron Privacy” is Here!

I’m thrilled to announce that my second book is officially out! I’m pleased to share the book “Managing Data for Patron Privacy: Comprehensive Strategies for Libraries“, which was co-written with the amazing Becky Yoose and published by ALA Editions.

Cover of the book "Managing Data for Patron Privacy" surrounded by lock, folder, and laptop icons on a blue background.

Summary: Libraries are not exempt from the financial costs of data breaches or leaks, no matter the size. Whether from a library worker unwittingly sharing a patron’s address with a perpetrator of domestic violence to leaving sensitive patron data unprotected, patrons can also pay a hefty price when libraries fail to manage patron data securely and ethically. In Kristin Briney and Becky Yoose’s new guide “Managing Data for Patron Privacy: Comprehensive Strategies for Libraries,” published by ALA Editions, readers will learn concrete action steps for putting the ethical management of data into practice, following two common public and academic library cumulative case studies. The authors explore such key topics as:

  • succinct summaries of major U.S. laws and other regulations and standards governing patron data management;
  • information security practices to protect patrons and libraries from common threats;
  • how to navigate barriers in organizational culture when implementing data privacy measures;
  • sources for publicly available, customizable privacy training material for library workers;
  • the data life cycle from planning and collecting to disposal;
  • how to conduct a data inventory;
  • understanding the associated privacy risks of different types of library data;
  • why the current popular model of library assessment can become a huge privacy invasion;
  • addressing key topics while keeping your privacy policy clear and understandable to patrons; and
  • data privacy and security provisions to look for in vendor contracts.

On a more personal note, this will probably always be “the COVID book” in my mind as I got the request to submit a proposal for what would become this book the week after the USA shut down in March 2020, I caught COVID halfway through writing the book and wrote half of my chapters while dealing with long COVID, and caught COVID again the week the book was published. The last two years have been absolute roller coasters but I could not have asked for a better partner to bring this project to light than Becky. I’m so incredibly proud of what we did together. I hope that you all find value in the book and buy a copy so that us two ex-Wisconsinites can afford to get The Good Cheese shipped to the west coast.

Posted in admin, libraries, privacy | 1 Comment

Should Researchers Use a Standard Folder Structure?

I love to teach people about data management and file organization, but I tend to talk mostly about file naming conventions and ISO 8601. These two strategies are incredibly helpful in keeping files organized and easy to find, but file organization also has a role.

My usual thoughts about file organization are to have a logical way to organize your files and to put files in the correct folders. Coupled with strong file naming conventions, having some established folder system usually works well enough and is flexible to account for the wide variety of data types.

That said, I’ve been reading a couple data management resources recently – “Managing your Research Data and Documentation” by Kathy Berenson and Towards a Standardized Research Folder Structure on the Gen R Blog – that recommend a specific folder organization structure for research files and data. For example, they advocate for having a folder for each project with separate defined folders for primary data, data analysis, and data dissemination, in addition to having other folders for content like grant administrata, etc. These two resources outline folder templates, though the proposed structures aren’t identical.

The recommendation to use a specific folder structure for research data and files has me thinking about the value of such templates for research. On one hand, it’s incredibly useful to having a defined and well organized hierarchy to manage and find content. On the other hand, research is very heterogenous by nature and no one folder structure template is guaranteed to work for all types of research.

I keep coming back to the fact that data management skills are a toolkit and you use the tools you need to make the work easier and leave the rest. I don’t think there is one right answer when it comes to folder organization, but having an established structure may be beneficial to many researchers. As I teach file organization in the future, I plan to use the recommendations from Berenson and the Gen R blog as examples for people to follow if they choose.

I’d love to hear from others if you would find a standard file structure useful in your research?

Posted in dataManagement | Leave a comment