A phrase that often comes up when data librarians speak about documentation is that “documentation is a love letter to your future self.” Basically, documentation is necessary because research data rarely speak for themselves and researchers forget details over time; therefore documentation is necessary in order to interpret data in the future.
Different disciplines gravitate to different forms of documentation (e.g. chemists use laboratory notebooks but social scientists running surveys more often leverage codebooks) but this often means that researchers are not aware of the full scope of documentation methods beyond what is preferred within their discipline. While I’ve written a lot about documentation on this blog already, I wanted a quick way to show researchers the full range of documentation possibilities. So I developed a 2-page handout that summarizes different forms of documentation and when to use them.
The handout summarizes seven different documentation forms to expose researcher to methods that might improve their data management and work better in certain parts of their research workflow. These seven methods include:
Laboratory Notebook, Field Notebook, or Research Notebook
e-Lab Notebook (ELN)
README.txt
Templates: Data Sheet, Collection Sheet, or Field Sheet
Data Dictionary
Codebook
Metadata Schema, Standard, or Taxonomy
I’ve blogged about most of these documentation types in more detail, but I hope it’s helpful to be able to review them all in one place.
Please do check out the new documentation handout and feel free to reuse it – it’s licensed under a Creative Commons Attribution (CC BY) license, meaning you are welcome to use and remix it so long as you credit me. And thank you to Tom Morrell and Megan O’Donnell, who reviewed earlier version of this handout and suggested improvements.
Summary: Libraries are not exempt from the financial costs of data breaches or leaks, no matter the size. Whether from a library worker unwittingly sharing a patron’s address with a perpetrator of domestic violence to leaving sensitive patron data unprotected, patrons can also pay a hefty price when libraries fail to manage patron data securely and ethically. In Kristin Briney and Becky Yoose’s new guide “Managing Data for Patron Privacy: Comprehensive Strategies for Libraries,” published by ALA Editions, readers will learn concrete action steps for putting the ethical management of data into practice, following two common public and academic library cumulative case studies. The authors explore such key topics as:
succinct summaries of major U.S. laws and other regulations and standards governing patron data management;
information security practices to protect patrons and libraries from common threats;
how to navigate barriers in organizational culture when implementing data privacy measures;
sources for publicly available, customizable privacy training material for library workers;
the data life cycle from planning and collecting to disposal;
how to conduct a data inventory;
understanding the associated privacy risks of different types of library data;
why the current popular model of library assessment can become a huge privacy invasion;
addressing key topics while keeping your privacy policy clear and understandable to patrons; and
data privacy and security provisions to look for in vendor contracts.
On a more personal note, this will probably always be “the COVID book” in my mind as I got the request to submit a proposal for what would become this book the week after the USA shut down in March 2020, I caught COVID halfway through writing the book and wrote half of my chapters while dealing with long COVID, and caught COVID again the week the book was published. The last two years have been absolute roller coasters but I could not have asked for a better partner to bring this project to light than Becky. I’m so incredibly proud of what we did together. I hope that you all find value in the book and buy a copy so that us two ex-Wisconsinites can afford to get The Good Cheese shipped to the west coast.
I love to teach people about data management and file organization, but I tend to talk mostly about file naming conventions and ISO 8601. These two strategies are incredibly helpful in keeping files organized and easy to find, but file organization also has a role.
My usual thoughts about file organization are to have a logical way to organize your files and to put files in the correct folders. Coupled with strong file naming conventions, having some established folder system usually works well enough and is flexible to account for the wide variety of data types.
That said, I’ve been reading a couple data management resources recently – “Managing your Research Data and Documentation” by Kathy Berenson and Towards a Standardized Research Folder Structure on the Gen R Blog – that recommend a specific folder organization structure for research files and data. For example, they advocate for having a folder for each project with separate defined folders for primary data, data analysis, and data dissemination, in addition to having other folders for content like grant administrata, etc. These two resources outline folder templates, though the proposed structures aren’t identical.
The recommendation to use a specific folder structure for research data and files has me thinking about the value of such templates for research. On one hand, it’s incredibly useful to having a defined and well organized hierarchy to manage and find content. On the other hand, research is very heterogenous by nature and no one folder structure template is guaranteed to work for all types of research.
I keep coming back to the fact that data management skills are a toolkit and you use the tools you need to make the work easier and leave the rest. I don’t think there is one right answer when it comes to folder organization, but having an established structure may be beneficial to many researchers. As I teach file organization in the future, I plan to use the recommendations from Berenson and the Gen R blog as examples for people to follow if they choose.
I’d love to hear from others if you would find a standard file structure useful in your research?
I’ve had two giant projects finish up in the last month and am already feeling their loss. The first project is the Data Doubles project, which I’ve been working on in one form or another since 2017. This team been an amazing group to work with and I will sorely miss our fortnightly group meetings.
Part of wrapping up the Data Doubles project involved creating a pile of outputs to share our research results with the world. I will summarize this content here and I hope you check some of it out.
If you are interested in what students think about the privacy of their data held by the university and the university library, I encourage you to check out:
The Data Doubles white paper which summarizes the results from the three phases of our research.
If you would like to reproduce our research at your own institution, we created a Toolkit of our research protocols that is shared in our OSF repository. These file are available under a CC BY-NC license, with the exception of our survey which is available under a CC BY-NC-ND license. The best place to get started with the Toolkit is with the Toolkit README file.
We also recently published the results of our survey (project phase 2) in Library Quarterly:
Asher, A., Briney, K. A., Jones, K. M. L., Regalado, M., Perry, M. R., Goben, A., Smale, M., & Salo, D. (2022). Questions of trust: A survey of student expectations and perspectives on library learning analytics. Library Quarterly, 92(2), 151-171. https://doi.org/10.1086/718605
Finally, there will be more Data Doubles publications in the future, including an article on our data management planning (we had four DMPs) that is currently under review.
Besides wrapping up the Data Doubles project, I recently finished writing my second book, Managing Data for Patron Privacy, written with Becky Yoose. The book is currently at the printer and will come out in a couple months. I will definitely write up a post about it once it’s available!
With the Data Doubles project and the book done, I’m looking forward to having a little bit of quiet before I start on any new big adventures.
I’m reviewing the two books together because they provide parallel showcases of two visualization patterns: trees and circles. In structure and design, the books are obviously related with only the content being different between the two. One book is a collection of hierarchical visualizations and the other a volume full of round visualizations.
The two books are laid out identically. There is an introductory chapter describing the importance of the tree/circle iconography throughout history, followed by sections containing a wealth of images that are grouped by the author’s tree/circle taxonomies (more on that in a moment). There is no narrative in the latter sections; rather these sections are made up of a huge array of full-color examples from hundreds of years ago through modern day, each with a citation and short description.
The author classifies trees and circles into different structural types, called taxonomies, which both divide the book into discrete sections and help the reader interpret the visualizations. In the tree book, for example, there are sections on figurative trees, horizontal trees, radial trees, and rectangular treemaps, among others, each with its own taxonomic description and wealth of examples. This taxonomic structure provides the reader with a deeper way to engage with the overall visualization pattern and reflect on when one taxonomic structure would be preferable to another.
The timescales spanned by the visualizations in these books are a big part of their appeal. Seeing a diagram from a hand-scribed manuscript next to an AI-generated image reinforces trees and circles as archetypes for structuring information, while also demonstrating the range of styles that can be present within these archetypes. The images themselves visualize all types of information and the only similarity is in the structure of the display.
Examples from The Book of Circles of wheel and pie diagrams: a wheel of moral struggle from the 13th century (left), book artwork from 2007 (top), gold-ion collision data from Brookhaven National Lab (bottom middle), and a visualization of pi from 2012 (bottom right).
There are a couple difference between the two books. The circles book is both larger in size and about 50 pages longer. The organization of images also differs between the books; the tree examples are arranged from oldest to newest within taxonomic groups, while the circle examples are grouped by substructure within a taxonomic group with little regard for age. The circles book also veers into art, architecture, and maps, while the tree examples are more traditional data visualizations (though both contain dated attempts to rationalize the world through philosophy). I think I prefer the tree book for two reasons: 1) I’m more likely to visualize hierarchical information, meaning these images are more applicable to my work; and 2) I sometimes find circular visualizations difficult to interpret even though the images are still inspiring.
I’m really happy to have both books in my library alongside other my visualization books. At list prices of $30 (trees) and $40 (circles), they’re nice to have but not critical additions to a visualization collection. If you’re not a visualization or art history nerd, I recommend seeing if your local library has copies if you are looking for visualization inspiration or just some interesting imagery.
Overall, these books balance art, history, and data visualization in beautiful packages. They will not teach you how to visualize nor provide you with examples of the “best” visualizations. Rather, they provide deep views into two visualization families – trees and circles – and inspire you to think deeply about their history and use.
Year 3 of this pandemic is quickly approaching and one might think we’d be getting used to being in these “unprecedented times.” And yet the last several months have been extra challenging for me, particularly as a parent of small children (one of whom cannot be vaccinated yet). So this blog has been silent as my focus has been simply to get through the weeks with everyone being healthy and safe.
The good news is that I have a bunch of new stuff to talk about in 2022, including my second book which will be published this summer! I’ll write about everything in future posts, but for now I want to circle back to my handwoven COVID visualization from last year.
In January 2022 I wrote up a post for the Data Visualization Society’s blog, the Nightingale, that goes beyond the mechanics of the visualization to discuss how central my emotions and my anxiety were to creating my 2020 COVID visualization. With a little distance between finishing the visualization and now, it became clear to me that having an outlet for my pandemic-induced feelings was a critical, if yet untold, part of the visualization. I’m glad to finally be able to put into words what was originally only subconscious thoughts.
As a result of my post with the Nightingale, I was invited to participate in the COVID Calls podcast, which I’m sharing here:
In addition to discussing the visualization, I also share some of my thoughts as a science librarian and show off a couple hexagons from the 2021 edition of the visualization. Expect the 2021 visualization to appear on the blog later this year once I finally finish it.
That’s what I have for now: I’m still here and will be back with more exciting content soon.