Data Ab Initio

Foundational Practices of Research Data Management

by Kristin Briney Posted on 2020-08-11

If you’re a regular reader of my blog, you’ll know that one of my goals is for all researchers to adopt the basic data management practices that make conducting research easier. I’ve written a whole book on data management, done videos, created checklists, written numerous blog posts, etc., but it will never be enough until researchers are regularly taught these skills. Until that point, I’ll keep sending the gospel of data out in the world in different formats, hoping to reach new audiences.

My latest iteration of educating about the principles of data management is in the form of a research article in RIO. I really like the article format because it’s just enough space to provide a broad overview of the basic data management practices. And if readers want to learn more, we’ve provided a handy list of citations!

The new article covers 10 practices of data management that my coauthors and I consider to be foundational:

Practice 1: Keep sufficient documentation
Practice 2: Organize files and name them consistently
Practice 3: Version the Files
Practice 4: Create a security plan, when applicable
Practice 5: Define roles and responsibilities
Practice 6: Back up the data
Practice 7: Identify tool constraints
Practice 8: Close out the project
Practice 9: Put the data in a repository
Practice 10: Write these conventions down [in a data management plan]

This is by no means the complete scope of data management but rather a good introduction. Honestly, if you implement all ten practices into your research, you’re going to be doing very well with your data.

So if you or a peer are looking for a general introduction to research data management, check out my new article “Foundational Practices of Research Data Management.”

Citation: Briney KA, Coates H, Goben A (2020) Foundational Practices of Research Data Management. Research Ideas and Outcomes 6: e56508. https://doi.org/10.3897/rio.6.e56508

Posted in dataManagement | Leave a comment

Book Review: How Charts Lie

by Kristin Briney Posted on 2020-07-07

Continuing in my pandemic reading of data books, next up is “How Charts Lie: Getting Smarter about Visual Information” by Alberto Cairo. (I didn’t plan to be a predominately book review blog, but I need a way to channel the pandemic anxiety, so here we are.)

This book is a little different than other visualization books I’ve been reading because it focuses on visual literacy (which Cairo calls “graphicacy”) instead of chart design. Because charts appear by their nature more authoritative (they show “facts” and make such information easy to understand), we need to train ourselves to critically assess the information displayed. This book provides the framework for an individual to engage with and dissect the charts we regularly see in the news and on social media and decide what’s accurate.

Cairo uses his experience as a chart designer and chart consumer to break down the major ways that charts lie. Each type of lie gets covered in its own chapter in the book:

Poor design
Displaying dubious data
Displaying insufficient data
Concealing or confusing uncertainty
Suggesting misleading patterns

You’ll notice that these mistakes aren’t all about chart design; many chart issues concern the data that’s being visualized, including everything from displaying percentages instead of absolute numbers on a map to vetting data sources. Cairo provides ways to think through the many mistakes that are made in data selection, because even the prettiest and easiest-to-read chart can lie to us by getting the data wrong .

What’s nice about the book is that it doesn’t assume that charts are intentionally lying to us. Sometimes designers make honest mistakes and sometimes trade-offs have to be made. Cairo walks the reader through exemplar visualizations and shows us how different choices affect the accuracy and design of the chart. By discussing the data selection and visualization decision process as well as showing how these choices affect the final design, Cairo provides the reader with the mental scaffolding to critically assess charts.

As with any data book, Cairo uses plenty of examples throughout this book. What I found interesting is how many of these examples were drawn from recent politics; the book actually starts by dissecting a graphic that Donald Trump shared in April 2017. While I appreciate the American cultural touchstones (and it’s nice to rage at some of the bad charts we’ve seen in recent years), I do worry that this book will lose some of its relevance over time.

Overall, this is a good book for any information consumer to read and will also help visualization designers learn to avoid pitfalls and assess design trade-offs. I would also recommend it to my fellow librarians who do information literacy instruction; the visual literacy discussed in this book is a perfect compliment to the work we’re already doing with students around assessing text-based resources.

Posted in bookReview, dataVisualization | Leave a comment

File Naming Convention Worksheet

by Kristin Briney Posted on 2020-06-09

I’ve been working on a lot of data management resources at work recently. At my last position, I was really focused on 3-5 minute videos but I’m currently taken with the concept of 1-2 page data management handouts. I described the first new resource — the project close-out checklist for research data — in a recent post and I’ll also link here to my new DMP resources: an updated DMP checklist, some DMP standard language for my university, and an example DMP.

This post, however, is dedicated to a new worksheet that I’m incredibly excited to share: the file naming convention worksheet. File naming conventions are, hands down, my favorite data management strategy to teach. They are just so simple and so useful and save you so much time later when you try to find files. They’re also vital to success on team projects with shared data.

While I’ve already written about file naming conventions, and blogged about them, and done a video about them, I really like the worksheet format for walking you through all of the steps required to create your own convention. There’s something about the enumerated steps and having a physical takeaway that I hope will be really helpful to people.

As with the close-out checklist, I’ve made the file naming convention worksheet available as a branded pdf and as a generic, editable Microsoft Word file under a CC BY license. Please do share any feedback you have on this document as I’d love to improve it over time to make it really usable for people.

Posted in dataManagement, metadata | Leave a comment

Project Close Out Checklist for Research Data

by Kristin Briney Posted on 2020-05-26

Researchers tend to think about data management at key times during a project, such as when writing a data management plan for grant funding and when preparing for data collection. But there’s one other critical time for data management in the project lifecycle: when a project ends and/or a researcher leaves the project.

I’ve actually blogged about project close out twice before (here and here) because it’s an area where I’ve had my own successes and failures. I’ve lost data in projects where I didn’t do data close out and have saved myself several large headaches on projects where I did close out. But here’s the important thing: project close out isn’t actually that difficult, it’s just that there is hardly any guidance on how to do it.

Enter the “Project Close Out Checklist for Research Data“! Born out of a discussion with Jonathan Petters and Abigail Goben at the RDAP Summit in 2020, this checklist describes a range of activities for helping ensure that research data are properly managed at the end of a project. Activities include: making stewardship decisions, preparing files for archiving, sharing data, and setting aside important files in a “FINAL” folder.

Two versions of the checklist are available: a Caltech Library branded version and a generic editable version. I’m sharing the checklist under a CC BY license, so please reuse and remix with attribution.

My hope is that this checklist will help researchers be able to use their data well into the future!

Posted in dataManagement | 3 Comments

Recent Publications

by Kristin Briney Posted on 2020-05-20

It’s always nice to have new publications to put up on the blog, especially when they’re all things I’ve been working on for at least a year. If you’re interested in privacy and data and libraries, I hope you check them out!

Briney, Kristin, Becky Yoose, John Mark Ockerbloom, and Shea Swauger. “A Practical Guide to Performing a Library User Data Risk Assessment in Library-Built Systems.” Digital Library Federation, May 2020. http://doi.org/10.17605/OSF.IO/V2C3M.

Libraries collect data about the people they serve every day. While some data collection is necessary to provide services, responsible data management is essential to protect the privacy of our users and uphold our professional values. One of the ways to ensure responsible data management is to perform a Data Risk Assessment. A Data Risk Assessment is a process of identifying data the library collects about users, understanding how it manages that data, identifying the risks associated with that data, and then selecting an appropriate risk mitigation strategy.

Jones, K. M. L., Asher, A., Goben, A., Perry, M. R., Salo, D., Briney, K. A., & Robertshaw, M. B. (forthcoming). “We’re being tracked at all times”: Student perspectives of their privacy in relation to learning analytics in higher education. Journal of the Association for Information Science and Technology. https://doi.org/10.1002/asi.24358

Higher education institutions are continuing to develop their capacity for learning analytics (LA), which is a sociotechnical data‐mining and analytic practice. Institutions rarely inform their students about LA practices, and there exist significant privacy concerns. Without a clear student voice in the design of LA, institutions put themselves in an ethical gray area. To help fill this gap in practice and add to the growing literature on students’ privacy perspectives, this study reports findings from over 100 interviews with undergraduate students at eight U.S. higher education institutions. Findings demonstrate that students lacked awareness of educational data‐mining and analytic practices, as well as the data on which they rely. Students see potential in LA, but they presented nuanced arguments about when and with whom data should be shared; they also expressed why informed consent was valuable and necessary. The study uncovered perspectives on institutional trust that were heretofore unknown, as well as what actions might violate that trust. Institutions must balance their desire to implement LA with their obligation to educate students about their analytic practices and treat them as partners in the design of analytic strategies reliant on student data in order to protect their intellectual privacy.

Jones, K. M. L., Briney, K. A., Goben, A., Salo, D., Asher, A., & Perry, M. R. (2020). A comprehensive primer to library learning analytics practices, initiatives, and privacy issues. College & Research Libraries, 81(3), 570–591. https://doi.org/10.5860/crl.81.3.570

Universities are pursuing learning analytics practices to improve returns from their investments, develop behavioral and academic interventions to improve student success, and address political and financial pressures. Academic libraries are additionally undertaking learning analytics to demonstrate value to stakeholders, assess learning gains from instruction, and analyze student-library usage, et cetera. The adoption of these techniques leads to many professional ethics issues and practical concerns related to privacy. In this narrative literature review, we provide a foundational background in the field of learning analytics, library adoption of these practices, and identify ethical and practical privacy issues.

Posted in admin, libraries, privacy | Leave a comment

Book Review: Invisible Women

by Kristin Briney Posted on 2020-05-12

The book, Invisible Women: Data Bias in a World Designed for Men, is one of those books that I’m going to shove toward my unsuspecting friends and say “read this!”, it’s so good. It’s validating for women, will be eye-opening for men, and a vital read for everyone, particularly creators and consumers of data.

Invisible Women takes a critical look at the world where male (often white male) is the default and the harm this does to the female half of the population. It’s a data book because it’s entire thesis centers on data (here gender-disaggregated data), but it’s not a data book where you’ll find data actually analyzed. Rather, author Caroline Criado Perez argues that everything from economic policy to the design of cell phones is biased against women because data is either: not collected on women, not disaggregated by sex, or ignored when female-specific data actually exists. She collectively labels these problems as “the gender data gap.”

This book is incredibly well researched across a huge range of topics. Criado Perez covers everything from transit systems to unpaid labor to car safety design and cites experts, studies, and data (when it actually exists). The chapter on the gender data gap in medicine is particularly staggering. Taken together, her detailed research paints a stark picture of how broadly women are excluded from decision making at all levels.

What I particularly appreciate about Invisible Women is that Criado Perez moves beyond the litany of depressing facts and shows how better data collection and analysis can actually improve women’s lives. Then she cites real world examples of this occurring. By modeling how the experience should be instead of solely focusing on how depressing the situation currently is, Criado Perez demonstrates that designing with women’s needs in mind is both feasible and broadly beneficial to society.

The one deficiency of the book is that I consistently found myself wanting a broader acknowledgement that the gender data gap is compounded by race, disability, etc. Criado Perez cites a couple examples (U.S. maternity mortality in black women and the exclusion of U.S. black women in Hurricane Katrina recovery efforts) but overall falls short in this area. I understand that the book’s focus was on women and that this data gap is likely easiest to identify, since women represent a full half of the human population. That said, the strength of the data-gap argument is missing something essential when we fail to acknowledge that other data gaps exist and intersect with the gender data gap.

Overall, this book is fantastic and a necessary read for those who do any work in the data sphere. For those who aren’t data nerds, Criado Perez’s endless stream of facts is lightened by data success stories and a witty writing style, making this book accessible and enjoyable. I personally enjoyed the audiobook, which is read by the author herself. No matter the format, Invisible Women delivers critical facts on an important topic and is a highly recommended read.

Posted in bookReview | Leave a comment

Managing research data right, from the start

Foundational Practices of Research Data Management

Book Review: How Charts Lie

File Naming Convention Worksheet

Project Close Out Checklist for Research Data

Recent Publications

Book Review: Invisible Women

Search

Recent Posts

Archives

Categories

Meta