Data Ab Initio

A Data Management Philosophy

by Kristin Briney Posted on 2024-03-19

I’ve been reviewing data management books recently and picked up a copy of “Ecological Data: Design, Management and Processing,” edited by William K. Michener and James W. Brunt. Despite this book being published in 2000, meaning it references outdated technology, this little book is a gem.

One section in the second chapter of the book, written by Blunt, really stuck with me: his data management philosophy. The philosophy relies on an adherence to two principles:

1) Start small, keep it simple, and be flexible
2) Involve scientists in the data management process

The first principle is one I use extensively in teaching data management, even though I’ve never named it explicitly. For example, at the end of most teaching sessions, I remind my audience to start slow by incorporating one data practice at a time. After that practice is routine, researchers can add a second practice until that is routine, etc.

As for flexibility, this is what makes it a joy and a pain to teach data management. The answer to so many questions I get from researchers about how to manage data is “it depends”. It depends on their workflows and what works best for them. I can teach how to create a file naming convention but the best file naming convention depends on the files and how one searches for them. And even then, sometimes people have to bulk rename files to make them easier to organize and find. Data management must be flexible because research is so heterogenous.

The second of Blunt’s principles really reinforces the need for data management to be context dependent. A small group of researchers is going to know best how to organize and manage their files. Similarly, scientific subfields have developed norms for metadata and data sharing, building systems that work best for those researchers.

Some of my favorite data management outcomes have come from consultations where I provide the structure, the researcher provides the context, and we jointly come up with something that works really well. Scientists don’t have to know all about data management, but data management really shines when scientists are involved in the data decisions.

I think Blunt’s data management philosophy lays a pretty good foundation. It aligns with the way I teach and consult on data management and will be a useful framework for going forward.

If I had to come up with my own data management philosophy, I might borrow an adage from camping: leave the campsite better than you found it. For camping, this means to not only pack out whatever you packed in, but to also find ways to improve your surroundings so that the effect of humans being present is less notable. For data management, I interpret the adage to mean that you should keep making improvements, no matter how small. So, while it can be nice to have large structures to protect us from the elements, the small things (like keeping everything tidy and clean) really do have a big impact. It’s not a perfect metaphor but it still encapsulates the way I think about a lot of incremental data management strategies.

I know that these are not the only data management philosophies out there, but they do provide an interesting insight into some ways to engage with data management. Do you have a data management philosophy?

Posted in dataManagement | Leave a comment

DataWorks! Prize

by Kristin Briney Posted on 2024-02-01

I’m extremely honored that Caltech Library won a 2023 DataWorks! prize from FASEB/NIH. These awards recognize efforts that promote data sharing and reuse in biological and biomedical research.

The Library won our award for my file naming convention worksheet, which first appeared on this blog in 2020. I’m amazed by the large scale projects that the other winners have put together but I’m also glad to see a small thing, like consistent and descriptive file naming, being recognized as important for data reuse.

File naming conventions are one of my favorite things to teach to new researchers because they are free to use and can have a big impact on the ability to find and reuse data. A good file naming system lets you easily find a specific file, disambiguate related files, and tell what’s in a file at a glance. Consistent and descriptive naming is especially helpful for collaborative research.

I can wax poetic about file naming for days, but I would rather you check out the worksheet and create a useful file naming convention for yourself. Trust me, your future self will appreciate it.

Posted in admin, metadata | Leave a comment

Just Published Commentary on Data Management and Research Misconduct

by Kristin Briney Posted on 2024-01-03

I really appreciate the blog Retraction Watch. I used this source heavily in writing my first book, Data Management for Researchers, and regularly cite stories from the blog in my teaching. It’s a fact of science that errors occur, and Retraction Watch makes those errors – both accidental and intentional – transparent.

The transparency brought about by Retraction Watch is part of a larger movement (see efforts such as the Center for Open Science and PubPeer) to stop scientific errors and research misconduct from occurring. It can be difficult to expose and fix such problems, but this is all part of the self-correction process that is fundamental to scientific research.

And here’s where data management comes in: good data management also prevents scientific errors and can curtail misconduct investigations. This is because managing data well results in a clear accounting of what was done to the data, in addition to well-organized and available data files. So when someone has a question about your research, it’s easy to put your hands on the relevant data and documentation to prove exactly what was done.

I had the honor of co-teaching a workshop about the relationship between data management and research misconduct at last year’s RDAP Summit with Heather Coates and Abigail Goben. And the ideas behind that workshop were recently published in the special RDAP issue of the Journal of eScience Librarianship as the commentary, “What if It Didn’t Happen: Data Management and Avoiding Research Misconduct“.

I’m not going to repeat the arguments of the commentary here in this blog post, but I will say that there are a lot of useful case studies in this area and there’s definitely potential for more work to be done on this topic. So I encourage you to jump over and read the commentary, and start thinking about the ways that data management can prevent research misconduct.

Citation: Coates, Heather, Abigail Goben, and Kristin Briney. 2023. “What if It Didn’t Happen: Data Management and Avoiding Research Misconduct.” Journal of eScience Librarianship 12(3): e746. https://doi.org/10.7191/jeslib.746.

Edited to add: the commentary was featured in the weekly round up on Retraction Watch on 2024-01-06!

Posted in dataManagement, researchMisconduct | Leave a comment

Living Data Management Plans

by Kristin Briney Posted on 2023-12-05

It’s well past time we discuss living data management plans (living DMPs). Somehow, I’ve been running this blog for over 10 years and I don’t have a post specifically discussing this important document type. I obviously need to fix this right now.

You’re probably wondering what a living DMP is and how it differs from a more “traditional”, grant-related data management plan. Honestly, the two-page document you turn in for a grant application is important but it’s often treated a box you have to check to make sure your grant submission is complete. A living DMP, on the other hand, is an evolving document that actually helps you manage your data during a project.

A living DMP describes how you will organize, name, store, and handle your data during a research project. While this is helpful for single-researcher projects, it’s invaluable for research done by a group. The living DMP makes sure that everyone is in agreement about how and where data will be stored and used. When someone needs to know where a find a specific dataset collected by someone else on the project, the living DMP should be the map for finding the file.

What makes this DMP “living” is that it should be updated whenever data handling practices change. A living DMP should accurately reflect the current data practices in the research project and should be added to when new procedures are developed.

The idea of a “living DMP” has been around for a while (I’m not sure who first came up with the term but I would love to give them credit for it) and it’s a document type that I’ve used several times. I made a living DMP for when generating files for my first book. More notably, I created three living DMPs for the Data Doubles project, one for each of the research phases of the project; we actually wrote up an article about the process of creating these DMPs and made the DMPs themselves publicly available.

So how do you create a living DMP and what should you put into it? To get started, see the Write a Living Data Management Plan (DMP) exercise in The Research Data Management Workbook. After that, add any data handling information you think is beneficial to record for later.

DMPs don’t have to just be boring documents for grant compliance. They can be helpful maps for decoding data practices when used as living documents. I hope I’ve convinced you to give this type of DMP a try in your next research project.

Posted in dataManagementPlans | Leave a comment

Five Ways to Nudge Labmates Toward Better Data Management

by Kristin Briney Posted on 2023-11-15

This post is aimed at the graduate students, post docs, and research scientists who have established good data management practices in their own work and now want to make a positive impact on their peer’s data management.

It can be challenging to talk to others about data management when you don’t have the authority to direct everyone in your research group to manage their data in a specific way. The person with the most authority to make a group implement good data practices is the head of the lab, but lab members also have the ability to impact their peers, though they have to be more considerate about it.

Here I’m suggesting a few gentle ways to introduce good data habits to labmates or start a conversation about data management without requiring someone else to implement a specific data practice:

1. Pick a data management paper for group meeting discussion.

Many research labs have group meetings where they discuss relevant publications in their field. While these articles often center around the group’s research field, paper discussions present an opportunity to introduce other research-related topics, like data management, to a larger group. Consider picking an article, such as my “Foundational Practices of Research Data Management” paper, at a future group meeting to introduce your peers to the overall topic.

2. Use a file naming convention when you share data with others.

File naming conventions are one of my favorite data management strategies to teach about and use. They also represent a great teach-by-example moment for when you have to share files with others. Sharing well-named files – plus the guide for interpreting these file names – provides a natural opportunity to demonstrate the benefits of good file naming and start a discussion about this data management practice.

3. Initiate a discussion about how to organize data on the shared lab server.

If a lab has a shared server, there’s a good chance that it’s a chaotic jumble of files (unless there’s a lab manager keeping everything organized). That’s not to say that you need to jump in to organize everyone’s files. Rather, initiate a conversation with lab members to develop a shared set of rules on how files should be organized and stored on the shared server. To be extra helpful, you can write these rules down in a README file and save it in the shared drive as documentation for future labmates.

4. Talk to the head of your lab about data permissions.

Ask the head of your lab if you are allowed to take a copy of data with you when you leave the laboratory and if you can publish with that data and under what conditions; for a full list of questions to ask, see the Determine Data Stewardship exercise from my Research Data Management Workbook. While this action doesn’t directly impact your peers, having a conversation with the head of the lab about data permissions makes the lab head aware that this is an issue for all lab members that should be addressed.

5. Ask a peer to review your research notes for clarity.

After undergrad, it’s not very common to have someone review our research notes. Yet getting peer feedback on notes can help ensure that we’re documenting our research at a level sufficient for reproducibility by someone with similar training. By taking this step, not only are you starting a conversation about good notetaking with a peer, you can also benefit from their feedback on your notes!

There are more ways to introduce peers to the topic of data management beyond these five ideas, but hopefully I’ve given you a starting point to consider the power that each of us has to help those around us with data management, even when we lack the authority to require good data practices.

Posted in dataManagement | Leave a comment

The Research Data Management Workbook

by Kristin Briney Posted on 2023-10-02

I am beyond thrilled to share with you my third book, The Research Data Management Workbook. This book is free and openly licensed (CC BY-NC), available both online and as a PDF or EPUB download. Here’s the citation:

Briney, K. (2023). The Research Data Management Workbook. Caltech Library. https://doi.org/10.7907/z6czh-7zx60

If you’ve been following the blog, you know that I’ve written extensively about practical data management. One of the challenging parts of data management education, however, is helping researchers bridge the gap between data management principles and implementing routine, customized data management strategies in their research practices. Part of this challenge has to do with the fact that data management, when done well, gets adapted to local context; for example, there is not one correct way to name a data file, rather there is a best way for you to name your data files.

The Research Data Management Workbook was designed to help researchers with implementing data management practices by providing them with structured, reproducible exercises for discrete data management tasks. For example, the Workbook exercise to improve notetaking centers on evaluating a previous laboratory notebook entry from 6-12 months ago. In the exercise, you: read through the old notebook entry; summarize what the entry was about; identify what information might be missing; evaluate if you could reproduce that entry’s research; highlight good and bad things about your notetaking; and then determining what improvements you should make to your notetaking going forward. It’s nice to know notetaking best practices to do the exercise, but the real focus is on key questions about your research practices and guiding you toward improvements that you specifically need to make.

The Research Data Management Workbook contains 15 exercises from across the data lifecycle, over half of which are completely new to the Workbook (most of the exercises from previously existing materials have been heavily edited and improved, though my favorite exercise on file naming conventions is in the Workbook and will look familiar). The majority of exercises are worksheets, designed for you to fill in answers to targeted questions, with a few checklist exercises and a pair of procedures.

The Workbook was published by Caltech Library and is free to download, use, and adapt, so long as you cite the original source and are not selling the material (the Workbook is under a Creative Commons Attribution Non Commercial 4.0 International license). I anticipate updating the Workbook over time and adding new exercises, so I would love to hear feedback – good or bad – to make the next editions even better.

I hope that you enjoy The Research Data Management Workbook and find it useful!

Posted in admin, dataManagement, dataManagementPlans, dataStorage, documentation, labNotebooks | Leave a comment

Managing research data right, from the start

A Data Management Philosophy

DataWorks! Prize

Just Published Commentary on Data Management and Research Misconduct

Living Data Management Plans

Five Ways to Nudge Labmates Toward Better Data Management

The Research Data Management Workbook

Search

Recent Posts

Archives

Categories

Meta