I’ve been reviewing data management books recently and picked up a copy of “Ecological Data: Design, Management and Processing,” edited by William K. Michener and James W. Brunt. Despite this book being published in 2000, meaning it references outdated technology, this little book is a gem.
One section in the second chapter of the book, written by Blunt, really stuck with me: his data management philosophy. The philosophy relies on an adherence to two principles:
1) Start small, keep it simple, and be flexible
2) Involve scientists in the data management process
The first principle is one I use extensively in teaching data management, even though I’ve never named it explicitly. For example, at the end of most teaching sessions, I remind my audience to start slow by incorporating one data practice at a time. After that practice is routine, researchers can add a second practice until that is routine, etc.
As for flexibility, this is what makes it a joy and a pain to teach data management. The answer to so many questions I get from researchers about how to manage data is “it depends”. It depends on their workflows and what works best for them. I can teach how to create a file naming convention but the best file naming convention depends on the files and how one searches for them. And even then, sometimes people have to bulk rename files to make them easier to organize and find. Data management must be flexible because research is so heterogenous.
The second of Blunt’s principles really reinforces the need for data management to be context dependent. A small group of researchers is going to know best how to organize and manage their files. Similarly, scientific subfields have developed norms for metadata and data sharing, building systems that work best for those researchers.
Some of my favorite data management outcomes have come from consultations where I provide the structure, the researcher provides the context, and we jointly come up with something that works really well. Scientists don’t have to know all about data management, but data management really shines when scientists are involved in the data decisions.
I think Blunt’s data management philosophy lays a pretty good foundation. It aligns with the way I teach and consult on data management and will be a useful framework for going forward.
If I had to come up with my own data management philosophy, I might borrow an adage from camping: leave the campsite better than you found it. For camping, this means to not only pack out whatever you packed in, but to also find ways to improve your surroundings so that the effect of humans being present is less notable. For data management, I interpret the adage to mean that you should keep making improvements, no matter how small. So, while it can be nice to have large structures to protect us from the elements, the small things (like keeping everything tidy and clean) really do have a big impact. It’s not a perfect metaphor but it still encapsulates the way I think about a lot of incremental data management strategies.
I know that these are not the only data management philosophies out there, but they do provide an interesting insight into some ways to engage with data management. Do you have a data management philosophy?