Every once and a while, I run across a topic and think, “surely I’ve covered this one my blog?” The most recent of these topics, which came up when I wrote December’s Exit Strategy post, concerns file formats.
File formats and exit strategies go hand-in-hand, as you don’t want to be stuck with inaccessible data in an unreadable file format. Proprietary, out-of-date, little used formats lower the chance that you’ll be able to access your data when you need it. Ask anyone with 20-year old files if they can use those files and you’ll likely see why file formats matter.
When it comes to choosing file formats that last, file types actually exist a spectrum. The best formats are open, well-documented, and in wide use. Clear examples are .TXT instead of .DOCX, or .CSV instead of .XLSX or .SAS. In the middle of the spectrum we find something like .PDF, which is an Adobe file format but in such wide use that it will be usable for many years. Also note that the spectrum of preferred file formats shifts over time (hello Lotus Notes and WordPerfect!).
Since there is no one right answer for any data type, the key thing for picking a good file format is to ask yourself if your content is currently in a file format that is uncommon or can only be opened by a specific software program. If the answer is yes, now is the time to make a backup copy of that data in a more open format. Even if you lose some formatting in the process, it’s better to have some data in an open format than having no data because it’s locked in an unreadable format. By making a copy, you also don’t have to lose the performance of the original file format while gaining the sustainability of the new; you can, of course, wholly switch to a better format if that is feasible. It’s also worth reviewing old data for formats that are no longer popular or supported.
Good data management is the sum of a number of small practices and picking good files formats is a piece of this puzzle. The more you are aware of how closed your current file formats are, the better you can plan for making that data usable into the future.
One phrase that’s bound to come up at every data management conference is “carrot versus stick” vis-a-vis incentivizing researchers to manage their data better. Carrots are rewards for good practices and sticks are requirements and their consequences relating to data management. There is inevitably discussion over which method is more effective for implementing data management.
Another phrase that I often hear in similar settings is “eating our own dog food” or “drinking our own champagne”. This is another way of saying “practice what you preach”, in that data experts should apply their advice to their own files.
These phrases are used so often that I’ve decided that they need to be combined as “eating our own carrot sticks”. It’s at least more appetizing than some of the other “eating our own…” options and a bit of snarkiness provides relief from predictability.
But to say something serious in this blog post, all of these phrases emphasize the importance of *doing* data management. It’s not enough to have the knowledge or to be given the incentive. It is only in the act of actually managing the data that we get value.
So I challenge you, whether you are a data management novice or an expert, to find one new data management practice to implement this month. Because a carrot stick a day keeps the data disaster away*.
* Okay, now I’m taking it too far, I know. I can’t help myself.
There was a long discussion on twitter yesterday (okay, I went on a rant) about the vast number of data management books that have been published for librarians in the past few years. While not exclusively data management books for librarians, here is the long list of data management books that I’m aware of:
- Big data, little data, no data : scholarship in the networked world by Christine L Borgman
- Curating research data, volume one: practical strategies for your digital repository by Lisa Johnston [Open Access]
- Curating research data, volume two: a handbook of current practice by Lisa Johnston [Open Access]
- Data information literacy : librarians, data, and the education of a new generation of researchers by Jake Carlson & Lisa Johnston
- Data management : a practical guide for librarians by Margaret Henderson
- Data management for libraries : a LITA guide by Laura Krier & Carly A Strasser
- Data management for researchers : organize, maintain and share your data for research success by Kristin Briney
- Databrarianship : the academic data librarian in theory and practice by Lynda M Kellam & Kristi Thompson
- Delivering research data management services : fundamentals of good practice by Graham Pryor, Sarah Jones, & Angus Whyte
- Digital curation by Ross Harvey
- Exploring research data management by Andrew Cox
- Managing and sharing research data : a guide to good practice by Louise Corti, Veerle Van den Eynden, Libby Bishop, & Matthew Woollard
- Managing research data by Graham Pryor
- Research data management : practical strategies for information professionals by Joyce M Ray
- Scholarship in the digital age : information, infrastructure, and the Internet by Christine L Borgman
- The data librarian’s handbook by Robin Rice & John Southall
- The Medical Library Association guide to data management for librarians by Lisa Federer
We do not need any more “here’s how to build data services at a large research institution in a western country” books, thank you. I would happily buy books about data services at smaller institutions, in non-western countries, for data service support beyond PhDs and faculty, for building on data information literacy principles, and how to manage data when you’re a researcher (I’m happy to have competition for my own book!).
Please feel free to point people to this post when someone suggests writing/publishing another “building data services for librarians” book.
It is the 5-year anniversary of me starting this blog! I can’t believe that it’s already been 5 years. How did that happen?! I put up my very first post on 2013-02-20:
The Blog I Wish I Had
A lot has changed between then and now – I finished my MLIS, started my current job, published a book, and pursued some pretty interesting data-related research projects – but this blog has continued to be a wonderful project for me.
To celebrate, I’m giving away a softback copy of my book, Data Management for Researchers. I’ll even sign it for you!
Details: Leave a comment on this post by 2018-02-28 describing your worst data disaster; the worse the data disaster the more likely I’ll feel you need my book. A winner will be chosen on 2018-03-01. United States only.
Twice in the month of January, I had to find files from an old project. With resignation, I delved into old folders only to find that, wow, there’s a “FinalDocuments” subfolder with everything I need all laid out for me in a well documented way. Both of these times I was so, so thankful that my past self had the forethought to organize things for the future.
Looking back through the blog archives, I realize that I wrote a “Wrapping Up a Project” post 4 years ago! (Related: how the heck is this blog 5 years old?!) I still stand beside the advice I gave in that post that researchers should: back up their notes, convert file formats, use README.txt files, and keep everything together. These are generally useful strategies for all of your files that make it likely for you to still have everything in 5-10 years. However, most often when you go back to your old files you are looking for something specific.
This is why, in this post, I’m recommending that you add a step during project wrap up to select key information to copy to a “FINAL” folder (or some obvious variant of that name). Example documents include: a copy of the final publication, the raw dataset and the analyzed dataset, finalized scripts, key protocols, and JPEG files for figures. Basically you should identify the information that you will mostly likely need to refer to later and place all the final versions together in one folder. And then write a README.txt file to describe the contents of that folder.
Without this added step, you will still have your files and can open them, but you’ll likely waste a lot of time looking for exactly what you need. And even with this step, there will be times when you’ll have to dig through all of the project files to find something specific. But 90% of the time, you will save time by placing these key documents in an obvious place.
Trust me, your future self with thank you for taking 20 minutes, while you still understand the files and their organization, to set aside the important stuff for later.
I couldn’t be more excited that my latest journal article, “Gaining Competency: Learning to Teach Data Visualization,” was just published in the Journal of eScience Librarianship.
The idea behind the paper is: how do we as librarians teach data skills in an area, specifically data visualization, in which we often have little expertise? Data librarians teach many data competencies, but the “data visualization” competency has always been an awkward one. We see a lot of desire for help in this area but don’t always have the expertise to meet this need. Data visualization has not historically be associated with the library and it isn’t covered in our usual data-management-based curricula. This paper seeks to close this gap.
I admit that it’s a quirky little paper. So many papers in librarianship are in the “we did this awesome thing” mold, and there’s a health dose of that in this paper in that I discuss the data visualization workshop I offer at my institution. However, I decided to go a step further by describing the lead-up process in which I prepared to teach the workshop. I thought it might benefit other librarians to have a framework for developing our own skills to the point where we can help others with data visualization.
So if you’ve ever thought about supporting data visualization but don’t feel like you have the requisite skills, I encourage you to check out my new paper. It’s part of a larger data visualization special issue and I’m certainly looking forward to digging into the whole issue!