Last week, I discussed some of the policy requirements for retaining research data and I’d like to follow up by discussing how one goes about retaining research data for 10+ years. It’s a sad fact that many of us have digital files from 10 years ago that we are no longer able to open or read. For how common it is to have unreadable old files, we should not have to accept this fate for our research data.
The problem is that digital information is not like a book, which can be put on a shelf for 10 years and forgotten yet still be readable when you come back to it. Digital information requires upkeep so you can actually open and use your files 10 years into the future. Digital preservation also requires a little planning up front. The rule of thumb about data management really holds true here: 1 minute of planning now will save you 10 minutes of headache later.
There is a whole field dedicated to digital preservation, but I’d like to discuss a couple easy practices that will make it much easier for you to use your data 10 years from now. Because, as this recent study evidences, you never know when you’ll be out drinking with your research buddies and realize that the data you took 10 years ago could be repurposed for an awesome study (true story).
Do Immediately: Convert File Formats
One of the easiest things you can do to save your files for the future is to convert them to open file formats. The best formats are open, standardized, well-documented, and in wide use; examples include: .csv, .tiff, .txt, .dbf, and .pdf. Avoid proprietary file types whenever you can.
By choosing an open file format, you’re doing a lot to ensure that your files are readable down the road. For example, a lot of people have invested their information in .pdf’s, meaning that there will be a need to read .pdf files well into the future. Your .pdf data will be safer because of this. Likewise, saving spreadsheet data as .csv instead of .xslx means that your files aren’t tied to the fate of one particular software package.
I will also note that even when you convert your files, it’s a good idea to keep copies in both the old and new formats, just in case. You can sometimes lose functionality and formatting through the conversion, so it’s preferable to have the original files on hand if you can still read them.
Sometimes, it’s just not possible to convert to an open file format or retain the desired functionality. In this case, you’ll need to preserve any software and hardware necessary to open and interpret your digital files. This is more work than simple conversion, but can definitely save you a headache down the road.
The most important thing about converting file formats is to do it now instead of later. For example, even though I finished my PhD only a few years ago, a lot of my dissertation data is inaccessible to me because I no longer have access to the software program Igor Pro. If I had only converted my files to .csv’s before I left the lab, I would still be able to use my data if I needed to.
Do Periodically: Update Your Media
Beyond converting files to open formats, it’s also important to periodically examine the media that those files live on. It’s no use having a bunch of converted .txt files if they’re living on a floppy disk. I could probably track down floppy disk drive if I had to, but it would have been a lot easier to use those .txt files if I had moved them to a CD a few years ago.
Updating your media is not something you’ll need to do frequently but you should pay attention to the general ebb and flow of storage types. Being aware of such things will remind you to transfer your video interview data, for example, from VHS to DVD before you loose all access to a VCR.
When in doubt, there are places to send your old media to for recovery, but be aware that it will cost you. It’s much easier–and cheaper–just to update your media periodically as you go along.
Always a Good Idea: Documentation
Finally, documenting your data goes a long way toward ensuring that it is usable in the future. This is because scanning through a file is usually not enough to understand what you’re looking at or how the data were acquired. Documentation provides the context of a dataset and allows for the data to truly be usable.
Documentation becomes more important when preserving data for the long term because you’re likely to forget the context of a dataset in 10 years. You’re much more likely to have the information you need if you document your dataset while you are acquiring it. So while it’s not directly related to the logistics of keeping files readable, documentation is a critical part of preserving data for future use.
For all that I have a whole post about documentation, I’m sure I will keep talking about documentation on this blog because it’s so important to good data management.
A Final Thought
I will end this post with this final thought: you may be required to keep your data on hand for 7 years post-publication or post-grant, but what is the point of keeping them if you have no way of reading them? By doing the simple steps of converting your file to open file formats, periodically moving everything to more modern media, and maintaining good documentation habits, you’ll be doing yourself a huge favor for when you need your data 10 years after you acquired them.
Pingback: The Declining Availability of Data » Data Ab Initio