The thud you might have heard yesterday was NIH dropping a new Data Management and Sharing Policy. It won’t go into effect until 2023-01-25 but the policy has so many ramifications that I don’t plan to waste time in preparing.
I’m going to do a short overview of initial thoughts here. I expect that I’ll be working through all of the nuances more in the weeks to come.
Here are the highlights for how this policy effects researchers:
- All NIH grants will be required to have a 2-page maximum data management plan (DMP). NIH expects researchers to: be clear in the DMP about where they plan to share (“to be determined” is no longer acceptable), notify them if plans change, and actually follow the plan.
- You will be sharing more data, as NIH not only wants the data that underlies publications but all data that verifies results.
- You will be sharing data sooner. NIH prefers if you share as soon as possible, but at the latest sharing should occur with publication or at the end of the grant period, which ever comes first. That last part is a huge change.
- You will share your data in a repository. Criteria for data repositories are provided in a supplement and I expect to see more in this area between now and 2023.
- You can ask for money to support data management and sharing activities, including pre-paying for long-term hosting of open data.
- If you conduct data on people, sharing expectations are changing. NIH really wants researchers to fine-tune the balance between sharing and privacy. Two mechanisms explicitly called out are outlining sharing practices during informed consent and controlled data sharing, even for de-identified data. This is another area where I want to see more development.
- If you are doing research on indigenous populations, you must respect Tribal sovereignty. This is a great addition to the policy.
I think this is a good policy, though it’s definitely overdue. I don’t love the lack of clarity around retention times and I’m not sure how I feel about review of DMPs shifting from peer reviewers to program officers. But these are minor quibbles in what I think is a pretty solid policy.
The biggest takeaways is that this policy represents a shift in expectations for data sharing. It has stronger requirements than the NSF data policy and will really move things forward. Some people are going to hate it and it’s going to be a big adjustment, but it’s a win for reproducibility and open data.
Today, we’re going to discuss what happens when you don’t end up liking your file naming conventions. (Every time I think I’ve covered file naming conventions enough, I find something new on this topic to talk about. They are my favorite data management trick, after all. Sorry not sorry.)
So anyway, what do you do when your file naming convention isn’t working well for you? You use a file renaming tool to apply a new naming convention! I use Bulk Rename Utility for Windows, but there other good tools available.
A file renamer lets you add information to your file name, remove information, and move pieces of your file name around, among other things. Don’t like the date at the end of the file name? A renamer can move it to the beginning of the file name, no problem.
The biggest benefit of a file renamer is that you can easily rename a whole set of files at the same time instead of renaming files one-by-one. A file renamer will save you so much time and can mean the difference between being able to rename your files or not.
The one thing to note about a file renamer is that it works best when you start with consistent file names to convert. If your file names are an inconsistent mess, a file renamer is not going to help you at all. But even a little consistency can help you break your files into manageable chunks. A file renamer also demonstrates the benefit of separating information (metadata) in file names with dashes or underscores, as they help you process particular sections of your file name independently of the others.
A file renamer is the type of tool that I don’t need often, but it saves me so much time when I do. I hope that, enlightened of their existence, they will help you too!
If you’re a regular reader of my blog, you’ll know that one of my goals is for all researchers to adopt the basic data management practices that make conducting research easier. I’ve written a whole book on data management, done videos, created checklists, written numerous blog posts, etc., but it will never be enough until researchers are regularly taught these skills. Until that point, I’ll keep sending the gospel of data out in the world in different formats, hoping to reach new audiences.
My latest iteration of educating about the principles of data management is in the form of a research article in RIO. I really like the article format because it’s just enough space to provide a broad overview of the basic data management practices. And if readers want to learn more, we’ve provided a handy list of citations!
The new article covers 10 practices of data management that my coauthors and I consider to be foundational:
- Practice 1: Keep sufficient documentation
- Practice 2: Organize files and name them consistently
- Practice 3: Version the Files
- Practice 4: Create a security plan, when applicable
- Practice 5: Define roles and responsibilities
- Practice 6: Back up the data
- Practice 7: Identify tool constraints
- Practice 8: Close out the project
- Practice 9: Put the data in a repository
- Practice 10: Write these conventions down [in a data management plan]
This is by no means the complete scope of data management but rather a good introduction. Honestly, if you implement all ten practices into your research, you’re going to be doing very well with your data.
So if you or a peer are looking for a general introduction to research data management, check out my new article “Foundational Practices of Research Data Management.”
Citation: Briney KA, Coates H, Goben A (2020) Foundational Practices of Research Data Management. Research Ideas and Outcomes 6: e56508. https://doi.org/10.3897/rio.6.e56508
Continuing in my pandemic reading of data books, next up is “How Charts Lie: Getting Smarter about Visual Information” by Alberto Cairo. (I didn’t plan to be a predominately book review blog, but I need a way to channel the pandemic anxiety, so here we are.)
This book is a little different than other visualization books I’ve been reading because it focuses on visual literacy (which Cairo calls “graphicacy”) instead of chart design. Because charts appear by their nature more authoritative (they show “facts” and make such information easy to understand), we need to train ourselves to critically assess the information displayed. This book provides the framework for an individual to engage with and dissect the charts we regularly see in the news and on social media and decide what’s accurate.
Cairo uses his experience as a chart designer and chart consumer to break down the major ways that charts lie. Each type of lie gets covered in its own chapter in the book:
- Poor design
- Displaying dubious data
- Displaying insufficient data
- Concealing or confusing uncertainty
- Suggesting misleading patterns
You’ll notice that these mistakes aren’t all about chart design; many chart issues concern the data that’s being visualized, including everything from displaying percentages instead of absolute numbers on a map to vetting data sources. Cairo provides ways to think through the many mistakes that are made in data selection, because even the prettiest and easiest-to-read chart can lie to us by getting the data wrong .
What’s nice about the book is that it doesn’t assume that charts are intentionally lying to us. Sometimes designers make honest mistakes and sometimes trade-offs have to be made. Cairo walks the reader through exemplar visualizations and shows us how different choices affect the accuracy and design of the chart. By discussing the data selection and visualization decision process as well as showing how these choices affect the final design, Cairo provides the reader with the mental scaffolding to critically assess charts.
As with any data book, Cairo uses plenty of examples throughout this book. What I found interesting is how many of these examples were drawn from recent politics; the book actually starts by dissecting a graphic that Donald Trump shared in April 2017. While I appreciate the American cultural touchstones (and it’s nice to rage at some of the bad charts we’ve seen in recent years), I do worry that this book will lose some of its relevance over time.
Overall, this is a good book for any information consumer to read and will also help visualization designers learn to avoid pitfalls and assess design trade-offs. I would also recommend it to my fellow librarians who do information literacy instruction; the visual literacy discussed in this book is a perfect compliment to the work we’re already doing with students around assessing text-based resources.
I’ve been working on a lot of data management resources at work recently. At my last position, I was really focused on 3-5 minute videos but I’m currently taken with the concept of 1-2 page data management handouts. I described the first new resource — the project close-out checklist for research data — in a recent post and I’ll also link here to my new DMP resources: an updated DMP checklist, some DMP standard language for my university, and an example DMP.
This post, however, is dedicated to a new worksheet that I’m incredibly excited to share: the file naming convention worksheet. File naming conventions are, hands down, my favorite data management strategy to teach. They are just so simple and so useful and save you so much time later when you try to find files. They’re also vital to success on team projects with shared data.
While I’ve already written about file naming conventions, and blogged about them, and done a video about them, I really like the worksheet format for walking you through all of the steps required to create your own convention. There’s something about the enumerated steps and having a physical takeaway that I hope will be really helpful to people.
As with the close-out checklist, I’ve made the file naming convention worksheet available as a branded pdf and as a generic, editable Microsoft Word file under a CC BY license. Please do share any feedback you have on this document as I’d love to improve it over time to make it really usable for people.
Researchers tend to think about data management at key times during a project, such as when writing a data management plan for grant funding and when preparing for data collection. But there’s one other critical time for data management in the project lifecycle: when a project ends and/or a researcher leaves the project.
I’ve actually blogged about project close out twice before (here and here) because it’s an area where I’ve had my own successes and failures. I’ve lost data in projects where I didn’t do data close out and have saved myself several large headaches on projects where I did close out. But here’s the important thing: project close out isn’t actually that difficult, it’s just that there is hardly any guidance on how to do it.
Enter the “Project Close Out Checklist for Research Data“! Born out of a discussion with Jonathan Petters and Abigail Goben at the RDAP Summit in 2020, this checklist describes a range of activities for helping ensure that research data are properly managed at the end of a project. Activities include: making stewardship decisions, preparing files for archiving, sharing data, and setting aside important files in a “FINAL” folder.
Two versions of the checklist are available: a Caltech Library branded version and a generic editable version. I’m sharing the checklist under a CC BY license, so please reuse and remix with attribution.
My hope is that this checklist will help researchers be able to use their data well into the future!