I unfortunately had a medical event right before November’s scheduled figshare webinar on “Making repositories and data digitally accessible.” I ended up having surgery in December and now that I’m feeling much better, we have rescheduled the webinar for February.
The webinar is on now on Tuesday, February 11, 2025 at 8am PT / 11am ET / 4pm GMT. You can register here for the webinar.
My colleague Megan O’Donnell and I will be speaking at an upcoming, free webinar on “Making repositories and data digitally accessible“. The webinar is on Monday, November 18, 2024, at 9am PT / 12pm ET / 5pm GMT.
I’m still travelling down the deep rabbit hole that is the topic of accessible data but it’s a larger discussion that the whole data community needs to start having. I’m looking forward to sharing what I know about how to make shared data files more accessible.
I’ve been participating in RDAP since I started working as a librarian and a member since 2019 when RDAP officially became a professional association. I remember when RDAP was moving from a conference to a professional association and one of the the pipe dreams was to have association awards. It took a few years but we finally got the awards up an running this year. I couldn’t be more humbled to win the very first RDAP Work of the Year award.
I also want to say congratulations to my peers, Ashley Thomas, Shannon Sheridan, and Megan O’Donnell, who shared the 2023 Volunteer of the Year award from RDAP.
I don’t have capacity in this blog post to do justice to the full topic of accessibility for scientific images. If you are interested in the basics of creating accessible data visualizations, I recommend you check out Chartability by Frank Elavsky and the Do No Harm Guide: Centering Accessibility in Data Visualization report by Jonathan Schwabish, Susan J. Popkin, and Alice Feng. There are many more resources available on the internet in this area, but those two links are a good place to start.
In this post, I want to discuss a more foundational requirement for accessibility and visualizations: alt text for scientific figures. Alt text, short for “alternative text”, is a textual description of a digital image. This text is critical for blind and low vision people whose screen reader software cannot otherwise interpret images; without alt text, blind people miss all information or context provided by a figure. Alt text also helps with search engine optimization and is presented instead of an image when the image file cannot load (due to low bandwidth, etc.).
All images on the internet should have accompanying alt text. But alt text isn’t something that comes up regularly when discussing the sharing of research results. So how do you write alt text for a scientific figure?
alt text = *Chart type* of *type of data* where *reason for including chart*. *Link to source data.*
Let’s look at an example using this figure from my most recent publication:
The chart type is “column chart” and the type of data, or y-axis data, is best summarized as “research data availability”. The point of the figure is that “research data on the internet disappears at a rate of 2.6% per year”. Using Cesal’s formula, the alt text for this figure would be :
Column chart of research data availability where research data on the internet disappears at a rate of 2.6% per year. For underlying data, see “Figure2_UnavailableByYear.csv” file at https://doi.org/10.22002/h5e81-spf62.
There are further examples in Cesal’s post to give you a better sense of how the formula works. She also advises repeating the formula for each separate panel in a multi-part scientific figure.
If you want to go beyond Cesal’s basic alt text for data visualizations, chapter four of the Do No Harm Guide offers a really nice look into several models for writing full alt text descriptions for visualizations. The image description guidelines from the DIAGRAM Center also provide more detailed recommendations for specific types of images, such as: chemical elements, Venn diagrams, line graphs, etc. Cesal’s shorthand formula is useful but fuller descriptions are preferable when you are able to write them.
I know this post links to a lot of information about alt text but the important thing is to have something written as alt text because so many images on the internet have no alt text at all. If you want to use Cesal’s quick formula instead of a fuller description, that’s 100 times better than having no image description at all.
Hopefully these resources have prompted you to think more about accessibility of your scientific visualizations so that, at the very least, you’ll include good alt text for the scientific images you share on your lab website, social media, or anywhere!
July is Disability Pride Month. I’ve been learning more about disability during the past year and this has naturally led me to thinking about accessibility of research data. Last year, my friend Abigail co-wrote a really nice call-to-action about making research data more accessible because this is an area where we have to catch up and do better.
The hard part about shared data though, is that such data is meant to be maximally reusable. For spreadsheets, this means thinking about the accessibility of the most reusable tabular file format: the CSV file. Karl Broman and Kara Woo published my favorite article on making spreadsheets reusable in 2017 but it doesn’t really address accessibility issues. Adding to this problem is the fact that strategies for reusability and accessibility can sometimes be in conflict.
After a bunch of research, I’ve taken a try at a checklist for making accessible and reusable CSV files for data sharing. It balances a bunch of stuff from the Broman and Woo article with a bunch of different guidance for accessibility. For example, I based recommendations for variable naming off of guidance for hashtags that are screen-reader friendly.
Please note: I am not disabled. I am a neurodivergent person with a couple chronic conditions and am not an expert on disability. I am sharing this because I’m requesting feedback so I can make updates to the checklist.
The goal is to create something that a scientist with zero background in accessibility can use to make their data more accessible and reusable. I don’t know if the the checklist will ever be perfect but it should still help make data better.
To determine where data is shared and what data is no longer available, this study analyzed data shared by researchers at a single university. 2166 supplemental data links were harvested from the university’s institutional repository and web scraped using R. All links that failed to scrape or could not be tested algorithmically were tested for availability by hand. Trends in data availability by link type, age of publication, and data source were examined for patterns. Results show that researchers shared data in hundreds of places. About two-thirds of links to shared data were in the form of URLs and one-third were DOIs, with several FTP links and links directly to files. A surprising 13.4% of shared URL links pointed to a website homepage rather than a specific record on a website. After testing, 5.4% the 2166 supplemental data links were found to be no longer available. DOIs were the type of shared link that was least likely to disappear with a 1.7% loss, with URL loss at 5.9% averaged over time. Links from older publications were more likely to be unavailable, with a data disappearance rate estimated at 2.6% per year, as well as links to data hosted on journal websites. The results support best practice guidance to share data in a data repository using a permanent identifier.
I’m really proud of this paper and was incredibly lucky that my institutional repository keeps such good records of links to shared data. It was fascinating to look at where researchers at my institution share data and to see how data is lost over time. It feels good to have added more evidence that you should be following funder guidance to put shared data in a data repository and to share a DOI link to that data.
I hope this analysis is useful for you and you enjoy the article!