I’ve been doing a lot of research into accessibility recently, specifically thinking about how to make research data files more accessible. There is a lot of existing content about the accessibility of common file types used in business (e.g. Word, PowerPoint, Excel, etc.), but only a little content specific to the accessibility of research data. Part of the issue is that a lot of our guidance around research data focuses on reusability and computability – guidance that sometimes conflicts with accessibility principles.
All of this has me thinking about the humble TXT file. Data management and sharing experts commonly recommend writing README.txt files to accompany shared data files (I myself have given this guidance many, many times). The TXT file type is recommended because it’s a simple file type that can be opened by many software programs, including the command line, making it so users don’t need special proprietary software to read these files. TXT files come up a lot for open documentation and often for data files themselves, especially when doing text analysis.
The problem with TXT files, however, is that they are not very accessible. There is zero extra formatting in a TXT file, meaning there is zero formatting for accessibility in a TXT file. Features that make text files more accessible include headings, hyperlinks, bullet points, etc.. TXT files don’t support these, let alone allow for content like images and tables. Unless the TXT file is very short, it’s going to be challenging to make a TXT document that is maximally accessible for a disabled user to navigate and read (that’s not to say someone using a screen reader can’t read a TXT file; rather, it will be inefficient to navigate).
The TXT’s role in documentation is even more concerning when considering recent research on data reuse by Koesten, et al. This group found that highly reused datasets on GitHub had more words, more headers, and more links in their README documentation files than less reused GitHub datasets. This is a correlation, not causation, but it makes sense that longer documentation makes for easier data reuse. My concern is that these helpful extras – like headers, links, and tables – are not supported by TXT files.
So where does that leave us? Microsoft Word has a ton of accessibility features, to the point where it’s the recommended file format in the U.S. government’s text document accessibility tutorial. But Word is a proprietary format owned by Microsoft. It’s now a bit easier to open and edit such files due to Google Docs, but using a proprietary file type for important data and documentation still raises concerns for me around reusability and computability.
Other alternatives for text-based document types are PDF and LaTeX (which can be converted into PDF). However, PDFs are notoriously difficult to make accessible; you need knowledge of how to make PDFs accessible and you have to use the paid version of Adobe Acrobat to edit the accessibility settings. LaTeX has some support for accessibility, but LaTeX accessibility is a currently developing area and, again, requires a lot of knowledge of how to do.
I’m personally very interested in Markdown (MD or RMD) for filling this documentation accessibility/reusability gap. In fact, Koesten’s research (cited earlier) looked at datasets on GitHub, which uses Markdown as the default file format for README documentation files. Markdown is an open text format that supports formatting like headings, hyperlinks, bullet points, etc.. Markdown does this by using special characters to signify where formatting should be applied to specific text. This does have a learning curve, but it’s not as challenging to learn as LaTeX. Markdown also requires tools to convert the marked up text into HTML, PDF, Word, etc., which means Markdown integration into other systems may be a limiting factor for the general population’s adoption of this file format.
I’m not sure I have a clear answer to the challenge posed in this post. The bigger issue is that we must start considering accessibility in our default guidance for data management and sharing. And by considering accessibility, it will start to change our default guidance, hopefully for something better. As for text files and documentation, I think Markdown can fill an important gap for accessible and reusable text, but I also recognize that many researchers don’t have the knowledge and infrastructure to make this switch at the present time.
What are your thoughts about the humble TXT file?



