These are a Few of My Favorite Tools

If you follow me on twitter, you may have noticed a small rant the other day about how much I dislike how Excel handles dates (it’s a proven fact that Excel is terrible with dates). My beef with Excel got me thinking more about the programs I actually like to use for data processing and clean up. These are tools that I’d only stop using if you paid me serious money. And maybe even not then.

Since I love these tools so much and since I’m mistress of my own blog, I’m going to spend this post proselytizing about them to you because I want you to love them too. So here are my top 4 you-might-not-know-about-them-but-really-should tools for hacking at your data. Some of them only do a small job, but they do it incredibly well and are just what you need in a specific instance.

 

Regular Expressions

Regular expressions (or regex) are a somewhat obscure little tool for search and replace but I’ve found nothing better for cleaning up text. Regular expressions work by pattern matching text and have a lot of flexibility. Need to find every phone number in a document? Want to clean up dates? Have to reformat a document while keeping the text the same? Regex is the tool for you. Regex isn’t a standalone program but rather plugs into other tools (including all of the tools below, as well as some programming languages). I recommend this tutorial for getting started.

Notepad++

While not as dazzling as the other players on this list, Notepad++ is my go-to text editor and the main platform I use for leveraging regular expressions. It’s always good to have an open source text editor around and this one is my particular workhorse.

OpenRefine

OpenRefine (formerly Google Refine) in an open source tool for cleaning up spreadsheet data. This tool allows you to dig into your data by faceting it across any number of variables. I find it particularly handy for generating counts; for example, it’s incredibly easy to find out how many times {variable1=X AND variable2=Y} versus {variable1=X AND variable2=Z}. Faceting also allows for editing of select subsets of data. You can also do stronger data manipulations, such as streamlining inconsistent spelling and formatting or breaking multi-component cells apart/collapsing columns together. I recommend this tutorial for getting started.

Bulk Rename Utility

Need to rename a number of files? Bulk Rename Utility is the tool for you! This software allows you to rename files in very specific ways such as: adding characters to the ending, removing characters from the name beginning, changing something at a specific position in the name, and much more. You can also add numbering and dates to file names or do a custom search and replace with regular expressions. I don’t use Bulk Rename Utility a lot, but it saves me a ton of time and energy when I do.

This entry was posted in dataAnalysis. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *