Level Up Your Regex

Level up your regex skills

Regular expressions for fun and profit

Regular expressions have a bad reputation. They’re hard, they’re opaque, they break things.

They even [broke Stack Overflow](https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016).They even broke Stack Overflow.

Relying on built-in modules for string matching is fine for some tasks, but you’ll miss out on the power and flexibility that comes from being able to writing your own regex. (If you’re really dedicated, you can even make them do arithmetic.)

This post will show you some use cases for regex you might not have tried before and give you resources to make learning them — dare I say — fun.

Make your f-strings more versatile

Python’s f-strings are fantastic — they’re readable, concise, and less error-prone than the older %-formatting method.

You can get even more out of them with regex. Prefix you expression with *r *for “raw”, which tells Python to ignore all the escape characters.

But beware, even a raw string can’t consist **entirely** of a single backslash.*But beware, even a raw string can’t consist entirely of a single backslash.*

Combine r with the *f *prefix when you need to swap parts of your regex. You can use this to write shorter loops or hold a place for a value you don’t have yet or need to compile. (Here’s an example of the latter case.)

Use re.compile to pre-compile a regular expression object.Use re.compile to pre-compile a regular expression object.

Find and replace text with precision

Regex give you a parsimonious way to alter the content of strings. In one line, you can target the items to replace and alter them using capturing groups.

Here, I’ve used this technique to scan sentences from news articles and make hashtags from the word “alien”.

Use a numbered reference to the capturing group enclosed in parentheses.Use a numbered reference to the capturing group enclosed in parentheses.

The second expression replaces the targeted substring in the call to re.sub.The second expression replaces the targeted substring in the call to re.sub.

Get valuable information from noisy data

I use regex in my everyday life to simplify other tasks (yes, really). For example, I wanted a list of packages from a requirements.txt file, but didn’t want their specific versions.

Not pleasant.Not pleasant.

Regex prevented the tedium of having to extract the package names manually. You can see how I did it at Regex101. I like using BBEdit (formerly TextWrangler) for this, but you can also use the “export matches” feature in Regex101. The website gives you the added benefit of debugging your expression in real time.

The time spent learning regex pays for itself several times over by saving you from tedious searching. I’ve used regex to extract regular expressions from other Python scripts and grepping for files on the command line.

Train your brain and enjoy the challenge

In applying regex, you’ll improve your computational thinking skills by decomposing a search problem, abstracting patterns, and applying them algorithmically.

I’m fun at parties.I’m fun at parties.

But the best reason to use regex might be that they’re just fun. If you’re the kind of person who enjoys puzzles, you’ll get hooked on finding different ways to solve the same problem and resolving edge cases.

While it’s true that regular expressions can be hard and sometimes dangerous, most of the best things in life are. Do a few crosswords, play a little regex golf, and see if you don’t agree.

Published 9 Sep 2019

A data science blog for everyone.
Lora Johns on Twitter