Why I don’t like using Regex in Production code

A regex that was used to check validity of an email address

Recently I declined two suggested changes to a library I’m maintaining. Both PRs had one thing in common: they introduced a change that required running a regex match in order to do something.

While I love using regexes for small/non-prod scripts, I think it is a bad idea, in general, to use them in production. From over 15 years of experience as a backend-developer, I can hardly remember one instance where I thought that the use of regex in production code was justified.

After having to explain my reasoning to two engineers, I decided that the next one who’ll ask me about it will receive a link to a blog-post (the one you’re reading right now). So here we go:

Reasons against using regex in production code

  1. It quickly gets difficult to maintain
  2. It’s error-prone
  3. It’s difficult to test (and most of the time impossible to get full coverage)
  4. Performance hog (both memory and CPU)
  5. It’s usually not the right tool: we can get the same results by either using:
  • One of the String-functions: substring / indexOf / slice / endsWith / startsWith / …
  • Writing a small parser

So what regex is good for?

  • Get something done quickly (and dirty)
  • Code is not for prod, meaning, it’s temporary or a throwaway-code

One additional point

Most regex engines today are NFA-engines which are “greedy” (check every option = exponential = Memory & CPU hogs) and they’re popular because they support backtracking.

Some languages / libraries have DFA-engines which run linearly (they do not support backtracking) so their advantage is performance which eliminates reason #4 in the list above.

More about regex-engines can be found in: https://docs.microsoft.com/en-us/previous-versions/0yzc2yb0(v=vs.100)?redirectedfrom=MSDN

“Java is to JavaScript what Car is to Carpet.” - Chris Heilmann

“Java is to JavaScript what Car is to Carpet.” - Chris Heilmann