Recently I declined two suggested changes to a library I’m maintaining. Both PRs had one thing in common: they introduced a change that required running a regex match in order to do something.
While I love using regexes for small/non-prod scripts, I think it is a bad idea, in general, to use them in production. From over 15 years of experience as a backend-developer, I can hardly remember one instance where I thought that the use of regex in production code was justified.
After having to explain my reasoning to two engineers, I decided that the next one who’ll ask me about it will receive a link to a blog-post (the one you’re reading right now). So here we go:
Reasons against using regex in production code
- It quickly gets difficult to maintain
- It’s error-prone
- It’s difficult to test (and most of the time impossible to get full coverage)
- Performance hog (both memory and CPU)
- It’s usually not the right tool: we can get the same results by either using:
- One of the String-functions: substring / indexOf / slice / endsWith / startsWith / …
- Writing a small parser
So what regex is good for?
- Get something done quickly (and dirty)
- Code is not for prod, meaning, it’s temporary or a throwaway-code
One additional point
Most regex engines today are NFA-engines which are “greedy” (check every option = exponential = Memory & CPU hogs) and they’re popular because they support backtracking.
Some languages / libraries have DFA-engines which run linearly (they do not support backtracking) so their advantage is performance which eliminates reason #4 in the list above.
More about regex-engines can be found in: https://docs.microsoft.com/en-us/previous-versions/0yzc2yb0(v=vs.100)?redirectedfrom=MSDN