Which special characters must be escaped in regular expressions?
8 Jan 2022
In most regular expression engines (PCRE, JavaScript, Python, Go, and Java), these special characters must be escaped outside of character classes:
[ * + ? { . ( ) ^ $ | \
If you want to find one of these metacharacters literally, please add \
before it. For example, to find the text $100
, use \$100
. If you want to find the backslash itself, double it: \\
.
Inside character classes [square brackets], you must escape the following characters:
\ ] -
For example, to find an opening or a closing bracket, use [[\]]
.
If you need to include the dash into a character class, you can make it the first or the last character instead of escaping it. Use [a-z-]
or [a-z\-]
to find a Latin letter or a dash.
If you need to include the caret ^ into a character class, it cannot be the first character; otherwise, it will be interpreted as any character except the specified ones. For example: [^aeiouy]
means "any character except vowels", while [a^eiouy]
means "any vowel or a caret". Alternatively, you can escape the caret: [\^aeiouy]
JavaScript
In JavaScript, you also need to escape the slash /
in regular expression literals:
/AC\/DC/.test('AC/DC')
Lone closing brackets ]
and }
are allowed by default, but if you use the 'u' flag, then you must escape them:
/]}/.test(']}') // true /]}/u.test(']}') // throws an exception
This feature is specific for JavaScript; lone closing brackets are allowed in other languages.
If you create a regular expression on the fly from a user-supplied string, you can use the following function to properly escape the special characters:
function escapeRe(str) { return str.replace(/[[\]*+?{}.()^$|\\-]/g, '\\$&'); } var re = new RegExp(escapeRe(start) + '.*?' + escapeRe(end));
PHP
In PHP, you have the preg_quote function to insert a user-supplied string into a regular expression pattern. In addition to the characters listed above, it also escapes #
(in 7.3.0 and higher), the null terminator, and the following characters: = ! < > : -
, which do not have a special meaning in PCRE regular expressions but are sometimes used as delimiters. Closing brackets ]
and }
are escaped, too, which is unnecessary:
preg_match('/]}/', ']}'); // returns 1
Just like in JavaScript, you also need to escape the delimiter, which is usually /
, but you can use another special character such as #
or =
if the slash appears inside your pattern:
if (preg_match('/\/posts\/([0-9]+)/', $path, $matches)) { } // Can be simplified to: if (preg_match('#/posts/([0-9]+)#', $path, $matches)) { }
Note that preg_quote does not escape the tilde ~
and the slash /
, so you should not use them as delimiters if you construct regexes from strings.
In double quotes, \1
and $
are interpreted differently than in regular expressions, so the best practice is:
- to use single quotes with preg_match, preg_replace, etc.;
- to repeat backslash 4 times if you need to match a literal backslash. This is because you need to escape the backslash in the regular expression, but you also need to escape it in the single-quoted string. So it's escaped twice:
$text = 'C:\\Program files\\'; echo $text; if (preg_match('/C:\\\\Program files\\\\/', $text, $matches)) { print_r($matches); }
Python
Python has a raw string syntax (r''
), which conveniently avoids the backslash escaping idiosyncrasies of PHP:
import re re.match(r'C:\\Program files/Tools', 'C:\\Program files/Tools')
You only need to escape the quote in raw strings:
re.match(r'\'', "'") re.match(r"'", "'") // or just use double quotes if you have a regex with a single quote re.match(r"\"", '"') re.match(r'"', '"') // or use single quotes if you have a regex with a double quote re.match(r'"\'', '"\'') // multiple quote types; cannot avoid escaping them
A raw string literal cannot end with a single backslash, but this is not a problem for a valid regular expression.
To match a literal ]
inside a character class, you can make it the first character: [][]
matches a closing or an opening bracket. Aba Search & Replace supports this syntax, but other programming languages do not. You can also quote the ]
character with a slash, which works in all languages: [\][]
or [[\]]
.
For inserting a string into a regular expression, Python offers the re.escape method. Unlike JavaScript with the u
flag, Python tolerates escaping non-special punctuation characters, so this function also escapes -
, #
, &
, and ~
:
print(re.escape(r'-#&~')) // prints \-\#\&\~ re.match(r'\@\~', '@~') // matches
Java
Java allows escaping non-special punctuation characters, too:
Assert.assertTrue(Pattern.matches("\\@\\}\\] }]", "@}] }]"));
Similarly to PHP, you need to repeat the backslash character 4 times, but in Java, you also must double the backslash character when escaping other characters:
Assert.assertTrue(Pattern.matches("C:\\\\Program files \\(x86\\)\\\\", "C:\\Program files (x86)\\"));
This is because the backslash must be escaped in a Java string literal, so if you want to pass \\ \[
to the regular expression engine, you need to double each backslash: "\\\\ \\["
. There are no raw string literals in Java, so regular expressions are just usual strings.
There is the Pattern.quote method for inserting a string into a regular expression. It surrounds the string with \Q
and \E
, which escapes multiple characters in Java regexes (borrowed from Perl). If the string contains \E
, it will be escaped with the backslash \
:
Assert.assertEquals("\\Q()\\E", Pattern.quote("()")); Assert.assertEquals("\\Q\\E\\\\E\\Q\\E", Pattern.quote("\\E")); Assert.assertEquals("\\Q(\\E\\\\E\\Q)\\E", Pattern.quote("(\\E)"));
The \Q...\E
syntax is another way to escape multiple special characters that you can use. Besides Java, it's supported in PHP/PCRE and Go regular expressions, but not in Python nor in JavaScript.
Go
Go raw string literals are characters between back quotes: `\(`
. It's preferable to use them for regular expressions because you don't need to double-escape the backslash:
r := regexp.MustCompile(`\(text\)`) fmt.Println(r.FindString("(text)"))
A back quote cannot be used in a raw string literal, so you have to resort to the usual "`"
string syntax for it. But this is a rare character.
The \Q...\E
syntax is supported, too:
r := regexp.MustCompile(`\Q||\E`) fmt.Println(r.FindString("||"))
There is a regexp.QuoteMeta method for inserting strings into a regular expression. In addition to the characters listed above, it also escapes closing brackets ]
and }
.
Replacing text in several files used to be a tedious and error-prone task. Aba Search and Replace solves the problem, allowing you to correct errors on your web pages, replace banners and copyright notices, change method names, and perform other text-processing tasks.
This is a blog about Aba Search and Replace, a tool for replacing text in multiple files.
- Search from the Windows command prompt
- Empty character class in JavaScript regexes
- Privacy Policy Update - December 2022
- Aba 2.5 released
- Our response to the war in Ukraine
- Check VAT ID with regular expressions and VIES
- Which special characters must be escaped in regular expressions?
- Aba 2.4 released
- Privacy Policy Update - April 2021
- Review of Aba Search and Replace with video
- Aba 2.2 released
- Discount on Aba Search and Replace
- Using search and replace to rename a method
- Cleaning the output of a converter
- Aba 2.1 released
- How to replace HTML tags using regular expressions
- Video trailer for Aba
- Aba 2.0 released