Empty character class in JavaScript regexes
10 Apr 2023
I contributed to PCRE and wrote two smaller regular expression engines, but I still regularly learn something new about this topic. This time, it's about a regex that never matches.
When using character classes, you can specify the allowed characters in brackets, such as [a-z] or [aeiouy]. But what happens if the character class is empty?
Popular regex engines treat the empty brackets [] differently. In JavaScript, they never match. This is a valid JavaScript code, and it always prints false regardless of the value of str:
const str = 'a'; console.log(/[]/.test(str));
However, in Java, PHP (PCRE), Go, and Python, the same regex throws an exception:
// Java
@Test
void testRegex1() {
PatternSyntaxException e = assertThrows(PatternSyntaxException.class,
() -> Pattern.compile("[]"));
assertEquals("Unclosed character class", e.getDescription());
}
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
// Emits a warning: preg_match(): Compilation failed: missing terminating ] for character class
echo preg_match('/[]/', ']') ? 'Match ' : 'No match';
# Python
import re
re.compile('[]') # throws "unterminated character set"
In these languages, you can put the closing bracket right after the opening bracket to avoid escaping the former:
// Java
@Test
void testRegex2() {
Pattern p = Pattern.compile("[]]");
Matcher m = p.matcher("]");
assertTrue(m.matches());
}
<?php
echo preg_match('/[]]/', ']', $m) ? 'Match ' : 'No match'; // Outputs 'Match'
print_r($m);
# Python
import re
print(re.match('[]]', ']')) # outputs the Match object
// Go
package main
import (
"fmt"
"regexp"
)
func main() {
matched, err := regexp.MatchString(`[]]`, "]")
fmt.Println(matched, err)
}
This won't work in JavaScript because the first ] is interpreted as the end of the character class there, so the same regular expression in JavaScript means an empty character class that never matches, followed by a closing bracket. As the result, the regular expression never finds the closing bracket:
// JavaScript
console.log(/[]]/.test(']')); // outputs false
If you negate the empty character class with ^ in JavaScript, it will match any character including newlines:
console.log(/[^]/.test('')); // outputs false
console.log(/[^]/.test('a')); // outputs true
console.log(/[^]/.test('\n')); // outputs true
Again, this is an invalid regex in other languages. PCRE can emulate the JavaScript behavior if you pass the PCRE2_ALLOW_EMPTY_CLASS option to pcre_compile. PHP never passes this flag.
If you want to match an opening or a closing bracket, this somewhat cryptic regular expression will help you in Java, PHP, Python, or Go: [][]. The first opening bracket starts the character class, which includes the literal closing bracket and the literal opening bracket, and finally, the last closing bracket ends the class.
In JavaScript, you need to escape the closing bracket like this: [\][]
console.log(/[\][]/.test('[')); // outputs true
console.log(/[\][]/.test(']')); // outputs true
In Aba Search and Replace, I chose to support the syntax used in Java/PHP/Python/Go. There are many other ways to construct a regular expression that always fails, in case you need it. So it makes sense to use this syntax for a literal closing bracket.
Stop jumping between browser tabs and random online tools. Aba Search and Replace is your Swiss army knife for fast, safe text updates across multiple files and data conversions, with all your data staying on your computer. Built for developers, testers, and analysts.
This is a blog about Aba Search and Replace, a tool for replacing text in multiple files.
- Unix and JavaScript timestamps
- Replace only the Nth match
- Aba 2.8 released
- Anonymizing a dataset by replacing names with counters
- Automatically add width and height to img tags
- Using zero-width assertions in regular expressions
- Aba 2.7 released
- Regular Expressions 101
- 2023 in review
- Regular expression for numbers
- Aba 2.6 released
- Search from the Windows command prompt
- Empty character class in JavaScript regexes
- Privacy Policy Update - December 2022
- Aba 2.5 released
- Our response to the war in Ukraine
- Check VAT ID with regular expressions and VIES
- Which special characters must be escaped in regular expressions?
- Aba 2.4 released
- Privacy Policy Update - April 2021
- Review of Aba Search and Replace with video
- Aba 2.2 released
- Discount on Aba Search and Replace
- Using search and replace to rename a method
- Cleaning the output of a converter
- Aba 2.1 released
- How to replace HTML tags using regular expressions
- Video trailer for Aba
- Aba 2.0 released
