Documentation / Regular Expressions / Character lists [a-z] and classes \s
Character lists [a-z] and classes \s
The purpose of character lists and classes is the same: to match one character of the specified kind. When using a list, you explicitly specify the characters you want to match. When using a class, you only specify the type (letter, digit, etc.)
Character lists
A character list matches one character in the specified range. You can list all allowable characters (e.g., [abcdef]) or use a range (e.g., [a-f]):
| [aeiouy] | Any English vowel |
| [a-f] or [abcdef] | Any letter from a to f |
| [0-9a-f] | A hexadecimal digit (from 0 to 9, or from A to F) |
The letters in a range are ordered by their Unicode codes, even if you search in ANSI files. You can find the order of letters in Unicode charts or in Windows charmap utility. For example, to search for an extended ANSI character, use [¡-ÿ], because U+00A1 ¡ is the first printable character in Latin-1 block, and U+00FF ÿ is the last one.
| [µÞ-öø-ÿ] | A letter from Latin-1 charset |
| [Þ-ÿ] | A Latin-1 letter (if you don't care about µ and ÷) |
| [a-zÞ-ÿ] | A Latin-1 letter, including basic Latin alphabet |
| [À-ž] | A letter from “Latin-1” or “Latin Extended A” blocks (for Western European languages) |
| [α-ω] | A Greek letter (from alpha to omega) |
| [а-яё] | A Russian letter |
This behavior differs from grep and other Unix regexpr tools. For example, when using grep with French locale, the range [a-z] includes the letters with diacritics (àéè). In Aba, [a-z] matches only a letter from the basic Latin alphabet. To include the letters with diacritic marks in search, use [a-zÞ-ÿ] or \L character class.
The range cannot be larger than 256 characters, for example, [Z-Ж] is wrong (here, Z is a Latin letter and Ж is a Cyrillic letter).
If you need to include the characters ] or - in the range, put them right after the opening bracket:
| [][] | Closing bracket ] or opening bracket [ |
| [-a-z] | Latin letters and dash - |
Using ^ metacharacter, you can find any character except the specified ones:
| [^aeiouy] | Any character except vowels |
| [^a-z] | Any character except Latin letters |
Character classes
A character class matches one character of the specified type (letter, digit, etc.) Here is the full list of supported classes:
Common classes
| \d | A digit |
| \D | Anything but a digit |
| \w | A word character (a letter, a digit, or a underscore _) |
| \W | Any character except word characters |
| \s | Space, tab, newline character, or other separator |
| \a | A letter |
Note that the classes include international characters. For example, \w matches not only English letters, but also German umlauts and French letters with diacritics (and also Greek and Russian letters, Chinese ideographs, etc.) This differs from Perl regular expressions, which include only English letters in \w class and require special notation \P{IsAlpha} for other languages.
If you need to find only Latin letters (say, you are looking for programming language identifiers), use [a-z].
Also, note that \s matches newlines. If you want to match only spaces and tabs, use a character list: put space and tab between the brackets [ ]. Use Ctrl+Alt+Tab to insert the tab character.
Character categories
| \d | A digit or a numeric character (e.g., 4 or ½) |
| \u | An uppercase letter (e.g., W, Ψ, or Ł) |
| \l | A lowercase letter (š or я) |
| \p | A punctuation mark (, or ?) |
| \s | A separator (e.g., space or newline) |
| \y | A symbol (≈, +, or ♣) |
| \c | A control character (newline or escape) |
| \f | A format character (soft hyphen or zero width joiner) |
| \o | A private use character |
Scripts
| \L | Latin letter (e.g., X or ÿ) |
| \G | Greek letter (e.g., Σ or ω) |
| \C | Cyrillic letter (e.g., Д or ў) |
| \H | Han ideograph (女 or 茶) |
| \C | Common character, which is used in several scripts (7 or .) |
Any character
Use dot . to match any character. Note that a dot matches newlines, too. (In Perl terms, Aba always has /s modifier on.) If you want to match any character except newline, use \N (borrowed from Perl 6).
| . | Any character |
| \N | Any character except CR and LF |
