Character lists [a-z] and classes \s

The purpose of character lists and classes is the same: to match one character of the specified kind. When using a list, you explicitly specify the characters you want to match. When using a class, you only specify the type (letter, digit, etc.)

Character lists

A character list matches one character in the specified range. You can list all allowable characters (e.g., [abcdef]) or use a range (e.g., [a-f]):

`[aeiouy]`	Any English vowel
`[a-f] or [abcdef]`	Any letter from a to f
`[0-9a-f]`	A hexadecimal digit (from 0 to 9, or from A to F)

The letters in a range are ordered by their Unicode codes, even if you search in ANSI files. You can find the order of letters in Unicode charts or in the Windows charmap utility. For example, to search for an extended ANSI character, use [¡-ÿ], because U+00A1 ¡ is the first printable character in Latin-1 block, and U+00FF ÿ is the last one.

`[µÞ-öø-ÿ]`	A letter from Latin-1 charset
`[Þ-ÿ]`	A Latin-1 letter (if you don't care about µ and ÷)
`[a-zÞ-ÿ]`	A Latin-1 letter, including basic Latin alphabet
`[À-ž]`	A letter from “Latin-1” or “Latin Extended A” blocks (for Western European languages)
`[a-ząćęłńóśźż]`	A Polish letter
`[^ -~\r\n\t]`	A non-ASCII character (including unusual control characters and extended ASCII)

This behavior differs from grep and other Unix regex tools. For example, when using grep with French locale, the range [a-z] includes the letters with diacritics (àéè). In Aba, [a-z] matches only a letter from the basic Latin alphabet. To include the letters with diacritic marks in search, use [a-zÞ-ÿ].

A range cannot be larger than 256 characters, for example, [Z-Ж] is wrong (here, Z is a Latin letter and Ж is a Cyrillic letter).

If you need to include the characters ] or - in the range, put them right after the opening bracket:

`[][]`	Closing bracket ] or opening bracket [
`[-a-z]`	Latin letters and dash -

Another option is to escape them:

`[[\]]`	Opening bracket [ or closing bracket ]
`[a-z\-]`	Latin letters and dash -

Using the ^ metacharacter, you can find any character except the specified ones:

`[^aeiouy]`	Any character except vowels
`[^a-z]`	Any character except Latin letters

You can use character classes and escape sequences inside the brackets:

`[\t ]`	A tab or a space
`[\d\s]`	A digit, a space, or a newline

Character classes

A character class matches one character of the specified type (letter, digit, etc.) Here is the full list of supported classes:

`\d`	A digit or a numeric character (e.g., `4` or `½`)
`\D`	Anything but a digit
`\w`	A word character (a letter, a digit, or a underscore _)
`\W`	Any character except word characters
`\s`	Space, tab, newline character, or other separator
`\S`	Anything but a separator

Note that the classes include international characters. For example, \w matches not only English letters, but also German umlauts and French letters with diacritics (and also Greek and Russian letters, Chinese ideographs, etc.) This differs from Perl regular expressions, which include only English letters in \w class and require special notation \P{IsAlpha} for other languages.

If you need to find only Latin letters (say, you are looking for programming language identifiers), use [a-z].

Also, note that \s matches newlines. If you want to match only spaces and tabs, please use [ \t].

Escape sequences

You can use C-like escape sequences \t, \r, \n to match tab, carriage return, and line feed. Only these 3 escape sequences are supported.

`\t`	Tab
`\r`	Carriage return
`\n`	Line feed (newline)

However, it's recommended to press Enter instead of typing \r\n, because the former will match both Windows (CR+LF) and Unix-style (LF) line terminators.

Any character

Use a dot . to match any character. Note that a dot matches newlines, too. (In Perl terms, Aba always has the /s modifier on.) If you want to match any character except newline, use \N (borrowed from Perl 6).

`.`	Any character
`\N`	Any character except CR and LF

Previous topic | Next topic

This is a page from Aba Search and Replace help file.

Download now

Welcome to Aba
Getting started
How-to guides
Regular Expressions
- Understanding regular expressions
- The elements of regular expressions
- Character lists [a-z] and classes \s
- Repetitions * + ?
- Anchors ^ and $
- Alternative |
- Lookaround (?= ) (?< )
- Atomic groups (?>)
- Backreferences \1
- Additional reading
Replacement syntax
User interface
Command line
Troubleshooting
Glossary
Version history
Credits