Aba logo

The elements of regular expressions

Character lists [a-z] and classes \s

A character list matches one character in the specified range:

[a-z] Any English letter (from a to z)
[aeiouy] Any English vowel
[^aeiouy ] Not vowel and not space
[0-9a-f] A hexadecimal digit (from 0 to 9, or from A to F)

A character class matches one character of the specified type (letter, digit, etc.):

\d   A digit
\s Space, tab or line separator
\w A letter

Dot (.) matches any character.

Repetitions * + ?

The star * matches the previous element zero or more time, the plus + matches it one or more times, and ? matches it zero or one times. Use *? and +? for non-greedy matching. Braces {min,max} match the previous element from min to max times.

0* None or several zeros
\w+ A word (one or more letters)
(http://)? Optional http:// string
<b>(.*?)</b> <b> tag
\d{4} Exactly four digits
(abc){1,4} “abc” repeated from one to four times
[a-z]{3,}? At least three letters (non-greedy)

Anchors ^ and $

Caret ^ matches at the beginning of the line, dollar sign $ matches at the end of the line:

^[0-9] Digit at the beginning of a line
\\$ Backslash at the end of a line

Alternative |

(http|ftp|https):// Any of the these protocols: http:// or ftp:// or https://

Extension syntax

Some advanced features are available through extension syntax: opening parenthesis, question mark, and then 1-2 characters that determine the meaning of the element:

(?=abc) Check that the next characters are “abc” (lookahead)
(?:abc)+ “abc” repeated one or more times (without capturing a subexpression)
(?>.*?,){20}slithy A comma-separated list of 21 values, the last of which is “slithy” (using atomic group)

Backreferences \1

Parentheses mark subexpressions, which are numbered from left to right. You can refer to previously defined subexpressions with \1, \2, etc. syntax:

(\d+) \1 Find two equal numbers with space between them (e.g., 123 123)

Text and metacharacters

If you want to search for a word or a number, you could enter them as is. For example, to search for cocoa, just type this word.

The metacharacters are:

[ ] * + ? { } . ( ) ^ $ | \

If you want to find one of metacharacters, type \ before it. For example, to find the text $100, use \$100. If you want to find the backslash itself, repeat it: \\.

This is a page from Aba Search and Replace help file.