The elements of regular expressions

Character lists [a-z] and classes \s

A character list matches one character in the specified range:

[a-z] Any English letter (from a to z)
[aeiouy] Any English vowel
[^aeiouy ] Not vowel and not space
[0-9a-f] A hexadecimal digit (from 0 to 9, or from A to F)

A character class matches one character of the specified type (letter, digit, etc.):

\d   A digit
\s Space, tab or line separator
\w A letter

Dot (.) matches any character.

Repetitions * + ?

The star * matches the previous element zero or more time, the plus + matches it one or more times, and ? matches it zero or one times. Use *? and +? for non-greedy matching. Braces {min,max} match the previous element from min to max times.

0* None or several zeros
\w+ A word (one or more letters)
(http://)? Optional http:// string
<b>(.*?)</b> <b> tag
\d{4} Exactly four digits
(abc){1,4} “abc” repeated from one to four times
[a-z]{3,}? At least three letters (non-greedy)

Anchors ^ and $

Caret ^ matches at the beginning of the line, dollar sign $ matches at the end of the line:

^[0-9] Digit at the beginning of a line
\\$ Backslash at the end of a line

Alternative |

(http|ftp|https):// Any of the these protocols: http:// or ftp:// or https://

Extension syntax

Some advanced features are available through extension syntax: opening parenthesis, question mark, and then 1-2 characters that determine the meaning of the element:

(?=abc) Check that the next characters are “abc” (lookahead)
(?:abc)+ “abc” repeated one or more times (without capturing a subexpression)

Backreferences \1

Parentheses mark subexpressions, which are numbered from left to right. You can refer to previously defined subexpressions with \1, \2, etc. syntax:

(\d+) \1 Find two equal numbers with space between them (e.g., 123 123)

Text and metacharacters

If you want to search for a word or a number, you could enter them as is. For example, to search for cocoa, just type this word.

The metacharacters are:

[ ] * + ? { } . ( ) ^ $ | \

If you want to find one of metacharacters, type \ before it. For example, to find the text $100, use \$100. If you want to find the backslash itself, repeat it: \\.