Replace only the Nth match
10 May 2025
Here is another practical task. We need to replace only the second or following match in each file.
For example, we want to insert <a name="...">
tags before the second and any subsequent headers in each HTML file. There are multiple headers in each file:

And we want to insert a name to be able to refer to each header. The final result should look like this (the added tag is bold):
<a name="international_search_test"><h2>International search test</h2>
We can solve this task with a lookbehind:

(?<=<h2>.*?)<h2>
This regular expression matches <h2>
, but only if there is another <h2>
before it.
With Aba Search and Replace, there is a more flexible way to do this. You can match all <h2>
tags, but change only the second, the third, etc. tags leaving the first tag intact. The pattern is simple:
<h2>(.*?)</h2>
But in the replacement, we check if the match number is equal to one:
\( if Aba.matchNoInFile() == 1 { \0 } else { '<a name="' \1.replace(' ', '_').toLower() '">' \0 } )
If yes, we return the whole match \0
without any change. If not, we add <a name="...">
before it. We also use the toLower function to convert the name to lowercase and the replace function to replace spaces with underscores.

You can easily modify this one-liner to replace the first three tags in each file only, or replace every second tag, which is more complicated with a lookbehind.
Aba 2.8 released
9 Mar 2025
The new version goes beyond just search and replace; it allows you to convert text and images to and from Base64, encode and decode HTML entities like <
, encode and decode percent-encoding (also known as URL encoding), decode JSON Web Tokens, and convert Unix/JavaScript timestamps to dates and vice versa.


Other new features and fixes include:
- Fixed: a long file name was cut in the combobox (many thanks to Duane who reported the bug).
- An option to disable the dot matches newline mode (the \s modifier; thanks to Helmet for the idea).
- Line numbers in the text viewer (thanks to Yonatan).
- An option to preserve the files' date/time (thanks to Duane).
- Syntax highlighting for Scala and Go.
- Ctrl+1, Ctrl+2, etc. to switch between the tabs.
- Updated translations.
- Fixed: when reading from disk failed, Aba crashed (mostly applicable to old HDDs and network drives).
- Fixed: incorrect cursor position for Chinese characters.
- Fixed: excessive completion sound when the search pattern was changed.
- Fixed: in the Undo > In files field, the vertical scrollbar was displayed for long directory names.
- Fixed: in some cases, a previous file name remained in the status bar when the search did not find anything.
Just as always, the upgrade is free for the registered users.
Anonymizing a dataset by replacing names with counters
11 Jan 2025
Sometimes, you need to remove personal data from a dataset, such as when preparing examples or unit tests. With Aba Search and Replace, you can mask names, addresses, and other personally identifiable information by replacing them with counters.
Let's use the following CSV file with information about Alice in Wonderland characters as an example:
Name,Address,Favorite Color Alice,Near the Rabbit Hole,Blue Mad Hatter,Tea Party Garden,Orange White Rabbit,Rabbit Hole,White Queen of Hearts,Hearts Castle,Red Cheshire Cat,Forest Tree Hollow,Purple Caterpillar,Mushroom Grove,Green Tweedledee,Looking Glass Land,Yellow Tweedledum,Looking Glass Land,Yellow March Hare,Mad Tea Party Estate,Brown Dormouse,Tea Party Garden,Gray
You want to remove real names and addresses from this file. A common approach would be to write a script that opens the file, reads each line, replaces the first two fields with counters, and then prints the result. However, it's easier to do the same task with Aba Search and Replace. You don't have to write boilerplate code for file reading, and you can immediately preview the replacement results.
We'll use the following regular expression to match the first two columns in the CSV file while skipping the headers:
(?<=\n)(\N+?),(\N+?),
Here's how it works: first, we check that a newline \n
is found before the match using a lookbehind assertion, which allows us to skip the headers (the first line). Next, we match two fields separated with commas.
We would like to replace the names (Alice, Mad Hatter, White Rabbit, etc.) with a counter like person1
, person2
, person3
, etc. Aba provides functions for inserting counters; Aba.matchNo
works well for this case:

For the address field, we don't want to use the same sequence (1, 2, 3), so let's do some math with the counter in order to start from 77 and decrement each street number by 3. The replacement expression becomes:
person\{ Aba.matchNo() },\{ 80 - Aba.matchNo() * 3 } Wonderland Drive,
Note that proper anonymization is more complex than this. In our example, it's still possible to identify some characters after the replacement. For example, White Rabbit predictably likes white, Queen of Hearts likes red ❤️, and the twins (Tweedledee and Tweedledum) share the same favorite color, yellow. So the anonymization process won't meet GDPR requirements and you need further manual edits to remove or randomize such cases, but the replacement is a good first step for removing sensitive information.
Automatically add width and height to img tags
14 Jul 2024
If you set the width and height attributes for your img
tags, the browser can allocate the correct amount of space for the image before loading it. This prevents content below the image from shifting around as the page loads. The layout becomes stable, which means that:
- your users won’t accidentally click a wrong button because of layout shift;
- the performance is better because the browser doesn’t have to recalculate the layout as the images load;
- page load feels smoother and faster.
That’s why Google recommends setting the width and height attributes in your HTML code.
If you have a lot of images, it may take some time to specify their dimensions. With Aba Search and Replace, you can do it automatically.
The typical case

Please use this search pattern to capture the image file name in the first subexpression:
<img src="([^"]+)"
The [^"]+
regex matches everything except for the closing quotation mark and parentheses mark the first subexpression.
If you have absolute paths like <img src="/images/someImage.png">
in your HTML code, use the following replacement:
\0 \{ File(Aba.searchPath() \1).meta('ImgTag') }
Here, we insert the whole match \0
, which is the img
tag and its src
attribute. Then, we insert width and height via the meta function. The Aba.searchPath() function returns the directory that you selected for the search, then the image filename \1
is added to it.
Relative paths

If your paths are relative to the html files (e.g., <img src="someImage.png">
or <img src="../banner.png">
), then use a simpler replacement:
\0 \{ File(\1).meta('ImgTag') }
Replacing existing width and height attributes
If you have existing width and height attributes and you want to replace them, the regex becomes more complex. For example, if the width and height always follow the src attribute:
<img src="([^"]+)" width="\d+" height="\d+"
And the replacement should be:
<img src="\1" \{File(Aba.searchPath() \1).meta('ImgTag')}

Matching tags without existing width and height attributes
More often, you need to skip the tags that already have the width or the height attribute. Our previous regular expression also has these disadvantages:
- If another attributes like alt precedes src, the regex won't match.
- If there are line breaks or multiple spaces between
<img
andsrc
, the regex won't match. - You need to choose between absolute paths and relative paths manually.
- If the file name contains encoded spaces, for example,
Fancy%20logo.png
, Aba won't be able to open the image file.
The following regular expression fixes these problems:
<img\s+(?:alt="[^"]*?"\s+)?src="([^"]+)"(?!\s+width|\s+height)
And the replacement should be:
\0 \{ File( if \1[0] == '/' { Aba.searchPath() } else {''} \1.decodeUrl()).meta('ImgTag') }
We use alt="[^"]*?"
to match an optional alt attribute. If you use other attributes, you can add them here. Instead of spaces, we use \s+
to match any number of spaces or line breaks. The regular expression includes a negative lookhead (?!\s+width|\s+height)
, so it skips the tags that already have width or height attributes.
The replacement checks if the first character of the file name is a slash /
; if yes, it uses the absolute path. Finally, the decodeUrl function replaces %20
with spaces.
This regular expression works in most cases and it's included into favorites by default. Note that regex matching is textual, so the program does not really understand HTML. You may need to modify the regular expression to match your specific case.
Conclusion
You can preview the replacements and check that the img
tags are matched correctly. If Aba cannot find an image file, it will display an error message with the src attribute and the HTML filename. Then, just press the Replace button and test the result in your browser. If anything goes wrong, you can always undo the replacement.
Aba can help you to ensure that all of your pages use width and height attributes, which improves performance, prevents layout shifts, and makes your website more visually appealing for the users.
Using zero-width assertions in regular expressions
30 Jun 2024
Anchors ^ $ \b \A \Z
Anchors in regular expressions allow you to specify context in a string where your pattern should be matched. There are several types of anchors:
^
matches the start of a line (in multiline mode) or the start of the string (by default).$
matches the end of a line (in multiline mode) or the end of the string (by default).\A
matches the start of the string.\Z
or\z
matches the end of the string.\b
matches a word boundary (before the first letter of a word or after the last letter of a word).\B
matches a position that is not a word boundary (between two letters or between two non-letter characters).
These anchors are supported in Java, PHP, Python, Ruby, C#, and Go. In JavaScript, \A
and \Z
are not supported, but you can use ^
and $
instead of them; just remember to keep the multiline mode disabled. Aba Search and Replace always runs in multiline mode, so you can use \A
and \Z
to match the beginning or the end of a file.
For example, the regular expression ^abc
will match the start of a string that contains the letters "abc". In multiline mode, the same regex will match these letters at the beginning of a line. You can use anchors in combination with other regular expression elements to create more complex matches. For example, ^From: (.*)
matches a line starting with From:
The difference between \Z
and \z
is that \Z
matches at the end of the string but also skips a possible newline character at the end. In contrast, \z
is more strict and matches only at the end of the string.
If you have read the previous part of this article, you may wonder if the anchors add any additional capabilities that are not supported by the three primitives (alternation, parentheses, and the star for repetition). The answer is: they do not, but they change what is captured by the regular expression. You can match a line starting with abc
by explicitly adding the newline character: \nabc
, but in this case, you will also match the newline character itself. When you use ^abc
, the newline character is not consumed.
In a similar way, ing\b
matches all words ending with ing. You can replace the anchor with a character class containing non-letter characters (such as spaces or punctuation): ing\W
, but in this case, the regular expression will also consume the space or punctuation character.
If the regular expression starts with ^
so that it only matches at the start of the string, it's called anchored. In some programming languages, you can do an anchored match instead of the non-anchored search without using ^
. For example, in PHP (PCRE), you can use the A
modifier.
So the anchors don't add any new capabilities to the regular expressions, but they allow you to manage which characters will be included into the match or to match only at the beginning or end of the string. The matched language is still regular.
Zero-width assertions (?= ) (?! ) (?<= ) (?<! )
Zero-width assertions (also called lookahead and lookbehind assertions) allow you to check that a pattern occurs in the subject string without capturing any of the characters. This can be useful when you want to check for a pattern without moving the match pointer forward. For example, you can test that the next characters are abc
without consuming them: (?=abc)
.
Zero-width assertions are generalized anchors. Just like anchors, they don't consume any character from the input string. Unlike anchors, they allow you to check anything, not only line boundaries or word boundaries. So you can replace an anchor with a zero-width assertion, but not vice versa. For example, ing\b
could be rewritten as ing(?=\W|$)
.
Aba documentation includes a detailed article on zero-width assertions (lookaround) and their typical usage, so we won't repeat it here. Zero-width lookahead and lookbehind are supported in PHP, JavaScript, Python, Java, and Ruby. Unfortunately, they are not supported in Go.
Just like anchors, zero-width assertions still match a regular language, so from a theoretical point of view, they don't add anything new to the capabilities of regular expressions. They just make it possible to skip certain things from the captured string, so you only check for their presence but don't consume them.
Aba 2.7 released
12 May 2024
In the new version, Aba got a UI facelift and dark mode. Several critical bugs were fixed in this release, so it's recommended for everyone to install. The changes are:
- Dark mode.
- A larger, more modern UI font (Segoe UI).
- Syntax highlight for Java, C#, SQL, and Pascal.
- Drag and drop into the main window.
- Autocomplete in the path combobox.
- Allow to use a file name in double quotes.
- Fixed 13 bugs including 6 critical ones.

Just as always, the upgrade is free for the registered users.
Regular Expressions 101
28 Jan 2024
With regular expressions, you can describe the patterns that are similar to each other. For example, you have multiple <img>
tags, and you want to move all these images to the images
folder:
<img src="9.png"> → <img src="images/9.png"> <img src="10.png"> → <img src="images/10.png"> and so on
You can easily write a regular expression that matches all file names that are numbers, then replace all such tags at once.
Basic syntax
If you need to match one of the alternatives, use an alternation (vertical bar). For example:
Regex | Meaning |
a|img|h1|h2 | either a , or img , or h1 , or h2 |
When using alternation, you often need to group characters together; you can do this with parentheses. For example, if you want to match an HTML tag, this approach won't work:
Regex | Meaning |
<h1|h2|b|i> | <h1 or h2 (without the angle brackets) or b or i> |
because <
applies to the first alternative only and >
applies to the last one only. To apply the angle brackets to all alternatives, you need to group the alternatives together:
<(h1|h2|b|i)>
The last primitive (star) allows you to repeat anything zero or more times. You can apply it to one character, for example:
Regex | Meaning |
a* | an empty string, a , aa , aaa , aaaa , etc. |
You also can apply it to multiple characters in parentheses:
Regex | Meaning |
(ab)* | an empty string, ab , abab , ababab , abababab , etc. |
Note that if you remove the parentheses, the star will apply to the last character only:
Regex | Meaning |
ab* | an empty string, ab , abb , abbb , abbbb , etc. |

Author: Konrad Jacobs. Source: Archives of the Mathematisches Forschungsinstitut Oberwolfach.
The star is named Kleene star after an American mathematician Stephen Kleene who invented regular expressions in the 1950s. It can match an empty string as well as any number of repetitions.
These three primitives (alternation, parentheses, and the star for repetition) are enough to write any regular expression, but the syntax may be verbose. For example, you now can write a regex for matching the file names that are numbers in an <img>
tag:
Regex | Meaning |
(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)* | one or more digits |
(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)* | a positive integer number (don't allow zero as the first character) |
The parentheses may be nested without a limit, for example:
Regex | Meaning |
(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*(,(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*)* | one or more positive integer numbers, separated with commas |
Convenient shortcuts for character classes
You can write any regex with the three primitives, but it quickly becomes hard to read, so a few shortcuts were invented. When you need to match any of the listed characters, please put them into square brackets:
Regex | Shorter regex | Meaning |
a|e|i|o|u|y | [aeiouy] | a vowel |
0|1|2|3|4|5|6|7|8|9 | [0123456789] | a digit |
0|1|2|3|4|5|6|7|8|9 | [0-9] | a digit |
a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z | [a-z] | a letter |
As you can see, it's possible to specify only the first and the last allowed character if you put a dash between them. There may be several such ranges inside square brackets:
Regex | Meaning |
[a-z0-9] | a letter or a digit |
[a-z0-9_] | a letter, a digit, or the underscore character |
[a-f0-9] | a hexadecimal digit |
There are some predefined character classes that are even shorter to write:
Regex | Meaning |
\s | a space character: the space, the tab character, the new line, or the carriage feed |
\d | a digit |
\w | a word character (a letter, a digits, or the underscore character) |
. | any character |
In Aba Search and Replace, these character classes include Unicode characters such as accented letters or Unicode line breaks. In other regex dialects, they usually include ASCII characters only, so \d
is typically the same as [0-9]
and \w
is the same as [a-zA-Z0-9_]
.
The character classes don't add any new capabilities to the regular expressions; you can just list all allowed characters with an alternation, but a character class is much shorter to write. We now can write a shorter version of the regex mentioned before:
Regex | Meaning |
[1-9][0-9]*(,[1-9][0-9])* | one or more positive integer numbers, separated with commas |
Repetitions
A Kleene star means "repeating zero or more times", but you often need another number of repetitions. As shown before, you can just copy-and-paste a regex to repeat it twice or three times, but there is a shorter notation for that:
Regex | Shorter regex | Meaning |
\d\d* | \d+ | one or more digits |
(0|1)(0|1)* | [01]+ | any binary number (consisting of zeros and ones) |
(\s|) | \s? | either a space character or nothing |
http(s|) | https? | either http or https |
(-|\+|) | [-+]? | the minus sign, the plus sign, or nothing |
[a-z][a-z] | [a-z]{2} | two small letters |
[a-z][a-z]((([a-z]|)[a-z]|)[a-z]|) | [a-z]{2,5} | from two to five small letters |
[a-z][a-z][a-z]* | [a-z]{2,} | two or more small letters |
So there are the following repetition operators:
- a Kleene star
*
means repeating zero or more times, so it can never match, it can match once, twice, three times, etc.; - a plus sign
+
means repeating one or more times, so it must match at least once; - an optional part
?
means zero times or once; - curly brackets
{m,n}
means repeating from m to n times.
Note that you can express any repetition with the curly brackets, so these operators partially duplicate each other. For example:
Regex | Shorter regex | Meaning |
\d{0,} | \d* | nothing or some digits |
\d{1,} | \d+ | one or more digits |
\s{0,1} | \s? | either a space character or nothing |
Just like the Kleene star, the other repetition operators can apply to parentheses, so you can nest them indefinitely.
Escaping
If you need to match any of the special characters like parentheses, vertical bar, plus, or star, you must escape them by adding a backslash \
before them. For example, to find a number in parentheses, use \(\d+\)
.
A common mistake is to forget a backslash before a dot. Note that a dot means any character, so if you write example.com
in a regular expression, it will match examplexcom
or something similar, which may even cause a security issue in your program. Now we can write a regex to match the <img>
tags:
<img src="\d+\.png">
This matches any filename consisting of digits and we correctly escaped the dot.
Other features
Modern regex engines add more features such as backreferences or conditional subpatterns. Mathematically speaking, these features don't belong to the regular expressions; they describe a non-regular language, so you cannot replace them with the three primitives.
Next time, we will discuss anchors and zero-width assertions.
2023 in review
14 Jan 2024
In 2023, I continued to support Ukraine and donated more than 50% of the revenue from Aba Search and Replace to the charities helping Ukrainians in need. I will keep donating this year.
Released in December, Aba 2.6 is the first version that requires Windows Vista. The previous versions were tested on Windows XP, which remained popular for a long time after its release. Unfortunately, it became increasingly hard to maintain the Windows XP compatibility code and it limited the further development, so I had to say goodbye to Windows 2000/XP. Please contact me if it creates any problem for you; I always listen to your feedback and can send you the previous version.
In January 2023, Microsoft certified Aba Search and Replace for publication to the Microsoft Store. The new version 2.6 was also approved a few days ago, so you can download it from the Microsoft Store as well as from this website.
Thanks to Richard, Aba is also available in French . If you are a native speaker of Spanish , German , or Italian and you can translate the 17 messages that were added in the recent version, please contact me. Feel free to use Google Translate or ChatGPT, then review and edit the automatic translation. Thank you so much.
The blog post about escaping in regular expressions is still the most popular on this blog. In April, I wrote a followup about empty character classes, which was also well-received.
The new Aba version remains lean and fast. No huge runtime libraries, no cluttered UIs or bloatware. Stay tuned for the next versions!
Regular expression for numbers
30 Dec 2023
It's easy to find a positive integer number with regular expressions:
[0-9]+
This regex means digits from 0 to 9, repeated one or more times. However, numbers starting with zero are treated as octal in many programming languages, so you may wish to avoid matching them:
[1-9][0-9]*
This regular expression matches any positive integer number starting with a non-zero digit. If you also need to match zero, you can include it as another branch:
[1-9][0-9]*|0
To also accomodate negative integer numbers, you can allow a minus sign before the digits:
-?[1-9][0-9]*|0
Sometimes it's necessary to allow a plus sign as well:
[-+]?[1-9][0-9]*|0
The previous regexes searched the input string for a number. If you need to match a number only discarding anything else, you can add the ^
anchor to match the beginning of the string and the $
anchor to match the end:
^(-?[1-9][0-9]*|0)$
Parentheses are necessary here; without them, the ^
anchor would apply only to the first branch. Another variation of the same regex avoids finding numbers that are part of words, such as 600px
or x64
:
\b(-?[1-9][0-9]*|0)\b
Things get more complicated if you need to match a fractional number:
\b-?(?:[1-9][0-9]*(?:\.[0-9]+)?|\.[0-9]+|0)\b
Let's break down this regular expression:
- The first branch
[1-9][0-9]*(?:\.[0-9]+)?
matches an integer number starting with a non-zero digit, then an optional fractional part. - The second branch
\.[0-9]+
matches fractional numbers starting with a dot, for example,.5
is another way to write0.5
. - The third branch matches zero. Note that both positive and negative zeros are possible in floating-point numbers.
For floating-point numbers with an exponent, such as 5.2777e+231
, please use:
\b-?(?:[1-9][0-9]*(?:\.[0-9]+)?|\.[0-9]+|0)(?:[eE][+-]?[0-9]+)?\b
Many programming languages support hexadecimal numbers starting with 0x
. Here is a regular expression to match them:
0x[0-9a-fA-F]+
Finally, here is a comprehensive regular expression to match floating-point, integer decimal, or hexadecimal numbers:
\b-?(?:[1-9][0-9]*(?:\.[0-9]+)?|\.[0-9]+|0(?:x[0-9a-fA-F]+)?)(?:[eE][+-]?[0-9]+)?\b
Aba 2.6 released
25 Dec 2023
This version adds the following features:
- complex replacements including converting the matching text to lowercase, inserting the file name, or adding width/height attributes to <img> tags (now you can use a simple scripting language in the replacements);
- a 64-bit version (if needed, you still can choose a 32-bit version during installation);
- a new hotkey: the left/right arrow key to quickly jump to the next/previous file (when the results pane is focused);
- the taskbar button now flashes when a long operation is complete;
- basic support for emojis (ZWJ sequences and skin tones are displayed as separate characters).
Just as always, the upgrade is free for the registered users; your settings and search history will be preserved when you run the installer.
If you have any suggestions for new features, please contact me. I will be happy to implement your ideas.
This is a blog about Aba Search and Replace, a tool for replacing text in multiple files.
- Replace only the Nth match
- Aba 2.8 released
- Anonymizing a dataset by replacing names with counters
- Automatically add width and height to img tags
- Using zero-width assertions in regular expressions
- Aba 2.7 released
- Regular Expressions 101
- 2023 in review
- Regular expression for numbers
- Aba 2.6 released
- Search from the Windows command prompt
- Empty character class in JavaScript regexes
- Privacy Policy Update - December 2022
- Aba 2.5 released
- Our response to the war in Ukraine
- Check VAT ID with regular expressions and VIES
- Which special characters must be escaped in regular expressions?
- Aba 2.4 released
- Privacy Policy Update - April 2021
- Review of Aba Search and Replace with video
- Aba 2.2 released
- Discount on Aba Search and Replace
- Using search and replace to rename a method
- Cleaning the output of a converter
- Aba 2.1 released
- How to replace HTML tags using regular expressions
- Video trailer for Aba
- Aba 2.0 released