Anonymizing a dataset by replacing names with counters
11 Jan 2025
Sometimes, you need to remove personal data from a dataset, such as when preparing examples or unit tests. With Aba Search and Replace, you can mask names, addresses, and other personally identifiable information by replacing them with counters.
Let's use the following CSV file with information about Alice in Wonderland characters as an example:
Name,Address,Favorite Color Alice,Near the Rabbit Hole,Blue Mad Hatter,Tea Party Garden,Orange White Rabbit,Rabbit Hole,White Queen of Hearts,Hearts Castle,Red Cheshire Cat,Forest Tree Hollow,Purple Caterpillar,Mushroom Grove,Green Tweedledee,Looking Glass Land,Yellow Tweedledum,Looking Glass Land,Yellow March Hare,Mad Tea Party Estate,Brown Dormouse,Tea Party Garden,Gray
You want to remove real names and addresses from this file. A common approach would be to write a script that opens the file, reads each line, replaces the first two fields with counters, and then prints the result. However, it's easier to do the same task with Aba Search and Replace. You don't have to write boilerplate code for file reading, and you can immediately preview the replacement results.
We'll use the following regular expression to match the first two columns in the CSV file while skipping the headers:
(?<=\n)(\N+?),(\N+?),
Here's how it works: first, we check that a newline \n
is found before the match using a lookbehind assertion, which allows us to skip the headers (the first line). Next, we match two fields separated with commas.
We would like to replace the names (Alice, Mad Hatter, White Rabbit, etc.) with a counter like person1
, person2
, person3
, etc. Aba provides functions for inserting counters; Aba.matchNo
works well for this case:

For the address field, we don't want to use the same sequence (1, 2, 3), so let's do some math with the counter in order to start from 77 and decrement each street number by 3. The replacement expression becomes:
person\{ Aba.matchNo() },\{ 80 - Aba.matchNo() * 3 } Wonderland Drive,
Note that proper anonymization is more complex than this. In our example, it's still possible to identify some characters after the replacement. For example, White Rabbit predictably likes white, Queen of Hearts likes red ❤️, and the twins (Tweedledee and Tweedledum) share the same favorite color, yellow. So the anonymization process won't meet GDPR requirements and you need further manual edits to remove or randomize such cases, but the replacement is a good first step for removing sensitive information.
Replacing text in several files used to be a tedious and error-prone task. Aba Search and Replace solves the problem, allowing you to correct errors on your web pages, replace banners and copyright notices, change method names, and perform other text-processing tasks.
This is a blog about Aba Search and Replace, a tool for replacing text in multiple files.
- Aba 2.8 released
- Anonymizing a dataset by replacing names with counters
- Automatically add width and height to img tags
- Using zero-width assertions in regular expressions
- Aba 2.7 released
- Regular Expressions 101
- 2023 in review
- Regular expression for numbers
- Aba 2.6 released
- Search from the Windows command prompt
- Empty character class in JavaScript regexes
- Privacy Policy Update - December 2022
- Aba 2.5 released
- Our response to the war in Ukraine
- Check VAT ID with regular expressions and VIES
- Which special characters must be escaped in regular expressions?
- Aba 2.4 released
- Privacy Policy Update - April 2021
- Review of Aba Search and Replace with video
- Aba 2.2 released
- Discount on Aba Search and Replace
- Using search and replace to rename a method
- Cleaning the output of a converter
- Aba 2.1 released
- How to replace HTML tags using regular expressions
- Video trailer for Aba
- Aba 2.0 released