Aba Search and Replace bloghttps://www.abareplace.com/blog/Search tips and tricks, regular expression tutorials, announcements about new versions of Aba Search and Replace.1440Aba 2.7 released<p>In the new version, Aba got a UI facelift and dark mode. Several critical bugs were fixed in this release, so it's recommended for everyone to install. The changes are:</p> <ul> <li>Dark mode.</li> <li>A larger, more modern UI font (Segoe UI).</li> <li>Syntax highlight for Java, C#, SQL, and Pascal.</li> <li>Drag and drop into the main window.</li> <li>Autocomplete in the path combobox.</li> <li>Allow to use a file name in double quotes.</li> <li>Fixed 13 bugs including 6 critical ones.</li> </ul> <img src="/aba27_2.png" alt="Dark mode" width="625" height="483"> <p>Just as always, <b>the upgrade is free</b> for the registered users.</p>Sun, 12 May 2024 17:54:00 +0200https://www.abareplace.com/blog/aba27/Regular Expressions 101<p>With regular expressions, you can describe the patterns that are similar to each other. For example, you have multiple <code>&lt;img&gt;</code> tags, and you want to move all these images to the <code>images</code> folder:</p> <pre> &lt;img src="9.png"&gt; &#x2192; &lt;img src="images/9.png"&gt; &lt;img src="10.png"&gt; &#x2192; &lt;img src="images/10.png"&gt; and so on </pre> <p>You can easily write a regular expression that matches all file names that are numbers, then replace all such tags at once.</p> <h3>Basic syntax</h3> <p>If you need to match <b>one of the alternatives,</b> use an alternation (vertical bar). For example:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>a|img|h1|h2</code></td><td class="c">either <code>a</code>, or <code>img</code>, or <code>h1</code>, or <code>h2</code></td></tr> </table> <p>When using alternation, you often need to <b>group</b> characters together; you can do this with parentheses. For example, if you want to match an HTML tag, this approach won't work:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>&lt;h1|h2|b|i&gt;</code></td><td class="c"><code>&lt;h1</code> or <code>h2</code> (without the angle brackets) or <code>b</code> or <code>i&gt;</code></td></tr> </table> <p>because <code>&lt;</code> applies to the first alternative only and <code>&gt;</code> applies to the last one only. To apply the angle brackets to all alternatives, you need to group the alternatives together:</p> <pre> &lt;(h1|h2|b|i)&gt; </pre> <p>The last primitive (star) allows you to <b>repeat</b> anything zero or more times. You can apply it to one character, for example:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>a*</code></td><td class="c">an empty string, <code>a</code>, <code>aa</code>, <code>aaa</code>, <code>aaaa</code>, etc.</td></tr> </table> <p>You also can apply it to multiple characters in parentheses:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>(ab)*</code></td><td class="c">an empty string, <code>ab</code>, <code>abab</code>, <code>ababab</code>, <code>abababab</code>, etc.</td></tr> </table> <p>Note that if you remove the parentheses, the star will apply to the last character only:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>ab*</code></td><td class="c">an empty string, <code>ab</code>, <code>abb</code>, <code>abbb</code>, <code>abbbb</code>, etc.</td></td></tr> </table> <figure> <img src="/Stephen_Kleene.jpg" width="281" height="400" alt="A portrait of Stephen Cole Kleene, the regular expression inventor" title=""> <figcaption>Stephen Kleene (1909-1994), the regular expression inventor.<br/>Author: Konrad Jacobs. Source: Archives of the Mathematisches Forschungsinstitut Oberwolfach.</figcaption> </figure> <p>The star is named <b>Kleene star</b> after an American mathematician <a href="https://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</a> who invented regular expressions in the 1950s. It can match an empty string as well as any number of repetitions.</p> <p>These <b>three primitives</b> (alternation, parentheses, and the star for repetition) are enough to write any regular expression, but the syntax may be verbose. For example, you now can write a regex for matching the file names that are numbers in an <code>&lt;img&gt;</code> tag:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>(0|1|2|3|4&#x200b;|5|6|7|8|9)(0|1|2|3|4&#x200b;|5|6|7|8|9)*</code></td><td class="c">one or more digits</td></tr> <tr><td class="r"><code>(1|2|3|4|5&#x200b;|6|7|8|9)(0|1|2|3|4&#x200b;|5|6|7|8|9)*</code></td><td class="c">a positive integer number (don't allow zero as the first character)</td></tr> </table> <p>The parentheses may be nested without a limit, for example:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>(1|2|3|4&#x200b;|5|6|7|8|9)(0|1|2|3|4&#x200b;|5|6|7|8|9)*(,(1|2|3|4&#x200b;|5|6|7|8|9)(0|1|2|3|4&#x200b;|5|6|7|8|9)*)*</code></td><td class="c">one or more positive integer numbers, separated with commas</td></tr> </table> <h3>Convenient shortcuts for character classes</h3> <p>You can write any regex with the three primitives, but it quickly becomes hard to read, so a few shortcuts were invented. When you need to match <b>any of the listed characters,</b> please put them into square brackets:</p> <table class="example"> <thead><tr><td>Regex</td><td>Shorter regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>a|e|i|o|u|y</code></td><td class="r"><code>[aeiouy]</code></td><td class="c">a vowel</td></tr> <tr><td class="r"><code>0|1|2|3|4&#x200b;|5|6|7|8|9</code></td><td class="r"><code>[0123456789]</code></td><td class="c">a digit</td></tr> <tr><td class="r"><code>0|1|2|3|4&#x200b;|5|6|7|8|9</code></td><td class="r"><code>[0-9]</code></td><td class="c">a digit</td></tr> <tr><td class="r"><code>a|b|c|d|e&#x200b;|f|g|h|i|j&#x200b;|k|l|m|n&#x200b;|o|p|q|r&#x200b;|s|t|u|v&#x200b;|w|x|y|z</code></td><td class="r"><code>[a-z]</code></td><td class="c">a letter</td></tr> </table> <p>As you can see, it's possible to specify only the first and the last allowed character if you put <b>a dash</b> between them. There may be several such ranges inside square brackets:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>[a-z0-9]</code></td><td class="c">a letter or a digit</td></tr> <tr><td class="r"><code>[a-z0-9_]</code></td><td class="c">a letter, a digit, or the underscore character</td></tr> <tr><td class="r"><code>[a-f0-9]</code></td><td class="c">a hexadecimal digit</td></tr> </table> <p>There are some <b>predefined character classes</b> that are even shorter to write:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>\s</code></td><td class="c">a space character: the space, the tab character, the new line, or the carriage feed</td></tr> <tr><td class="r"><code>\d</code></td><td class="c">a digit</td></tr> <tr><td class="r"><code>\w</code></td><td class="c">a word character (a letter, a digits, or the underscore character)</td></tr> <tr><td class="r"><code>.</code></td><td class="c">any character</td></tr> </table> <p>In Aba Search and Replace, these <a href="/docs/charListClass.php">character classes</a> include Unicode characters such as accented letters or Unicode line breaks. In other regex dialects, they usually include ASCII characters only, so <code>\d</code> is typically the same as <code>[0-9]</code> and <code>\w</code> is the same as <code>[a-zA-Z0-9_]</code>.</p> <p>The character classes don't add any new capabilities to the regular expressions; you can just list all allowed characters with an alternation, but a character class is much shorter to write. We now can write a shorter version of the regex mentioned before:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>[1-9][0-9]*(,[1-9][0-9])*</code></td><td class="c">one or more positive integer numbers, separated with commas</td></tr> </table> <h3>Repetitions</h3> <p>A Kleene star means "repeating zero or more times", but you often need another number of repetitions. As shown before, you can just copy-and-paste a regex to repeat it twice or three times, but there is a shorter notation for that:</p> <table class="example"> <thead><tr><td>Regex</td><td>Shorter regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>\d\d*</code></td><td class="r"><code>\d+</code></td><td class="c">one or more digits</td></tr> <tr><td class="r"><code>(0|1)(0|1)*</code></td><td class="r"><code>[01]+</code></td><td class="c">any binary number (consisting of zeros and ones)</td></tr> <tr><td class="r"><code>(\s|)</code></td><td class="r"><code>\s?</code></td><td class="c">either a space character or nothing</td></tr> <tr><td class="r"><code>http(s|)</code></td><td class="r"><code>https?</code></td><td class="c">either <code>http</code> or <code>https</code></td></tr> <tr><td class="r"><code>(-|\+|)</code></td><td class="r"><code>[-+]?</code></td><td class="c">the minus sign, the plus sign, or nothing</td></tr> <tr><td class="r"><code>[a-z][a-z]</code></td><td class="r"><code>[a-z]{2}</code></td><td class="c">two small letters</td></tr> <tr><td class="r"><code>[a-z][a-z]((([a-z]|)[a-z]|)[a-z]|)</code></td><td class="r"><code>[a-z]{2,5}</code></td><td class="c">from two to five small letters</td></tr> <tr><td class="r"><code>[a-z][a-z][a-z]*</code></td><td class="r"><code>[a-z]{2,}</code></td><td class="c">two or more small letters</td></tr> </table> <p>So there are the following <a href="/docs/repetitions.php">repetition operators:</a></p> <ul> <li>a Kleene star <code>*</code> means repeating <b>zero or more times,</b> so it can never match, it can match once, twice, three times, etc.;</li> <li>a plus sign <code>+</code> means repeating <b>one or more times,</b> so it must match at least once;</li> <li>an optional part <code>?</code> means <b>zero times or once</b>;</li> <li>curly brackets <code>{m,n}</code> means repeating <b>from m to n times</b>.</li> </ul> <p>Note that you can express any repetition with the curly brackets, so these operators partially duplicate each other. For example:</p> <table class="example"> <thead><tr><td>Regex</td><td>Shorter regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>\d{0,}</code></td><td class="r"><code>\d*</code></td><td class="c">nothing or some digits</td></tr> <tr><td class="r"><code>\d{1,}</code></td><td class="r"><code>\d+</code></td><td class="c">one or more digits</td></tr> <tr><td class="r"><code>\s{0,1}</code></td><td class="r"><code>\s?</code></td><td class="c">either a space character or nothing</td></tr> </table> <p>Just like the Kleene star, the other repetition operators can apply to parentheses, so you can nest them indefinitely.</p> <h3>Escaping</h3> <p>If you need to match any of <b>the special characters</b> like parentheses, vertical bar, plus, or star, you must <a href="/blog/escape-regexp/">escape them</a> by adding a backslash <code>\</code> before them. For example, to find a number in parentheses, use <code>\(\d+\)</code>.</p> <p>A common mistake is to forget a backslash before a dot. Note that a dot means any character, so if you write <code>example.com</code> in a regular expression, it will match <code>examplexcom</code> or something similar, which may even cause a security issue in your program. Now we can write a regex to match the <code>&lt;img&gt;</code> tags:</p> <pre> &lt;img src="\d+\.png"&gt; </pre> <p>This matches any filename consisting of digits and we correctly escaped the dot.</p> <h3>Other features</h3> <p>Modern regex engines add more features such as <b>backreferences</b> or conditional subpatterns. Mathematically speaking, these features don't belong to the regular expressions; they describe a non-regular language, so you cannot replace them with the three primitives.</p> <p>Next time, we will discuss anchors and zero-width assertions.</p> Sun, 28 Jan 2024 15:10:17 +0100https://www.abareplace.com/blog/regex101/2023 in review<p>In 2023, I <a href="/blog/ukraine/">continued</a> to <b>support Ukraine</b> and donated more than 50% of the revenue from Aba Search and Replace to the charities helping Ukrainians in need. I will keep donating this year.</p> <p>Released in December, Aba 2.6 is the first version that requires <b>Windows Vista.</b> The previous versions were tested on Windows XP, which remained popular for a long time after its release. Unfortunately, it became increasingly hard to maintain the Windows XP compatibility code and it limited the further development, so I had to say goodbye to Windows 2000/XP. Please <a href="/support/">contact me</a> if it creates any problem for you; I always listen to your feedback and can send you the previous version.</p> <p>In January 2023, Microsoft <a href="https://apps.microsoft.com/store/detail/XP89FXJL539HRG">certified</a> Aba Search and Replace for publication to the <b>Microsoft Store.</b> The new version 2.6 was also approved a few days ago, so you can <a href="/download/">download it</a> from the Microsoft Store as well as from this website.</p> <p>Thanks to Richard, Aba is also available in French <svg width="1em" viewBox="0 0 3 2"><path fill="#0055A4" d="M0 0h1v2H0z"/><path fill="#eee" d="M1 0h1v2H1z"/><path fill="#EF4135" d="M2 0h1v2H2z"/></svg>. If you are a native speaker of Spanish <svg viewBox="0 0 6 4" width="1em"><path fill="#AD1519" d="M0 0h6v4H0z"/><path fill="#FABD00" d="M0 1h6v2H0z"/></svg>, German <svg width="1em" viewBox="0 0 9 6"><path d="M0 0h9v2H0z"/><path fill="#f00" d="M0 2h9v2H0z"/><path fill="#fc0" d="M0 4h9v2H0z"/></svg>, or Italian <svg width="1em" viewBox="0 0 3 2"><path fill="#008C45" d="M0 0h1v2H0z"/><path fill="#eee" d="M1 0h1v2H1z"/><path fill="#CD212A" d="M2 0h1v2H2z"/></svg> and you can translate the 17 messages that were added in the recent version, please <a href="/support/">contact me.</a> Feel free to use Google Translate or ChatGPT, then review and edit the automatic translation. Thank you so much.</p> <p>The <b>blog post</b> about <a href="/blog/escape-regexp/">escaping in regular expressions</a> is still the most popular on this blog. In April, I wrote a followup about <a href="/blog/emptybrackets/">empty character classes</a>, which was also well-received.</p> <p>The new Aba version remains lean and fast. No huge runtime libraries, no cluttered UIs or bloatware. Stay tuned for the next versions!</p> Sun, 14 Jan 2024 12:27:50 +0100https://www.abareplace.com/blog/2023review/Regular expression for numbers<p>It's easy to find a positive integer number with regular expressions:</p> <pre>[0-9]+</pre> <p>This regex means digits from 0 to 9, repeated one or more times. However, <b>numbers starting with zero</b> are treated as octal in many programming languages, so you may wish to avoid matching them:</p> <pre>[1-9][0-9]*</pre> <p>This regular expression matches any positive integer number starting with a non-zero digit. If you also need to match zero, you can include it as another branch:</p> <pre>[1-9][0-9]*|0</pre> <p>To also accomodate <b>negative integer numbers,</b> you can allow a minus sign before the digits:</p> <pre>-?[1-9][0-9]*|0</pre> <p>Sometimes it's necessary to allow a plus sign as well:</p> <pre>[-+]?[1-9][0-9]*|0</pre> <p>The previous regexes searched the input string for a number. If you need to match <b>a number only</b> discarding anything else, you can add the <code>^</code> anchor to match the beginning of the string and the <code>$</code> anchor to match the end:</p> <pre>^(-?[1-9][0-9]*|0)$</pre> <p>Parentheses are necessary here; without them, the <code>^</code> anchor would apply only to the first branch. Another variation of the same regex avoids finding numbers that are part of words, such as <code>600px</code> or <code>x64</code>:</p> <pre>\b(-?[1-9][0-9]*|0)\b</pre> <p>Things get more complicated if you need to match <b>a fractional number</b>:</p> <pre>\b-?(?:[1-9][0-9]*(?:\.[0-9]+)?|\.[0-9]+|0)\b</pre> <p>Let's break down this regular expression:</p> <ul> <li>The first branch <code>[1-9][0-9]*(?:\.[0-9]+)?</code> matches an integer number starting with a non-zero digit, then an optional fractional part.</li> <li>The second branch <code>\.[0-9]+</code> matches fractional numbers starting with a dot, for example, <code>.5</code> is another way to write <code>0.5</code>.</li> <li>The third branch matches zero. Note that both positive and negative zeros are possible in floating-point numbers.</li> </ul> <p>For floating-point numbers with an exponent, such as <code>5.2777e+231</code>, please use:</p> <pre>\b-?(?:[1-9][0-9]*(?:\.[0-9]+)?|\.[0-9]+|0)(?:[eE][+-]?[0-9]+)?\b</pre> <p>Many programming languages support <b>hexadecimal numbers</b> starting with <code>0x</code>. Here is a regular expression to match them:</p> <pre>0x[0-9a-fA-F]+</pre> <p>Finally, here is a comprehensive regular expression to match floating-point, integer decimal, or hexadecimal numbers:</p> <pre>\b-?(?:[1-9][0-9]*(?:\.[0-9]+)?|\.[0-9]+|0(?:x[0-9a-fA-F]+)?)(?:[eE][+-]?[0-9]+)?\b</pre> Sat, 30 Dec 2023 18:13:28 +0100https://www.abareplace.com/blog/regex_numbers/Aba 2.6 released<p>This version adds the following features:</p> <ul> <li><a href="https://www.abareplace.com/docs/baoOverview.php">complex replacements</a> including converting the matching text to lowercase, inserting the file name, or adding width/height attributes to &lt;img&gt; tags (now you can use a simple scripting language in the replacements); </li> <li>a 64-bit version (if needed, you still can choose a 32-bit version during installation);</li> <li>a new <a href="https://www.abareplace.com/docs/hotkeys.php">hotkey:</a> the left/right arrow key to quickly jump to the next/previous file (when <a href="https://www.abareplace.com/docs/searchResults.php">the results pane</a> is focused);</li> <li>the taskbar button now flashes when a long operation is complete;</li> <li>basic support for emojis (ZWJ sequences and skin tones are displayed as separate characters).</li> </ul> <p>Just as always, <b>the upgrade is free</b> for the registered users; your settings and search history will be preserved when you run the installer.</p> <p>If you have any suggestions for new features, please <a href="/support/">contact me.</a> I will be happy to implement your ideas.</p>Mon, 25 Dec 2023 03:06:00 +0100https://www.abareplace.com/blog/aba26/Search from the Windows command prompt<p>When you need to search within text files from Windows batch files, you can use either the find or findstr command. Findstr supports a limited version of regular expressions. You can also automate certain tasks based on the search results.</p> <h3>The find command</h3> <p>To search for text in multiple files from the Windows command prompt or batch files, you can use the <b>FIND</b> command, which has been present since the days of MS DOS and is still available in Windows 11. It's similar to the Unix <code>grep</code> command, but does not support regular expressions. If you want to search for the word <code>borogoves</code> in the current directory, please follow this syntax:</p> <pre> find "borogoves" * </pre> <p>Note that the double quotes around the pattern are mandatory. If you are using PowerShell, you will need to include single quotes as well:</p> <pre> find '"borogoves"' * </pre> <p>Instead of the asterisk (<code>*</code>), you can specify a file mask such as <code>*.htm?</code>. The <code>find</code> command displays the names of the files it scans, even if it doesn't find any matches within these files:</p> <img src="/FindStr1.png" alt="The FIND command in Windows 11" title="" width="652" height="262"> <p>The search is <b>case-sensitive</b> by default, so you typically need to add the <code>/I</code> switch to treat uppercase and lowercase letters as equivalent:</p> <pre> find /I "&lt;a href=" *.htm </pre> <p>If you don't specify the file to search in, <code>find</code> will wait for the text input <b>from stdin,</b> so that you can pipe output from another command. For example, you can list all copy commands supported in Windows:</p> <pre> help | find /i "copy" </pre> <p>Another switch, <code>/V</code>, allows you to find all lines not containing the pattern, similar to the <code>grep -v</code> command.</p> <p>In <b>batch files,</b> you can use the fact that the <code>find</code> command sets the exit code (<b>errorlevel</b>) to 1 if the pattern is not found. For instance, you can check if the machine is running a 64-bit or 32-bit version of Windows:</p> <pre> @echo off rem Based on KB556009 with some corrections reg Query "HKLM\Hardware\Description\System\CentralProcessor\0" /v "Identifier" | find /i "x86 Family" &gt; nul if errorlevel 1 goto win64 echo 32-bit Windows goto :eof :win64 rem Could be AMD64 or ARM64 echo 64-bit Windows </pre> <h3>The findstr command: regular expression search</h3> <p>If you need to find <b>a regular expression,</b> try the <code>FINDSTR</code> command, which was introduced in Windows XP. <a href="https://devblogs.microsoft.com/oldnewthing/20151209-00/?p=92361">For historical reasons,</a> <code>findstr</code> supports a limited subset of regular expressions, so you can only use these <a href="https://www.abareplace.com/docs/regExprElements.php">regex features:</a></p> <ul> <li>The dot <code>.</code> matches any character except for newline and extended ASCII characters.</li> <li>Character lists <code>[abc]</code> match any of the specified characters (<code>a</code>, <code>b</code>, or <code>c</code>).</li> <li>Character list ranges <code>[a-z]</code> match any letter from <code>a</code> to <code>z</code>.</li> <li>The asterisk (<code>*</code>) indicates that the previous character cane be repeated zero or more times.</li> <li>The <code>\&lt;</code> and <code>\&gt;</code> symbols mark the beginning and the end of a word.</li> <li>The caret (<code>^</code>) and the dollar sign (<code>$</code>) denote the beginning of and the end of a line.</li> <li>The backslash (<code>\</code>) escapes any metacharacter, allowing you to find literal characters. For example, <code>\$</code> finds the dollar sign itself.</li> </ul> <p><b>Findstr</b> does not support character classes (<code>\d</code>), alternation (<code>|</code>), or other repetitions (<code>+</code> or <code>{5}</code>).</p> <p>The basic syntax is the same as for the <code>FIND</code> command:</p> <pre> findstr "\&lt;20[0-9][0-9]\&gt;" *.htm </pre> <p>This command finds all years starting with 2000 in the <code>.htm</code> files of the current directory. Just like with <code>find</code>, use the <code>/I</code> switch for <b>a case-insensitive</b> search:</p> <img src="/FindStr2.png" alt="The FINDSTR command in Windows 11" title="" width="652" height="115"> <h3>Findstr limitations and quirks</h3> <p>Character lists <code>[a-z]</code> are always case-insensitive, so <code>echo ABC | findstr "[a-z]"</code> matches.</p> <p><b>The space character</b> works as the alternation metacharacter in <code>findstr</code>, so a search query like <code>findstr "new shoes" *</code> will find all lines containing either <code>new</code> or <code>shoes</code>. Unfortunately, there is no way to escape the space and use it as a literal character in a regular expression. For example, you cannot find lines starting with a space.</p> <p><b>Syntax errors</b> in regular expression are ignored. For instance, <code>findstr "[" *</code> will match all lines that contain the <code>[</code> character.</p> <p>If the file contains <b>Unix line breaks</b> (LF), the <code>$</code> metacharacter does not work correctly. If <b>the last line of a file</b> lacks a line terminator, <code>findstr</code> will be unable to find it. For example, <code>findstr "&lt;/html&gt;$" *</code> won't work if there is no CR+LF after &lt;/html&gt;.</p> <p>Early Windows versions had <b>limitations on line length</b> for <code>find</code> and <code>findstr</code>, as well as other commands. The recent versions lifted these limits, so you don't have to worry about them anymore. See <a href="https://stackoverflow.com/questions/8844868/what-are-the-undocumented-features-and-limitations-of-the-windows-findstr-comman/20159191#20159191">this StackOverflow question</a> for <code>findstr</code> limitations and bugs, especially in early Windows versions.</p> <p>The findstr command operates in <b>the OEM (MS DOS) code page;</b> the dot metacharacter does not match any of the extended ASCII characters. As the result, the command is not very useful for non-English text. Besides that, you cannot search for Unicode characters (UTF-8 or UTF-16).</p> <h3>Conclusion</h3> <p>You can learn about other switches by typing <code>findstr /?</code> or <code>find /?</code>. For example, the additional switches allow you to search in subdirectories or print line numbers. You can also refer to <a href="https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/findstr">the official documentation.</a></p> <p>In general, the <code>find</code> and <code>findstr</code> commands are outdated and come with various quirks and limitations. Shameless plug: <b>Aba Search and Replace</b> supports <a href="/docs/cmdLine.php">command-line options as well,</a> allowing you to search from the command prompt and replace text from Windows batch files.</p> Sun, 21 May 2023 14:07:58 +0200https://www.abareplace.com/blog/findstr/Empty character class in JavaScript regexes<p>I <a href="https://github.com/PCRE2Project/pcre2/blob/master/maint/GenerateUcd.py">contributed to PCRE</a> and wrote two smaller regular expression engines, but I still regularly learn something new about this topic. This time, it's about <b>a regex that never matches.</b></p> <p>When using <a href="https://www.abareplace.com/docs/charListClass.php">character classes,</a> you can specify the allowed characters in brackets, such as <code>[a-z]</code> or <code>[aeiouy]</code>. But what happens if the character class is empty?</p> <p>Popular <b>regex engines</b> treat the empty brackets <code>[]</code> differently. In JavaScript, they never match. This is a valid JavaScript code, and it always prints false regardless of the value of <code>str</code>:</p> <pre> const str = 'a'; console.log(/[]/.test(str)); </pre> <p>However, in Java, PHP (PCRE), Go, and Python, the same regex throws an exception:</p> <pre> // Java @Test void testRegex1() { PatternSyntaxException e = assertThrows(PatternSyntaxException.class, () -> Pattern.compile("[]")); assertEquals("Unclosed character class", e.getDescription()); } </pre> <pre> &lt;?php ini_set('display_errors', 1); error_reporting(E_ALL); // Emits a warning: preg_match(): Compilation failed: missing terminating ] for character class echo preg_match('/[]/', ']') ? 'Match ' : 'No match'; </pre> <pre> # Python import re re.compile('[]') # throws "unterminated character set" </pre> <p>In these languages, you can <b>put the closing bracket right after the opening bracket</b> to avoid <a href="https://www.abareplace.com/blog/escape-regexp/">escaping the former</a>:</p> <pre> // Java @Test void testRegex2() { Pattern p = Pattern.compile("[]]"); Matcher m = p.matcher("]"); assertTrue(m.matches()); } </pre> <pre> &lt;?php echo preg_match('/[]]/', ']', $m) ? 'Match ' : 'No match'; // Outputs 'Match' print_r($m); </pre> <pre> # Python import re print(re.match('[]]', ']')) # outputs the Match object </pre> <pre> // Go package main import ( "fmt" "regexp" ) func main() { matched, err := regexp.MatchString(`[]]`, "]") fmt.Println(matched, err) } </pre> <p>This won't work in JavaScript because the first <code>]</code> is interpreted as the end of the character class there, so the same regular expression in JavaScript means <a href="https://262.ecma-international.org/13.0/#sec-compiletocharset">an empty character class</a> that never matches, followed by a closing bracket. As the result, the regular expression never finds the closing bracket:</p> <pre> // JavaScript console.log(/[]]/.test(']')); // outputs false </pre> <p>If you <b>negate the empty character class</b> with <code>^</code> in JavaScript, it will match any character including newlines:</p> <pre> console.log(/[^]/.test('')); // outputs false console.log(/[^]/.test('a')); // outputs true console.log(/[^]/.test('\n')); // outputs true </pre> <p>Again, this is an invalid regex in other languages. PCRE can emulate the JavaScript behavior if you pass the PCRE2_ALLOW_EMPTY_CLASS option to <a href="https://pcre.org/current/doc/html/pcre2api.html#SEC20">pcre_compile.</a> PHP never passes this flag.</p> <p>If you want to match <b>an opening or a closing bracket,</b> this somewhat cryptic regular expression will help you in Java, PHP, Python, or Go: <code><b>[</b>][<b>]</b></code>. The first opening bracket starts the character class, which includes the literal closing bracket and the literal opening bracket, and finally, the last closing bracket ends the class.</p> <p>In JavaScript, you need to escape the closing bracket like this: <code><b>[</b>\][<b>]</b></code></p> <pre> console.log(/[\][]/.test('[')); // outputs true console.log(/[\][]/.test(']')); // outputs true </pre> <p>In Aba Search and Replace, I chose to support the syntax used in Java/PHP/Python/Go. There are <a href="https://stackoverflow.com/questions/1723182/a-regex-that-will-never-be-matched-by-anything">many other ways</a> to construct a regular expression that always fails, in case you need it. So it makes sense to use this syntax for a literal closing bracket.</p> Mon, 10 Apr 2023 17:44:12 +0200https://www.abareplace.com/blog/emptybrackets/Privacy Policy Update - December 2022<p>Updated <a href="/order/#privacy">our privacy policy:</a></p> <ul> <li>clarified your rights under GDPR (you can object to processing of your personal data or restrict the processing, etc.);</li> <li>added that we don't do any profiling for marketing purposes, but PayPro Global may do risk scoring in order to prevent a potential credit card fraud;</li> <li>added that we can notify you by email about new software versions (you can leave this checkbox empty or unsubscribe at any time);</li> <li>listed what happens if you don't provide your personal data (e.g., if you don't provide your email address, we cannot reply to you);</li> <li>changed the refund policy from 30 to 14 days, added a reference to the relevant Czech law;</li> <li>stated that we do full-disk encryption and encrypt all backups, so your personal data are safe with us.</li> </ul> <p>Note that we are required by law to notify you of any changes in the privacy policy. Thank you and have a nice holiday season!</p> Sun, 25 Dec 2022 21:17:32 +0100https://www.abareplace.com/blog/privacy2022-12/Aba 2.5 released<p>The new features in this version include:</p> <ul> <li>Search and replace <a href="/docs/cmdLine.php">from the command line</a></li> <li><a href="/docs/searchParams.php#browseForFiles">Skip subdirectories</a> when searching (click the <i>Browse</i> button and uncheck <i>Include subdirectories</i>)</li> <li><a href="/docs/searchResults.php#sorting">Sorting</a> the search results by path, filename, extension, modification date, or file size.</li> <li>Escape sequences and character classes inside the character lists, e.g. <code>[\d\s]</code> to find a digit, a space, or a newline.</li> <li>Fixed multiple bugs including encoding detection in very short files and searching for the replacement character U+FFFD (many thanks to Joe). Also fixed incorrect search in files slightly larger than 4 GB.</li> <li>Now relative paths are displayed instead of absolute ones in the search results.</li> </ul> <p>The upgrade is free for the registered users. Just <a href="/download/">download</a> the installer and run it; your settings and search history will be preserved.</p> <img src="/blog_aba25.png" width="688" height="436" alt="Aba 2.5 window" title=""> Sun, 11 Dec 2022 20:03:51 +0100https://www.abareplace.com/blog/aba25/Our response to the war in Ukraine<p>In response to the Russian invasion of Ukraine, I blocked all orders from Russia starting from March 2022. I fully support Ukraine in this terrible war and donate money to help Ukrainian refugees in Czech Republic.</p> <p>Many of you are in a tough situation now due to the high inflation and the rising energy prices. So I introduce <b>a 10% discount</b> for all new Aba Search and Replace users, but especially for freelancers and small businesses who pay for the software from their own pocket.</p> <p>Please use this coupon code at <a href="/buy/">checkout:</a></p> <p><code><b>GloryToUkraine</b></code> &nbsp; <button onclick="navigator.clipboard.writeText('GloryToUkraine'); return false;">📋 Copy to clipboard</button> <p>The coupon code is valid until the end of 2022. I plan to release a new version within several weeks; the upgrade will be free for all registered users. Please stay tuned.</p> <p>Thank you for your continuous support. Wishing you peace and good fortune.</p> <p><i>Peter Kankowski,</i><br><i>Aba Search and Replace developer</i></p>Sat, 01 Oct 2022 12:37:28 +0200https://www.abareplace.com/blog/ukraine/