Aba Search and Replace bloghttps://www.abareplace.com/blog/Search tips and tricks, regular expression tutorials, announcements about new versions of Aba Search and Replace.1440Automatically add width and height to img tags <p>If you set the width and height attributes for your <code>img</code> tags, the browser can allocate the correct amount of space for the image before loading it. This prevents content below the image from shifting around as the page loads. The layout becomes stable, which means that:</p> <ul> <li>your users <a href="https://web.dev/articles/cls">won’t accidentally click a wrong button</a> because of layout shift;</li> <li>the performance is better because the browser doesn’t have to recalculate the layout as the images load;</li> <li>page load feels smoother and faster.</li> </ul> <p>That’s why Google <a href="https://web.dev/articles/serve-images-with-correct-dimensions#avoid_layout_shifts_by_specifying_dimensions">recommends</a> setting the <a href="https://web.dev/patterns/web-vitals-patterns/images/img-tag">width and height attributes</a> in your HTML code.</p> <p>If you have a lot of images, it may take some time to specify their dimensions. With Aba Search and Replace, you can do it automatically.</p> <h3>The typical case</h3> <img src="/blog_ImgWidthHeight1.png" alt="Adding width and height to HTML images" width="675" height="501"> <p>Please use this search pattern to capture the image file name in the first subexpression:</p> <pre> &lt;img src="([^"]+)" </pre> <p>The <code>[^"]+</code> regex matches everything except for the closing quotation mark and parentheses mark the first subexpression.</p> <p>If you have absolute paths like <code>&lt;img src="/images/someImage.png"&gt;</code> in your HTML code, use the following replacement:</p> <pre> \0 \{ File(Aba.searchPath() \1).meta('ImgTag') } </pre> <p>Here, we insert the whole match <code>\0</code>, which is the <code>img</code> tag and its <code>src</code> attribute. Then, we insert width and height via <a href="https://www.abareplace.com/docs/baoFiles.php#baoMeta">the meta function</a>. The <a href="https://www.abareplace.com/docs/baoFiles.php#baoFileName">Aba.searchPath()</a> function returns the directory that you selected for the search, then the image filename <code>\1</code> is added to it.</p> <h3>Relative paths</h3> <img src="/blog_ImgWidthHeight2.png" alt="Adding width and height to HTML images with relative paths" width="675" height="501"> <p>If your paths are relative to the html files (e.g., <code>&lt;img src="someImage.png"&gt;</code> or <code>&lt;img src="../banner.png"&gt;</code>), then use a simpler replacement:</p> <pre> \0 \{ File(\1).meta('ImgTag') } </pre> <h3>Replacing existing width and height attributes</h3> <p>If you have existing width and height attributes and you want to replace them, the regex becomes more complex. For example, if the width and height always follow the src attribute:</p> <pre> &lt;img src="([^"]+)" width="\d+" height="\d+" </pre> <p>And the replacement should be:</p> <pre> &lt;img src="\1" \{File(Aba.searchPath() \1).meta('ImgTag')} </pre> <img src="/blog_ImgWidthHeight3.png" alt="Matching the existing width and height attributes" width="675" height="501"> <h3>Conclusion</h3> <p>You can preview the replacements and check that the <code>img</code> tags are matched correctly. If Aba cannot find an image file, it will display an error message with the src attribute and the HTML filename. Then, just press the <i>Replace</i> button and test the result in your browser. If anything goes wrong, you can always undo the replacement.</p> <p>Aba can help you to ensure that all of your pages use width and height attributes, which improves performance, prevents layout shifts, and makes your website more visually appealing for the users.</p> Sun, 14 Jul 2024 15:30:03 +0200https://www.abareplace.com/blog/img-width-height/Using zero-width assertions in regular expressions<h3>Anchors ^ $ \b \A \Z</h3> <p>Anchors in regular expressions allow you to specify context in a string where your pattern should be matched. There are several types of anchors:</p> <ul> <li><code>^</code> matches the start of a line (in multiline mode) or the start of the string (by default).</li> <li><code>$</code> matches the end of a line (in multiline mode) or the end of the string (by default).</li> <li><code>\A</code> matches the start of the string.</li> <li><code>\Z</code> or <code>\z</code> matches the end of the string.</li> <li><code>\b</code> matches a word boundary (before the first letter of a word or after the last letter of a word).</li> <li><code>\B</code> matches a position that is not a word boundary (between two letters or between two non-letter characters).</li> </ul> <p>These anchors are supported in <a href="https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bounds">Java</a>, <a href="https://www.php.net/manual/en/regexp.reference.anchors.php">PHP</a>, Python, <a href="https://docs.ruby-lang.org/en/master/Regexp.html">Ruby</a>, <a href="https://learn.microsoft.com/en-us/dotnet/standard/base-types/anchors-in-regular-expressions">C#</a>, and Go. In JavaScript, <code>\A</code> and <code>\Z</code> are not supported, but you can use <code>^</code> and <code>$</code> instead of them; just remember to keep the multiline mode disabled. Aba Search and Replace <a href="https://www.abareplace.com/docs/anchors.php">always runs in multiline mode,</a> so you can use <code>\A</code> and <code>\Z</code> to match the beginning or the end of a file.</p> <p>For example, the regular expression <code>^abc</code> will match the start of a string that contains the letters "abc". In multiline mode, the same regex will match these letters at the beginning of a line. You can use anchors in combination with other regular expression elements to create more complex matches. For example, <code>^From: (.*)</code> matches a line starting with <code>From:</code></p> <p>The difference between <code>\Z</code> and <code>\z</code> is that <code>\Z</code> matches at the end of the string but also skips a possible newline character at the end. In contrast, <code>\z</code> is more strict and matches only at the end of the string.</p> <p>If you have read <a href="/blog/regex101/">the previous part of this article,</a> you may wonder if the anchors add any additional capabilities that are not supported by the three primitives (alternation, parentheses, and the star for repetition). The answer is: they do not, but they <b>change what is captured</b> by the regular expression. You can match a line starting with <code>abc</code> by explicitly adding the newline character: <code>\nabc</code>, but in this case, you will also match the newline character itself. When you use <code>^abc</code>, the newline character is not consumed.</p> <p>In a similar way, <code>ing\b</code> matches all words ending with <i>ing</i>. You can replace the anchor with a character class containing non-letter characters (such as spaces or punctuation): <code>ing\W</code>, but in this case, the regular expression will also consume the space or punctuation character.</p> <p>If the regular expression starts with <code>^</code> so that it only matches at the start of the string, it's called <b>anchored</b>. In some programming languages, you can do an anchored <b>match</b> instead of the non-anchored <b>search</b> without using <code>^</code>. For example, in PHP (PCRE), you can use the <code>A</code> modifier.</p> <p>So the anchors <b>don't add any new capabilities</b> to the regular expressions, but they allow you to manage which characters will be included into the match or to match only at the beginning or end of the string. The matched language is still <a href="https://en.wikipedia.org/wiki/Regular_language">regular.</a></p> <h3>Zero-width assertions (?= ) (?! ) (?&lt;= ) (?&lt;! )</h3> <p>Zero-width assertions (also called lookahead and lookbehind assertions) allow you to check that a pattern occurs in the subject string without capturing any of the characters. This can be useful when you want to check for a pattern without moving the match pointer forward. For example, you can test that the next characters are <code>abc</code> without consuming them: <code>(?=abc)</code>.</p> <p>Zero-width assertions are <b>generalized anchors.</b> Just like anchors, they don't consume any character from the input string. Unlike anchors, they allow you to check anything, not only line boundaries or word boundaries. So you can replace an anchor with a zero-width assertion, but not vice versa. For example, <code>ing\b</code> could be rewritten as <code>ing(?=\W|$)</code>.</p> <p>Aba documentation includes a detailed article on <a href="https://www.abareplace.com/docs/lookaround.php">zero-width assertions (lookaround)</a> and their typical usage, so we won't repeat it here. Zero-width lookahead and lookbehind are supported in <a href="https://www.php.net/manual/en/regexp.reference.assertions.php">PHP</a>, JavaScript, Python, Java, and Ruby. Unfortunately, they are not supported in Go.</p> <p>Just like anchors, zero-width assertions still match a regular language, so from a theoretical point of view, they don't add anything new to the capabilities of regular expressions. They just make it possible to <b>skip certain things from the captured string,</b> so you only check for their presence but don't consume them.</p> Sun, 30 Jun 2024 22:27:35 +0200https://www.abareplace.com/blog/zero-width/Aba 2.7 released<p>In the new version, Aba got a UI facelift and dark mode. Several critical bugs were fixed in this release, so it's recommended for everyone to install. The changes are:</p> <ul> <li>Dark mode.</li> <li>A larger, more modern UI font (Segoe UI).</li> <li>Syntax highlight for Java, C#, SQL, and Pascal.</li> <li>Drag and drop into the main window.</li> <li>Autocomplete in the path combobox.</li> <li>Allow to use a file name in double quotes.</li> <li>Fixed 13 bugs including 6 critical ones.</li> </ul> <img src="/aba27_2.png" alt="Dark mode" width="625" height="483"> <p>Just as always, <b>the upgrade is free</b> for the registered users.</p>Sun, 12 May 2024 17:54:00 +0200https://www.abareplace.com/blog/aba27/Regular Expressions 101<p>With regular expressions, you can describe the patterns that are similar to each other. For example, you have multiple <code>&lt;img&gt;</code> tags, and you want to move all these images to the <code>images</code> folder:</p> <pre> &lt;img src="9.png"&gt; &#x2192; &lt;img src="images/9.png"&gt; &lt;img src="10.png"&gt; &#x2192; &lt;img src="images/10.png"&gt; and so on </pre> <p>You can easily write a regular expression that matches all file names that are numbers, then replace all such tags at once.</p> <h3>Basic syntax</h3> <p>If you need to match <b>one of the alternatives,</b> use an alternation (vertical bar). For example:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>a|img|h1|h2</code></td><td class="c">either <code>a</code>, or <code>img</code>, or <code>h1</code>, or <code>h2</code></td></tr> </table> <p>When using alternation, you often need to <b>group</b> characters together; you can do this with parentheses. For example, if you want to match an HTML tag, this approach won't work:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>&lt;h1|h2|b|i&gt;</code></td><td class="c"><code>&lt;h1</code> or <code>h2</code> (without the angle brackets) or <code>b</code> or <code>i&gt;</code></td></tr> </table> <p>because <code>&lt;</code> applies to the first alternative only and <code>&gt;</code> applies to the last one only. To apply the angle brackets to all alternatives, you need to group the alternatives together:</p> <pre> &lt;(h1|h2|b|i)&gt; </pre> <p>The last primitive (star) allows you to <b>repeat</b> anything zero or more times. You can apply it to one character, for example:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>a*</code></td><td class="c">an empty string, <code>a</code>, <code>aa</code>, <code>aaa</code>, <code>aaaa</code>, etc.</td></tr> </table> <p>You also can apply it to multiple characters in parentheses:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>(ab)*</code></td><td class="c">an empty string, <code>ab</code>, <code>abab</code>, <code>ababab</code>, <code>abababab</code>, etc.</td></tr> </table> <p>Note that if you remove the parentheses, the star will apply to the last character only:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>ab*</code></td><td class="c">an empty string, <code>ab</code>, <code>abb</code>, <code>abbb</code>, <code>abbbb</code>, etc.</td></td></tr> </table> <figure> <img src="/Stephen_Kleene.jpg" width="281" height="400" alt="A portrait of Stephen Cole Kleene, the regular expression inventor" title=""> <figcaption>Stephen Kleene (1909-1994), the regular expression inventor.<br/>Author: Konrad Jacobs. Source: Archives of the Mathematisches Forschungsinstitut Oberwolfach.</figcaption> </figure> <p>The star is named <b>Kleene star</b> after an American mathematician <a href="https://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Kleene</a> who invented regular expressions in the 1950s. It can match an empty string as well as any number of repetitions.</p> <p>These <b>three primitives</b> (alternation, parentheses, and the star for repetition) are enough to write any regular expression, but the syntax may be verbose. For example, you now can write a regex for matching the file names that are numbers in an <code>&lt;img&gt;</code> tag:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>(0|1|2|3|4&#x200b;|5|6|7|8|9)(0|1|2|3|4&#x200b;|5|6|7|8|9)*</code></td><td class="c">one or more digits</td></tr> <tr><td class="r"><code>(1|2|3|4|5&#x200b;|6|7|8|9)(0|1|2|3|4&#x200b;|5|6|7|8|9)*</code></td><td class="c">a positive integer number (don't allow zero as the first character)</td></tr> </table> <p>The parentheses may be nested without a limit, for example:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>(1|2|3|4&#x200b;|5|6|7|8|9)(0|1|2|3|4&#x200b;|5|6|7|8|9)*(,(1|2|3|4&#x200b;|5|6|7|8|9)(0|1|2|3|4&#x200b;|5|6|7|8|9)*)*</code></td><td class="c">one or more positive integer numbers, separated with commas</td></tr> </table> <h3>Convenient shortcuts for character classes</h3> <p>You can write any regex with the three primitives, but it quickly becomes hard to read, so a few shortcuts were invented. When you need to match <b>any of the listed characters,</b> please put them into square brackets:</p> <table class="example"> <thead><tr><td>Regex</td><td>Shorter regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>a|e|i|o|u|y</code></td><td class="r"><code>[aeiouy]</code></td><td class="c">a vowel</td></tr> <tr><td class="r"><code>0|1|2|3|4&#x200b;|5|6|7|8|9</code></td><td class="r"><code>[0123456789]</code></td><td class="c">a digit</td></tr> <tr><td class="r"><code>0|1|2|3|4&#x200b;|5|6|7|8|9</code></td><td class="r"><code>[0-9]</code></td><td class="c">a digit</td></tr> <tr><td class="r"><code>a|b|c|d|e&#x200b;|f|g|h|i|j&#x200b;|k|l|m|n&#x200b;|o|p|q|r&#x200b;|s|t|u|v&#x200b;|w|x|y|z</code></td><td class="r"><code>[a-z]</code></td><td class="c">a letter</td></tr> </table> <p>As you can see, it's possible to specify only the first and the last allowed character if you put <b>a dash</b> between them. There may be several such ranges inside square brackets:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>[a-z0-9]</code></td><td class="c">a letter or a digit</td></tr> <tr><td class="r"><code>[a-z0-9_]</code></td><td class="c">a letter, a digit, or the underscore character</td></tr> <tr><td class="r"><code>[a-f0-9]</code></td><td class="c">a hexadecimal digit</td></tr> </table> <p>There are some <b>predefined character classes</b> that are even shorter to write:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>\s</code></td><td class="c">a space character: the space, the tab character, the new line, or the carriage feed</td></tr> <tr><td class="r"><code>\d</code></td><td class="c">a digit</td></tr> <tr><td class="r"><code>\w</code></td><td class="c">a word character (a letter, a digits, or the underscore character)</td></tr> <tr><td class="r"><code>.</code></td><td class="c">any character</td></tr> </table> <p>In Aba Search and Replace, these <a href="/docs/charListClass.php">character classes</a> include Unicode characters such as accented letters or Unicode line breaks. In other regex dialects, they usually include ASCII characters only, so <code>\d</code> is typically the same as <code>[0-9]</code> and <code>\w</code> is the same as <code>[a-zA-Z0-9_]</code>.</p> <p>The character classes don't add any new capabilities to the regular expressions; you can just list all allowed characters with an alternation, but a character class is much shorter to write. We now can write a shorter version of the regex mentioned before:</p> <table class="example"> <thead><tr><td>Regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>[1-9][0-9]*(,[1-9][0-9])*</code></td><td class="c">one or more positive integer numbers, separated with commas</td></tr> </table> <h3>Repetitions</h3> <p>A Kleene star means "repeating zero or more times", but you often need another number of repetitions. As shown before, you can just copy-and-paste a regex to repeat it twice or three times, but there is a shorter notation for that:</p> <table class="example"> <thead><tr><td>Regex</td><td>Shorter regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>\d\d*</code></td><td class="r"><code>\d+</code></td><td class="c">one or more digits</td></tr> <tr><td class="r"><code>(0|1)(0|1)*</code></td><td class="r"><code>[01]+</code></td><td class="c">any binary number (consisting of zeros and ones)</td></tr> <tr><td class="r"><code>(\s|)</code></td><td class="r"><code>\s?</code></td><td class="c">either a space character or nothing</td></tr> <tr><td class="r"><code>http(s|)</code></td><td class="r"><code>https?</code></td><td class="c">either <code>http</code> or <code>https</code></td></tr> <tr><td class="r"><code>(-|\+|)</code></td><td class="r"><code>[-+]?</code></td><td class="c">the minus sign, the plus sign, or nothing</td></tr> <tr><td class="r"><code>[a-z][a-z]</code></td><td class="r"><code>[a-z]{2}</code></td><td class="c">two small letters</td></tr> <tr><td class="r"><code>[a-z][a-z]((([a-z]|)[a-z]|)[a-z]|)</code></td><td class="r"><code>[a-z]{2,5}</code></td><td class="c">from two to five small letters</td></tr> <tr><td class="r"><code>[a-z][a-z][a-z]*</code></td><td class="r"><code>[a-z]{2,}</code></td><td class="c">two or more small letters</td></tr> </table> <p>So there are the following <a href="/docs/repetitions.php">repetition operators:</a></p> <ul> <li>a Kleene star <code>*</code> means repeating <b>zero or more times,</b> so it can never match, it can match once, twice, three times, etc.;</li> <li>a plus sign <code>+</code> means repeating <b>one or more times,</b> so it must match at least once;</li> <li>an optional part <code>?</code> means <b>zero times or once</b>;</li> <li>curly brackets <code>{m,n}</code> means repeating <b>from m to n times</b>.</li> </ul> <p>Note that you can express any repetition with the curly brackets, so these operators partially duplicate each other. For example:</p> <table class="example"> <thead><tr><td>Regex</td><td>Shorter regex</td><td>Meaning</td></tr></thead> <tr><td class="r"><code>\d{0,}</code></td><td class="r"><code>\d*</code></td><td class="c">nothing or some digits</td></tr> <tr><td class="r"><code>\d{1,}</code></td><td class="r"><code>\d+</code></td><td class="c">one or more digits</td></tr> <tr><td class="r"><code>\s{0,1}</code></td><td class="r"><code>\s?</code></td><td class="c">either a space character or nothing</td></tr> </table> <p>Just like the Kleene star, the other repetition operators can apply to parentheses, so you can nest them indefinitely.</p> <h3>Escaping</h3> <p>If you need to match any of <b>the special characters</b> like parentheses, vertical bar, plus, or star, you must <a href="/blog/escape-regexp/">escape them</a> by adding a backslash <code>\</code> before them. For example, to find a number in parentheses, use <code>\(\d+\)</code>.</p> <p>A common mistake is to forget a backslash before a dot. Note that a dot means any character, so if you write <code>example.com</code> in a regular expression, it will match <code>examplexcom</code> or something similar, which may even cause a security issue in your program. Now we can write a regex to match the <code>&lt;img&gt;</code> tags:</p> <pre> &lt;img src="\d+\.png"&gt; </pre> <p>This matches any filename consisting of digits and we correctly escaped the dot.</p> <h3>Other features</h3> <p>Modern regex engines add more features such as <b>backreferences</b> or conditional subpatterns. Mathematically speaking, these features don't belong to the regular expressions; they describe a non-regular language, so you cannot replace them with the three primitives.</p> <p>Next time, we will discuss anchors and zero-width assertions.</p> Sun, 28 Jan 2024 15:10:17 +0100https://www.abareplace.com/blog/regex101/2023 in review<p>In 2023, I <a href="/blog/ukraine/">continued</a> to <b>support Ukraine</b> and donated more than 50% of the revenue from Aba Search and Replace to the charities helping Ukrainians in need. I will keep donating this year.</p> <p>Released in December, Aba 2.6 is the first version that requires <b>Windows Vista.</b> The previous versions were tested on Windows XP, which remained popular for a long time after its release. Unfortunately, it became increasingly hard to maintain the Windows XP compatibility code and it limited the further development, so I had to say goodbye to Windows 2000/XP. Please <a href="/support/">contact me</a> if it creates any problem for you; I always listen to your feedback and can send you the previous version.</p> <p>In January 2023, Microsoft <a href="https://apps.microsoft.com/store/detail/XP89FXJL539HRG">certified</a> Aba Search and Replace for publication to the <b>Microsoft Store.</b> The new version 2.6 was also approved a few days ago, so you can <a href="/download/">download it</a> from the Microsoft Store as well as from this website.</p> <p>Thanks to Richard, Aba is also available in French <svg width="1em" viewBox="0 0 3 2"><path fill="#0055A4" d="M0 0h1v2H0z"/><path fill="#eee" d="M1 0h1v2H1z"/><path fill="#EF4135" d="M2 0h1v2H2z"/></svg>. If you are a native speaker of Spanish <svg viewBox="0 0 6 4" width="1em"><path fill="#AD1519" d="M0 0h6v4H0z"/><path fill="#FABD00" d="M0 1h6v2H0z"/></svg>, German <svg width="1em" viewBox="0 0 9 6"><path d="M0 0h9v2H0z"/><path fill="#f00" d="M0 2h9v2H0z"/><path fill="#fc0" d="M0 4h9v2H0z"/></svg>, or Italian <svg width="1em" viewBox="0 0 3 2"><path fill="#008C45" d="M0 0h1v2H0z"/><path fill="#eee" d="M1 0h1v2H1z"/><path fill="#CD212A" d="M2 0h1v2H2z"/></svg> and you can translate the 17 messages that were added in the recent version, please <a href="/support/">contact me.</a> Feel free to use Google Translate or ChatGPT, then review and edit the automatic translation. Thank you so much.</p> <p>The <b>blog post</b> about <a href="/blog/escape-regexp/">escaping in regular expressions</a> is still the most popular on this blog. In April, I wrote a followup about <a href="/blog/emptybrackets/">empty character classes</a>, which was also well-received.</p> <p>The new Aba version remains lean and fast. No huge runtime libraries, no cluttered UIs or bloatware. Stay tuned for the next versions!</p> Sun, 14 Jan 2024 12:27:50 +0100https://www.abareplace.com/blog/2023review/Regular expression for numbers<p>It's easy to find a positive integer number with regular expressions:</p> <pre>[0-9]+</pre> <p>This regex means digits from 0 to 9, repeated one or more times. However, <b>numbers starting with zero</b> are treated as octal in many programming languages, so you may wish to avoid matching them:</p> <pre>[1-9][0-9]*</pre> <p>This regular expression matches any positive integer number starting with a non-zero digit. If you also need to match zero, you can include it as another branch:</p> <pre>[1-9][0-9]*|0</pre> <p>To also accomodate <b>negative integer numbers,</b> you can allow a minus sign before the digits:</p> <pre>-?[1-9][0-9]*|0</pre> <p>Sometimes it's necessary to allow a plus sign as well:</p> <pre>[-+]?[1-9][0-9]*|0</pre> <p>The previous regexes searched the input string for a number. If you need to match <b>a number only</b> discarding anything else, you can add the <code>^</code> anchor to match the beginning of the string and the <code>$</code> anchor to match the end:</p> <pre>^(-?[1-9][0-9]*|0)$</pre> <p>Parentheses are necessary here; without them, the <code>^</code> anchor would apply only to the first branch. Another variation of the same regex avoids finding numbers that are part of words, such as <code>600px</code> or <code>x64</code>:</p> <pre>\b(-?[1-9][0-9]*|0)\b</pre> <p>Things get more complicated if you need to match <b>a fractional number</b>:</p> <pre>\b-?(?:[1-9][0-9]*(?:\.[0-9]+)?|\.[0-9]+|0)\b</pre> <p>Let's break down this regular expression:</p> <ul> <li>The first branch <code>[1-9][0-9]*(?:\.[0-9]+)?</code> matches an integer number starting with a non-zero digit, then an optional fractional part.</li> <li>The second branch <code>\.[0-9]+</code> matches fractional numbers starting with a dot, for example, <code>.5</code> is another way to write <code>0.5</code>.</li> <li>The third branch matches zero. Note that both positive and negative zeros are possible in floating-point numbers.</li> </ul> <p>For floating-point numbers with an exponent, such as <code>5.2777e+231</code>, please use:</p> <pre>\b-?(?:[1-9][0-9]*(?:\.[0-9]+)?|\.[0-9]+|0)(?:[eE][+-]?[0-9]+)?\b</pre> <p>Many programming languages support <b>hexadecimal numbers</b> starting with <code>0x</code>. Here is a regular expression to match them:</p> <pre>0x[0-9a-fA-F]+</pre> <p>Finally, here is a comprehensive regular expression to match floating-point, integer decimal, or hexadecimal numbers:</p> <pre>\b-?(?:[1-9][0-9]*(?:\.[0-9]+)?|\.[0-9]+|0(?:x[0-9a-fA-F]+)?)(?:[eE][+-]?[0-9]+)?\b</pre> Sat, 30 Dec 2023 18:13:28 +0100https://www.abareplace.com/blog/regex_numbers/Aba 2.6 released<p>This version adds the following features:</p> <ul> <li><a href="https://www.abareplace.com/docs/baoOverview.php">complex replacements</a> including converting the matching text to lowercase, inserting the file name, or adding width/height attributes to &lt;img&gt; tags (now you can use a simple scripting language in the replacements); </li> <li>a 64-bit version (if needed, you still can choose a 32-bit version during installation);</li> <li>a new <a href="https://www.abareplace.com/docs/hotkeys.php">hotkey:</a> the left/right arrow key to quickly jump to the next/previous file (when <a href="https://www.abareplace.com/docs/searchResults.php">the results pane</a> is focused);</li> <li>the taskbar button now flashes when a long operation is complete;</li> <li>basic support for emojis (ZWJ sequences and skin tones are displayed as separate characters).</li> </ul> <p>Just as always, <b>the upgrade is free</b> for the registered users; your settings and search history will be preserved when you run the installer.</p> <p>If you have any suggestions for new features, please <a href="/support/">contact me.</a> I will be happy to implement your ideas.</p>Mon, 25 Dec 2023 03:06:00 +0100https://www.abareplace.com/blog/aba26/Search from the Windows command prompt<p>When you need to search within text files from Windows batch files, you can use either the find or findstr command. Findstr supports a limited version of regular expressions. You can also automate certain tasks based on the search results.</p> <h3>The find command</h3> <p>To search for text in multiple files from the Windows command prompt or batch files, you can use the <b>FIND</b> command, which has been present since the days of MS DOS and is still available in Windows 11. It's similar to the Unix <code>grep</code> command, but does not support regular expressions. If you want to search for the word <code>borogoves</code> in the current directory, please follow this syntax:</p> <pre> find "borogoves" * </pre> <p>Note that the double quotes around the pattern are mandatory. If you are using PowerShell, you will need to include single quotes as well:</p> <pre> find '"borogoves"' * </pre> <p>Instead of the asterisk (<code>*</code>), you can specify a file mask such as <code>*.htm?</code>. The <code>find</code> command displays the names of the files it scans, even if it doesn't find any matches within these files:</p> <img src="/FindStr1.png" alt="The FIND command in Windows 11" title="" width="652" height="262"> <p>The search is <b>case-sensitive</b> by default, so you typically need to add the <code>/I</code> switch to treat uppercase and lowercase letters as equivalent:</p> <pre> find /I "&lt;a href=" *.htm </pre> <p>If you don't specify the file to search in, <code>find</code> will wait for the text input <b>from stdin,</b> so that you can pipe output from another command. For example, you can list all copy commands supported in Windows:</p> <pre> help | find /i "copy" </pre> <p>Another switch, <code>/V</code>, allows you to find all lines not containing the pattern, similar to the <code>grep -v</code> command.</p> <p>In <b>batch files,</b> you can use the fact that the <code>find</code> command sets the exit code (<b>errorlevel</b>) to 1 if the pattern is not found. For instance, you can check if the machine is running a 64-bit or 32-bit version of Windows:</p> <pre> @echo off rem Based on KB556009 with some corrections reg Query "HKLM\Hardware\Description\System\CentralProcessor\0" /v "Identifier" | find /i "x86 Family" &gt; nul if errorlevel 1 goto win64 echo 32-bit Windows goto :eof :win64 rem Could be AMD64 or ARM64 echo 64-bit Windows </pre> <h3>The findstr command: regular expression search</h3> <p>If you need to find <b>a regular expression,</b> try the <code>FINDSTR</code> command, which was introduced in Windows XP. <a href="https://devblogs.microsoft.com/oldnewthing/20151209-00/?p=92361">For historical reasons,</a> <code>findstr</code> supports a limited subset of regular expressions, so you can only use these <a href="https://www.abareplace.com/docs/regExprElements.php">regex features:</a></p> <ul> <li>The dot <code>.</code> matches any character except for newline and extended ASCII characters.</li> <li>Character lists <code>[abc]</code> match any of the specified characters (<code>a</code>, <code>b</code>, or <code>c</code>).</li> <li>Character list ranges <code>[a-z]</code> match any letter from <code>a</code> to <code>z</code>.</li> <li>The asterisk (<code>*</code>) indicates that the previous character cane be repeated zero or more times.</li> <li>The <code>\&lt;</code> and <code>\&gt;</code> symbols mark the beginning and the end of a word.</li> <li>The caret (<code>^</code>) and the dollar sign (<code>$</code>) denote the beginning of and the end of a line.</li> <li>The backslash (<code>\</code>) escapes any metacharacter, allowing you to find literal characters. For example, <code>\$</code> finds the dollar sign itself.</li> </ul> <p><b>Findstr</b> does not support character classes (<code>\d</code>), alternation (<code>|</code>), or other repetitions (<code>+</code> or <code>{5}</code>).</p> <p>The basic syntax is the same as for the <code>FIND</code> command:</p> <pre> findstr "\&lt;20[0-9][0-9]\&gt;" *.htm </pre> <p>This command finds all years starting with 2000 in the <code>.htm</code> files of the current directory. Just like with <code>find</code>, use the <code>/I</code> switch for <b>a case-insensitive</b> search:</p> <img src="/FindStr2.png" alt="The FINDSTR command in Windows 11" title="" width="652" height="115"> <h3>Findstr limitations and quirks</h3> <p>Character lists <code>[a-z]</code> are always case-insensitive, so <code>echo ABC | findstr "[a-z]"</code> matches.</p> <p><b>The space character</b> works as the alternation metacharacter in <code>findstr</code>, so a search query like <code>findstr "new shoes" *</code> will find all lines containing either <code>new</code> or <code>shoes</code>. Unfortunately, there is no way to escape the space and use it as a literal character in a regular expression. For example, you cannot find lines starting with a space.</p> <p><b>Syntax errors</b> in regular expression are ignored. For instance, <code>findstr "[" *</code> will match all lines that contain the <code>[</code> character.</p> <p>If the file contains <b>Unix line breaks</b> (LF), the <code>$</code> metacharacter does not work correctly. If <b>the last line of a file</b> lacks a line terminator, <code>findstr</code> will be unable to find it. For example, <code>findstr "&lt;/html&gt;$" *</code> won't work if there is no CR+LF after &lt;/html&gt;.</p> <p>Early Windows versions had <b>limitations on line length</b> for <code>find</code> and <code>findstr</code>, as well as other commands. The recent versions lifted these limits, so you don't have to worry about them anymore. See <a href="https://stackoverflow.com/questions/8844868/what-are-the-undocumented-features-and-limitations-of-the-windows-findstr-comman/20159191#20159191">this StackOverflow question</a> for <code>findstr</code> limitations and bugs, especially in early Windows versions.</p> <p>The findstr command operates in <b>the OEM (MS DOS) code page;</b> the dot metacharacter does not match any of the extended ASCII characters. As the result, the command is not very useful for non-English text. Besides that, you cannot search for Unicode characters (UTF-8 or UTF-16).</p> <h3>Conclusion</h3> <p>You can learn about other switches by typing <code>findstr /?</code> or <code>find /?</code>. For example, the additional switches allow you to search in subdirectories or print line numbers. You can also refer to <a href="https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/findstr">the official documentation.</a></p> <p>In general, the <code>find</code> and <code>findstr</code> commands are outdated and come with various quirks and limitations. Shameless plug: <b>Aba Search and Replace</b> supports <a href="/docs/cmdLine.php">command-line options as well,</a> allowing you to search from the command prompt and replace text from Windows batch files.</p> Sun, 21 May 2023 14:07:58 +0200https://www.abareplace.com/blog/findstr/Empty character class in JavaScript regexes<p>I <a href="https://github.com/PCRE2Project/pcre2/blob/master/maint/GenerateUcd.py">contributed to PCRE</a> and wrote two smaller regular expression engines, but I still regularly learn something new about this topic. This time, it's about <b>a regex that never matches.</b></p> <p>When using <a href="https://www.abareplace.com/docs/charListClass.php">character classes,</a> you can specify the allowed characters in brackets, such as <code>[a-z]</code> or <code>[aeiouy]</code>. But what happens if the character class is empty?</p> <p>Popular <b>regex engines</b> treat the empty brackets <code>[]</code> differently. In JavaScript, they never match. This is a valid JavaScript code, and it always prints false regardless of the value of <code>str</code>:</p> <pre> const str = 'a'; console.log(/[]/.test(str)); </pre> <p>However, in Java, PHP (PCRE), Go, and Python, the same regex throws an exception:</p> <pre> // Java @Test void testRegex1() { PatternSyntaxException e = assertThrows(PatternSyntaxException.class, () -> Pattern.compile("[]")); assertEquals("Unclosed character class", e.getDescription()); } </pre> <pre> &lt;?php ini_set('display_errors', 1); error_reporting(E_ALL); // Emits a warning: preg_match(): Compilation failed: missing terminating ] for character class echo preg_match('/[]/', ']') ? 'Match ' : 'No match'; </pre> <pre> # Python import re re.compile('[]') # throws "unterminated character set" </pre> <p>In these languages, you can <b>put the closing bracket right after the opening bracket</b> to avoid <a href="https://www.abareplace.com/blog/escape-regexp/">escaping the former</a>:</p> <pre> // Java @Test void testRegex2() { Pattern p = Pattern.compile("[]]"); Matcher m = p.matcher("]"); assertTrue(m.matches()); } </pre> <pre> &lt;?php echo preg_match('/[]]/', ']', $m) ? 'Match ' : 'No match'; // Outputs 'Match' print_r($m); </pre> <pre> # Python import re print(re.match('[]]', ']')) # outputs the Match object </pre> <pre> // Go package main import ( "fmt" "regexp" ) func main() { matched, err := regexp.MatchString(`[]]`, "]") fmt.Println(matched, err) } </pre> <p>This won't work in JavaScript because the first <code>]</code> is interpreted as the end of the character class there, so the same regular expression in JavaScript means <a href="https://262.ecma-international.org/13.0/#sec-compiletocharset">an empty character class</a> that never matches, followed by a closing bracket. As the result, the regular expression never finds the closing bracket:</p> <pre> // JavaScript console.log(/[]]/.test(']')); // outputs false </pre> <p>If you <b>negate the empty character class</b> with <code>^</code> in JavaScript, it will match any character including newlines:</p> <pre> console.log(/[^]/.test('')); // outputs false console.log(/[^]/.test('a')); // outputs true console.log(/[^]/.test('\n')); // outputs true </pre> <p>Again, this is an invalid regex in other languages. PCRE can emulate the JavaScript behavior if you pass the PCRE2_ALLOW_EMPTY_CLASS option to <a href="https://pcre.org/current/doc/html/pcre2api.html#SEC20">pcre_compile.</a> PHP never passes this flag.</p> <p>If you want to match <b>an opening or a closing bracket,</b> this somewhat cryptic regular expression will help you in Java, PHP, Python, or Go: <code><b>[</b>][<b>]</b></code>. The first opening bracket starts the character class, which includes the literal closing bracket and the literal opening bracket, and finally, the last closing bracket ends the class.</p> <p>In JavaScript, you need to escape the closing bracket like this: <code><b>[</b>\][<b>]</b></code></p> <pre> console.log(/[\][]/.test('[')); // outputs true console.log(/[\][]/.test(']')); // outputs true </pre> <p>In Aba Search and Replace, I chose to support the syntax used in Java/PHP/Python/Go. There are <a href="https://stackoverflow.com/questions/1723182/a-regex-that-will-never-be-matched-by-anything">many other ways</a> to construct a regular expression that always fails, in case you need it. So it makes sense to use this syntax for a literal closing bracket.</p> Mon, 10 Apr 2023 17:44:12 +0200https://www.abareplace.com/blog/emptybrackets/Privacy Policy Update - December 2022<p>Updated <a href="/order/#privacy">our privacy policy:</a></p> <ul> <li>clarified your rights under GDPR (you can object to processing of your personal data or restrict the processing, etc.);</li> <li>added that we don't do any profiling for marketing purposes, but PayPro Global may do risk scoring in order to prevent a potential credit card fraud;</li> <li>added that we can notify you by email about new software versions (you can leave this checkbox empty or unsubscribe at any time);</li> <li>listed what happens if you don't provide your personal data (e.g., if you don't provide your email address, we cannot reply to you);</li> <li>changed the refund policy from 30 to 14 days, added a reference to the relevant Czech law;</li> <li>stated that we do full-disk encryption and encrypt all backups, so your personal data are safe with us.</li> </ul> <p>Note that we are required by law to notify you of any changes in the privacy policy. Thank you and have a nice holiday season!</p> Sun, 25 Dec 2022 21:17:32 +0100https://www.abareplace.com/blog/privacy2022-12/