Why doesn't this FINDSTR example with multiple literal search strings find a match?
Asked Answered
P

2

11

Sometimes FINDSTR with multiple literal search strings fails to find all matches. For example, the following FINDSTR example fails to find a match.

echo ffffaaa|findstr /l "ffffaaa faffaffddd"

Why?

Pasteur answered 19/1, 2012 at 5:1 Comment(5)
wanna know something funny? put a space after ffffaaa and it works =DArbor
@Mechaflash - It doesn't have to be a space, it could be any character. But then extend the 2nd search string by one character and if fails again. There seems to be a minimum size difference necessary for the bug to appear. But the minimum difference is not a constant. I've seen a size difference of 2 fail.Pasteur
...just found another interesting "behaviour" of findstr: with the /X switch given a line must match exactly to be output; when the last line in a text file to search is not terminated with new-line, findstr will not return it (no matter whether /L or /R is given, or the search string is preceded with /C:)...Egwin
@Egwin - I've already documented that issue at What are the undocumented features and limitations of the Windows FINDSTR command?. It actually fails if the line does not contain a carriage return (0x0D), even if a newline (0x0A) is present.Pasteur
@Egwin - The info is under the headings Regex Line Position anchors ^ and $ and Positional Options /B /E /XPasteur
P
16

Apparantly this is a long standing FINDSTR bug. I think it can be a crippling bug, depending on the circumstances.

I have confirmed the command fails on two different Vista machines, a Windows 7 machine, and an XP machine. I found this findstr - broken ??? link that reports a similar search fails on Windows Server 2003, but it succeeds on Windows 2000.

I've done a number of experiments and it seems all of the following conditions must be met for the potential of a failure:

  • The search is using multiple literal search strings
  • The search strings are of different lengths
  • A short search string has some amount of overlap with a longer search string
  • The search is case sensitive (no /I option)

In every failure I have seen, it is always one of the shorter search strings that fails.

It does not matter how the search strings are specified. The same faulty result is achieved using multiple /C:"search" options and also with the /G:file option.

The only 3 workarounds I have been able to come up with are:

  • Use the /I option if you don't care about case. Obviously this might not meet your needs.

  • Use the /R regular expression option. But if you do then you have to make sure you escape any meta-characters in the search so that it matches the result expected of a literal search. This can be problematic as well.

  • If you are using the /V option, then use multiple piped FINDSTR commands with one search string each instead of one FINDSTR with multiple searches. This also can be a problem if you have a lot of search strings for which you want to use the /G:file option.

I hate this bug!!!!

Note - See What are the undocumented features and limitations of the Windows FINDSTR command? for a comprehensive list of FINDSTR idiosyncrasies.

Pasteur answered 19/1, 2012 at 5:5 Comment(1)
Careful, dbenham, you're likely to become the findstr guru in much the same way Skeet is the C# guru :-)Renteria
E
1

I cannot tell why findstr may fail with multiple literal strings. However, I can provide a method to work around that annoying bug.

Given that the literal search strings are listed in a text file called search_strings.txt...:

ffffaaa
faffaffddd

..., you can convert it to regular expressions by inserting a backslash in front of every single character:

@echo off
setlocal EnableExtensions DisableDelayedExpansion
> "regular_expressions.txt" (
    for /F usebackq^ delims^=^ eol^= %%S in ("search_strings.txt") do (
        set "REGEX=" & set "STRING=%%S"
        for /F delims^=^ eol^= %%T in ('
            cmd /U /V /C echo(!STRING!^| find /V ""
        ') do (
            set "ESCCHR=\%%T"
            if "%%T"="<" (set "ESCCHR=%%T") else if "%%T"=">" (set "ESCCHR=%%T")
            setlocal EnableDelayedExpansion
            for /F "delims=" %%U in ("REGEX=!REGEX!!ESCCHR!") do (
                endlocal & set "%%U"
            )
        )
        setlocal EnableDelayedExpansion
        echo(!REGEX!
        endlocal
    )
)
endlocal

Then use the converted file regular_expressions.txt...:

\f\f\f\f\a\a\a
\f\a\f\f\a\f\f\d\d\d

...to do a regular expression search, which seems to work fine also with multiple search strings:

echo ffffaaa| findstr /R /G:"regular_expressions.txt"

The preceding backslashes simply escape every character including those that have a particular meaning in regular expression searches.

The characters < and > are excluded from being escaped in order to avoid conflicts with word boundaries, which were expressed by \< and \> when appearing at the beginning and at the end of a search string, respectively.

Since regular expressions are limited to 254 characters for findstr versions past Windows XP (opposed to literal strings, which are limited to 511 characters), the length of the original search strings is limited to 127 characters, because every such character is expressed by two characters due to the escaping.


Here is an alternative approach that only escapes the meta-characters ., *, ^, $, [, ], \, ":

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "_META=.*^$[]\"^" & rem (including `"`)
> "regular_expressions.txt" (
    for /F usebackq^ delims^=^ eol^= %%S in ("search_strings.txt") do (
        set "REGEX=" & set "STRING=%%S"
        for /F delims^=^ eol^= %%T in ('
            cmd /U /V /C echo(!STRING!^| find /V ""
        ') do (
            set "CHR=%%T"
            setlocal EnableDelayedExpansion
            if not "!_META!"=="!_META:*%%T=!" set "CHR=\!CHR!"
            for /F "delims=" %%U in ("REGEX=!REGEX!!CHR!") do (
                endlocal & set "%%U"
            )
        )
        setlocal EnableDelayedExpansion
        echo(!REGEX!
        endlocal
    )
)
endlocal

The advantage of this method is that the length of the search strings is no longer limited to 127 characters but to 254 characters minus 1 for every occurring aforementioned meta-character, applying for findstr versions past Windows XP.


Here is another work-around, using a case-insensitive search with findstr at the first place, then post-filtering the result by case-sensitive comparisons:

echo ffffaaa|findstr /L /I "ffffaaa faffaffddd"|cmd /V /C set /P STR=""^&if @^^!STR^^!==@^^!STR:ffffaaa=ffffaaa^^! (echo(^^!STR^^!) else if @^^!STR^^!==@^^!STR:faffaffddd=faffaffddd^^! (echo(^^!STR^^!)

The double-escaped exclamation marks ensure the variable STR is expanded in the explicitly invoked cmd instance even in case delayed expansion is enabled in the hosting cmd instance.


By the way, due to what I call a design flaw, searches with literal strings using findstr never work reliably as soon as they contain backslashes, because such may still be consumed to escape following meta-characters, although not necessary; for example, the search string \. actually matches .; to truly match \. literally, you must specify the search string \\.. I do not understand why meta-characters are still recognised when doing literal searches, that is not what I call literal.

Egwin answered 4/6, 2017 at 22:2 Comment(1)
Yes, the "literal" search is ridiculous. FINDSTR is probably one of the worst ever piles of ***** ever released to production. It was originally a MS employee's personal tool, and it became a standard part of the Windows release without proper design and debugging. And yes, you can convert each literal string to a regular expression, but your strategy of escaping every character severely limits the search length to 127 chars max. Literal strings max out at 511. Regex is limited to 254, but your escapes leave only 127. It is even worse on XP.Pasteur

© 2022 - 2024 — McMap. All rights reserved.