Use subpatterns in FINDSTR
Asked Answered
V

2

5

I must check the validity of a string stored in a variable, I can not use external CLI utilities (grep, awk, etc.) so I chose FINDSTR. The string has this format (in regexp):

([1-9][0-9]*:".*"(|".*")*)

I do not know how to check the subpattern (|. "*"). Currently my code is:

((ECHO.) | (SET /P "=(11:"a"|"b"|"c")") | (FINDSTR /R /C:"^([1-9][0-9]*:".*")$"))

Regards.

Vernitavernoleninsk answered 23/9, 2012 at 18:42 Comment(5)
If at all possible, you're probably better off using vbscript or powershell. Manipulating strings containing special characters is absurdly difficulty in Windows batch files.Male
@Harry Johnston Unfortunately I can not use anything other than standard internal or external commands to cmd.exe.Vernitavernoleninsk
VBScript and JScript are standard native utilities available to CMD.EXE, with good regex support. PowerShell is native from Vista onward, and also has good regex support.Riesman
dbenham: The Windows Scripting Host has nothing to do with cmd, cscript is just a normal console executable (which can [and sometimes is in corporate environments] be disabled via group policies). PowerShell can be installed on XP and Vista, but comes preinstalled only on Windows 7 and 8 (and the respective Server variants).Lorileelorilyn
@Joey: the point, I think, is that cscript (like any other console executable) is available for use via cmd.exe. It's true that it can be disabled by group policy, but so can any other external command (including findstr) and so can cmd.exe itself.Male
R
6

Mat M is correct about the limitation of FINDSTR. The FINDSTR regex support is very primitive and non-standard. Type HELP FINDSTR or FINDSTR /? from the command line to get a brief synopsis of what is supported. For an in depth explanation, refer to What are the undocumented features and limitations of the Windows FINDSTR command?

I like Harry Johnston's comment - It would be quite easy to create a solution using VBScript or JavaScript. I think that would be a much better choice.

But, here is a native batch solution. I've incorporated the extra rule about the number of subpatterns that the OP stated in the comment to Mat M's answer.

The solution is surprisingly tricky. Special characters can cause problems when piping the ECHO output to FINDSTR because of the way pipes work. Each side of the pipe is executed in it's own CMD session. The special characters must either be quoted, escaped twice, or only exposed via delayed expansion. I chose to use delayed expansion, but the ! characters must be escaped twice to make sure the delayed expansion occurs at the correct time.

The easiest way to parse a variable number of subpatterns is to replace the delimiter with a newline and use FOR /F to iterate each subpattern.

The top half of my code is a brittle coding harness to conveniently iterate and test a set of strings. It will not work properly with any of <space> ; , = <tab> * or ? in the string. Also, the quotes must be balanced in each string.

But the more important validate routine can handle any string in the var variable.

@echo off
setlocal
set LF=^


::Above 2 blank lines are critical for creating a linefeed variable. Do not remove

set test=a

for %%S in (
  "(3:"a"|"c"|"c")"
  "(11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
  "(4:"a"|"b"|"c")"
  "(10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
  "(3:"a"|"b"|"c""
  "(3:"a"|"b^|c")"
  "(3:"a"|"b"|c)"
  "(3:"a"|"b"||"c")"
  "(3:"a"|"b"|;|"c")"
) do (
  set "var=%%~S"
  call :validate
)
exit /b

:validate
setlocal enableDelayedExpansion
cmd /v:on /c echo ^^^!var^^^!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid  FINDSTR fail& exit /b)
if "!var:||=!" neq "!var!" (call :invalid double pipe fail& exit /b)
for /f "delims=(:" %%N in ("!var!") do set "expectedCount=%%N"
set "str=!var:*:=!"
set "str=!str:~0,-1!"
set foundCount=0
for %%A in ("!LF!") do for /f eol^=^%LF%%LF%^ delims^=  %%B in ("!str:|=%%~A!") do (
  if %%B neq "%%~B" (call :invalid sub-pattern fail& exit /b)
  set /a foundCount+=1
)
if %foundCount% neq %expectedCount% (call :invalid count fail& exit /b)
echo Valid: !var!
exit /b
:invalid
echo Invalid - %*: !var!
exit /b

Here are the results after running the batch script

Valid: (3:"a"|"c"|"c")
Valid: (11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - count fail: (4:"a"|"b"|"c")
Invalid - count fail: (10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - FINDSTR fail: (3:"a"|"b"|"c"
Invalid - sub-pattern fail: (3:"a"|"b|c")
Invalid - sub-pattern fail: (3:"a"|"b"|c)
Invalid - double pipe fail: (3:"a"|"b"||"c")
Invalid - sub-pattern fail: (3:"a"|"b"|;|"c")


Update

The :validate routine can be simplified a bit by postponing the enablement of delayed expansion until after the CMD /V:ON pipe. This means I no longer have to worry about double escaping the ! on the left side of the pipe.

:validate
cmd /v:on /c echo !var!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid  FINDSTR fail& exit /b)
setlocal enableDelayedExpansion
... remainder unchanged
Riesman answered 24/9, 2012 at 18:16 Comment(3)
Nice, although we don't know if the 6th test case is really false.Enravish
@MatM - good point. If it should be valid then the solution will be significantly more complicated.Riesman
@dbenham: the solution is correct, thank you! I would like to tell you about a project I did in batch to have your opinion, could be of common interest!Vernitavernoleninsk
E
2

As far as I know, findstr is not able to group regexps, so (|".*")* is a no-no. If you know how many blocks you have and you duplicate your code like this

FINDSTR /R /C:"^([1-9][0-9]*:\"..*\"|\"..*\"|\"..*\")$"

This way, if you are sure the number of blocks is constant, having empty ones "" if required, then you can check for it.

The double quotes inside the expression are ignored unless you prefix them with \.
The ..* construct is meant to replace .+ : one or more characters.

Enravish answered 23/9, 2012 at 23:11 Comment(2)
The string can not contain a constant number of subpatterns, the variable number is communicated from the first number followed by : . Perhaps a solution would be to immediately validate the string with a FINDSTR and then analyze token the each subpatterns with a FOR /F, if correct in content and number, what do you think?Vernitavernoleninsk
@user1125183 - That should work, but it is tricky. See my answerRiesman

© 2022 - 2024 — McMap. All rights reserved.