How does the Windows Command Interpreter (CMD.EXE) parse scripts?
Asked Answered
D

8

189

I ran into ss64.com which provides good help regarding how to write batch scripts that the Windows Command Interpreter will run.

However, I have been unable to find a good explanation of the grammar of batch scripts, how things expand or do not expand, and how to escape things.

Here are sample questions that I have not been able to solve:

  • How is the quote system managed? I made a TinyPerl script
    ( foreach $i (@ARGV) { print '*' . $i ; } ), compiled it and called it this way :
    • my_script.exe "a ""b"" c" → output is *a "b*c
    • my_script.exe """a b c""" → output it *"a*b*c"
  • How does the internal echo command work? What is expanded inside that command?
  • Why do I have to use for [...] %%I in file scripts, but for [...] %I in interactive sessions?
  • What are the escape characters, and in what context? How to escape a percent sign? For example, how can I echo %PROCESSOR_ARCHITECTURE% literally? I found that echo.exe %""PROCESSOR_ARCHITECTURE% works, is there a better solution?
  • How do pairs of % match? Example:
    • set b=a , echo %a %b% c%%a a c%
    • set a =b, echo %a %b% c%bb% c%
  • How do I ensure a variable passes to a command as a single argument if ever this variable contains double quotes?
  • How are variables stored when using the set command? For example, if I do set a=a" b and then echo.%a% I obtain a" b. If I however use echo.exe from the UnxUtils, I get a b. How comes %a% expands in a different way?
Doublehung answered 4/11, 2010 at 7:39 Comment(1)
Rob van der Woude has an awesome Batch scripting and Windows Command prompt reference on his site.Ewens
E
279

We performed experiments to investigate the grammar of batch scripts. We also investigated differences between batch and command line mode.

Batch Line Parser:

Here is a brief overview of phases in the batch file line parser:

Phase 0) Read Line:

Phase 1) Percent Expansion:

Phase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes.

Phase 3) Echo the parsed command(s) Only if the command block did not begin with @, and ECHO was ON at the start of the preceding step.

Phase 4) FOR %X variable expansion: Only if a FOR command is active and the commands after DO are being processed.

Phase 5) Delayed Expansion: Only if delayed expansion is enabled

Phase 5.3) Pipe processing: Only if commands are on either side of a pipe

Phase 5.5) Execute Redirection:

Phase 6) CALL processing/Caret doubling: Only if the command token is CALL

Phase 7) Execute: The command is executed


Here are details for each phase:

Note that the phases described below are only a model of how the batch parser works. The actual cmd.exe internals may not reflect these phases. But this model is effective at predicting behavior of batch scripts.

Phase 0) Read Line: Read line of input through first <LF>.

  • When reading a line to be parsed as a command, <Ctrl-Z> (0x1A) is read as <LF> (LineFeed 0x0A)
  • When GOTO or CALL reads lines while scanning for a :label, <Ctrl-Z>, is treated as itself - it is not converted to <LF>

Phase 1) Percent Expansion:

  • A double %% is replaced by a single %
  • Expansion of arguments (%*, %1, %2, etc.)
  • Expansion of %var%, if var does not exist replace it with nothing
  • Line is truncated at first <LF> not within %var% expansion
  • For a complete explanation read the first half of this from dbenham Same thread: Percent Phase

Phase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes. What follows is an approximation of this process.

There are concepts that are important throughout this phase.

  • A token is simply a string of characters that is treated as a unit.
  • Tokens are separated by token delimiters. The standard token delimiters are <space> <tab> ; , = <0x0B> <0x0C> and <0xFF>
    Consecutive token delimiters are treated as one - there are no empty tokens between token delimiters
  • There are no token delimiters within a quoted string. The entire quoted string is always treated as part of a single token. A single token may consist of a combination of quoted strings and unquoted characters.

The following characters may have special meaning in this phase, depending on context: <CR> ^ ( @ & | < > <LF> <space> <tab> ; , = <0x0B> <0x0C> <0xFF>

Look at each character from left to right:

  • If <CR> then remove it, as if it were never there (except for weird redirection behavior)
  • If a caret (^), the next character is escaped, and the escaping caret is removed. Escaped characters lose all special meaning (except for <LF>).
  • If a quote ("), toggle the quote flag. If the quote flag is active, then only " and <LF> are special. All other characters lose their special meaning until the next quote toggles the quote flag off. It is not possible to escape the closing quote. All quoted characters are always within the same token.
  • <LF> always turns off the quote flag. Other behaviors vary depending on context, but quotes never alter the behavior of <LF>.
    • Escaped <LF>
      • <LF> is stripped
      • The next character is escaped. If at the end of line buffer, then the next line is read and processed by phases 1 and 1.5 and appended to the current one before escaping the next character. If the next character is <LF>, then it is treated as a literal, meaning this process is not recursive.
    • Unescaped <LF> not within parentheses
      • <LF> is stripped and parsing of the current line is terminated.
      • Any remaining characters in the line buffer are simply ignored.
    • Unescaped <LF> within a FOR IN parenthesized block
      • <LF> is converted into a <space>
      • If at the end of the line buffer, then the next line is read and appended to the current one.
    • Unescaped <LF> within a parenthesized command block
      • <LF> is converted into <LF><space>, and the <space> is treated as part of the next line of the command block.
      • If at the end of line buffer, then the next line is read and appended to the space.
  • If one of the special characters & | < or >, split the line at this point in order to handle pipes, command concatenation, and redirection.
    • In the case of a pipe (|), each side is a separate command (or command block) that gets special handling in phase 5.3
    • In the case of &, &&, or || command concatenation, each side of the concatenation is treated as a separate command.
    • In the case of <, <<, >, or >> redirection, the redirection clause is parsed, temporarily removed, and then appended to the end of the current command. A redirection clause consists of an optional file handle digit, the redirection operator, and the redirection destination token.
      • If the token that precedes the redirection operator is a single unescaped digit, then the digit specifies the file handle to be redirected. If the handle token is not found, then output redirection defaults to 1 (stdout), and input redirection defaults to 0 (stdin).
  • If the very first token for this command (prior to moving redirection to the end) begins with @, then the @ has special meaning. (@ is not special in any other context)
    • The special @ is removed.
    • If ECHO is ON, then this command, along with any following concatenated commands on this line, are excluded from the phase 3 echo. If the @ is before an opening (, then the entire parenthesized block is excluded from the phase 3 echo.
  • Process parenthesis (provides for compound statements across multiple lines):
    • If the parser is not looking for a command token, then ( is not special.
    • If the parser is looking for a command token and finds (, then start a new compound statement and increment the parenthesis counter
    • If the parenthesis counter is > 0 then ) terminates the compound statement and decrements the parenthesis counter.
    • If the line end is reached and the parenthesis counter is > 0 then the next line will be appended to the compound statement (starts again with phase 0)
    • If the parenthesis counter is 0 and the parser is looking for a command, then ) functions similar to a REM statement as long as it is immediately followed by a token delimiter, special character, newline, or end-of-file
      • All special characters lose their meaning except ^ (line concatenation is possible)
      • Once the end of the logical line is reached, the entire "command" is discarded.
  • Each command is parsed into a series of tokens. The first token is always treated as a command token (after special @ have been stripped and redirection moved to the end).
    • Leading token delimiters prior to the command token are stripped
    • When parsing the command token, ( functions as a command token delimiter, in addition to the standard token delimiters
    • The handling of subsequent tokens depends on the command.
  • Most commands simply concatenate all arguments after the command token into a single argument token. All argument token delimiters are preserved. Argument options are typically not parsed until phase 7.
  • Three commands get special handling - IF, FOR, and REM
    • IF is split into two or three distinct parts that are processed independently. A syntax error in the IF construction will result in a fatal syntax error.
      • The comparison operation is the actual command that flows all the way through to phase 7
        • All IF options are fully parsed in phase 2.
        • Consecutive token delimiters collapse into a single space.
        • Depending on the comparison operator, there will be one or two value tokens that are identified.
      • The True command block is the set of commands after the condition, and is parsed like any other command block. If ELSE is to be used, then the True block must be parenthesized.
      • The optional False command block is the set of commands after ELSE. Again, this command block is parsed normally.
      • The True and False command blocks do not automatically flow into the subsequent phases. Their subsequent processing is controled by phase 7.
    • FOR is split in two after the DO. A syntax error in the FOR construction will result in a fatal syntax error.
      • The portion through DO is the actual FOR iteration command that flows all the way through phase 7
        • All FOR options are fully parsed in phase 2.
        • The IN parenthesized clause treats <LF> as <space>. After the IN clause is parsed, all tokens are concatenated together to form a single token.
        • Consecutive unescaped/unquoted token delimiters collapse into a single space throughout the FOR command through DO.
      • The portion after DO is a command block that is parsed normally. Subsequent processing of the DO command block is controled by the iteration in phase 7.
    • REM detected in phase 2 is treated dramatically different than all other commands.
      • Only one argument token is parsed - the parser ignores characters after the first argument token.
      • The REM command may appear in phase 3 output, but the command is never executed, and the original argument text is echoed - escaping carets are not removed, except...
        • If there is only one argument token that ends with an unescaped ^ that ends the line, then the argument token is thrown away, and the subsequent line is parsed and appended to the REM. This repeats until there is more than one token, or the last character is not ^.
  • If the command token begins with :, and this is the first round of phase 2 (not a restart due to CALL in phase 6) then
    • The token is normally treated as an Unexecuted Label.
      • The remainder of the line is parsed, however ), <, >, & and | no longer have special meaning. The entire remainder of the line is considered to be part of the label "command".
      • The ^ continues to be special, meaning that line continuation can be used to append the subsequent line to the label.
      • An Unexecuted Label within a parenthesized block will result in a fatal syntax error unless it is immediately followed by a command or Executed Label on the next line.
        • ( no longer has special meaning for the first command that follows the Unexecuted Label.
      • The command is aborted after label parsing is complete. Subsequent phases do not take place for the label
    • There are three exceptions that can cause a label found in phase 2 to be treated as an Executed Label that continues parsing through phase 7.
      • There is redirection that precedes the label token, and there is a | pipe or &, &&, or || command concatenation on the line.
      • There is redirection that precedes the label token, and the command is within a parenthesized block.
      • The label token is the very first command on a line within a parenthesized block, and the line above ended with an Unexecuted Label.
    • The following occurs when an Executed Label is discovered in phase 2
      • The label, its arguments, and its redirection are all excluded from any echo output in phase 3
      • Any subsequent concatenated commands on the line are fully parsed and executed.
    • For more information about Executed Labels vs. Unexecuted Labels, see https://www.dostips.com/forum/viewtopic.php?f=3&t=3803&p=55405#p55405

Phase 3) Echo the parsed command(s) Only if the command block did not begin with @, and ECHO was ON at the start of the preceding step.

Phase 4) FOR %X variable expansion: Only if a FOR command is active and the commands after DO are being processed.

  • At this point, phase 1 of batch processing will have already converted a FOR variable like %%X into %X. The command line has different percent expansion rules for phase 1. This is the reason that command lines use %X but batch files use %%X for FOR variables.
  • FOR variable names are case sensitive, but ~modifiers are not case sensitive.
  • ~modifiers take precedence over variable names. If a character following ~ is both a modifier and a valid FOR variable name, and there exists a subsequent character that is an active FOR variable name, then the character is interpreted as a modifier.
  • FOR variable names are global, but only within the context of a DO clause. If a routine is CALLed from within a FOR DO clause, then the FOR variables are not expanded within the CALLed routine. But if the routine has its own FOR command, then all currently defined FOR variables are accessible to the inner DO commands.
  • FOR variable names can be reused within nested FORs. The inner FOR value takes precedence, but once the INNER FOR closes, then the outer FOR value is restored.
  • If ECHO was ON at the start of this phase, then phase 3) is repeated to show the parsed DO commands after the FOR variables have been expanded.

---- From this point onward, each command identified in phase 2 is processed separately.
---- Phases 5 through 7 are completed for one command before moving on to the next.

Phase 5) Delayed Expansion: Only if delayed expansion is on, the command is not in a parenthesized block on either side of a pipe, and the command is not a "naked" batch script (script name without parentheses, CALL, command concatenation, or pipe).

  • Each token for a command is parsed for delayed expansion independently.
    • Most commands parse two or more tokens - the command token, the arguments token, and each redirection destination token.
    • The FOR command parses the IN clause token only.
    • The IF command parses the comparison values only - either one or two, depending on the comparison operator.
  • For each parsed token, first check if it contains any !. If not, then the token is not parsed - important for ^ characters. If the token does contain !, then scan each character from left to right:
    • If it is a caret (^) the next character has no special meaning, the caret itself is removed
    • If it is an exclamation mark, search for the next exclamation mark (carets are not observed anymore), expand to the value of the variable.
      • Consecutive opening ! are collapsed into a single !
      • Any remaining unpaired ! is removed
    • Expanding vars at this stage is "safe", because special characters are not detected anymore (even <CR> or <LF>)
    • For a more complete explanation, read the 2nd half of this from dbenham same thread - Exclamation Point Phase

Phase 5.3) Pipe processing: Only if commands are on either side of a pipe
Each side of the pipe is processed independently and asynchronously.

  • If command is internal to cmd.exe, or it is a batch file, or if it is a parenthesized command block, then it is executed in a new cmd.exe thread via %comspec% /S /D /c" commandBlock", so the command block gets a phase restart, but this time in command line mode.
    • If a parenthesized command block, then all <LF> with a command before and after are converted to <space>&. Other <LF> are stripped.
  • This is the end of processing for the pipe commands.
  • See Why does delayed expansion fail when inside a piped block of code? for more about pipe parsing and processing

Phase 5.5) Execute Redirection: Any redirection that was discovered in phase 2 is now executed.

Phase 6) CALL processing/Caret doubling: Only if the command token is CALL, or if the text before the first occurring standard token delimiter is CALL. If CALL is parsed from a larger command token, then the unused portion is prepended to the arguments token before proceeding.

  • Scan the arguments token for an unquoted /?. If found anywhere within the tokens, then abort phase 6 and proceed to Phase 7, where the HELP for CALL will be printed.
  • Remove the first CALL, so multiple CALL's can be stacked
  • Double all carets
  • Restart phases 1, 1.5, and 2, but do not continue to phase 3
    • Any doubled carets are reduced back to one caret as long as they are not quoted. But unfortunately, quoted carets remain doubled.
    • Phase 1 changes a bit - Expansion errors in step 1.2 or 1.3 abort the CALL, but the error is not fatal - batch processing continues.
    • Phase 2 tasks are altered a bit
      • Any newly appearing unquoted, unescaped redirection that was not detected in the first round of phase 2 is detected, but it is removed (including the file name) without actually performing the redirection
      • Any newly appearing unquoted, unescaped caret at the end of the line is removed without performing line continuation
      • The CALL is aborted without error if any of the following are detected
        • Newly appearing unquoted, unescaped & or |
        • The resultant command token begins with unquoted, unescaped (
        • The very first token after the removed CALL began with @
      • If the resultant command is a seemingly valid IF or FOR, then execution will subsequently fail with an error stating that IF or FOR is not recognized as an internal or external command.
      • Of course the CALL is not aborted in this 2nd round of phase 2 if the resultant command token is a label beginning with :.
  • If the resultant command token is CALL, then restart Phase 6 (repeats until no more CALL)
  • If the resultant command token is a batch script or a :label, then execution of the CALL is fully handled by the remainder of Phase 6.
    • Push the current batch script file position on the call stack so that execution can resume from the correct position when the CALL is completed.
    • Setup the %0, %1, %2, ...%N and %* argument tokens for the CALL, using all resultant tokens
    • If the command token is a label that begins with :, then
      • Restart Phase 5. This can impact what :label is CALLed. But since the %0 etc. tokens have already been setup, it will not alter the arguments that are passed to the CALLed routine.
      • Execute GOTO label to position the file pointer at the beginning of the subroutine (ignore any other tokens that may follow the :label) See Phase 7 for rules on how GOTO works.
        • If the :label token is missing, or the :label is not found, then the call stack is immediately popped to restore the saved file position, and the CALL is aborted.
        • If the :label happens to contain /?, then GOTO help is printed instead of searching for the :label. The file pointer does not move, such that code after the CALL is executed twice, once in the CALL context, and then again after the CALL return. See Why CALL prints the GOTO help message in this script?And why command after that are executed twice? for more info.
    • Else transfer control to the specified batch script.
    • Execution of the CALLed :label or script continues until either EXIT /B or end-of-file is reached, at which point the CALL stack is popped and execution resumes from the saved file position.
      Phase 7 is not executed for CALLed scripts or :labels.
  • Else the result of phase 6 falls through into phase 7 for execution.

Phase 7) Execute: The command is executed

  • 7.1 - Execute internal command - If the command token is quoted, then skip this step. Otherwise, attempt to parse out an internal command and execute.
    • The following tests are made to determine if an unquoted command token represents an internal command:
      • If the command token exactly matches an internal command, then execute it.
      • Else break the command token before the first occurrence of + / [ ] <space> <tab> , ; or =
        If the preceding text is an internal command, then remember that command
        • If in command line mode, or if the command is from a parenthesized block, IF true or false command block, FOR DO command block, or involved with command concatenation, then execute the internal command
        • Else (must be a stand-alone command in batch mode) scan the current folder and the PATH for a .COM, .EXE, .BAT, or .CMD file whose base name matches the original command token
          • If the first matching file is a .BAT or .CMD, then goto 7.3.exec and execute that script
          • Else (match not found or first match is .EXE or .COM) execute the remembered internal command
      • Else break the command token before the first occurrence of . \ or :
        If the preceding text is not an internal command, then goto 7.2
        Else the preceding text may be an internal command. Remember this command.
      • Break the command token before the first occurrence of + / [ ] <space> <tab> , ; or =
        If the preceding text is a path to an existing file, then goto 7.2
        Else execute the remembered internal command.
    • If an internal command is parsed from a larger command token, then the unused portion of the command token is included in the argument list
    • Just because a command token is parsed as an internal command does not mean that it will execute successfully. Each internal command has its own rules as to how the arguments and options are parsed, and what syntax is allowed.
    • All internal commands will print help instead of performing their function if /? is detected. Most recognize /? if it appears anywhere in the arguments. But a few commands like ECHO and SET only print help if the first argument token begins with /?.
    • SET has some interesting semantics:
      • If a SET command has a quote before the variable name and extensions are enabled
        set "name=content" ignored --> value=content
        then the text between the first equal sign and the last quote is used as the content (first equal and last quote excluded). Text after the last quote is ignored. If there is no quote after the equal sign, then the rest of the line is used as content.
      • If a SET command does not have a quote before the name
        set name="content" not ignored --> value="content" not ignored
        then the entire remainder of the line after the equal is used as content, including any and all quotes that may be present.
    • An IF comparison is evaluated, and depending on whether the condition is true or false, the appropriate already parsed dependent command block is processed, starting with phase 5.
    • The IN clause of a FOR command is iterated appropriately.
      • If this is a FOR /F that iterates the output of a command block, then:
        • The IN clause is executed in a new cmd.exe process via CMD /C.
        • The command block must go through the entire parsing process a second time, but this time in a command line context
        • ECHO will start out ON, and delayed expansion will usually start out disabled (dependent on the registry setting)
        • All environment changes made by the IN clause command block will be lost once the child cmd.exe process terminates
      • For each iteration:
        • The FOR variable values are defined
        • The already parsed DO command block is then processed, starting with phase 4.
    • GOTO uses the following logic to locate the :label
      • Parse the label from the first argument token
      • Scan for the next occurrence of the label
        • Start from the current file position
        • If end of file is reached, then loop back to the beginning of file and continue to the original starting point.
      • The scan stops at the first occurrence of the label that it finds, and the file pointer is set to the line immediately following the label. Execution of the script resumes from that point. Note that a successful true GOTO will immediately abort any parsed block of code, including FOR loops.
      • If the label is not found, or the label token is missing, then the GOTO fails, an error message is printed, and the call stack is popped. This effectively functions as an EXIT /B, except any already parsed commands in the current command block that follow the GOTO are still executed, but in the context of the CALLer (the context that exists after EXIT /B)
      • See https://www.dostips.com/forum/viewtopic.php?t=3803 for a more precise description of label parsing rules, and https://www.dostips.com/forum/viewtopic.php?t=8988 for label scanning rules.
    • RENAME and COPY both accept wildcards for the source and target paths. But Microsoft does a terrible job documenting how the wildcards work, especially for the target path. A useful set of wildcard rules may be found at How does the Windows RENAME command interpret wildcards?
  • 7.2 - Execute volume change - Else if the command token does not begin with a quote, is exactly two characters long, and the 2nd character is a colon, then change the volume
    • All argument tokens are ignored
    • If the volume specified by the first character cannot be found, then abort with an error
    • A command token of :: will always result in an error unless SUBST is used to define a volume for ::
      If SUBST is used to define a volume for ::, then the volume will be changed, it will not be treated as a label.
  • 7.3 - Execute external command - Else try to treat the command as an external command.
    • If in command line mode and the command is not quoted and does not begin with a volume specification, white-space, ,, ;, = or + then break the command token at the first occurrence of <space> , ; or = and prepend the remainder to the argument token(s).
    • If the 2nd character of the command token is a colon, then verify the volume specified by the 1st character can be found.
      If the volume cannot be found, then abort with an error.
    • If in batch mode and the command token begins with :, then goto 7.4
      Note that if the label token begins with ::, then this will not be reached because the preceding step will have aborted with an error unless SUBST is used to define a volume for ::.
    • Identify the external command to execute.
      • This is a complex process that may involve the current volume, current directory, PATH variable, PATHEXT variable, and or file associations.
      • If a valid external command cannot be identified, then abort with an error.
    • If in command line mode and the command token begins with :, then goto 7.4
      Note that this is rarely reached because the preceding step will have aborted with an error unless the command token begins with ::, and SUBST is used to define a volume for ::, and the entire command token is a valid path to an external command.
    • 7.3.exec - Execute the external command.
  • 7.4 - Ignore a label - Ignore the command and all its arguments if the command token begins with :.
    Rules in 7.2 and 7.3 may prevent a label from reaching this point.

Command Line Parser:

Works like the BatchLine-Parser, except:

Phase 1) Percent Expansion:

  • No %*, %1 etc. argument expansion
  • If var is undefined, then %var% is left unchanged.
  • No special handling of %%. If var=content, then %%var%% expands to %content%.

Phase 3) Echo the parsed command(s)

  • This is not performed after phase 2. It is only performed after phase 4 for the FOR DO command block.

Phase 5) Delayed Expansion: only if DelayedExpansion is enabled

  • If var is undefined, then !var! is left unchanged.

Phase 7) Execute Command

  • Attempts to CALL or GOTO a :label result in an error.
  • As already documented in phase 7, an executed label may result in an error under different scenarios.
    • Batch executed labels can only cause an error if they begin with ::
    • Command line executed labels almost always result in an error

Parsing of integer values

There are many different contexts where cmd.exe parses integer values from strings, and the rules are inconsistent:

  • SET /A
  • IF
  • %var:~n,m% (variable substring expansion)
  • FOR /F "TOKENS=n"
  • FOR /F "SKIP=n"
  • FOR /L %%A in (n1 n2 n3)
  • EXIT [/B] n

Details for these rules may be found at Rules for how CMD.EXE parses numbers


For anyone wishing to improve the cmd.exe parsing rules, there is a discussion topic on the DosTips forum where issues can be reported and suggestions made.

Jan Erik (jeb) - Original author and discoverer of phases
Dave Benham (dbenham) - Much additional content and editing

Elainaelaine answered 4/11, 2010 at 7:39 Comment(30)
Hello jeb, thank you for your insight… It might be hard to understand, but I will try to think it through! You seem to have performed much tests! Thank you for translating (administrator.de/…)Doublehung
Batch phase 5) - %%a will have already been changed to %a in Phase 1, so for-loop expansion really expands %a. Also, I added a more detailed explanation of Batch phase 1 in an answer below (I don't have edit privilege)Elainaelaine
Jeb - perhaps phase 0 could be moved and combined with phase 6? That makes more sense to me, or is there a reason why they are separated like that?Elainaelaine
@Elainaelaine Which parsing applies when cmd.exe is used to execute a command by the _popen or system functions? I think that these work using cmd /c command line. What logic parses the part after the /c?Dejecta
@jeb I'm in the middle of it. Also, I cannot seem to be able to find a reference in any of the answers here to the quoting rules applied to the command. That is to say, cmd.exe does process double quotes, so that commands which contain spaces can be issued. It doesn't process double quotes in the rest of the command line.Dejecta
@jeb, not sure if the following phase 2 statement is true: »If the parenthesis counter is = 0, and the parser is looking for a commmand, then ) and all remaining characters on line are ignored«; I think the extra ) is simply no longer treated as a special character in this case; what do you think?Edea
@Edea - I updated that section. The ) really does function almost like a REM command when the parenthesis counter is 0. Try both of these from the command line: ) Ignore this, and echo OK & ) Ignore thisElainaelaine
Thank you, @dbenham! So there is only one thing left that is not clear to me: phase 5: "If it is a caret (^) the next character has no special meaning, the caret itself is removed"; which characters are considered special at that point? I believe there are only ^ and !, because all the others have already been recognised in phase 2 (" & | < > ( ) <LF>), am I right? I mean, this was the only way I could explain why I had to do echo ^^! to get a literal ! echoed, and why ^^^ in front of any other char. simply became ^ without changing the meaning of the char....Edea
@Edea - That is exactly what jeb is referring to.Elainaelaine
I'm not sure about the point 2.2 "If it is a caret (^) the next character has no special meaning," in case there's a quote (") after it. It's possible that it is so, but if it is the quotes are still interpreted specially by many commands, for exampleecho ^"x^" gives the same result as echo "x", and if ^"%1^" is very different from if "%1" (the first one works even when the argument is quoted and contains spaces, the second one not - e.g. passing a first argument " " a command if "^%1^"==""works and the result is true, while with if "%1"=="" an error is risen).Lapidify
@gbr, also quotes lose their special meaning in case they are escaped by preceding ^, so the quote flag is not toggled in that case; for example, set a VAR to ",", (the , is a token separator just like space) and do comparisons like if "%VAR%"=="" and if ^"%VAR%^"==""; you will see that both variants fail... echo does not care about quotes on its own, so it works even with my example, echo %VAR%...Edea
...to continue the explanation, @gbr, if needs to recognise and to distinguish between the left comparison expression, the comparison operator, the right comparison expression and the command to execute conditionally, for which it uses the token separators space, tab, ,, ;, = and non-break space (char. code 0xFF); in contrast, echo does not need to detect any parts individually, so it interpretes everything after as a single string...Edea
@jeb, when I get the description of phase 5 right, the caret handling ("If it is a caret (^) the next character has no special meaning, the caret itself is removed") is done only in case there is at least one ! in the entire string or command line, so if there is no ! at all (when entering phase 5), no such second caret escaping occurs, it that true?Edea
@Edea yes that's correct, therfore you see sometimes 'set "var=%expr%" ! ' the last exclamation mark will be removed but forces phase 5Pianism
Ah, that explains why I sometimes have to double the ^ in for /F %%L in ('findstr /N /R "^" "<<file>>"') do and sometimes not: as soon as <<file>> is a delayed expanded variable, a single ^ disappears... thank you very much!Edea
EDIT - Phase 2) Added detection of label, plus link to description of some label detection complications. Phase 5) Added link to a more detailed explanation, as well as a link to known cases where rules fail. Phase 6) Refined the rules, and added a link to known cases where rules fail.Elainaelaine
EDIT - Added Phase 0 which reads a line and converts <Ctrl-Z> into <LF>Elainaelaine
@Elainaelaine Thanks for clarfiying the points, but I think the complete text becomes a bit too long. Perhaps it could splitted into a short overview and a detailed explanation?Pianism
Is this all guess work based on black-box testing or did you actually get a debugger out and verify cmd.exe's actual internal behavior?Bridge
@Bridge I done it completly without debugger. As said, I made, many, many tests. It took months to realize how the phases work. And still the rules aren't completePianism
some of cmd's parsing rule is described hereQianaqibla
Phase 7.1: I think it would be worth mentioning that for the special set semantics set "name=content" ignored the command extensions need to be enabled, otherwise a syntax error arises: what do you think, @Elainaelaine or @jeb?Edea
You're welcome, @dbenham! I noticed the size limit as I tried to update it on my own ;-); anyway, on the other hand, it might good I couldn't do it, as I'm not sure how far all the rules regard the case of disabled command extensions (particularly because the term is never mentioned in the whole answer anywhere, not even for the delayed expansion phase which only exists with enabled extensions)...Edea
@Edea In my opinion, the SET-syntax it's not related to the batch parser phases, more to a special parser of the SET-command. But better discuss this on dostipsPianism
Edits: Phase 6 change - expansion errors in phase 1 after CALL are not fatal. Phase 1 changed to account for issues at dostips.com/forum/…. Phase 7.3 changed to account for issues at dostips.com/forum/viewtopic.php?f=3&t=9124Elainaelaine
random trivia, one of the strangest things to me is file batch file is read a line at a time. If you edit the file while the batch file is running the running batch file will pick up your changes as it's running. This means editing long running batch files (like a build script) can be an issue. You start a 10 minute build, then add some new features or echos to the script in your editor and if you hit save you'll likely break the running build.Ecology
The 'command string' in a FOR /F loop is not a 'quoted string' -- delimiters are elided from inside the single-quoted text: FOR /F %R IN ('DIR=A.*') DO echo %R becomes FOR /F %R IN ('DIR A.*') DO echo %RIndecorum
@Indecorum I don't get your point, how is it related to the parsing phases? For discussions, better use dostips.comPianism
Yes, it's not clear from the answer at what stage of parsing tokenization of a FOR /F bracketed single-quoted command line occurs. Probably phase 2: perhaps the ( ) makes it a compound statement, clearly single-quotes aren't a 'quoted string', but I don't see the place where it says "and all token delimiters are replaced with space".Indecorum
@Indecorum - No, not the compound command statement section, but rather the section in phase 2 that discusses special processing for IF, FOR, and REM. Within the FOR section there is a statement that reads: "Consecutive unescaped/unquoted token delimiters collapse into a single space throughout the FOR command through DO." That includes the parenthesized single quoted command. Also remember that the parenthesized command will go through phase 2 multiple times because it eventually is executed in a new cmd.exe process.Elainaelaine
B
65

When invoking a command from a command window, tokenization of the command line arguments is not done by cmd.exe (a.k.a. "the shell"). Most often the tokenization is done by the newly formed processes' C/C++ runtime, but this is not necessarily so -- for example, if the new process was not written in C/C++, or if the new process chooses to ignore argv and process the raw commandline for itself (e.g. with GetCommandLine()). At the OS level, Windows passes command lines untokenized as a single string to new processes. This is in contrast to most *nix shells, where the shell tokenizes arguments in a consistent, predictable way before passing them to the newly formed process. All this means that you may experience wildly divergent argument tokenization behavior across different programs on Windows, as individual programs often take argument tokenization into their own hands.

If it sounds like anarchy, it kind of is. However, since a large number of Windows programs do utilize the Microsoft C/C++ runtime's argv, it may be generally useful to understand how the MSVCRT tokenizes arguments. Here is an excerpt:

  • Arguments are delimited by whitespace characters, which are either spaces or tabs.
  • A string surrounded by double quote marks is interpreted as a single argument, whether it contains whitespace characters or not. A quoted string can be embedded in an argument. The caret ^ isn't recognized as an escape character or delimiter. Within a quoted string, a pair of double quote marks is interpreted as a single escaped double quote mark. If the command line ends before a closing double quote mark is found, then all the characters read so far are output as the last argument.
  • A double quote mark preceded by a backslash \" is interpreted as a literal double quote mark ".
  • Backslashes are interpreted literally, unless they immediately precede a double quote mark.
  • If an even number of backslashes is followed by a double quote mark, then one backslash \ is placed in the argv array for every pair of backslashes \\, and the double quote mark " is interpreted as a string delimiter.
  • If an odd number of backslashes is followed by a double quote mark, then one backslash \ is placed in the argv array for every pair of backslashes \\. The double quote mark is interpreted as an escape sequence by the remaining backslash, causing a literal double quote mark " to be placed in argv.

NB from dbenham: "This is great information, but the Microsoft documentation is incomplete! The missing rules are documented at daviddeley.com"


The Microsoft "batch language" (.bat) is no exception to this anarchic environment, and it has developed its own unique rules for tokenization and escaping. It also looks like cmd.exe's command prompt does do some preprocessing of the command line argument (mostly for variable substitution and escaping) before passing the argument off to the newly executing process. You can read more about the low-level details of the batch language and cmd escaping in the excellent answers by jeb and dbenham on this page.


Let's build a simple command line utility in C and see what it says about your test cases:

int main(int argc, char* argv[]) {
    int i;
    for (i = 0; i < argc; i++) {
        printf("argv[%d][%s]\n", i, argv[i]);
    }
    return 0;
}

(Notes: argv[0] is always the name of the executable, and is omitted below for brevity. Tested on Windows XP SP3. Compiled with Visual Studio 2005.)

> test.exe "a ""b"" c"
argv[1][a "b" c]

> test.exe """a b c"""
argv[1]["a b c"]

> test.exe "a"" b c
argv[1][a" b c]

And a few of my own tests:

> test.exe a "b" c
argv[1][a]
argv[2][b]
argv[3][c]

> test.exe a "b c" "d e
argv[1][a]
argv[2][b c]
argv[3][d e]

> test.exe a \"b\" c
argv[1][a]
argv[2]["b"]
argv[3][c]
Blackjack answered 4/11, 2010 at 7:39 Comment(7)
Thank you for your answer. It puzzles me even more to see that TinyPerl will not output what your program outputs, and I have difficulties to understand how [a "b" c] could become [a "b] [c] doing post-processing.Doublehung
Now that I think about it, this tokenization of the command line is probably done entirely by the C runtime. An executable could be written such that it doesn't even use the C runtime, in which case I think it would have to deal with the command line verbatim, and be responsible for doing its own tokenization (if it wanted to.) Or even if your application does use the C runtime, you could choose to ignore argc and argv and just get the raw command line via e.g. Win32 GetCommandLine. Perhaps TinyPerl is ignoring argv and simply tokenizing the raw command line by its own rules.Blackjack
"Remember that from Win32's point of view, the command line is just a string that is copied into the address space of the new process. How the launching process and the new process interpret this string is governed not by rules but by convention." -Raymond Chen blogs.msdn.com/b/oldnewthing/archive/2009/11/25/9928372.aspxBlackjack
Thank you for that truly nice answer. That explains a lot in my opinion. And that also explains why I sometimes find that truly crappy to work with Windows…Doublehung
I found this regarding backslashes and quotes during transformation from commandline to argv's, for Win32 C++ programs. Backslashes count is only divided by two when the last backslash is followed by a dblquote, and the dblquote terminates a string when there is an even number of backslashes before.Doublehung
@Doublehung not sure why you think anything is counting backslashes, but also superuser.com/questions/182454/… and look here msdn.microsoft.com/en-us/library/a1y7w461.aspx it has a description about the backslashes and double quotes. . And there can be sonething to know there about even vs odd number of backslashes one after another like `\\\\\`Katakana
This is great information, but the Microsoft documentation is incomplete! (big surprise) The actual missing rules are documented at daviddeley.com/autohotkey/parameters/parameters.htm#WINCRULES.Elainaelaine
E
59

Percent Expansion Rules

Here is an expanded explanation of Phase 1 in jeb's answer (valid for both batch mode and command line mode).

Phase 1) Percent Expansion Starting from left, scan each character for % or <LF>. If found then

  • 1.05 (truncate line at <LF>)
  • If the character is <LF> then
    • Drop (ignore) the remainder of the line from the <LF> onward
    • Goto Phase 2.0
  • Else the character must be %, so proceed to 1.1
  • 1.1 (escape %) skipped if command line mode
  • If batch mode and followed by another % then
    Replace %% with single % and continue scan
  • 1.2 (expand argument) skipped if command line mode
  • Else if batch mode then
    • If followed by * and command extensions are enabled then
      Replace %* with the text of all command line arguments (Replace with nothing if there are no arguments) and continue scan.
    • Else if followed by <digit> then
      Replace %<digit> with argument value (replace with nothing if undefined) and continue scan.
    • Else if followed by ~ and command extensions are enabled then
      • If followed by optional valid list of argument modifiers followed by required <digit> then
        Replace %~[modifiers]<digit> with modified argument value (replace with nothing if not defined or if specified $PATH: modifier is not defined) and continue scan.
        Note: modifiers are case insensitive and can appear multiple times in any order, except $PATH: modifier can only appear once and must be the last modifier before the <digit>
      • Else invalid modified argument syntax raises fatal error: All parsed commands are aborted, and batch processing aborts if in batch mode!
  • 1.3 (expand variable)
  • Else if command extensions are disabled then
    Look at next string of characters, breaking before % or end of buffer, and call them VAR (may be an empty list)
    • If next character is % then
      • If VAR is defined then
        Replace %VAR% with value of VAR and continue scan
      • Else if batch mode then
        Remove %VAR% and continue scan
      • Else goto 1.4
    • Else goto 1.4
  • Else if command extensions are enabled then
    Look at next string of characters, breaking before % : or end of buffer, and call them VAR (may be an empty list). If VAR breaks before : and the subsequent character is % then include : as the last character in VAR and break before %.
    • If next character is % then
      • If VAR is defined then
        Replace %VAR% with value of VAR and continue scan
      • Else if batch mode then
        Remove %VAR% and continue scan
      • Else goto 1.4
    • Else if next character is : then
      • If VAR is undefined then
        • If batch mode then
          Remove %VAR: and continue scan.
        • Else goto 1.4
      • Else if next character is ~ then
        • If next string of characters matches pattern of [integer][,[integer]]% then
          Replace %VAR:~[integer][,[integer]]% with substring of value of VAR (possibly resulting in empty string) and continue scan.
        • Else goto 1.4
      • Else if followed by = or *= then
        Invalid variable search and replace syntax raises fatal error: All parsed commands are aborted, and batch processing aborts if in batch mode!
      • Else if next string of characters matches pattern of [*]search=[replace]%, where search may include any set of characters except =, and replace may include any set of characters except %, then
        Replace %VAR:[*]search=[replace]% with value of VAR after performing search and replace (possibly resulting in empty string) and continue scan
      • Else goto 1.4
  • 1.4 (strip %)
    • Else If batch mode then
      Remove % and continue scan starting with the next character after the %
    • Else preserve the leading % and continue scan starting with the next character after the preserved leading %

The above helps explain why this batch

@echo off
setlocal enableDelayedExpansion
set "1var=varA"
set "~f1var=varB"
call :test "arg1"
exit /b  
::
:test "arg1"
echo %%1var%% = %1var%
echo ^^^!1var^^^! = !1var!
echo --------
echo %%~f1var%% = %~f1var%
echo ^^^!~f1var^^^! = !~f1var!
exit /b

Gives these results:

%1var% = "arg1"var
!1var! = varA
--------
%~f1var% = P:\arg1var
!~f1var! = varB

Note 1 - Phase 1 occurs prior to the recognition of REM statements. This is very important because it means even a remark can generate a fatal error if it has invalid argument expansion syntax or invalid variable search and replace syntax!

@echo off
rem %~x This generates a fatal argument expansion error
echo this line is never reached

Note 2 - Another interesting consequence of the % parsing rules: Variables containing : in the name can be defined, but they cannot be expanded unless command extensions are disabled. There is one exception - a variable name containing a single colon at the end can be expanded while command extensions are enabled. However, you cannot perform substring or search and replace operations on variable names ending with a colon. The batch file below (courtesy of jeb) demonstrates this behavior

@echo off
setlocal
set var=content
set var:=Special
set var::=double colon
set var:~0,2=tricky
set var::~0,2=unfortunate
echo %var%
echo %var:%
echo %var::%
echo %var:~0,2%
echo %var::~0,2%
echo Now with DisableExtensions
setlocal DisableExtensions
echo %var%
echo %var:%
echo %var::%
echo %var:~0,2%
echo %var::~0,2%

Note 3 - An interesting outcome of the order of the parsing rules that jeb lays out in his post: When performing find and replace with delayed expansion, special characters in both the find and replace terms must be escaped or quoted. But the situation is different for percent expansion - the find term must not be escaped (though it can be quoted). The percent replace string may or may not require escape or quote, depending on your intent.

@echo off
setlocal enableDelayedExpansion
set "var=this & that"
echo %var:&=and%
echo "%var:&=and%"
echo !var:^&=and!
echo "!var:&=and!"

Delayed Expansion Rules

Here is an expanded, and more accurate explanation of Phase 5 in jeb's answer (valid for both batch mode and command line mode)

Phase 5) Delayed Expansion

This phase is skipped if any of the following conditions apply:

  • Delayed expansion is disabled.
  • The command is within a parenthesized block on either side of a pipe.
  • The incoming command token is a "naked" batch script, meaning it is not associated with CALL, parenthesized block, any form of command concatenation (&, && or ||), or a pipe |.

The delayed expansion process is applied to tokens independently. A command may have multiple tokens:

  • The command token. For most commands the command name itself is a token. But a few commands have specialized regions that are considered a TOKEN for Phase 5.
    • for ... in(TOKEN) do
    • if defined TOKEN
    • if exists TOKEN
    • if errorlevel TOKEN
    • if cmdextversion TOKEN
    • if TOKEN comparison TOKEN, where comparison is one of ==, equ, neq, lss, leq, gtr, or geq
  • The arguments token
  • The destination token of redirection (one per redirection)

No change is made to tokens that do not contain !.

For each token that does contain at least one !, scan each character from left to right for ^ or !, and if found, then

  • 5.1 (caret escape) Needed for ! or ^ literals
  • If character is a caret ^ then
    • Remove the ^
    • Scan the next character and preserve it as a literal
    • Continue the scan
  • 5.2 (expand variable)
  • If character is !, then
    • If command extensions are disabled then
      Look at next string of characters, breaking before ! or <LF>, and call them VAR (may be an empty list)
      • If next character is ! then
        • If VAR is defined, then
          Replace !VAR! with value of VAR and continue scan
        • Else if batch mode then
          Remove !VAR! and continue scan
        • Else goto 5.2.1
      • Else goto 5.2.1
    • Else if command extensions are enabled then
      Look at next string of characters, breaking before !, :, or <LF>, and call them VAR (may be an empty list). If VAR breaks before : and the subsequent character is ! then include : as the last character in VAR and break before !
      • If next character is ! then
        • If VAR exists, then
          Replace !VAR! with value of VAR and continue scan
        • Else if batch mode then
          Remove !VAR! and continue scan
        • Else goto 5.2.1
      • Else if next character is : then
        • If VAR is undefined then
          • If batch mode then
            Remove !VAR: and continue scan
          • Else goto 5.2.1
        • Else if next character is ~ then
          • If next string of characters matches pattern of [integer][,[integer]]! then Replace !VAR:~[integer][,[integer]]! with substring of value of VAR (possibly resulting in empty string) and continue scan.
          • Else goto 5.2.1
        • Else if next string of characters matches pattern of [*]search=[replace]!, where search may include any set of characters except =, and replace may include any set of characters except !, then
          Replace !VAR:[*]search=[replace]! with value of VAR after performing search and replace (possibly resulting in an empty string) and continue scan
        • Else goto 5.2.1
      • Else goto 5.2.1
    • 5.2.1
      • If batch mode then remove the leading !
        Else preserve the leading !
      • Continue the scan starting with the next character after the preserved leading !
Elainaelaine answered 4/11, 2010 at 7:39 Comment(30)
+1, Only the colon syntax and rules are missing here for %definedVar:a=b% vs %undefinedVar:a=b% and the %var:~0x17,-010% formsPianism
Good point - I expanded the variable expansion section to address your concerns. I also expanded the argument expansion section to fill in some missing details.Elainaelaine
After getting some additional private feedback from jeb, I added a rule for variable names ending with colon, and added note 2. I also added note 3 simply because I thought it was interesting and important.Elainaelaine
Added %* expansion to phase 1.2Elainaelaine
@129130 - Shell script highlighting is not appropriate for Windows batch files. I rolled back the edit to the prior version.Elainaelaine
perhaps a note that shift does not change value of %* would be a good addition - it does not seem obvious.Seguidilla
EDIT - Modified search/replace rules in step 1.3 based on information at dostips.com/forum/viewtopic.php?f=3&t=7234Elainaelaine
This is slightly out of scope and (I think) unintended behavior, but it might be worth mentioning that call set ... alters the parsing/processing of the set statement. (this is the case, at least, with the versions in Windows XP-Windows 10) Assuming the VARNAME variable is "VARA" and SUBSTR is "_NT", call set %VARNAME%=%%OS:%SUBSTR%=ZZ%% would expand to and execute: set VARA=%OS:_NT=ZZ%Kilometer
@CharlesGrunwald - That is not out of scope, but it is already accounted for. This answer is intended to be combined with jeb's answer, and phase 1 coupled with phase 6 predicts the behavior you describe. Whether originally intended by MS or not, this behavior is frequently exploited by advanced batch programmers.Elainaelaine
For the substring expansion syntax %VAR:~[integer][,[integer]]% (phase 1.3) it would be worth mentioning that integer could also be hexadecimal (preceding 0x) or octal numbers (leading 0), don't you think?Edea
@Edea - Yeah, I considered going into more detail about that, but didn't want to go down that rabbit hole. I was intentionally non-committal when I used the term [integer].There is more info at Rules for how does CMD.EXE parses numbers.Elainaelaine
The % sign seems not to be allowed in the search string portion of the substring replacement syntax %VAR:[*]search=[replace]% (phase 1.3); I think this should be mentioned...Edea
@Edea - Not true! I intentionally worded 1.3 as I did because it is allowed. See New/unknown behaviour in percent/delayed expansion.Elainaelaine
Alright, thanks! sorry for drawing a wrong conclusion! I tried with a single % as the search string which failed, but it works if the search string begins with a character other than %; in case of the :* syntax, the % may even occur as the first character in the search string, which is the behaviour your post reflects...Edea
I'm missing the expansion rules for the cmd context, like that there are no reserved characters for the first character of the variable name like %<digit>, %* or %~. And the behaviour changes for undefined variables. Perhaps you need to open a second answerPianism
@Pianism - Rather than start a new answer, I modified this answer to account for both batch mode and command line mode. Check it out, and tell me if I missed something.Elainaelaine
EDIT - Phase 1 changes to account for issues at dostips.com/forum/…Elainaelaine
2019-06-24 Correction - 1.3 and 5.2 both allow <LF> characters within search and replace termsElainaelaine
Concerning Note 3 (percent expansion): you say that special characters should not be escaped, but actually this is only true for the search string; the replace string, when unquoted, still exposes special characters to the parser; I think this is worth an update -- what do you think?Edea
@Edea - That one is a bit tricky. When I wrote that note I was thinking from the standpoint of what is required for the search/replace operation to succeed. I was not concerned about what might happen afterward. Whether the percent replace string characters should be escaped is dependent on your intent. But I can see that clarification is needed.Elainaelaine
@Edea - DoneElainaelaine
I'm afraid I found a situation these rules do not explain: when the pattern between %var:~ and % does not match the pattern [integer][,[integer]], the result in batch mode is var:~ plus the non-matching pattern, then the character scan is continued at the closing %, meaning that this could be the opening % of a following variable; in CMD mode, the result is almost the same except that the opening % is not removed. Strange, isn't it?Edea
@Edea - You misinterpreted the rules (not hard to do). The rules actually work as they stand. Both cases continue the scan after the first % - They abandon the variable expansion and jump to 1.4 once an invalid substring operation is detected. The only difference is batch strips the leading % and command line preserves it, as described in 1.4.Elainaelaine
@Edea - The scan of the variable in 1.3 is provisional - if the detected VAR construct proves to be invalid, then that provisional scan is discarded and the scanner resumes after the initial % in 1.4.Elainaelaine
Oh damn, 1.4 is talking about the opening %, now it all makes perfect sense! sorry for that!Edea
EDIT - Removed phase 1.5 and replaced with item in phase 2, and modified phase 7.3Elainaelaine
Fixed delayed expansion rule 5.2 to properly disallow find string beginning with ~ when doing find/replace.Elainaelaine
In section 1.05 there is a reference to section 1.5 which does not exist (any more) – is it true that phase 1 is just left at this point (since said Strip CR usually happens in phase 2)?Edea
@Edea - Good catch - I edited to correct. Yeah, exit phase 1 and continue on to phase 2. It is hard to believe this is my first SO action in over 1 year! I've been concentrating on music as a hobby, and my new fascinations with synthesizers (VCV Rack) is a real time suck. I've had an old YouTube overtone flute video go viral - over 600k views!Elainaelaine
@Edea - Strike that - Apparently I can't do calendar arithmetic. It had been 3 months. But it feels like a year!Elainaelaine
I
8

As pointed out, commands are passed the entire argument string in μSoft land, and it is up to them to parse this into separate arguments for their own use. There is no consistencty in this between different programs, and therefore there is no one set of rules to describe this process. You really need to check each corner case for whatever C library your program uses.

As far as the system .bat files go, here is that test:

c> type args.cmd
@echo off
echo cmdcmdline:[%cmdcmdline%]
echo 0:[%0]
echo *:[%*]
set allargs=%*
if not defined allargs goto :eof
setlocal
@rem Wot about a nice for loop?
@rem Then we are in the land of delayedexpansion, !n!, call, etc.
@rem Plays havoc with args like %t%, a"b etc. ugh!
set n=1
:loop
    echo %n%:[%1]
    set /a n+=1
    shift
    set param=%1
    if defined param goto :loop
endlocal

Now we can run some tests. See if you can figure out just what μSoft are trying to do:

C>args a b c
cmdcmdline:[cmd.exe ]
0:[args]
*:[a b c]
1:[a]
2:[b]
3:[c]

Fine so far. (I'll leave out the uninteresting %cmdcmdline% and %0 from now on.)

C>args *.*
*:[*.*]
1:[*.*]

No filename expansion.

C>args "a b" c
*:["a b" c]
1:["a b"]
2:[c]

No quote stripping, though quotes do prevent argument splitting.

c>args ""a b" c
*:[""a b" c]
1:[""a]
2:[b" c]

Consecutive double quotes causes them to lose any special parsing abilities they may have had. @Beniot's example:

C>args "a """ b "" c"""
*:["a """ b "" c"""]
1:["a """]
2:[b]
3:[""]
4:[c"""]

Quiz: How do you pass the value of any environment var as a single argument (i.e., as %1) to a bat file?

c>set t=a "b c
c>set t
t=a "b c
c>args %t%
1:[a]
2:["b c]
c>args "%t%"
1:["a "b]
2:[c"]
c>Aaaaaargh!

Sane parsing seems forever broken.

For your entertainment, try adding miscellaneous ^, \, ', & (&c.) characters to these examples.

Isidor answered 4/11, 2010 at 7:39 Comment(2)
To pass %t% as single argument you could use "%t:"=\"%" That is, use the %VAR:str=replacement% syntax for variable expansion. Shell metacharacters like | and & in the variable contents can still be exposed and mess up the shell though, unless you escape them again....Greenock
@Greenock So, in my example, t is a "b c. Do you have a recipe for getting those 6 characters (a, 2 × space, ", b, and c) to appear as %1 inside a .cmd? I like your thinking though. args "%t:"=""%" is pretty close :-)Isidor
S
5

You have some great answers above already, but to answer one part of your question:

set a =b, echo %a %b% c% → bb c%

What is happening there is that because you have a space before the =, a variable is created called %a<space>% so when you echo %a % that is evaluated correctly as b.

The remaining part b% c% is then evaluated as plain text + an undefined variable % c%, which should be echoed as typed, for me echo %a %b% c% returns bb% c%

I suspect that the ability to include spaces in variable names is more of an oversight than a planned 'feature'

Scorpaenid answered 4/11, 2010 at 7:39 Comment(0)
E
4

FOR-Loop Meta-Variable Expansion

This is an extended explanation of Phase 4) in the accepted answer (applicable for both batch file mode and command line mode). Of course a for command must be active. The following describes the processing of the command line portion after the do clause. Note that in batch file mode, %% has already been converted to % due to the foregoing immediate %-expansion phase (Phase 1)).

  • scan for %-sign, beginning from the left up to the end of the line; if one is found, then:
    • if Command Extensions are enabled (default), check if next character is ~; if yes, then:
      • take as many as possible of the following characters in the case-insensitive set fdpnxsatz (even multiple times each) that are preceding a character that defines a for variable reference or a $-sign; if such a $-sign is encountered, then:
        • scan for a :1; if found, then:
          • if there is a character following the :, use it as a for variable reference and expand as expected, unless it is not defined, then do not expand and continue scan at that character position;
          • if the : is the last character, cmd.exe will crash!
        • else (no : is found) do not expand anything;
      • else (if no $-sign is encountered) expand the for variable using all the modifiers, unless it is not defined, then do not expand and continue scan at that character position;
    • else (if no ~ is found or Command Extensions are disabled) check the next character:
      • if there is no more character available, do not expand anything;
      • if the next character is %, do not expand anything and go back to the beginning of the scan at this character position2;
      • else use the next character as a for variable reference and expand, unless such is not defined, then do not expand;
  • go back to the beginning of the scan at the next character position (as long as there still characters available);

1) The string between $ and : is considered as the name of an environment variable, which may even be empty; since an environment variable cannot have an empty name, the behaviour is just the same as for an undefined environment variable.
2) This implies that a for meta-variable named % cannot be expanded without a ~-modifier.


Original source: How to safely echo FOR variable %%~p followed by a string literal

Edea answered 4/11, 2010 at 7:39 Comment(2)
There is no special percent rule for the character after the colon in %~$:<any-meta-var>Pianism
True, @jeb, I adapted the rules accordingly; the key is the continuation of the scan at the current character position when the meta-variable is not defined…Edea
D
0

edit: see accepted answer, what follows is wrong and explains only how to pass a command line to TinyPerl.


Regarding quotes, I have the feeling that the behaviour is the following:

  • when a " is found, string globbing begins
  • when string globbing occurs:
    • every character that is not a " is globbed
    • when a " is found:
      • if it is followed by "" (thus a triple ") then a double quote is added to the string
      • if it is followed by " (thus a double ") then a double quote is added to the string and string globbing ends
      • if the next character is not ", string globbing ends
    • when line ends, string globbing ends.

In short:

"a """ b "" c""" consists of two strings: a " b " and c"

"a"", "a""" and"a"""" are all the same string if at the end of a line

Doublehung answered 4/11, 2010 at 7:39 Comment(3)
the tokenizer and string globbing depends on the command! A "set" works different then a "call" or even an "if"Pianism
yes, but what about external commands? I guess cmd.exe always passes the same arguments to them?Doublehung
cmd.exe passes always the expansion result as a string not the tokens to an external command. It depends on the external command how to escape and tokenize it, findstr uses backslash the next one can use something elsePianism
B
-3

Note that Microsoft has published its Terminal's source code. It may work similar to the command line with respect to syntax parsing. Maybe someone is interested in testing the reverse-engineered parsing rules on accordance with the terminal's parsing rules.

Link to the source code.

Belford answered 4/11, 2010 at 7:39 Comment(1)
The Terminal has nothing to do with the shell, therefore, you will not find anything there related to the shell's syntax.Milburr

© 2022 - 2024 — McMap. All rights reserved.