Exactly how do backslashes work within backticks?
Asked Answered
F

2

14

From the Bash FAQ:

Backslashes (\) inside backticks are handled in a non-obvious manner:

 $ echo "`echo \\a`" "$(echo \\a)"
 a \a
 $ echo "`echo \\\\a`" "$(echo \\\\a)"
 \a \\a

But the FAQ does not break down the parsing rules that lead to this difference. The only relevant quote from man bash I found was:

When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by $, `, or .

The "$(echo \\a)" and "$(echo \\\\a)" cases are easy enough: Backslash, the escape character, is escaping itself into a literal backlash. Thus every instance of \\ becomes \ in the output. But I'm struggling to understand the analogous logic for the backtick cases. What is the underlying rule and how does the observed output follow from it?

Finally, a related question... If you don't quote the backticks, you get a "no match" error:

$ echo `echo \\\\a`
-bash: no match: \a

What's happening in this case?

update

Re: my main question, I have a theory for a set of rules that explains all the behavior, but still don't see how it follows from any of the documented rules in bash. Here are my proposed rules....

Inside backticks, a backslash in front of a character simply returns that character. Ie, a single backslash has no effect. And this is true for all characters, except backlash itself and backticks. In the case of backslash itself, \\ becomes an escaping backslash. It will escape its next character.

Let's see how this plays out in an example:

a=xx
echo "`echo $a`"      # prints the value of $a
echo "`echo \$a`"     # single backslash has no effect: equivalent to above
echo "`echo \\$a`"    # escaping backslash make $ literal

prints:

xx
xx
$a

Try it online!

Let's analyze the original examples from this perspective:

echo "`echo \\a`"

Here the \\ produces an escaping backslash, but when we "escape" a we just get back a, so it prints a.

echo "`echo \\\\a`"

Here the first pair \\ produces an escaping backslash which is applied to \, producing a literal backslash. That is, the first 3 \\\ become a single literal \ in the output. The remaining \a just produces a. Final result is \a.

File answered 11/8, 2019 at 5:49 Comment(15)
can't reproduce that error on bash 5.0.7, which version do you have?Skiascope
@oguzismail GNU bash, version 5.0.2(1)-release (x86_64-apple-darwin16.7.0)File
That "no match" error looks like you have failglob set and for some reason bash is trying to treat \a as a filename pattern. I have no idea why it would do this.Enslave
Backslash inside backticks definitely escapes a backtick. Inside that new backtick expression, a double backslash is required to introduce another level of nested backticks.Faina
@Faina Good point, I updated my "update" section to include that. I'm still not sure my theory is right, but it's the best I have so far that explains what I'm seeing.File
I vaguely recall that backticks are generally similar to double quotes, but I don't have a reference, and may be mixing this with Perl's rules.Faina
@File your theory is the best I've seen so far. Thanks for doing the analysis. I noticed (and tested), and this follows from your theory, that echo `echo \\\a` also outputs \a. Weird stuff. Now I understand better why so many say to use $(...) instead!Retrenchment
PS: I suggest you copy your theory into an answer and accept it, or at least let us upvote it.Retrenchment
@Retrenchment Thanks. I will do that if no one else answers. I'm hoping a true bash expert will see this and elaborate on the theory, connect it to documentation, etc. I think my theory is functionally correct, at least for most cases, but I still feel I'm missing the full story.File
OK, here's a kicker, consistent with your theory: you need 7 backslashes to get \\a as output! The first 4 give you a literal backslash, and as before you need at least three more before you get a second literal backslash.Retrenchment
Yes, I think my theory is good for explaining any number of such backslashes. But things get interesting when you consider nested backticks, and escaped nested backticks: Try it online!File
I think that the shopt -s failglob; echo $(<<<"\a" cat) is wrongly parsed. Why does filename expansion triggers there? @Retrenchment , the behavior also ahppens with $(, ex. shopt -s failglob; echo $(echo "\\a") I get no match error. This looks like a bug. filename expansion shouldn't happen on \a string, only when the argument after word splitting has * or ? or [Immixture
@Immixture I can't reproduce your errors - in my bash 4.4.12 shell, both commands output \a.Retrenchment
Indeed docker run -ti --rm bash:4.4.12 -c 'shopt -s failglob; echo $(<<<"\a" cat)' works, but docker run -ti --rm bash:5 -c 'shopt -s failglob; echo $(<<<"\a" cat)' fails. Looks like a bug. At least it inconsistent between versions. 4.4.23 looks like the last working version.Immixture
@Immixture so this is a recently introduced bug. Rather disappointing I have to say, but I guess it's going to happen, with such complex software tools. Might be worth reporting to the bash bug tracker, assuming there is such a thing.Retrenchment
E
4

The logic is quite simple as such. So we look at bash source code (4.4) itself

subst.c:9273

case '`': /* Backquoted command substitution. */
{
    t_index = sindex++;

    temp = string_extract(string, &sindex, "`", SX_REQMATCH);
    /* The test of sindex against t_index is to allow bare instances of
        ` to pass through, for backwards compatibility. */
    if (temp == &extract_string_error || temp == &extract_string_fatal)
    {
    if (sindex - 1 == t_index)
    {
        sindex = t_index;
        goto add_character;
    }
    last_command_exit_value = EXECUTION_FAILURE;
    report_error(_("bad substitution: no closing \"`\" in %s"), string + t_index);
    free(string);
    free(istring);
    return ((temp == &extract_string_error) ? &expand_word_error
                                            : &expand_word_fatal);
    }

    if (expanded_something)
    *expanded_something = 1;

    if (word->flags & W_NOCOMSUB)
    /* sindex + 1 because string[sindex] == '`' */
    temp1 = substring(string, t_index, sindex + 1);
    else
    {
    de_backslash(temp);
    tword = command_substitute(temp, quoted);
    temp1 = tword ? tword->word : (char *)NULL;
    if (tword)
        dispose_word_desc(tword);
    }
    FREE(temp);
    temp = temp1;
    goto dollar_add_string;
}

As you can see calls a function de_backslash(temp); on the string which updates the string in c. The code the same function is below

subst.c:1607

/* Remove backslashes which are quoting backquotes from STRING.  Modifies
   STRING, and returns a pointer to it. */
char *
    de_backslash(string) char *string;
{
  register size_t slen;
  register int i, j, prev_i;
  DECLARE_MBSTATE;

  slen = strlen(string);
  i = j = 0;

  /* Loop copying string[i] to string[j], i >= j. */
  while (i < slen)
  {
    if (string[i] == '\\' && (string[i + 1] == '`' || string[i + 1] == '\\' ||
                              string[i + 1] == '$'))
      i++;
    prev_i = i;
    ADVANCE_CHAR(string, slen, i);
    if (j < prev_i)
      do
        string[j++] = string[prev_i++];
      while (prev_i < i);
    else
      j = i;
  }
  string[j] = '\0';

  return (string);
}

The above just does simple thing if there is \ character and the next character is \ or backtick or $, then skip this \ character and copy the next character

So if convert it to python for simplicity

text = r"\\\\$a"

slen = len(text)
i = 0
j = 0
data = ""
while i < slen:
    if (text[i] == '\\' and (text[i + 1] == '`' or text[i + 1] == '\\' or
                             text[i + 1] == '$')):
        i += 1
    data += text[i]
    i += 1

print(data)

The output of the same is \\$a. And now lets test the same in bash

$ a=xxx

$ echo "$(echo \\$a)"
\xxx

$ echo "`echo \\\\$a`"
\xxx
Expel answered 14/8, 2019 at 18:7 Comment(0)
O
4

Did some more research to find the reference and rule of what is happening. From the GNU Bash Reference Manual it states

When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by ‘$’, ‘`’, or ‘\’. The first backquote not preceded by a backslash terminates the command substitution. When using the $(command) form, all characters between the parentheses make up the command; none are treated specially.

In other words \, \$, and ` inside of `` are processed by the CLI parser before the command substitution. Everything else is passed to the command substitution for processing.

Let's step through each example from the question. After the # I put how the command substitution was processed by the CLI parser before `` or $() is executed.

Your first example explained.

$ echo "`echo \\a`"   # echo \a
 a 
$ echo "$(echo \\a)"  # echo \\a
 \a

Your second example explained:

$ echo "`echo \\\\a`"   # echo \\a
 \a 
$ echo "$(echo \\\\a)"  # echo \\\\a
 \\a

Your third example:

a=xx
$ echo "`echo $a`"    # echo xx 
xx
$ echo "`echo \$a`"   # echo $a
xx
echo "`echo \\$a`"    # echo \$a
$a

Your third example using $()

$ echo "$(echo $a)"     # echo $a
xx
$ echo "$(echo \$a)"    # echo \$a
$a
$ echo "$(echo \\$a)"   # echo \\$a
\xx
Ovi answered 13/8, 2019 at 23:3 Comment(0)
E
4

The logic is quite simple as such. So we look at bash source code (4.4) itself

subst.c:9273

case '`': /* Backquoted command substitution. */
{
    t_index = sindex++;

    temp = string_extract(string, &sindex, "`", SX_REQMATCH);
    /* The test of sindex against t_index is to allow bare instances of
        ` to pass through, for backwards compatibility. */
    if (temp == &extract_string_error || temp == &extract_string_fatal)
    {
    if (sindex - 1 == t_index)
    {
        sindex = t_index;
        goto add_character;
    }
    last_command_exit_value = EXECUTION_FAILURE;
    report_error(_("bad substitution: no closing \"`\" in %s"), string + t_index);
    free(string);
    free(istring);
    return ((temp == &extract_string_error) ? &expand_word_error
                                            : &expand_word_fatal);
    }

    if (expanded_something)
    *expanded_something = 1;

    if (word->flags & W_NOCOMSUB)
    /* sindex + 1 because string[sindex] == '`' */
    temp1 = substring(string, t_index, sindex + 1);
    else
    {
    de_backslash(temp);
    tword = command_substitute(temp, quoted);
    temp1 = tword ? tword->word : (char *)NULL;
    if (tword)
        dispose_word_desc(tword);
    }
    FREE(temp);
    temp = temp1;
    goto dollar_add_string;
}

As you can see calls a function de_backslash(temp); on the string which updates the string in c. The code the same function is below

subst.c:1607

/* Remove backslashes which are quoting backquotes from STRING.  Modifies
   STRING, and returns a pointer to it. */
char *
    de_backslash(string) char *string;
{
  register size_t slen;
  register int i, j, prev_i;
  DECLARE_MBSTATE;

  slen = strlen(string);
  i = j = 0;

  /* Loop copying string[i] to string[j], i >= j. */
  while (i < slen)
  {
    if (string[i] == '\\' && (string[i + 1] == '`' || string[i + 1] == '\\' ||
                              string[i + 1] == '$'))
      i++;
    prev_i = i;
    ADVANCE_CHAR(string, slen, i);
    if (j < prev_i)
      do
        string[j++] = string[prev_i++];
      while (prev_i < i);
    else
      j = i;
  }
  string[j] = '\0';

  return (string);
}

The above just does simple thing if there is \ character and the next character is \ or backtick or $, then skip this \ character and copy the next character

So if convert it to python for simplicity

text = r"\\\\$a"

slen = len(text)
i = 0
j = 0
data = ""
while i < slen:
    if (text[i] == '\\' and (text[i + 1] == '`' or text[i + 1] == '\\' or
                             text[i + 1] == '$')):
        i += 1
    data += text[i]
    i += 1

print(data)

The output of the same is \\$a. And now lets test the same in bash

$ a=xxx

$ echo "$(echo \\$a)"
\xxx

$ echo "`echo \\\\$a`"
\xxx
Expel answered 14/8, 2019 at 18:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.