Why does this parameter expansion replacement fail in bash 4.2 but work in 5.1?
Asked Answered
B

2

6

I'm trying to port some code from bash 5.1 to 4.2.46. One function which tries to strip color codes from a specifically formatted string stopped working.

This is a sample string text in such format. I turn on extended globbing for this.

text="$(printf -- "%b%s%b" "\[\e[31m\]" "hello" "\[\e[0m\]")"
shopt -s extglob

In bash 5.1, this parameter expansion works to remove all the color codes and escape characters

bash-5.1$ echo "${text//$'\[\e'\[/}"
31m\]hello0m\]
bash-5.1$ echo "${text//$'\[\e'\[+([0-9])/}"
m\]hellom\]
bash-5.1$ echo "${text//$'\[\e'\[+([0-9])m$'\]'/}"
hello

In bash 4.2.46, I start getting a different behavior as I build up the parameter expansion.

bash-4.2.46$ echo "${text//$'\[\e'\[/}"
\31m\]hello\0m\]
bash-4.2.46$ echo "${text//$'\[\e'\[+([0-9])/}"
\[\]hello\[\]  ## no longer matches because `+([0-9])` doesn't follow `\[`

The difference comes from this line: echo "${text//$'\[\e'\[/}"

bash-5.1:    31m\]hello0m\]
bash-4.2.46: \31m\]hello\0m\]

Here's what printf "%q" "${text//$'\[\e'\[/}" shows:

bash-5.1:    31m\\\]hello0m\\\]
bash-4.2.46: \\31m\\\]hello\\0m\\\]

Where is the extra \ coming from in 4.2.26?

Even when I try to remove it, the pattern stops matching:

bash-4.2.46$ echo "${text//$'\[\e'\[\\/}"
\[\]hello\[\]  ## no longer matches because `\\` doesn't follow `\[`

I'm guessing there may be a bug related to parameter expansion, backslash escaping, and extended globbing.

I am aiming to write code that works on bash 4.0 onward, so I'm looking for a workaround primarily. An explanation (bug report, etc.) to why the behavior difference happens would be great, though.

Broch answered 5/1, 2022 at 0:44 Comment(7)
I tested this in more versions of bash. The error (?) is reproducible in 4.0.0(1) and 3.2.57(1) too. In 4.4.0(1) the expansion works just as it does in 5.Barta
All these years I never realized you could combine those. Thanks for the suggestion! Edited the question, though it's still an eyesore. :')Broch
I think 5.0 is worth having the minimal version instead of 4.Binghi
It's difficult to trace the changes unfortunately. Maybe consider asking about it through bashbug.Binghi
In the end my workaround was to just surround the escape characters with an invisible sentinel character (unit separator) when I generate the colored text, then the stripping code just finds those sentinel characters and rips out the escape characters, leaving the real text.Broch
I agree about having 5.x+ but unfortunately stuck on 4.x in some enterprise environments (CentOS…)Broch
The thought was it could inspire a workaround. But I ended up going in a completely different direction. I still like to understand these kinds of things as I work on very old, disparate versions of bash.Broch
M
2

The problem seems to be parsing $'...' inside ${test//<here>} when inside " quotes.

$ test='f() { "${text//\[$'\''\e'\''\[+([0-9])/}"; }; printf "%q\n" "$(declare -f f)"'; echo -n 'bash4.1 '; docker run bash:4.1 bash -c "$test" ; echo -n 'bash5.1 '; bash -c "$test"
bash4.1 $'f () \n{ \n    "${text//\\[\E\\[+([0-9])/}"\n}'
bash5.1 $'f () \n{ \n    "${text//\\[\'\E\'\\[+([0-9])/}"\n}'

Just use a variable.

esc=$'\e'
echo "${text//\\\[$esc\[+([0-9])/}"
Malcom answered 5/1, 2022 at 2:17 Comment(0)
B
5

Seems like a bug in bash. By bisecting the available versions, I found that 4.2.53(1)-release was the last version with this bug. Version 4.3.0(1)-release fixed the problem.

The list of changes mentions a few bug fixes in this direction. Maybe it was one of below bugfixes:

This document details the changes between this version, bash-4.3-alpha, and the previous version, bash-4.2-release.
[...]
zz. When using the pattern substitution word expansion, bash now runs the replacement string through quote removal, since it allows quotes in that string to act as escape characters. This is not backwards compatible, so it can be disabled by setting the bash compatibility mode to 4.2.
[...]
eee. Fixed a logic bug that caused extended globbing in a multibyte locale to cause failures when using the pattern substititution word expansions.

Workaround

Instead of using parameter expansions with extglobs, use bash pattern matching with actual regexes (available in bash 3.0.0 and higher):

text=$'\[\e[31m\]hello\[\e[0m\]'
while [[ "$text" =~ (.*)$'\[\e['[0-9]*'m\]'(.*) ]]; do
  text="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
done
echo "$text"

or rely on an external (but posix standarized) tool like sed:

text=$'\[\e[31m\]hello\[\e[0m\]'
text=$(sed $'s#\\\[\e[[0-9]*m\\\]##g' <<< "$text")
echo "$text"
Barta answered 5/1, 2022 at 1:59 Comment(1)
When using the pattern substitution word expansion, bash now runs the replacement string through quote removal.. my guess this is this.Malcom
M
2

The problem seems to be parsing $'...' inside ${test//<here>} when inside " quotes.

$ test='f() { "${text//\[$'\''\e'\''\[+([0-9])/}"; }; printf "%q\n" "$(declare -f f)"'; echo -n 'bash4.1 '; docker run bash:4.1 bash -c "$test" ; echo -n 'bash5.1 '; bash -c "$test"
bash4.1 $'f () \n{ \n    "${text//\\[\E\\[+([0-9])/}"\n}'
bash5.1 $'f () \n{ \n    "${text//\\[\'\E\'\\[+([0-9])/}"\n}'

Just use a variable.

esc=$'\e'
echo "${text//\\\[$esc\[+([0-9])/}"
Malcom answered 5/1, 2022 at 2:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.