How does negative matching work in extglob in parameter expansion
Asked Answered
C

1

6

Problem

The behaviour of

!(pattern-list)

does not work the way I would expect when used in parameter expansion, specifically

${parameter/pattern/string}

Input

a="1 2 3 4 5 6 7 8 9 10"

Test cases

$ printf "%s\n" "${a/!([0-9])/}"
[blank]
#expected 12 3 4 5 6 7 8 9 10

$ printf "%s\n" "${a/!(2)/}"
[blank]
#expected  2 3 4 5 6 7 8 9 10

$ printf "%s\n" "${a/!(*2*)/}"
2 3 4 5 6 7 8 9 10
#Produces the behaviour expected in previous one, not sure why though

$ printf "%s\n" "${a/!(*2*)/,}"
,2 3 4 5 6 7 8 9 10
#Expected after previous worked

$ printf "%s\n" "${a//!(*2*)/}"
2
#Expected again previous worked

$ printf "%s\n" "${a//!(*2*)/,}"
,,2,
#Why are there 3 commas???

Specs

GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)

Notes

These are very basic examples, so if it is possible to include more complex examples with explanations in the answer then please do.

Any more info or examples needed let me know in the comments.

Have already looked at How does extglob work with shell parameter expansion?, and have even commented on what the problem is with that particular problem, so please don't mark as a dupe.

Captious answered 29/5, 2017 at 12:3 Comment(6)
I think I can explain all of those except the last one (which looks like a bug)Briney
@123, I generally use it like ls !(*.txt) (other than files ending with .txt) or ls !(*.log|*.sh) (other than files ending with .log or .sh) etcRelations
@Briney These are only basic examples, the whole thing seems super buggy but I don't think it is, I reckon it's just that it isn't doing what I thought it did. Feel free to post an answer though!Captious
@Relations Yes, it works as expected when matching filenames.Captious
@Captious but I don't get why ls *.!(log|sh) or ls foo*!(bar) (starting with foo but not ending with bar) doesn't do what I expect...Relations
@Relations That is because foo* will match foobar and !(bar) will match nothing/null at the end, so the match will still be successful.Captious
B
5

Parameter expansion of the form ${parameter/pattern/string} (where pattern doesn't start with a /) works by finding the leftmost longest substring in the value of the variable parameter that matches the pattern pattern and replacing it with string. In other words, $parameter is decomposed into three parts prefix,match, and suffix such that

  1. $parameter == "${prefix}${match}${suffix}"
  2. $prefix is the shortest possible string enabling the other requirements to be fulfilled (i.e. the match, if at all possible, occurs in the leftmost position)
  3. $match matches pattern and is as long as possible
  4. any of $prefix, $match and/or $suffix can be empty

and the result of ${parameter/pattern/string} is "${prefix}string${suffix}".

For the global replacement form (${parameter//pattern/string}) of this type of parameter expansion, the same process is recursively performed for the suffix part, however a zero-length match is handled as a special case (in order to prevent infinite recursion):

  • if "${prefix}${match}" != ""

    "${parameter//pattern/string}" = "${prefix}string${suffix//pattern/string}"
    

    else suffix=${parameter:1} and

    "${parameter//pattern/string}" = "string${parameter:0:1}${suffix}//pattern/string}"
    

Now let's analyze the cases individually:

  • "${a/!([0-9])/}" --> prefix='' match='1 2 3 4 5 6 7 8 9 10' suffix=''. Indeed, '1 2 3 4 5 6 7 8 9 10' is not a string consisting of a single digit, and therefore it matches the pattern !([0-9]). Hence the empty result of expansion.

  • "${a/!(2)/}" --> prefix='' match='1 2 3 4 5 6 7 8 9 10' suffix=''. Similar to the above, '1 2 3 4 5 6 7 8 9 10' is not a string consisting of the single character '2', and therefore it matches the pattern !(2). Hence the empty result of expansion.

  • "${a/!(*2*)/}" --> prefix='' match='1 ' suffix='2 3 4 5 6 7 8 9 10'. The substring '1 ' doesn't match the pattern *2*, and therefore it matches the pattern !(*2*).

  • "${a/!(*2*)/,}". There were no surprises here, so no need to elaborate.

  • "${a//!(*2*)/}". There were no surprises here, so no need to elaborate.

  • "${a//!(*2*)/,}" --> prefix='' match='1 ' suffix='2 3 4 5 6 7 8 9 10'. Then ${suffix//!(*2*)/,} expands to ",2," as follows. The empty string in the beginning of suffix matches the pattern !(*2*), producing an extra comma in the result. Since the zero-length match special case (described above) was triggered, the first character of suffix is forcibly consumed, leaving us with ' 3 4 5 6 7 8 9 10', which matches the !(*2*) pattern in its entirety and is replaced with the last comma that we see in the final result of the expansion.

Briney answered 29/5, 2017 at 14:54 Comment(3)
This seems on the right track but I'm still not entirely convinced, for example following this logic say we have ${a/!(*2)/,} then 1 would not match that and would be the longest possible string from the left that doesn't, so surely the output should be ,2 3 4 5 6 7 8 9 10 yet it just leaves a single comma, meaning it matched the entire string. Not saying anything in your answer is incorrect, just that it still isn't entirely clear to me what is happening.Captious
The pattern *2 means a string ending with the character "2", and !(*2) means a string NOT ending with the character "2", that's why the longest possible substring of '1 2 3 4 5 6 7 8 9 10' that matches the pattern !(*2) is the entire string (as it doesn't end with 2).Briney
Makes sense, so basically unless you have *string* it's gonna eat the entire string.Captious

© 2022 - 2024 — McMap. All rights reserved.