how to use jq to filter select items not in list?
Asked Answered
P

3

22

In jq, I can select an item in a list fairly easily:

$ echo '["a","b","c","d","e"]' | jq '.[] | select(. == ("a","c"))'

Or if you prefer to get it as an array:

$ echo '["a","b","c","d","e"]' | jq 'map(select(. == ("a","c")))'

But how do I select all of the items that are not in the list? Certainly . != ("a","c") does not work:

$ echo '["a","b","c","d","e"]' | jq 'map(select(. != ("a","c")))'
[
  "a",
  "b",
  "b",
  "c",
  "d",
  "d",
  "e",
  "e"
]

The above gives every item twice, except for "a" and "c"

Same for:

$ echo '["a","b","c","d","e"]' | jq '.[] | select(. != ("a","c"))'
"a"
"b"
"b"
"c"
"d"
"d"
"e"
"e"

How do I filter out the matching items?

Penney answered 15/6, 2017 at 8:56 Comment(4)
That was brutally painful, but I did manage to get it.Penney
Your filter is effectively the same as . != "a" or . != "c". That of course would always be true so you're not seeing anything filtered. However you're getting duplicates now since you're using the comma operator. Remember, for every value produced from commas, the expression is reevaluated with the new values. So select(. != ("a","c")) becomes select(. != "a"), select(. != "c"). Then it should be very clear what's happening.Mudpack
Thanks for the explanation @JeffMercado. I could not figure out why it didn't work. Essentially . != ("a","c") is logic OR, where I was expecting logical AND (even though . == ("a","c") is logical OR).Penney
Not really. It's more like ("a","c") is two values "a" and "c". For any expression that uses it, copy the expression substituting the values "a" and "c" for the copies.Mudpack
E
24

The simplest and most robust (w.r.t. jq versions) approach would be to use the builtin -:

$ echo '["a","b","c","d","e"]' | jq -c '. - ["a","c"]'
["b","d","e"]

If the blacklist is very long and riddled with duplicates, then it might be appropriate to remove them (e.g. with unique).

Variations

The problem can also be solved (in jq 1.4 and up) using index and not, e.g.

["a","c"] as $blacklist
| .[] | select( . as $in | $blacklist | index($in) | not) 

Or, with a variable passed in from the command-line (jq --argjson blacklist ...):

.[] | select( . as $in | $blacklist | index($in) | not) 

To preserve the list structure, one can use map( select( ...) ).

With jq 1.5 or later, you could also use any or all, e.g.

def except(blacklist):
  map( select( . as $in | blacklist | all(. != $in) ) );

Special case: strings

See e.g. Select entries based on multiple values in jq

Eparch answered 15/6, 2017 at 14:45 Comment(6)
how do you use any here? Can you share an example?Penney
FYI, what I did was: def inarray($val;ary): ary | any(. == $val); def notinarray($val;ary): ary | all(. != $val); Penney
Aha! The - operator! Thank you @peak. So - is equivalent to "not in "a" AND not in "c"" ?Penney
For the - variant (by far the simplest), what do you do if the input is an array, e.g [{"val":"a"},{"val":"b"},{"val":"c"},{"val":"d"},{"val":"e"}] and you want to filter by .val - ["a","c"] (which doesn't work)?Penney
@Penney - I would suggest you create a new SO question.Eparch
With jq 1.6 or later, IN can be used: echo '["a","b","c","d","e"]' | jq '.[] | select(. | IN("a", "c") | not)'Toscano
C
3

I'm sure it is not the most simple solution, but it works :)

$ echo '["a","b","c","d","e"]' | jq '.[] | select(test("[^ac]"))'

Edit: one more solution - this is even worse :)

$ echo '["a","b","c","d","e"]' | jq '.[] | select(. != ("a") and . != ("b"))'
Childe answered 15/6, 2017 at 9:24 Comment(7)
Using the regex is nice idea, but this actually is just a simple sample. I am comparing against an array of items. I wish it were just single chars.Penney
@deitch: You can still use test, just invert the result with not, e.g.: test("^(abc|bcd)$") | notSuckow
@Suckow that is interesting. Could I do it with a variable, e.g. js --arg match "abc" '.[] | select(test("^($match)$") | not ?Penney
@Picard, I originally did it with your alternate solution. The problem is that I have an unknown in advance list of items to match against.Penney
From what I understand, your initial solution doesn't work because it checks each letter from the input array with each item from the matching list - so it matches the input "a" with the "a" from the list (no != match) then "a" with "c" (yes != match) so it outputs the input "a" (although you'd thought that is shouldn't). If it would be a "set-of-elements" element type maybe the things would be different but from how the lists work I don't think there's a short, one operator solution to this.Childe
@deitch: Yes ` jq --arg match 'bcd' '.[] | select(test("^("+$match+")$") | not)'`Suckow
In any case, I do have a solution, but cannot post it for 2 days. Oh well.Penney
K
1

As suggested by @gobenji, since 1.6, you can use IN function:

jq \
  '
    .[] |
    select(
      . | IN("a", "c") | not
    )
  ' \
  <<<'["a","b","c","d","e"]'
Kitchenware answered 29/7, 2024 at 12:51 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.