How can I tell if a jq filter successfully pulls data from a JSON data structure?
Asked Answered
S

4

29

I want to know if a given filter succeeds in pulling data from a JSON data structure. For example:

###### For the user steve...

% Name=steve
% jq -j --arg Name "$Name" '.[]|select(.user == $Name)|.value' <<<'
[
   {"user":"steve", "value":false},
   {"user":"tom", "value":true},
   {"user":"pat", "value":null},
   {"user":"jane", "value":""}
]'
false
% echo $?
0

Note: successful results can include boolean values, null, and even the empty string.

###### Now for user not in the JSON data...

% Name=mary
% jq -j --arg Name "$Name" '.[]|select(.user == $Name)|.value' <<<'
[
   {"user":"steve", "value":false},
   {"user":"tom", "value":true},
   {"user":"pat", "value":null},
   {"user":"jane", "value":""}
]'
% echo $?
0

If the filter does not pull data from the JSON data structure, I need to know this. I would prefer the filter to return a non-zero return code.

How would I go about determining if a selector successfully pulls data from a JSON data structure vs. fails to pull data?

Important: The above filter is just an example, the solution needs to work for any jq filter.

Note: the evaluation environment is Bash 4.2+.

Sevenup answered 13/12, 2016 at 15:7 Comment(5)
For some reason I could not run the command from your question (or) I could demonstrate the usage of the flag -e in your logic, it is throwing me an error parse error: Invalid numeric literal at line 6, column 25 on jq-1.5 on my MacScorcher
Fixed. Please try again. The value key wasn't quoted. I've also simplified the above examples, replacing $(whoami) with $Name where the latter is clearly defined on the line above the scripts.Sevenup
Speaking of quoting, use --arg Name "$Name". Otherwise the number of shell words your name parses into is entirely undefined.Dundalk
@CharlesDuffy ;-) Yup, I was just keeping the example simple as typically Linux usernames don't have spaces. Neverthless, your point has technical merit. I've updated the question above.Sevenup
Nod. bash supports operating systems other than Linux, though -- on cygwin, for instance, names with spaces do happen in practice.Dundalk
S
4

I've found a solution that meets all of my requirements! Please let me know what you think!

The idea is use jq -e "$Filter" as a first-pass check. Then for the return code of 1, do a jq "path($Filter)" check. The latter will only succeed if, in fact, there is a path into the JSON data.

Select.sh

#!/bin/bash

Select()
{
   local Name="$1"
   local Filter="$2"
   local Input="$3"
   local Result Status

   Result="$(jq -e --arg Name "$Name" "$Filter" <<<"$Input")"
   Status=$?

   case $Status in
   1) jq --arg Name "$Name" "path($Filter)" <<<"$Input" >/dev/null 2>&1
      Status=$?
      ;;
   *) ;;
   esac

   [[ $Status -eq 0 ]] || Result="***ERROR***"
   echo "$Status $Result"
}

Filter='.[]|select(.user == $Name)|.value'
Input='[
   {"user":"steve", "value":false},
   {"user":"tom", "value":true},
   {"user":"pat", "value":null},
   {"user":"jane", "value":""}
]'

Select steve "$Filter" "$Input"
Select tom   "$Filter" "$Input"
Select pat   "$Filter" "$Input"
Select jane  "$Filter" "$Input"
Select mary  "$Filter" "$Input"

And the execution of the above:

% ./Select.sh
0 false
0 true
0 null
0 ""
4 ***ERROR***
Sevenup answered 13/12, 2016 at 17:53 Comment(7)
Sounds good! You should accept the answer, for making the future reference valid!Scorcher
I think I have to wait 2 days before I can accept it. Please upvote this answer... I'm just getting started in putting stuff here and I need reputation points! :-) I really enjoyed the back-and-forth with you. I hope to have conversations with you in the future! Many thanks!Sevenup
This seems overly convoluted, take a look at @peak's answer, or mine.Sought
@Sought Unfortunately, your solution is too narrow as the jq filter is not necessarily a simple key filter. So, the has() function is inadequate. I need to cater to a wide variety of possible filters and I cannot know in advance what the filter is. That is, I am writing a Bash function that takes as its input the jq filter (and other arguments). The solution I show above is even a simplification of what I'm dealing with. So, when dealing with complexity to begin with, a more general solution is needed. However, thank you for your thoughts!Sevenup
@SteveAmerige - If your solution meets your needs, fine, but there is a fundamental problem here, which in brief is that the question is misguided. Consider, for example, the following case: with no input (jq -n), the filter range(0,10) produces 10 JSON values. Are these values "pulled from the data"? Similarly, if the input is 1, the jq filter 2 will produce one value (2). ...Inextirpable
@Inextirpable saying a question is misguided isn't part of the stackexchange mantras: focus on questions about an actual problem. That's what I've done. Part of the problem is that jq provides no global option to say that accessing non-existent keys should raise an error. The function has is unfortunately inadequate because it requires apriori knowledge of the filter. So, my focus is to find a solution to my problem. I do agree that "pulled from the data" could be made clearer: I want to be able to take a filter and operate upon it to know if it tries to access any non-existent keys or indices.Sevenup
@SteveAmerige - When SO users write misguided or problematic questions, the appropriate way to respond is via "comments" such as this, in the hope that the OP will make appropriate modifications or clarifications. The comment can then easily be deleted. You are of course quite correct to observe that jq does not have an option to change the semantics of the built-in functions for accessing the contents of compound objects. jq does not provide the meta-programming builtins that would be required to write a function in jq to ascertain whether a filter attempts to access a non-existent key, etc.Inextirpable
S
37

You can use the -e / --exit-status flag from the jq Manual, which says

Sets the exit status of jq to 0 if the last output values was neither false nor null, 1 if the last output value was either false or null, or 4 if no valid result was ever produced. Normally jq exits with 2 if there was any usage problem or system error, 3 if there was a jq program compile error, or 0 if the jq program ran.

I can demonstrate the usage with a basic filter as below, as your given example is not working for me.

For a successful query,

dudeOnMac:~$ jq -e '.foo?' <<< '{"foo": 42, "bar": "less interesting data"}'
42
dudeOnMac:~$ echo $?
0

For an invalid query, done with a non-existent entity zoo,

dudeOnMac:~$ jq -e '.zoo?' <<< '{"foo": 42, "bar": "less interesting data"}'
null
dudeOnMac:~$ echo $?
1

For an error scenario, returning code 2 which I created by double-quoting the jq input stream.

dudeOnMac:~$ jq -e '.zoo?' <<< "{"foo": 42, "bar": "less interesting data"}"
jq: error: Could not open file interesting: No such file or directory
jq: error: Could not open file data}: No such file or directory
dudeOnMac:~$ echo $?
2
Scorcher answered 13/12, 2016 at 15:54 Comment(7)
Try the examples above again please. I didn't have value quoted for the jane user. I've fixed it above. Note that while using the -e flag gives you the non-zero error code for a non-existent key, it also badly gives a non-zero error code for an existing key whose value is either false or null. So, this definitely won't work. The question is very clear that values can include booleans, empty strings, and null. Do you have other thoughts?Sevenup
@SteveAmerige: That's a nice observation, am afraid, these are the exit codes, jq supports, but it is clearly distinguishable for codes 1 and codes other than that, maybe you can use that logic?Scorcher
With the -e flag, this problem allows us to conclude that for 0 return status: result is from the JSON data. For non-0, non-1 error status: the result is bad in some way (real error case). For 1 error status: if result is false, then this is actually from the JSON data, so treat it as 0 return status. The final case is where the return status is 1 and the result is null. I have not yet figured out how this can indicate whether the data comes from the JSON input or not. If we can solve this, the problem is solved. Keep in mind the jq filter can be anything. So, has() doesn't help.Sevenup
@SteveAmerige: Tried several filters, for it, just couldn't reproduce that particular combination, with 1 and returning nullScorcher
Simple cases: Want non-zero return status: jq -je '.a' <<<'{}' Want zero return status: jq -je '.a' <<<'{"a":null}' In both cases, these return 1 and the result is null.Sevenup
@SteveAmerige: Am out of luck on this, tried parsing the entire jq documentation without any luck, :(. Do update the answer or post it here, if you happen to find the logic, will be happy to update the answerScorcher
You may want to remove the ? from your first two examples, they have no bearing on this situation and add confusion. They only affect the filter if the input isn't an object (but it is). I'd also add an example like % jq -e '.zoo?' <<< '"foo"' to show an exit status of 4.Weylin
S
4

I've found a solution that meets all of my requirements! Please let me know what you think!

The idea is use jq -e "$Filter" as a first-pass check. Then for the return code of 1, do a jq "path($Filter)" check. The latter will only succeed if, in fact, there is a path into the JSON data.

Select.sh

#!/bin/bash

Select()
{
   local Name="$1"
   local Filter="$2"
   local Input="$3"
   local Result Status

   Result="$(jq -e --arg Name "$Name" "$Filter" <<<"$Input")"
   Status=$?

   case $Status in
   1) jq --arg Name "$Name" "path($Filter)" <<<"$Input" >/dev/null 2>&1
      Status=$?
      ;;
   *) ;;
   esac

   [[ $Status -eq 0 ]] || Result="***ERROR***"
   echo "$Status $Result"
}

Filter='.[]|select(.user == $Name)|.value'
Input='[
   {"user":"steve", "value":false},
   {"user":"tom", "value":true},
   {"user":"pat", "value":null},
   {"user":"jane", "value":""}
]'

Select steve "$Filter" "$Input"
Select tom   "$Filter" "$Input"
Select pat   "$Filter" "$Input"
Select jane  "$Filter" "$Input"
Select mary  "$Filter" "$Input"

And the execution of the above:

% ./Select.sh
0 false
0 true
0 null
0 ""
4 ***ERROR***
Sevenup answered 13/12, 2016 at 17:53 Comment(7)
Sounds good! You should accept the answer, for making the future reference valid!Scorcher
I think I have to wait 2 days before I can accept it. Please upvote this answer... I'm just getting started in putting stuff here and I need reputation points! :-) I really enjoyed the back-and-forth with you. I hope to have conversations with you in the future! Many thanks!Sevenup
This seems overly convoluted, take a look at @peak's answer, or mine.Sought
@Sought Unfortunately, your solution is too narrow as the jq filter is not necessarily a simple key filter. So, the has() function is inadequate. I need to cater to a wide variety of possible filters and I cannot know in advance what the filter is. That is, I am writing a Bash function that takes as its input the jq filter (and other arguments). The solution I show above is even a simplification of what I'm dealing with. So, when dealing with complexity to begin with, a more general solution is needed. However, thank you for your thoughts!Sevenup
@SteveAmerige - If your solution meets your needs, fine, but there is a fundamental problem here, which in brief is that the question is misguided. Consider, for example, the following case: with no input (jq -n), the filter range(0,10) produces 10 JSON values. Are these values "pulled from the data"? Similarly, if the input is 1, the jq filter 2 will produce one value (2). ...Inextirpable
@Inextirpable saying a question is misguided isn't part of the stackexchange mantras: focus on questions about an actual problem. That's what I've done. Part of the problem is that jq provides no global option to say that accessing non-existent keys should raise an error. The function has is unfortunately inadequate because it requires apriori knowledge of the filter. So, my focus is to find a solution to my problem. I do agree that "pulled from the data" could be made clearer: I want to be able to take a filter and operate upon it to know if it tries to access any non-existent keys or indices.Sevenup
@SteveAmerige - When SO users write misguided or problematic questions, the appropriate way to respond is via "comments" such as this, in the hope that the OP will make appropriate modifications or clarifications. The comment can then easily be deleted. You are of course quite correct to observe that jq does not have an option to change the semantics of the built-in functions for accessing the contents of compound objects. jq does not provide the meta-programming builtins that would be required to write a function in jq to ascertain whether a filter attempts to access a non-existent key, etc.Inextirpable
S
3

I've added an updated solution below

The fundamental problem here is that when try to retrieve a value from an object using the .key or .[key] syntax, jq — by definition — can't distinguish a missing key from a key with a value of null.

You can instead define your own lookup function:

def lookup(k):if has(k) then .[k] else error("invalid key") end;

Then use it like so:

$ jq 'lookup("a")' <<<'{}' ; echo $?
jq: error (at <stdin>:1): invalid key
5

$ jq 'lookup("a")' <<<'{"a":null}' ; echo $?
null
0

If you then use lookup consistently instead of the builtin method, I think that will give you the behaviour you want.


Here's another way to go about it, with less bash and more jq.

#!/bin/bash

lib='def value(f):((f|tojson)//error("no such value"))|fromjson;'

users=( steve tom pat jane mary )

Select () {
  local name=$1 filter=$2 input=$3
  local -i status=0
  result=$( jq --arg name "$name" "${lib}value(${filter})" <<<$input  2>/dev/null )
  status=$? 
  (( status )) && result="***ERROR***"
  printf '%s\t%d %s\n' "$name" $status "$result"
}

filter='.[]|select(.user == $name)|.value'

input='[{"user":"steve","value":false},
        {"user":"tom","value":true},
        {"user":"pat","value":null},
        {"user":"jane","value":""}]'

for name in "${users[@]}"
do
  Select "$name" "$filter" "$input"
done

This produces the output:

steve   0 false
tom     0 true
pat     0 null
jane    0 ""
mary    5 ***ERROR***

This takes advantage of the fact the absence of input to a filter acts like empty, and empty will trigger the alternative of //, but a string — like "null" or "false" — will not.

It should be noted that value/1 will not work for filters that are simple key/index lookups on objects/arrays, but neither will your solution. I'm reasonably sure that to cover all the cases, you'd need something like this (or yours) and something like get or lookup.

Sought answered 13/12, 2016 at 19:40 Comment(6)
This solution is too narrow. The jq filters are not necessarily simple key selectors. See my comment in my own solution. Thanks again for your participation.Sevenup
More interesting solution would be to lookup recursively, f.i.: lookup(.a.b.c). If there is .a.b.c in the json, return it or throw error otherwise. I believe that we need to use some jq functions like has, path, map and walk recursively over all items .[a], .[b] .[c] to find out if .a.b.c exists there.Rumormonger
Sorry for the delay in responding... the holiday season and all. Your update is very interesting. So much so that I'm now considering changing this to the accepted answer. It seems to me that on the Bash side that if I do a regex check to inspect the filter to see if it is a simple path filter (e.g., .a[1].b), then I can treat that as a separate case that requires walking to determine whether the path exists. If not a simple path filter, then use your above approach. Do you believe that this will then work for all filters? Can you describe how to do the walk?Sevenup
To do the simple path check, I've considered that in Bash I could take a filter such as .a[1].b and determine the parent and the key. For example, for the above example, the parent is .a[1] and the key is b. Then, I can do jq "$parent | has($key). This might be easier than trying to implement a lookup or get in jq. But, I'm open to your thoughts on this.Sevenup
I don't think there is going to be an even remotely elegant way to do this properly until (if ever) jq changes the way it deals with null values. However, I've come up with another abuse of (to|from)jsondef unnullify: "(?six)(?<string>\"([^\"\\\\\\\\]*|\\\\\\\\[\"\\\\\\\\bfnrt\\/]|\\\\\\\\u[0-9a-f]{4})*\"){0}(?<chunk>\\g<string>|[^\"]+)" as $rx|tojson|[scan($rx)]|map(.[2]|sub("\\A\"null\"\\Z";"\"\\\"null\\\"\"")|sub(":null,";":\"null\","))|join("")|fromjson; — that'll turn all "null" values to '"\"null\"" and all null values into "null". To sidestep the issue entirely.Sought
And, a slightly less ridiculous version: def unnullify: (..|objects)?|=(with_entries(.value|=(if ([.]|inside([null,"null"])) then tojson else . end)));Sought
I
1

Given that jq is the way it is, and in particular that it is stream-oriented, I'm inclined to think that a better approach would be to define and use one or more filters that make the distinctions you want. Thus rather than writing .a to access the value of a field, you'd write get("a") assuming that get/1 is defined as follows:

def get(f): if has(f) then .[f] else error("\(type) is not defined at \(f)") end;

Now you can easily tell whether or not an object has a key, and you're all set to go. This definition of get can also be used with arrays.

Inextirpable answered 13/12, 2016 at 19:14 Comment(2)
You beat me to it! Also, I'd like to point out that your get as defined works fine with arrays already.Sought
@Sought - I updated the def so that now it makes sense to use if for both objects and arrays.Inextirpable

© 2022 - 2024 — McMap. All rights reserved.