using jq to assign multiple output variables
Asked Answered
P

3

41

I am trying to use jq to parse information from the TVDB api. I need to pull a couple of fields and assign the values to variables that I can continue to use in my bash script. I know I can easily assign the output to one variable through bash with variable="$(command)" but I need the output to produce multiple variables and I don't want to make to use multiple commands.

I read this documentation:

https://stedolan.github.io/jq/manual/v1.5/#Advancedfeatures

but I don't know if this relevant to what I am trying to do.

jq '.data' produces the following output:

[
  {
    "absoluteNumber": 51,
    "airedEpisodeNumber": 6,
    "airedSeason": 4,
    "airedSeasonID": 680431,
    "dvdEpisodeNumber": 6,
    "dvdSeason": 4,
    "episodeName": "We Will Rise",
    "firstAired": "2017-03-15",
    "id": 5939660,
    "language": {
      "episodeName": "en",
      "overview": "en"
    },
    "lastUpdated": 1490769062,
    "overview": "Clarke and Roan must work together in hostile territory in order to deliver an invaluable asset to Abby and her team."
  }
]

I tried jq '.data | {episodeName:$name}' and jq '.data | .episodeName as $name' just to try and get one working. I don't understand the documentation or even if it's what I'm looking for. Is there a way to do what I am trying to do?

Partizan answered 8/4, 2017 at 7:26 Comment(5)
Can you post the complete JSON and the actual fields needed?Margerymarget
Agreed, the current jq docs are not user-friendly. SO's own list of questions tagged jq and ranked by votes may help.Syrinx
.foo as $var creates a jq variable. That variable doesn't last beyond the point in time when jq exits. If you want a bash variable, you need to do that with... well... bash facilities.Chapel
I would start considering if a language other than bash might be more appropriate.Amaral
I am limited to what is available on the server.Partizan
A
63

You can use separate variables with read :

read var1 var2 var3 < <(echo $(curl -s 'https://api.github.com/repos/torvalds/linux' | 
     jq -r '.id, .name, .full_name'))

echo "id        : $var1"
echo "name      : $var2"
echo "full_name : $var3"

Using array :

read -a arr < <(echo $(curl -s 'https://api.github.com/repos/torvalds/linux' | 
     jq -r '.id, .name, .full_name'))

echo "id        : ${arr[0]}"
echo "name      : ${arr[1]}"
echo "full_name : ${arr[2]}"

Also you can split output with some character :

IFS='|' read var1 var2 var3 var4 < <(curl '......' | jq -r '.data | 
    map([.absoluteNumber, .airedEpisodeNumber, .episodeName, .overview] | 
    join("|")) | join("\n")')

Or use an array like :

set -f; IFS='|' data=($(curl '......' | jq -r '.data | 
    map([.absoluteNumber, .airedEpisodeNumber, .episodeName, .overview] | 
    join("|")) | join("\n")')); set +f

absoluteNumber, airedEpisodeNumber, episodeName & overview are respectively ${data[0]}, ${data[1]}, ${data[2]}, ${data[3]}. set -f and set +f are used to respectively disable & enable globbing.

For the part, all your required fields are mapped and delimited with a '|' character with join("|")

If your are using jq < 1.5, you'll have to convert Number to String with tostring for each Number fields eg:

IFS='|' read var1 var2 var3 var4 < <(curl '......' | jq -r '.data | 
    map([.absoluteNumber|tostring, .airedEpisodeNumber|tostring, .episodeName, .overview] | 
    join("|")) | join("\n")')
Agra answered 8/4, 2017 at 11:30 Comment(10)
I think that's what I'm looking for. I didn't know I could do it from the bash side like that. However, both suggestions give me the following error: jq: error (at <stdin>:26): string ("") and number (51) cannot be added Seems to be a number/string issue. It works if I use strings like episodeName and overview, but then the variable values are wrong because they are parsed on a space and there are spaces. episodeName and overveiw are ones I need so those need to work.Partizan
I still get that error if using one of the fields that returns a number instead of a string. I don't need any of those fields for my purposes but just mentioning it for future reference for others. Thanks for the help. I was approaching this all wrong, trying to solve it with jq instead of the shell.Partizan
Also, what's the difference between < <(curl... and how you had it before, <<< $(curl ...?Partizan
I've updated my post with string conversion for jq < 1.5. < <(...) is process substitution and for <<< the string at the right is expanded. Please see this post for full explanationAgra
The read approach is certainly better practice than the data=($(curl ...)) one -- what if one of the fields you were reading contained *? Even a name like Foo [Bar] is syntactically a glob expression, and would cause a failure on globfail, or evaluate to an empty string with nullglob.Chapel
Thank you for pointing that, I've added set -f before the operation to disable globbing, and set +f just after to re-enable itAgra
I am facing an issue with read, my return string looks like this: "2.3.0 Runner - Core tests". The part "2.3.0" is getting separated from the rest and is written to the second variable. Any idea how to avoid that?Vichy
The first snippet above spared me an undesirable dependency on another language in a case where I already had bash. I tried to copy-paste it for 45 minutes before a coworker explained why the enclosing echo $(...) is necessary. In my case, I knew that the values contained no spaces, so I could leave out the echo and add one further transformation, i.e. jq -r '[ .id, .name, .full_name ] | join(" ")'. Thanks for your meticulous breakdown!Arango
@Arango you can also use @sh ie jq -r '[ .id, .name, .full_name ]' | @shRiha
The first command gives me -bash: command substitution: line 1360: unexpected EOF while looking for matching )'Garzon
F
8

jq always produces a stream of zero or more values. For example, to produce the two values corresponding to "episodeName" and "id"' you could write:

.data[] | ( .episodeName, .id )

For your purposes, it might be helpful to use the -c command-line option, to ensure each JSON output value is presented on a single line. You might also want to use the -r command-line option, which removes the outermost quotation marks from each output value that is a JSON string.

For further variations, please see the jq FAQ https://github.com/stedolan/jq/wiki/FAQ, e.g. the question:

Q: How can a stream of JSON texts produced by jq be converted into a bash array of corresponding values?

Fluter answered 8/4, 2017 at 11:32 Comment(2)
That shows me how to filter out just those two value, which I can already do, but how do I use those values as variables further in my script?Partizan
this -c command option is the solution for so many questions!Employment
S
-2

Experimental conversion of quoted OP input, (tv.dat), to a series of bash variables, (and an array). The jq code is mostly borrowed from here and there, but I don't know how to get jq to unroll an array within an array, so the sed code does that, (that's only good for one level, but so are bash arrays):

jq -r ".[] | to_entries | map(\"DAT_\(.key) \(.value|tostring)\") | .[]" tv.dat | 
while read a b ; do echo "${a,,}='$b'" ; done |
sed -e '/{.*}/s/"\([^"]*\)":/[\1]=/g;y/{},/() /' -e "s/='(/=(/;s/)'$/)/"

Output:

dat_absolutenumber='51'
dat_airedepisodenumber='6'
dat_airedseason='4'
dat_airedseasonid='680431'
dat_dvdepisodenumber='6'
dat_dvdseason='4'
dat_episodename='We Will Rise'
dat_firstaired='2017-03-15'
dat_id='5939660'
dat_language=([episodeName]="en" [overview]="en")
dat_lastupdated='1490769062'
dat_overview='Clarke and Roan must work together in hostile territory in order to deliver an invaluable asset to Abby and her team.'
Syrinx answered 8/4, 2017 at 11:55 Comment(12)
Imaginative, but evaling data received off the Internet munged through a sed script strikes me as a recipe for security vulnerabilites, particularly when anyone who reads StackOverflow knows the sed script you're using. :)Chapel
...now, if you were just generating key=value pairs without the array syntax, you could feed it into declare -A vars=( ); while IFS== read -r key value; do vars[$key]=$value; done or such safely.Chapel
'$b' is absolutely not safe -- a value containing literal single-quotes can trivially escape it. (Hence touch $'$(rm -rf $HOME)\'$(rm -rf $HOME)\'' as my usual example of creating a malicious filename that avoids naive attempts at escaping).Chapel
@CharlesDuffy, Re eval: There's no eval in this answer. (Though I do approve of a judicious use of eval more than some.)Syrinx
Granted that there isn't an eval, but the output is in a format which appears to be anticipating eval, source, or some equivalent to actually load those variables into a running shell.Chapel
@CharlesDuffy, Re "absolutely": we seem to be of different schools here. With due respect to defensive programming practices, this answer is specific to TVDB data, (which should not contain anything like that), and therefore is not intended as a universal defensive-programming answer. If we suppose TVDB data is a likely to actually be an attack vector, then TVDB should be fixed, or not used at all.Syrinx
@CharlesDuffy, That malicious touch code might not be within my current view of this answer's scope, but it's interesting code either way. Please provide a link to a more thorough description of that code, or something like it.Syrinx
If TVDB has a security breach, do you really want that to be your problem rather than theirs? And then there are MITM attacks. (Granted, the only MITM attack I've actually been part of was an April Fool's joke almost 20 years ago, translating web pages requested by a specific system into pig latin... actually, no, several more recently, mostly intercepting and rewriting network calls made by games in attempts to "cheat" / avoid grinding). Re: the code above, I don't have a good link handy, but why not change it from rm to touch and play with it yourself?Chapel
Defense-in-depth exists for a reason: Small security breaches get escalated into big ones. If I've 0wned the web proxy at a company's boundary, then looking for places where they're, say, downloading and executing shell scripts is prime territory. And if I know they're running code that mishandles data from TVDB (maybe because someone there asked a question on StackOverflow and accepted a less-than-cautious answer -- or maybe because I have read-only access to -- or an ex-employee's snapshot of -- their source control and spotted some vulnerable code), yes, I might target that too.Chapel
The above is to say -- you seem to be advocating trusting TVDB. I advocate trusting no-one, except to the extent that you actually have a reason to do so and make a deliberate decision that such trust is appropriate. In this case, reducing the level of trust in the data could be as easy as sanitization with printf '%q=%q\n' "${a,,}" "$b" (lowercase per pubs.opengroup.org/onlinepubs/9699919799/basedefs/…).Chapel
@CharlesDuffy, Thanks for the lowercase note, fixed. While your advocacy of defense-in-depth is both valid and interesting, there's more than one side to such matters; unfortunately a comment section would not be the best forum to contrast alternate schools of programming prophylaxis.Syrinx
@CharlesDuffy, After some experimenting with your touch example, it's unclear how it applies to this instance. Could you provide a specific value of a (post-jq) read a b input line that would, if this answer's current code were run and then eval'd once, execute (at eval time) mplayer baz?Syrinx

© 2022 - 2024 — McMap. All rights reserved.