jq: Object cannot be csv-formatted, only array
Asked Answered
K

2

14

I am new to jq and I have a JSON file from a DynamoDB table which I want to convert to CSV. This is my JSON file.

[
    {
        "SnsPublishTime": {
            "S": "2019-07-27T15:07:38.904Z"
        },
        "SESreportingMTA": {
            "S": "dsn; a8-19.smtp-out.amazonses.com"
        },
        "SESMessageType": {
            "S": "Bounce"
        },
        "SESDestinationAddress": {
            "S": "[email protected]"
        },
        "SESMessageId": {
            "S": "0100016c33f91857-600a8e44-c419-4a02-bfd6-7f6908f5969e-000000"
        },
        "SESbounceSummary": {
            "S": "[{\"emailAddress\":\"[email protected]\",\"action\":\"failed\",\"status\":\"5.1.1\",\"diagnosticCode\":\"smtp; 550 5.1.1 user unknown\"}]"
        }
    }
]

I get the correct output if I run

jq -r '.[] ' test.json

but if I run

jq -r '.[] |@csv' test.json

Then I am getting an error:

jq: error (at test.json:22): object ({"SnsPublis...) cannot be csv-formatted, only array

How can I convert this JSON to a CSV properly? I tried googling for over an hour and can't seem to be able to figure it out.

Thank you!

Kimon answered 28/7, 2019 at 15:19 Comment(2)
There is no simple, straightforward way to represent nested structures in CSV. If you can specify how you expect this to be serialized into CSV (and, probably, demonstrate a feasible attempt to implement that yourself) perhaps we can help you get there.Sama
Thanks @tripleee, your keyword "nested" really helped me to narrow down my search. This command is enough to what I was looking for. code jq -r '.[] | [ .SnsPublishTime.S, .SESreportingMTA.S, .SESMessageType.S, .SESDestinationAddress.S]| @csv' test.json Now, for some reason, the result still contains the quotes and I am using the "-r" option, by I can live with that.Kimon
P
17

Here is a generic JSON-to-CSV converter that makes just one major assumption and one minor assumption.

The major assumption is that all the JSON entities are conformal. In your case, it means that all corresponding objects have the same keys (though they may be in different order). If this assumption is ever violated, an error condition is raised, and processing stops.

The minor assumption is that key names do not contain a dot ("."); if any key name does contain a dot, then some of the header names might be difficult to read or parse, since the headers are formed by using the dot as the join character. If that is a problem, then you might wish to use a different join character.

The jq program

def json2header:
  [paths(scalars)];

def json2array($header):
  json2header as $h
  | if $h == $header or (($h|sort) == ($header|sort))
    then [$header[] as $p | getpath($p)]
    else "headers do not match: expected followed by found paths:" | debug
    | ($header|map(join(".")) | debug)
    | ($h|map(join(".")) | debug)
    | "headers do not match" | error
    end ;

# given an array of conformal objects, produce "CSV" rows, with a header row:
def json2csv:
  (.[0] | json2header) as $h
  | ([$h[]|join(".")], (.[] | json2array($h))) 
  | @csv ;

# `main`
json2csv

The invocation

jq -rf json2csv.jq INPUT.json

The output

"SnsPublishTime.S","SESreportingMTA.S","SESMessageType.S","SESDestinationAddress.S","SESMessageId.S","SESbounceSummary.S"
"2019-07-27T15:07:38.904Z","dsn; a8-19.smtp-out.amazonses.com","Bounce","[email protected]","0100016c33f91857-600a8e44-c419-4a02-bfd6-7f6908f5969e-000000","[{""emailAddress"":""[email protected]"",""action"":""failed"",""status"":""5.1.1"",""diagnosticCode"":""smtp; 550 5.1.1 user unknown""}]"

Variation: reading a JSON stream

With the above infrastructure, it is also easy to convert a stream of conformal JSON entities into the CSV format, with headers.

def inputs2csv:
  json2header as $h
  | [$h[]|join(".")],
    json2array($h),
    (inputs|json2array($h))
  | @csv ;

# `main`
inputs2csv

Illustration showing that keys in corresponding objects need not be in the same order

[ {a:1, b: {c:3, d: [{e:4},{e:5, f:6}]}},
  {b: {d: [{e:4},{f:6, e:5}], c:3}, a:1}
 ] 
| json2csv

produces:

"a","b.c","b.d.0.e","b.d.1.e","b.d.1.f"
1,3,4,5,6
1,3,4,5,6

Another variation

Under some circumstances, the checking for conformity might not be necessary, so you'd be left with:

def json2array($header):
  [$header[] as $p | getpath($p)];
Predispose answered 28/7, 2019 at 19:11 Comment(2)
Hi @peak, I created a jq file with the contents of the jq program above, getting a 'Cannot index object with number' error message when running as advised. Any thoughts on what/how to debug?Colwen
You could perhaps use the debug filter.Predispose
P
9

For the record, here is a generic JSON-to-CSV converter for converting any array of JSON objects to CSV (with headers). There are no restrictions on these objects, but the transformation is not always invertible, and the output cells might include stringified compound entities -- see "Caveats".

json2csv

# emit a stream
def json2headers:
  def isscalar: type | . != "array" and . != "object";
  def isflat: all(.[]; isscalar);
  paths as $p
  | getpath($p)
  | if type == "array" and isflat then $p
     elif isscalar and (($p[-1]|type) == "string") then $p
     else empty end ;

def json2array($header):
  def value($p):
    try getpath($p) catch null
    | if type == "object" then null else . end;
  [$header[] as $p | value($p)];

def json2csv:
  ( [.[] | json2headers] | unique) as $h
  | ([$h[]|join("_") ],
     (.[]
      | json2array($h)
      | map( if type == "array" then map(tostring)|join("|") else tostring end)))
  | @csv ;

Usage

One way to use json2csv.jq as specified above is as a jq module, e.g.

jq -r -L. 'include "json2csv"; json2csv' input.json

If the input is a stream of JSON objects:

jq -rn -L. 'include "json2csv"; [inputs]|json2csv' input.json

Caveats

  • For each object in the top-level array, the set of paths to all scalars and scalar-valued arrays is computed; if any such path is object-valued or invalid for another object, the corresponding cell in the output for that object will be "null".

  • Flat arrays are converted to pipe-separated values, so that if the input includes an array such as ["1|2", ["3|4"], it will be indistinguishable from the string value, "1|2|3|4", etc. If this is a problem, the character used as a separator value for array items can of course be changed.

  • Similar collisions can occur amongst the headers.

Conversion to TSV

sed 's/@csv/@tsv/' json2csv.jq > json2tsv.jq
Predispose answered 17/9, 2019 at 23:12 Comment(1)
For those like me that have never imported a custom jq module before the -L . part of the command specifies where the json2csv.jq file is located. If your script isn't in the current directory change the . to the path of the script. Took me a while to figure that part out.Coccidiosis

© 2022 - 2024 — McMap. All rights reserved.