jq: group and key by property
Asked Answered
P

4

50

I have a list of objects that look like this:

[
  {
    "ip": "1.1.1.1",
    "component": "name1"
  },
  {
    "ip": "1.1.1.2",
    "component": "name1"
  },
  {
    "ip": "1.1.1.3",
    "component": "name2"
  },
  {
    "ip": "1.1.1.4",
    "component": "name2"
  }
]

Now I'd like to group and key that by the component and assign a list of ips to each of the components:

{
  "name1": [
    "1.1.1.1",
    "1.1.1.2"
  ]
},{
  "name2": [
    "1.1.1.3",
    "1.1.1.4"
  ]
}
Plasmolysis answered 5/4, 2017 at 3:46 Comment(0)
P
73

I figured it out myself. I first group by .component and then just create new lists of ips that are indexed by the component of the first object of each group:

jq ' group_by(.component)[] | {(.[0].component): [.[] | .ip]}'

Plasmolysis answered 5/4, 2017 at 3:55 Comment(4)
And what if key is of type number?Presentationism
@Presentationism It can’t. Object keys are always strings in JSON. If you have a JSON like {"0": 1} you can get the "0" key using ."0".Unknow
The output produced by this query isn't valid json. You can find further explanations in this answer I posted recently: https://mcmap.net/q/348096/-jq-group-and-key-by-propertyAircrew
Pipe the output to | jq -s . to create valid json. So Final Answer: jq ' group_by(.component)[] | {(.[0].component): [.[] | .ip]}' | jq -s .Perishing
A
19

The accepted answer doesn't produce valid json, but:

{
  "name1": [
    "1.1.1.1",
    "1.1.1.2"
  ]
}
{
  "name2": [
    "1.1.1.3",
    "1.1.1.4"
  ]
}

name1 as well as name2 are valid json objects, but the output as a whole isn't.

The following jq statement results in the desired output as specified in the question:

group_by(.component) | map({ key: .[0].component, value: [.[] | .ip] }) | from_entries

Output:

{
  "name1": [
    "1.1.1.1",
    "1.1.1.2"
  ],
  "name2": [
    "1.1.1.3",
    "1.1.1.4"
  ]
}

Suggestions for simpler approaches are welcome.

If human readability is preferred over valid json, I'd suggest something like ...

jq -r 'group_by(.component)[] | "IPs for " + .[0].component + ": " + (map(.ip) | tostring)'

... which results in ...

IPs for name1: ["1.1.1.1","1.1.1.2"]
IPs for name2: ["1.1.1.3","1.1.1.4"]
Aircrew answered 27/9, 2021 at 11:49 Comment(2)
This is quite flexible once you get the hang of it! I actually ended up nesting two constructs like that, in order to parse logs and count errors and warnings independently, indexed according to another criterion. Isn’t key: (.[0].component) equivalent to key: .[0].component, by the way? Readability purpose?Cyprinid
@AliceM, yes, it's equivalent, and I don't think the braces increase readability. Probably it was some copy/paste leftover. I'll remove the braces now, thanks for the hint!Aircrew
C
1

As a further example of @replay's technique, after many failures using other methods, I finally built a filter that condenses this Wazuh report (excerpted for brevity):

{
  "took" : 228,
  "timed_out" : false,
  "hits" : {
    "total" : {
      "value" : 2806,
      "relation" : "eq"
    },
    "hits" : [
      {
        "_source" : {
          "agent" : {
            "name" : "100360xx"
          },
          "data" : {
            "vulnerability" : {
              "severity" : "High",
              "package" : {
                "condition" : "less than 78.0",
                "name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"
              }
            }
          }
        }
      },
      {
        "_source" : {
          "agent" : {
            "name" : "100360xx"
          },
          "data" : {
            "vulnerability" : {
              "severity" : "High",
              "package" : {
                "condition" : "less than 78.0",
                "name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"
              }
            }
          }
        }
      },
      ...

Here is the jq filter I use to provide an array of objects, each consisting of an agent name followed by an array of names of the agent's vulnerable packages:

jq ' .hits.hits |= unique_by(._source.agent.name, ._source.data.vulnerability.package.name) | .hits.hits | group_by(._source.agent.name)[] | { (.[0]._source.agent.name): [.[]._source.data.vulnerability.package | .name ]}'

Here is an excerpt of the output produced by the filter:

{
  "100360xx": [
    "Mozilla Firefox 68.11.0 ESR (x64 en-US)",
    "VLC media player",
    "Windows 10"
  ]
}
{
  "WIN-KD5C4xxx": [
    "Windows Server 2019"
  ]
}
{
  "fridxxx": [
    "java-1.8.0-openjdk",
    "kernel",
    "kernel-headers",
    "kernel-tools",
    "kernel-tools-libs",
    "python-perf"
  ]
}
{
  "mcd-xxx-xxx": [
    "dbus",
    "fribidi",
    "gnupg2",
    "graphite2",
    ...
Cooker answered 11/8, 2020 at 21:16 Comment(0)
G
1

As an simpler alternative to yaccob's answer, you can just use the add operator to merge the list of JSON objects into a single object.

jq '[group_by(.component)[] | {(.[0].component): [.[] | .ip]}] | add'

Output:

{
  "name1": [
    "1.1.1.1",
    "1.1.1.2"
  ],
  "name2": [
    "1.1.1.3",
    "1.1.1.4"
  ]
}
Giaour answered 29/1 at 19:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.