Group Buckets by Similarity
Asked Answered
T

1

0

I'm looking to group the terms returned from the significant terms aggregation.

Something that would take a significant terms response from this:

[
  {
    "key" : "ok",
    "doc_count" : 200,
    "score" : 8.583258052060206E-4,
    "bg_count" : 213
  },
  {
    "key" : "okay",
    "doc_count" : 117,
    "score" : 4.814546694690713E-4,
    "bg_count" : 126
  },
  {
    "key" : "something else",
    "doc_count" : 100,
    "score" : 2.3240213379936128E-4,
    "bg_count" : 78
  }
]

and change it too something like this

[
  {
    "grouped_keys" : ["ok","okay"],
    "doc_count" : 317,
    "score" : 8.583258052060206E-4,
    "bg_count" : 339
  },
  {
    "grouped_keys" : ["something else"],
    "doc_count" : 100,
    "score" : 2.3240213379936128E-4,
    "bg_count" : 78
  }
]

I haven't really tried too much, as I have no idea where to start. I did some reading into link but I'm not too sure how relevant it is https://discuss.elastic.co/t/group-documents-by-similarity-using-elser/342913/3

Tamekia answered 3/4 at 12:47 Comment(0)
N
0

The only way I see is manually group terms in a runtime field

Documents

PUT /grouped_keys/_bulk
{"create":{"_id":1}}
{"key":"ok","fictive":1}
{"create":{"_id":2}}
{"key":"okay","fictive":1}
{"create":{"_id":3}}
{"key":"okay","fictive":1}
{"create":{"_id":4}}
{"key":"ok","fictive":1}
{"create":{"_id":5}}
{"key":"something else","fictive":1}
{"create":{"_id":6}}
{"key":"ok","fictive":1}
{"create":{"_id":7}}
{"key":"something else","fictive":1}
{"create":{"_id":8}}
{"key":"ok","fictive":2}
{"create":{"_id":9}}
{"key":"something else","fictive":2}

significant_terms Query

GET /grouped_keys/_search?filter_path=aggregations
{
    "runtime_mappings": {
        "grouped_key": {
            "type": "keyword",
            "script": {
                "source": """
                    List groupedKeys = new LinkedList();
                    groupedKeys.add(['ok', 'okay']);
                    groupedKeys.add(['something else']);
                    groupedKeys.add(['default key']);
                    
                    String key = doc['key.keyword'].value;
                    for (List groupedKey : groupedKeys) {
                        int position = Collections.binarySearch(groupedKey, key);
                        if (position > -1) {
                            emit(String.join(', ', groupedKey));
                            return;
                        }
                    }

                    emit(String.join(', ', groupedKeys[groupedKeys.size() - 1]));
                """
            }
        }
    },
    "fields": [
        "grouped_key"
    ],
    "query": {
        "term": {
            "fictive": "1"
        }
    },
    "aggs": {
        "by_grouped_key": {
            "significant_terms": {
                "field": "grouped_key"
            }
        }
    }
}

Response

{
    "aggregations" : {
        "by_grouped_key" : {
            "doc_count" : 7,
            "bg_count" : 9,
            "buckets" : [
                {
                    "key" : "ok, okay",
                    "doc_count" : 5,
                    "score" : 0.05102040816326537,
                    "bg_count" : 6
                }
            ]
        }
    }
}
Norwich answered 6/4 at 10:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.