Elasticsearch aggregation with hierarchical category, subcategory; limit the levels
Asked Answered
B

2

4

I have products with categories field. Using the aggregation I can get the full categories with all subcategories. I want to limit the levels in the facet.

e.g. I have the facets like:

auto, tools & travel    (115)
auto, tools & travel > luggage tags (90)
auto, tools & travel > luggage tags > luggage spotters  (40)
auto, tools & travel > luggage tags > something else    (50)
auto, tools & travel > car organizers   (25)

Using aggregation like

"aggs": {
    "cat_groups": {
      "terms": {
        "field": "categories.keyword",
        "size": 10,
       "include": "auto, tools & travel > .*"
      }
    }
}

I am getting buckets like

"buckets": [
        {
          "auto, tools & travel > luggage tags",
          "doc_count": 90
        },
        {
          "key": "auto, tools & travel > luggage tags > luggage spotters",
          "doc_count": 40
        },
        {
          "key": "auto, tools & travel > luggage tags > something else",
          "doc_count": 50
        },
        {
          "key": "auto, tools & travel > car organizers",
          "doc_count": 25
        }
]

But I want to limit the level. e.g. I want to get only the results for auto, tools & travel > luggage tags. How can I limit the levels? By the way, "exclude": ".* > .* > .*" does not work for me.

I need to get buckets for different levels according to search. Sometimes first level, and sometimes second or third. When I want first level, I don't want the second levels to appear on buckets; and so on for other levels.

Elasticsearch version 6.4

Breakthrough answered 23/10, 2018 at 3:36 Comment(0)
K
7

Finally I've been able to figure the below technique.

I have implemented a custom analyzer using Path Hierarchy Tokenizer and I have created multi-field called categories so that you can use categories.facets for aggregations/facets and do normal text search using categories.

The custom analyzer would only apply for categories.facets

Do note the property "fielddata": "true" for my field categories.facet

Mapping

PUT myindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "path_hierarchy",
          "delimiter": ">"
        }
      }
    }
  },
  "mappings": {
    "mydocs": {
      "properties": {
        "categories": {
          "type": "text",
          "fields": {
            "facet": { 
              "type":  "text",
              "analyzer": "my_analyzer",
              "fielddata": "true"
            }
          }
        }
      }
    }
  }
}

Sample Documents

POST myindex/mydocs/1
{
    "categories" : "auto, tools & travel > luggage tags > luggage spotters"
}

POST myindex/mydocs/2
{
    "categories" : "auto, tools & travel > luggage tags > luggage spotters"
}

POST myindex/mydocs/3
{
    "categories" : "auto, tools & travel > luggage tags > luggage spotters"
}

POST myindex/mydocs/4
{
    "categories" : "auto, tools & travel > luggage tags > something else"
}

Query

You can try the below query which you are looking for. Again I've implemented Filter Aggregation because you need only specific words along with Terms Aggregation.

{
  "size": 0,
  "aggs":{
    "facets": {
      "filter": { 
          "bool": {
            "must": [
              { "match": { "categories": "luggage"} }
            ]
         }
      },
      "aggs": {
        "categories": {
          "terms": {
            "field": "categories.facet"
          }
        }
      }
    }
  }
}

Response

{
    "took": 43,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 11,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "facets": {
            "doc_count": 4,
            "categories": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                    {
                        "key": "auto, tools & travel ",
                        "doc_count": 4
                    },
                    {
                        "key": "auto, tools & travel > luggage tags ",
                        "doc_count": 4
                    },
                    {
                        "key": "auto, tools & travel > luggage tags > luggage spotters",
                        "doc_count": 3
                    },
                    {
                        "key": "auto, tools & travel > luggage tags > something else",
                        "doc_count": 1
                    }
                ]
            }
        }
    }
}

Final Answer Post Discussion On Chat

POST myindex/_search
{
  "size": 0,
  "aggs":{
    "facets": {
      "filter": { 
          "bool": {
            "must": [
              { "match": { "categories": "luggage"} }
          ]
        }
      },
      "aggs": {
        "categories": {
          "terms": {
            "field": "categories.facet",
            "exclude": ".*>{1}.*>{1}.*"
          }
        }
      }
    }
  }
}

Note that I've added exclude with a regular expression in such a way that it would not consider any facets which is having more than one occurrence of >

Let me know this if it helps.

Kareykari answered 23/10, 2018 at 5:56 Comment(9)
Thank you, but that does not do anything regarding limiting the category level.Breakthrough
@Breakthrough Could you check the answer now. I've updated it.Kareykari
Can you please check the chatroom?Breakthrough
@Breakthrough I've finally been able to figure out. Please check the section Final Answer Post Discussion On Chat where I've added an exclude clause. Please accept the answer as it solves what you are looking for and if you have any doubts please do let me know.Kareykari
@Breakthrough Posted an answer there! :)Kareykari
Hi, just wonder if there is a solution without setting fielddata to true... To me it is not possible as path_tokenizer only applies to text field. Am I correct?Antisthenes
@Antisthenes Yes that is correct. It only applies to text field as keyword doesn't make use of analyzer. Unless you have huge size of data for that field, you do not have to worry about the performance part. Its been said to use keyword instead of using text with fielddata: true when it comes to sorting and aggregation queries, but it doesn't apply to this case, in other ways this is the only option we have. Hope that clarifies.Kareykari
@Kamal Thanks for the detailed explanations. I have implemented this way and it works well !Antisthenes
@Kamal Just did :)Antisthenes
V
0

Just add an integer field named level signifying your category's level in the hierarchy. Just count the number of occurrence of your delimiter '>' and save it as the value. Then add a rangeQuery to your boolQuery.

Add this to your schema:

"level": {
    "type": "integer",
    "store": "true",
    "index": "true"
}

In your code you have something like this which counts the number of delimiter suggesting the level of hierarchy (no delimiter means main category):

public Builder(final String path) {
    this.path = path;
    this.level = StringUtils.countMatches(path, DELIMITER);
}

and then your query search could have something like:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "prefix": {
                        "category": {
                            "value": "auto, tools & travel",
                            "boost": 1
                        }
                    }
                },
                {
                    "range": {
                        "level": {
                            "from": 2,
                            "to": 4,
                            "include_lower": true,
                            "include_upper": true,
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    }
}
Vtarj answered 28/1, 2019 at 16:3 Comment(1)
you can help him better by providing some demo code :)Delapaz

© 2022 - 2024 — McMap. All rights reserved.