aggregation on path hierarchy tokens
Asked Answered
D

1

6

I've read a lot in the last couple of days, but wasn't able to find a solution which works for me. Also saw some stuff where the string type was using, which is deprecated in the ES version I'm using.

I'm on Elasticsearch: 5.6.4

I've indexed some documents, and played around with the mapping and tried to use analyzers (path_hierarchy tokenizer), take a look on Ingest Node but nothing seems to suit for me. It is about the category_tags field (see example at the bottom). I do have the possibility to restructure it as I like, if it is necessary, I'm generating this data.

I would like to have a typical e-commerce navigation, so I think this should be realized with aggregations on the category_tags? I've created an array to show that a document can have multiple categories, where each hierarchy can be arbitrary deep. I don't think that it will be deeper than 4 or 5 levels, but could happen.

my dynamic template looks like:

      ...
    "analyzer": {
      "my_path_hierarchy_analyzer": {
        "type": "custom",
        "tokenizer": "my_path_hierarchy_tokenizer"
      },
      "my_pipe_analyzer": {
        "type": "custom",
        "tokenizer": "my_pipe_tokenizer"
      }
    },
    "tokenizer": {
      "my_path_hierarchy_tokenizer": {
        "type": "path_hierarchy",
        "delimiter": "|"
      },
      "my_pipe_tokenizer": {
        "type": "pattern",
        "pattern": "|"
      }
    }
  }
},
"mappings": {
  "item": {
    "dynamic_templates": [
      {
        "category_tags_analyzed": {
          "match": "category_tags",
          "mapping": {
            "type": "text",
            "analyzer": "my_path_hierarchy_analyzer",
            "fields": {
              "textsearch": {
                "type": "text",
                "analyzer": "my_pipe_analyzer"
              }
            }
          }
        }
      },
      ...

When doing aggs on a text type field, it throws an error because of fielddata. Also I think I shouldn't set it to true here anyway. And on keyword type fields it hasn't even indexed a document, throws an error. So it wasn't allowed i guess.

the documents would look like:

"hits": [
  {
    "_index" : "my_index",
    "_type" : "my_type",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
      ...,
      "category_tags" : [
        "Men|Tops|Shirts",
        "Men|Sale",
        "Men|Whatever"
      ],
      ...
    }
  }
]

Now I don't know if I have to use the Path Hierarchy tokenizer somehow, the nested type, a combination of both or whatever ES offers. So is it even possible to do e.g. a terms aggregation on category_tags and get a "useful" result?

Useful would be that the data is structured so I can use it for a e-commerce based (tree-like) navigation. So that a user can click through the navigation tree, (on every click, I would send a request to ES to filter on that) and the results are shown based on whatever was clicked.

Dorser answered 8/11, 2017 at 22:20 Comment(0)
G
8

I found a couple of great articles on this issue (here and here), and also experimented a bit. While the two articles reference an older version, a few tweaks got things working with ES 6.

Here's what worked for me, I haven't tested with multiple categories per item (e.g. your array example), but it would likely still work:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "path-analyzer": {
          "tokenizer": "path-tokenizer"
        }
      },
      "tokenizer": {
        "path-tokenizer": {
          "type": "path_hierarchy",
          "delimiter": "|"
        }
      }
    }
  },
  "mappings": {
    "item": {
      "dynamic": "strict",
      "properties": {
        "category_path": {
          "type": "text",
          "analyzer": "path-analyzer",
          "search_analyzer": "keyword",
          "fielddata": true
        }
      }
    }
  }
}

Your aggregations request would look something like this:

{
  "aggs": {
    "category": {
      "terms": {
        "field": "category_path"
      }
    }
  },
  "size": 0
}

And your results:

  "buckets": [
    {
      "key": "Men",
      "doc_count": 11
    },
    {
      "key": "Men|Sale",
      "doc_count": 4
    },
    {
      "key": "Men|Tops",
      "doc_count": 3
    },
    {
      "key": "Men|Tops|Shirts",
      "doc_count": 2
    }
  ]
Genesa answered 18/4, 2018 at 16:23 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.