Elastic Search Geo Spatial search implementation
Asked Answered
C

2

7

I am trying to understand how elastic search supports Geo Spatial search internally.

For the basic search, it uses the inverted index; but how does it combine with the additional search criteria like searching for a particular text within a certain radius.

I would like to understand the internals of how the index would be stored and queried to support these queries

Counterrevolution answered 17/5, 2020 at 14:46 Comment(3)
What do you mean by "searching for a particular text within a certain radius"?Kanazawa
Lets say you are searching for a keyword "Pizza" and you are expecting to find a list of Places (restaurants etc) nearby that match your keywordCounterrevolution
@Counterrevolution did you find the answer else where? If so please can you share here?!Bomar
K
0

Text & geo queries are executed separately of one another. Let's take a concrete example:

PUT restaurants
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      },
      "menu": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

POST restaurants/_doc
{
  "name": "rest1",
  "location": {
    "lat": 40.739812,
    "lon": -74.006201
  },
  "menu": [
    "european",
    "french",
    "pizza"
  ]
}

POST restaurants/_doc
{
  "name": "rest2",
  "location": {
    "lat": 40.7403963,
    "lon": -73.9950026
  },
  "menu": [
    "pizza",
    "kebab"
  ]
}

You'd then match a text field and apply a geo_distance filter:

GET restaurants/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "menu": "pizza"
          }
        },
        {
          "geo_distance": {
            "distance": "0.5mi",
            "location": {
              "lat": 40.7388,
              "lon": -73.9982
            }
          }
        },
        {
          "function_score": {
            "query": {
              "match_all": {}
            },
            "boost_mode": "avg",
            "functions": [
              {
                "gauss": {
                  "location": {
                    "origin": {
                      "lat": 40.7388,
                      "lon": -73.9982
                    },
                    "scale": "0.5mi"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Since the geo_distance query only assigns a boolean value (--> score=1; only checking if the location is within a given radius), you may want to apply a gaussian function_score to boost the locations that are closer to a given origin.

Finally, these scores are overridable by using a _geo_distance sort where you'd order by the proximity (while of course keeping the match query intact):

...
  "query: {...},
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 40.7388,
          "lon": -73.9982
        },
        "order": "asc"
      }
    }
  ]
}
Kanazawa answered 17/5, 2020 at 21:37 Comment(1)
I wanted to understand how the geo_distance works internally. Normally if its just text search, it works on top of the inverted index; but with the geo_distance filter included, what additional data strcutures would be required to handle such a queryCounterrevolution
G
0

Since 2018, they've used quadtrees under the hood. See the elastic blog https://www.elastic.co/blog/supercharging-geopoint

If you want to find out more about Quadtrees, there are countless blog posts and videos online that explain them. Wikipedia is always a great place to start: https://en.wikipedia.org/wiki/Quadtree

Garrity answered 22/6, 2024 at 15:54 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.