ElasticSearch QuadPrefixTree vs GeohashPrefixTree
Asked Answered
A

3

6

I am new to ElasticSearch and I want to understand the difference between using geohashes and quadtree. In the reference it is written:

  • GeohashPrefixTree - Uses geohashes for grid squares. Geohashes are base32 encoded strings of the bits of the latitude and longitude interleaved. So the longer the hash, the more precise it is. Each character added to the geohash represents another tree level and adds 5 bits of precision to the geohash. A geohash represents a rectangular area and has 32 sub rectangles. The maximum amount of levels in Elasticsearch is 24.
  • QuadPrefixTree - Uses a quadtree for grid squares. Similar to geohash, quad trees interleave the bits of the latitude and longitude the resulting hash is a bit set. A tree level in a quad tree represents 2 bits in this bit set, one for each coordinate. The maximum amount of levels for the quad trees in Elasticsearch is 50.

I don't get the difference, for example if I take the point: Latitude / Longitude = 42.9123456, 21.799986 the geohash is srxs05fq8. Can anyone explain how can I calculate the quadtree representation?

Moreover, when is it better to prefer one over the other?

Anhedral answered 12/5, 2015 at 5:36 Comment(1)
Geohash itself is a kind of quatree: see "Point quadtree" or "Point-region quadtree" broader concept.Egesta
M
1

Within ElasticSearch you can select the use of quadtrees by setting the tree option to quadtree

If you want to calculate the quadtree value yourself (outside of ElasticSearch) I recommend the python-geohash module which also includes a robust quadtree implementation. With this library, calculating a quadtree is as simple as:

quad = quadtree.encode(38.90533, -77.01965)

I've create a sample of using the quadtree library in python

There are several benefits to using geohash over quadtree:

  • cross language support. geohash has open source implementations in a number of languages (c, java, python, ruby, perl, javascript). If you need external contributors to connect with your spatial data, you gain maximum flexibility with geohash. If you are using python as an application programming language, there is a pre-built c-extension that helps compute geohash values with greater speed than quadtree.

    • bug hunting. Geohash is widely adopted in the open source community (e.g. ElasticSearch, MongoDB, and others) implement geohash indexes as a form of spatial indexing in their systems. If you run into something weird with your data, your odds of finding the problem (and solution) are better with geohash.

    • community. You can create URLs that convert a geohash to a visible map via a shareable URL at geohash.org. For example, here's a URL for Washington, DC

Malave answered 15/7, 2015 at 22:2 Comment(0)
C
1

A quadtree can be implemented to interleave the bits more precisely a quadtree is nearly the same. The geohash is adding to the data structure the special variable string identifier with the variable precision. Both methods uses a z order space filling curve by interleaving the bits. Translate the coordinate to a binary and interleave it. Treat it as base-4 number.

Carpathoukraine answered 4/2, 2016 at 13:24 Comment(0)
S
1

Quadtrees are more predictable in that each level spans a square, whereas in geohash representation sometimes squares sometimes rectangles are spanned.

See: geohash, quadtree

So, if you are presenting a heatmap where the user can adjust the resolution, quadtree will give a smoother interface as the user will know that the width and height of the squares will be increased/decreased 2 times at each level increment/decrement.

Surgeonfish answered 18/5, 2017 at 18:12 Comment(1)
Geohash is a base32 integer number, so you can convert to any other base... The hierarchy of Geohash is based in recursive 4-region splits, so the correct base to represent simetric structure (with predicable geometry) is base4. See geohash4.Egesta

© 2022 - 2024 — McMap. All rights reserved.