I would like to know what people suggest as efficient ways of doing a spatial query in an Amazon Web Services SimpleDB?
By spatial query I mean finding objects in a given radius of a latitude and longitude.
I would like to know what people suggest as efficient ways of doing a spatial query in an Amazon Web Services SimpleDB?
By spatial query I mean finding objects in a given radius of a latitude and longitude.
SimpleDB doesn't currently offer any built-in spatial search operations but that doesn't mean it can't be done. There's several methods of implementing geospatial searches in non-geospatially aware databases such as SimpleDB and all of them center around the idea of using the database to retrieve a rough first selection based on a geospatial bounding box and then filtering the returned data in your application using more accurate algorithms such as the Haversine formula.
You could store the latitude and longitude as (zero-padded and normalized) numeric attributes and then perform a double range query (lat >= minLat and lat <= maxLat and lon >= minLat and lon <= maxLat
) but since neither of theese predicates are selective (each predicate matches a lot of items) it's not ideal (see Tuning Queries).
A better way would be using GeoHashes.
Geohashes offer properties like arbitrary precision, similar prefixes for nearby positions, and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision).
As a practical example, the Geohash 6gkzwgjzn820 decodes to the coordinates -25.382708 and -49.265506, while the Geohash 6gkzwgjz will decode to -25.383 and -49.266, and if we take a similar position in the same region, such as -25.427 and -49.315, we can see it being encoded as 6gkzmg1w (note the similar prefix).
From http://geohash.org/site/tips.html
With your item positions as GeoHashes you could use the like
operator to search for a bounding box (where GeoHash like '6gkzmg1w%'
) but since the like
operator is expensive (Comparison Operators) a better way would be to denormalize the data by storing each GeoHash prefix level (how many depends on your required search precision) as a separate attribute (GeoHash6 GeoHash8 etc) and then use a simple equality predicate (where Geohash8 = '6gkzmg1w'
).
Now on to the downside of GeoHashes. Since you can't make any assumption of a GeoHash being centered within your search box you have to search all neighboring prefixes as well. The process is excellently described by geohash-js
Geohash also has the property that as the number of digits decreases (from the right), accuracy degrades. This property can be used to do bounding box searches, as points near to one another will share similar Geohash prefixes.
However, because a given point may appear at the edge of a given Geohash bounding box, it is necessary to generate a list of Geohash values in order to perform a true proximity search around a point. Because the Geohash algorithm uses a base-32 numbering system, it is possible to derive the Geohash values surrounding any other given Geohash value using a simple lookup table.
So, for example, 1600 Pennsylvania Avenue, Washington DC resolves to: 38.897, -77.036
Using the geohash algorithm, this latitude and longitude is converted to: dqcjqcp84c6e
A simple bounding box around this point could be described by truncating this geohash to: dqcjqc
However, 'dqcjqcp84c6e' is not centered inside 'dqcjqc', and searching within 'dqcjqc' may miss some desired targets.
So instead, we can use the mathematical properties of the Geohash to quickly calculate the neighbors of 'dqcjqc'; we find that they are: 'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8'
This gives us a bounding box around 'dqcjqcp84c6e' roughly 2km x 1.5km and allows for a database search on just 9 keys: SELECT * FROM table WHERE LEFT(geohash,6) IN ('dqcjqc', 'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8');
Translated to a SimpleDB query that'd be where GeoHash6 in('dqcjqc', 'dqcjqf', 'dqcjqb', 'dqcjr1', 'dqcjq9', 'dqcjqd', 'dqcjr4', 'dqcjr0', 'dqcjq8')
and then you'll do your Haversine filtering on the results in order to only get the items that's within your search radius.
I'm going to leave this here because it might help you!
14 years ago we tried to do a geo lookup table of locations within a radius. There was obviously no geospatial indexes or anything like that. There was literally only standard SQL and Oracle... anyway, we ended up converting all lat/lng into kilometers from a fixed plane field. Essentially what geospatial indexes do these days.
To explain what exactly it does, it turns the world into a flat surface and with a bit of SQL trickery you can even select by radius, you even get the distance from the two points you're selecting. Since it's also raw full integers the queries are blazing fast.
Here is a simple example in PHP and a very complex looking but pretty easy once you understand it SQL query:
© 2022 - 2024 — McMap. All rights reserved.