Missing results due to geo proximity formula (store locator)
Asked Answered
M

5

7

OK - I've been wrestling with this for about 3 months on and off and since I've exhausted every geo proximity formula out there that I've come across and I'm no closer to getting the right results I figured it time to ask for some help.

THE AIM

I'm setting up a fairly basic implementation of a store locator. The user enters their postcode and selects from a predefined list of search radii. The gmaps API generates lat/long coordinates for this address and passes them to a php script. In this script the user coords are queried against a mysql database table (structure below)

post_id int(11)                             
post_type varchar(20)                                
lat   float(10,6)                               
lng   float(10,6)

The results of this query (post ids) are entered into a wordpress query which generates the XML that contains the map marker data. (the wordpress query uses post__in and posts_per_page -1 to display info for all ID generated by the query

THE PROBLEM

In a nutshell, every implementation of the Haversine formula I've come across seems to result in missing markers - specifically any markers that are very close to the users entered coordinates (don't know precisely but I think it's within about 500m). This is a big problem as if the user enters their postcode and there is a store very close to their location it won't show up.

I've tried about 8 different permutations of the forumla that I've dug up from various tutorials with the same results. Below is the formula that I'm currently using on the site which provides all markers except for the those very close to the users entered position:

$center_lat = $_GET["lat"];
$center_lng = $_GET["lng"];
$radius = $_GET["radius"];

// Calculate square radius search

$lat1 = (float) $center_lat - ( (int) $radius / 69 );
$lat2 = (float) $center_lat + ( (int) $radius / 69 );
$lng1 = (float) $center_lng - (int) $radius / abs( cos( deg2rad( (float) $center_lat ) ) * 69 );
$lng2 = (float) $center_lng + (int) $radius / abs( cos( deg2rad( (float) $center_lat ) ) * 69 );

$sqlsquareradius = "
SELECT 
post_id, lat, lng
FROM
wp_geodatastore
WHERE
lat BETWEEN ".$lat1." AND ".$lat2."
AND
lng BETWEEN ".$lng1." AND ".$lng2."
"; // End $sqlsquareradius

// Create sql for circle radius check
$sqlcircleradius = "
SELECT
t.post_id,
3956 * 2 * ASIN(
    SQRT(
        POWER(
            SIN(
                ( ".(float) $center_lat." - abs(t.lat) ) * pi() / 180 / 2
            ), 2
        ) + COS(
            ".(float) $center_lat." * pi() / 180
        ) * COS(
            abs(t.lat) * pi() / 180
        ) * POWER(
            SIN(
                ( ".(float) $center_lng." - t.lng ) * pi() / 180 / 2
            ), 2
        )
    )
) AS distance
FROM
(".$sqlsquareradius.") AS t
HAVING
distance <= ".(int) $radius."
ORDER BY distance
"; // End $sqlcircleradius


$result = mysql_query($sqlcircleradius);

$row = mysql_fetch_array( $result );

while($row = mysql_fetch_array( $result )) {
// the contents of each row
$post_ids[] = $row['post_id'];
}

There was 1 formula that I tried that was suggested by Mike Pelley here: Geolocation SQL query not finding exact location

This formula seemed to show markers that were very close to the users entered location but missed out others that should have been displayed within the given radius. To clear up any confusion this is the code I used:

$center_lat = $_GET["lat"];
$center_lng = $_GET["lng"];
$radius = $_GET["radius"];

$sql = "
SELECT post_id, lat, lng, 
truncate((degrees(acos( sin(radians(lat)) 
* sin(radians(".$center_lat.")) 
+ cos(radians(lat)) 
* cos(radians(".$center_lat.")) 
* cos(radians(".$center_lng." - lng) ) ) ) 
* 69.09*1.6),1) as distance 
FROM wp_geodatastore HAVING distance <= ".$radius." ORDER BY distance desc
"; // End $sqlcircleradius


$result = mysql_query($sql);

$row = mysql_fetch_array( $result );

while($row = mysql_fetch_array( $result )) {
// Print out the contents of each row
$post_ids[] = $row['post_id'];
}

THE REQUEST

Basically I would like to know why neither of these blocks of code are displaying the correct markers. If anyone can suggest any improvements to the code or can point me towards some resource that I might have missed that would be great

EDIT

Thought my psudeo answer was working but as it turns out that was still having problems. I've ended up going for a very different tack now and I'm using a very good jquery store locator which can be found here: http://www.bjornblog.com/web/jquery-store-locator-plugin

Won't work for every project out there but for my needs it's perfect (and works!)

Monomial answered 4/1, 2012 at 13:30 Comment(11)
Is there a reason you're not using MySQL's built in geospacial capabilities? dev.mysql.com/doc/refman/5.0/en/…Neon
I'm struggling to understand some stuff in your code. Why HAVING instead of WHERE? What's going on with the int stuff in (float) $center_lat - ( (int) $radius / 69 ); and the truncate in your other query? Keep this fact in mind as you apply the formulae: a minute of longitude at the equator is defined as a nautical mile. An integer degree is 60 nautical miles. Finally, try getting rid of BETWEEN and using 'WHERE a>=lat1 AND a <=lat2` instead. It should give the same complexity of query and it spells out the inclusivity /exclusivity of the range you're searching.Pickmeup
@Neon - I'm not using the geospatial queries for a couple of reasons. First, I'm using the geo data store plugin to create and maintain my table of marker data. This plugin generates the table structure I showed above. Second, the vast majority of the tutorials I've come across for creating store locator type maps seem to recommend a table structure similar to the one listed. Perhaps these aren't the best reasons but I've come this far with the current setup and I'm pretty sure that what I'm trying to do should be possible.Monomial
@OllieJones Appreciate the tips. The HAVING (instead of WHERE seems to crop up on almost all the formulae I've come across (including the one of the google maps tutorial. The int stuff probably doesn't need to be there - I was getting so frustrated that I began leaving in stuff that could probably have been stripped out from formulas that I found (didn't seem to make any difference to the results either way. I tried your BETWEEN/WHERE suggestion but I'm afraid it broke the query - see next comment for what I tried:Monomial
WHERE lat >=".$lat1." AND lat<=".$lat2." AND lng >=".$lng1." AND lat<=".$lng2."Monomial
You may want to review the answers to this question #575191Neon
@OllieJones if you're filtering on a column alias, especially of a calculated value (like distance in this case) you need to use HAVING instead of WHERE. This is because the at the point at which the WHERE clause is evaluated, the value(s) of the alias column may not yet be known. The WHERE clause selects the records to be used, then any column aliases are calculated based on those records, then finally, the HAVING clause filters the column alias values.Blowtube
@LJ902 have you tried running the two queries (square radius and circle radius) separately for a given location and checking the results for why the 'missing' records are missing? I would assume that they show up in the square radius query but are somehow excluded in the circle radius query. What distance is that query calculating for those stores?Blowtube
@mliusbrown - hadn't thought of running the two queries independently (although earlier versions of the formula were basically the circle radius query part and they missed out the nearby markers). I tried the square radius calculation by itself and it again produced the missing results. So I'm basically none the wiser as to what's causing the issue.Monomial
I'm experiencing the same issue.Edmundoedmunds
@Sid - See my 'sort-of' solution in the answers below. I've got it all working as it should be now though I'm sure it's not the most efficient way of doing things. There are some other really good answers below too which I'm sure should probably work but it might be an issue with the rest of my setup that's preventing them from being as accurate as they should beMonomial
M
0

Thinking a little laterally I've come up with a 'sort of' solution to the problem of the missing markers. The two equations I posted originally gave the correct results but each missed out either markers close to the target or on the edges of the search radius

It's not very elegant but I figured that running both equations and producing 2 arrays which I then combined (removing any duplicates) would give me all the markers I'm looking for. This does work (obviously a performance hit but it's not a high traffic application) so I'll work with this for the time being but I'm still after a more practical solution if anyone has one!

Monomial answered 8/1, 2012 at 22:35 Comment(0)
P
2

EDIT This location-finder comes up often enough that I've written an article on it.

http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/

Original Post

Let's start by dealing with the haversine formula once for all, by putting it into a stored function so we can forget about its gnarly details. NOTE: This whole solution is in statute miles.

DELIMITER $$

CREATE
    FUNCTION distance(lat1 FLOAT, long1 FLOAT, lat2 FLOAT, long2 FLOAT)
    RETURNS FLOAT
    DETERMINISTIC NO SQL
    BEGIN
    RETURN (3959 * ACOS(COS(RADIANS(lat1)) 
                 * COS(RADIANS(lat2)) 
                 * COS(RADIANS(long1) - RADIANS(long2)) 
                 + SIN(RADIANS(lat1)) 
                 * SIN(RADIANS(lat2)) 
                )); 
    END$$

DELIMITER ;

Now let's put together a query that searches on the bounding box, and then refines the search with our distance function and orders by distance

Based on the PHP code in the your question:

Assume $radius is your radius, $center_lat, $center_lng is your reference point.

$sqlsquareradius = "
SELECT post_id, lat, lng
  FROM
(
    SELECT post_id, lat, lng,
           distance(lat, lng, " . $center_lat . "," . $center_lng . ") AS distance
      FROM wp_geodatastore
     WHERE lat >=  " . $center_lat . " -(" . $radius . "/69)
       AND lat <=  " . $center_lat . " +(" . $radius . "/69)
       AND lng >=  " . $center_lng . " -(" . $radius . "/69)
       AND lng <=  " . $center_lng . " +(" . $radius . "/69)
)a
WHERE distance <= " . $radius . "
ORDER BY distance
";

Notice a few things about this.

First, it does the bounding box computation in SQL rather than in PHP. There's no good reason for that, except keeping all the computation in one environment. (radius / 69) is the number of degrees in radius statute miles.

Second, it doesn't fiddle with the size of the longitudinal bounding box based on latitude. Instead it uses a simpler, but slightly too large, bounding box. This bounding box catches a few extra records, but the distance measurement gets rid of them. For your typical postcode / store finder app the performance difference is negligible. If you were searching many more records (e.g. a database of all utility poles) it might not be so trivial.

Third, it uses a nested query to do the distance elimination, to avoid having to run the distance function more than once for each item.

Fourth, it orders by distance ASCENDING. This means your zero-distance results should show up first in the result set. It usually makes sense to list nearest things first.

Fifth, it uses FLOAT rather than DOUBLE throughout. There's a good reason for that. The haversine distance formula is not perfect, because it makes the approximation that the earth is a perfect sphere. That approximation happens to break down at roughly the same level of accuracy as the epsilon for FLOAT numbers. So DOUBLE is deceptive numerical overkill for this problem. (Don't use this haversine formula to do civil engineering work like parking lot drainage, or you will get big puddles a couple of epsilon, a few inches, deep, I promise.) It's fine for store-finder applications.

Sixth, you are definitely going to want to create an index for your lat column. If your table of locations doesn't change very often, it will help to create an index for your lng column as well. But your lat index will give you most of your query performance gain.

Lastly, I tested the stored procedure and the SQL, but not the PHP.

Reference: http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL Also my experience with a bunch of proximity finders for health care facilities.

--------------- EDIT --------------------

If you don't have a user interface that lets you define a stored procedure, that's a nuisance. At any rate, PHP lets you use numbered parameters in the sprintf call, so you can generate the whole nested statement like this. NOTE: You might need %$1f etc. You'll need to experiment with this.

$sql_stmt = sprintf ("
  SELECT post_id, lat, lng
    FROM
  (
    SELECT post_id, lat, lng,
           (3959 * ACOS(COS(RADIANS(lat)) 
                 * COS(RADIANS(%$1s)) 
                 * COS(RADIANS(lng) - RADIANS(%$2s)) 
                 + SIN(RADIANS(lat)) 
                 * SIN(RADIANS(%$1s)) 
            ))
           AS distance
      FROM wp_geodatastore
     WHERE lat >=  %$1s -(%$3s/69)
       AND lat <=  %$1s +(%$3s/69)
       AND lng >=  %$2s -(%$3s/69)
       AND lng <=  %$2s +(%$3s/69)
  )a
   WHERE distance <= %$3s
   ORDER BY distance
",$center_lat,$center_lng, $radius);
Pickmeup answered 7/1, 2012 at 2:40 Comment(3)
Thanks for the very detailed answer Ollie. Having some trouble implementing it to test it though. Sorry to be a pain but I'm really not familiar with MySQL stored procedures (I can work my way around a MySQL database with PhpMyAdmin but that's about where my MySQL knowledge dries up). Had a look around for some tutorials on creating stored functions using this GUI but haven't found anything to guide me through. Is it vital that the haversine formula is in this format or could it be included in with the rest of the php query?Monomial
Thanks for the edit Ollie - appreciate you converting it to php and all the help so far. Unfortunately it doesn't seem to produce any ID values at all. Doubled checked the code to make sure I didn't do anything stupid - couldn't see anything. Also tried your suggestion of changing the type specifier in the sprintf but no change. Wasn't sure if the lower case 'a' in your code was intentional or a typo but either way it produced no result.Monomial
I've awarded the bounty to this answer because it was the most complete and fully-explained answer (despite not actually using it in the end - see my own answer below which I'm rolling with for now). That said this question generated some very good answers and I'm sure that the equations listed below would probably work for other peoples projects with a different setup to mine so they are all well worth a look.Monomial
D
0

Here's a solution I used successfully for a while in my own geo proximity calculations:

/**
 * This portion of the routine  calculates the minimum and maximum lat and
 * long within a given range.  This portion of the code was written
 * by Jeff Bearer (http:return true;//www.jeffbearer.com).
 */

$lat = somevalue;      // The latitude of our search origin
$lon = someothervalue; // The longitude of our search origin
$range = 50;   // The range of our search, in miles, of your zip

// Find Max - Min Lat / Long for Radius and zero point and query only zips in that range.
$lat_range = $range / 69.172;
$lon_range = abs($range / (cos($lon) * 69.172));
$min_lat = number_format($lat - $lat_range, '4', '.', '');
$max_lat = number_format($lat + $lat_range, '4', '.', '');
$min_lon = number_format($lon - $lon_range, '4', '.', '');
$max_lon = number_format($lon + $lon_range, '4', '.', '');

/* Query for matching zips:

    SELECT post_id, lat, lng
    FROM wp_geodatastore
    WHERE
    lat BETWEEN $min_lat AND $max_lat
    AND lng BETWEEN $min_lon AND $max_lon
*/
Dyeing answered 6/1, 2012 at 20:57 Comment(1)
Thanks for posting the code @Fleep. Unfortunately whilst it worked in general I still ended up with the same missing marker problem. Appreciate the attempt though!Monomial
E
0

You can try my class at http://www.phpclasses.org/package/6202-PHP-Generate-points-of-an-Hilbert-curve.html. It uses the harvesine formula and a hilbert curve to compute a quadkey. You can then search the quadkey from left to right. Every position of the key is a point on the monster curve. A better explanation of the curve can be found at Nick's spatial index quadtree hilbert curve blog. It's like using the spatial index extension from mysql but you have more control. You can use a z curve or moore curve or you can change the look.

Esma answered 7/1, 2012 at 22:27 Comment(0)
G
0

This is code from a working production system,

6371.04 * acos(cos(pi()/2-radians(90-wgs84_lat)) * cos(pi()/2-radians(90-$lat)) * cos(radians(wgs84_long)-radians($lon)) + sin(pi()/2-radians(90-wgs84_lat)) * sin(pi()/2-radians(90-$lat))) as distance

Uses a different distance formular, but for a store locator the difference is minimal.

Granth answered 8/1, 2012 at 18:27 Comment(0)
M
0

Thinking a little laterally I've come up with a 'sort of' solution to the problem of the missing markers. The two equations I posted originally gave the correct results but each missed out either markers close to the target or on the edges of the search radius

It's not very elegant but I figured that running both equations and producing 2 arrays which I then combined (removing any duplicates) would give me all the markers I'm looking for. This does work (obviously a performance hit but it's not a high traffic application) so I'll work with this for the time being but I'm still after a more practical solution if anyone has one!

Monomial answered 8/1, 2012 at 22:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.