Proximity Search
Asked Answered
T

3

21

How does an application perform a proximity search? For example, a user types in a postal code, then the application lists all the businesses within 20 miles ordered by proximity.

I want to build something like that in PHP and MySQL. Is this approach correct?

  1. Get the addresses for locations I'm interested in and store in my database
  2. Geocode all the addresses with Google's geocoding service
  3. Write a database query that includes Haversine formula to do the proximity search and ordering

Is this OK? In step 3, I'm going to calculate the proximity for every query. Is it better to have a PROXIMITY table that lists the distance between every business and a few reference locations?

Tronna answered 3/11, 2008 at 23:33 Comment(1)
See also the fine movable-type.co.uk/scripts/latlong.html#cosine-lawAppease
P
13

If there are enough records for speed to matter, here's a way to index them ahead of time.

Define a grid of bins about 20 miles on a side. Store the bin number with each store's record. At search time, compute the numbers of all bins that intersect a 20-mile radius from your search point. Then retrieve all stores in any of those bins, and proceed as before.

Proclivity answered 3/11, 2008 at 23:38 Comment(0)
B
13

We use this to do many thousands of points. It is important if you are performing this in SQL to have an index on the Latitude and Longitude column. We tried doing this in SQL 2008 with spatial indexes but we really didn't see the performance increase we expected. Though if you want to calculate within a certain distance from a ZIP you need to think about if you are going to use the ZIP centroid or a polygon representation of the ZIP code.

Haversine forumla is a good place to start.

We have not had performance problems calculating the distance on the fly, we do calculate it ahead of time for some applications where we know the points ahead of time and there are going to be millions of records.

SELECT
        [DistanceRadius]=
        69.09 *
        DEGREES(
          ACOS(
            SIN( RADIANS(latitude) )*SIN( RADIANS(@ziplat) ) 
           +
            COS( RADIANS(latitude) )*COS( RADIANS(@ziplat) ) 
           *
            COS( RADIANS(longitude - (@ziplon)) )
          )
        )
        ,*
        FROM
            table

    ) sub
WHERE
    sub.DistanceRadius < @radius
Bemock answered 4/11, 2008 at 0:18 Comment(0)
G
2

We do this for about 1200 locations. I would just use the Haversine formula on the fly although depending on you application, it might be better to store it in PHP instead of SQL. (Our implementation is in .net so your milage may vary).

Really our biggest drawback with the way we implemented it, is that every calculation (up until recently) had to be calculated on the data tier which was painfully slow (when I say slow, I really mean non-instantaneous it took a second or so), but that was due to the fact that it had to calculate the distance for all 1200 locations based on the supplied zip code.

Depending on the route you choose, there are ways of speeding up the number distance calculations, by looking at the longitude and latitude and removing the ones outside of a predefined range (for example if you are looking at all address within 20 miles there is a longitude range you can calculate which all addresses have to fall in to be 20 miles away.) That can speed up you query if need be.

We actually looked at storing all possible combinations in our database. In reality it sounds like it could be a large data store, but it's really not in the big scope of things. With indexes it can be quite fast, and you don't have to worry about algorithm optimization etc. We decided against it, because we had the equation in C#, and it allowed us to cache the information necessary to do all the calculations in the business tier. Either will work just fine, it's just a matter of what your preference is.

Galeiform answered 4/11, 2008 at 0:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.