Fastest way to find the location(zip, city, state) given latitude/longitude
Asked Answered
A

10

12

I need a free(open-source) solution that given the lat/lng can return the closet city/state or zip. mysql is not an option, a small lightweight database would be the best if possible.

Updates: No web services, with 50 million impressions a day even the smallest addon hurts so adding a service request would kill response time. I would prefer not to add more than 200 milliseconds on to the request.

I have the database, lat/lon/zip/city/state in csv it's just how to store and more importantly how to retrieve it the quickest.

Aletaaletha answered 11/8, 2009 at 14:1 Comment(4)
I have the data of city,state, zip, lat, lng but I would still need an algorithm to match any lat/lng to the closet city.Aletaaletha
I'd edit your comment into the question itself. Everyone here (myself included) assumed you were looking for a data source, not a search algorithm.Dinnie
NO web services...any hit to a web service will add at least 300-400 milliseconds to each request if the service is up and reliable.Aletaaletha
Again, put that in your original question.Roughhew
D
10

Brute force: pre-load all of your data into an array. Calculate the distance between your current point and each point in the array (there's a method to do this calculation that uses linear algebra instead of trig functions, but I don't recall what it is offhand) to find the closest point.

Please read this before down-voting: there are ways to speed up a brute force search like this, but I've found that they're usually not worth the trouble. Not only have I used this approach before to find nearest zip from latitude/longitude, I've used it in a Windows Mobile application (where the processing power is not exactly overwhelming) and still achieved sub-second search times. As long as you avoid the use of trig functions, this is not an expensive process.

Update: you can speed up the search time by apportioning your zip data into sub-regions (quadrants, for example, like northwest, southeast etc.) and saving the region ID with each data point. In the search, then, you first determine what region your current location is in, and compare only to those data points.

To avoid boundary errors (like when your current location is near the edge of its region but is actually closest to a zip in the neighboring region), your regions should overlap to some extent. This means some of your zip records will be duplicated, so your overall dataset will be a bit larger.

Dinnie answered 11/8, 2009 at 14:35 Comment(5)
This is my fallback, I'm assuming it will be fast and not take up too much memory so if nothing else comes up this is what I'll have to do.Aletaaletha
I updated my answer a bit. If you break your data up into regions, you can avoid pre-loading everything, although unless I'm hallucinating there are only about 75,000 zipcodes in the US so the memory consumption would be trivial.Dinnie
What you're describing (breaking the data up into quads, recursively) is called a quadtree. But you're right - for small(ish) datasets, the brute force approach is probably just fine - and way simpler than any indexing scheme.Roughhew
@Nick: I didn't suggest doing the quadrant approach recursively, although that's a good idea - you're giving me too much credit. :)Dinnie
The distance between two locations can't be done with linear algebra (see #28428). If your list has even a reasonable fraction of the cities in the US or world, the brute force method will not achieve your 200ms target. See my answer for an approach that does 3000 queries per second and uses a proper map, not a list of points.Indoiranian
I
10

This is a very interesting question with a complex answer.

You mention a database of cities with lat/lon, but cities are not single points and this can make a big difference in densely populated areas where large parts of city A might be closer to the "center" of city B than to the center of city A. Take a big city surrounded by smaller suburbs. The outlying parts of the big city might be closer to the centers of the suburbs than to center of the big city itself. Snapping to the nearest city center implies a map that is the Voronoi diagram of city center points. Such a map would not look anything like an actual map of urban areas.

If you want to know the city and state for a given lat/lon, you need to query a proper map and do point in polygons tests to find out which one it is in. This sounds computationally expensive, but it is actually not bad if you use a proper spatial index and are careful in your coding. I run a web site that sells API access to this and other geographical queries, and our underlying engine (written in Java) can return the containing or nearest city in the US with an average query time of 3e-4 seconds (more than 3,000 queries per second).

Even though we are selling it, I'm happy to explain how it works, since it would be way cheaper to buy it from us than to build it yourself, even with instructions. So here they are:

  • Find the map that you want. For US locations, the US Census offers extremely accurate maps at: http://www.census.gov/geo/www/tiger/tgrshp2010/tgrshp2010.html. I've not found global maps that are as good as the US census maps, but they may exist.
  • Find or write a parser for the ESRI shapefile format. I don't have a specific link for this, as it is highly language dependent, but there are numerous parsers, both free and commercial available on the web. Just do a search for "shapefile parser" along with your programming language.
  • Load the map into memory. A digital map consists of a list of polygons represented by a list of lat/lon pairs, typically ordered in a counter clockwise direction. Most maps allow for cut-outs (e.g., Lesotho in South Africa), which are just listed as polygons where the lat/lon pairs are listed in the clockwise direction. For performance and memory consumption reasons, you will want to use raw float arrays (avoid double precision, as it wastes memory, and use native arrays where possible, to avoid boxing).
  • Next, you will need code to answer whether a given query point is contained in a given polygon. Here is an excellent discussion of the point-in-polygon problem: How can I determine whether a 2D Point is within a Polygon?
  • In my experience, the brute force technique suggested in another answer (checking every entity) does not work well on national or world maps. Instead, I strongly suggest a fast spatial index that returns a list of candidate polygons for a given lat/lon. Here there are a lot of options. A lot of people would suggest tree based indexes, but I tend to prefer grid indexes, as they are faster and modern servers tend to have a lot of memory. I wrote the only such index that I've worked with. I know they exist in GIS libraries, but I find most GIS code is overly complex, slow, and hard to use. So given a query lat/lon, you get a list of candidate polygons from the spatial index and use the point-in-polygon function to find which of the candidates contains the query point.
  • It is also important to handle cases where the query point is not contained by any polygon. In such a case, you will presumably want to find the nearest such polygon up to a specified maximum distance. To do this, you need to make sure that your spatial index can return a list of nearby polygons, and not just a list of candidate containing polygons. You will also need code to compute the distance between a query point and a lat/lon line segment (this is hard because lat/lon is not a Euclidean space). I've not found any good discussion of how to do this online, so I devised my own method. It works by creating a linearized space around the query point (which becomes (0, 0) in the new space) in which the relative longitude is re-scaled such that a degree of the modified longitude is the same distance as a degree of latitude (involves multiplying the relative longitude by the cosine of the latitude). In this linearized space you find the nearest point on the line segment using standard methods (see Shortest distance between a point and a line segment), and then convert that point back into lat/lon and use the Haversine formula to compute the distance between the two points (see Calculate distance between two latitude-longitude points? (Haversine formula)).

And that's it. I built such a system on and off for about half a year. My estimate is that there are at least three man months of serious coding in it, and that's someone familiar with the subject matter (so beware if you are making a buy-or-build decision).

Indoiranian answered 7/5, 2012 at 22:50 Comment(0)
L
3

Use a kd-tree to speed up the nearest-neighbor search. There should be lots of free implementations available whatever your platform is.

Longinus answered 11/8, 2009 at 15:26 Comment(2)
A vanilla kd-tree won't find the nearest point because lat/lon are a spherical coordinate system and kd-trees only work on Cartesian coordinate systems.Indoiranian
A kdtree or a in-memory voronoi diagram is the best answer is the problem is "find the nearest city center". The cartesian vs lat/lng problem can be solved very easily by converting latlongs to cartesian 3D coordinate. (0,0,0) the center of the earth, (0, 1, 0) the north pole, etc.Annunciator
C
1

Its not open-source but maybe you could use the Google Maps API:

Reverse Geocoding

Calysta answered 11/8, 2009 at 14:4 Comment(2)
Slow, once you rely on another source things can go downhill fast. This solution needs to work all the time which would not work if Google decided to start charging.Aletaaletha
A good SW architecture should overcome this kind of issues. You request some data to some class of yours, which returns the data you need upstream, irrelevantly of where you fetch it from. This approach saved me in many occasions, no matter the number of sources I have used. BTW, if the only one service you use stops providing its APIs you're still in dirt to the neck ;)Locomotive
H
1

you should check out geonames. they have an API that returns XML and/or JSON. also, you can dl their database.

Homogenetic answered 11/8, 2009 at 14:38 Comment(0)
D
0

Another thread recommends mod_geoip via MaxMind. It runs at the Apache level, before it even gets to the PHP/.NET/Java. Maxmind geolocation apis: Apache vs PHP

Dragoman answered 11/8, 2009 at 14:23 Comment(0)
I
0

If you have both the long and the lat for the zip and the current location you could just calculate a radius and find the points within that circle. If you make an assumed boundry of each zipcode range you could speed up the search.

If you can use SQL 2008 (standard or express) you could use Spatial data types.

Intervention answered 11/8, 2009 at 14:28 Comment(0)
P
0

The Yahoo! Placemaker is a free web service that can do this. It can look up place names (“New York City”, “Buckingham Palace”) but it can also look up latitudes and longitudes by using the Geo microformat.

To use the service, you submit a POST request, and it returns XML:

A small command-line example (I’ve obscured my Yahoo! app ID; you’ll need to register your own):

$ curl -X POST -ddocumentContent='<div class="geo">GEO: <span class="latitude">37.386013</span>, <span class="longitude">-122.082932</span></div>' -ddocumentType='text/html' -dappid='your_yahoo_app_id' http://wherein.yahooapis.com/v1/document

This returns a very detailed XML document, part of which is:

<type>Town</type>
<name><![CDATA[Los Altos, CA, US]]></name>

It also contains the following data:

<type>Zip</type>
<name><![CDATA[94024, Los Altos, CA, US]]></name>

I have not used Placemaker very much, but I have used their Geocoding API and it is very fast. Couple this with a local memcached and users have no idea the data isn’t local.

Pul answered 11/8, 2009 at 14:35 Comment(0)
T
0

Look at the geonames.org database for source data.

For a light database, sqlite is a good choice.

geonames also does a webservice, but if you want to do it yourself without a web call (and it sounds as though you do) then you will need a local database. Then, you just need to do the right trig calculations to work out the great circle distance (google that) between a pair of lat / lng points and then order the results by distance. You can also use a bounding box or radius, if you want to limit the search radius before doing the calculations.

If your local database can be SQL based (which sqllite3 is) then that all adds up to a SQL query which adds a bunch of trig calculations to calculate a 'distance' column and maybe also a similar 'where' clause to limit the search within a radius or bounding box. Having calculated the distance column in your query then it is easy to order by distance and add any other criteria you like. If you know ruby/rails and want to see a nice example of how this is done, look at the GeoKit rails plugin source.

Thurgau answered 11/8, 2009 at 14:41 Comment(0)
C
0

How far from your source location would you expect the closest city to be? 50 miles? 200 miles? 500 miles? If two cities are nearly equidistant, does it matter if your algorithm picks the exactly closer one? You can use this information to help speed your search.

If you can reasonably assume that the distance difference is small (~250 mi or so is probably close enough to be considered 'small'), and your distance calculation can be a bit 'fuzzy', then you can optimize the 'brute force' check by limiting your search space to +/- 5 lat from the source (~70 miles per lat, so this gives you 350 or so miles to the north and south), and +/- 5 long (presuming you aren't searching for cities at the poles, this is anywhere from ~350 mi at the equator to ~100 mi in northern Canada). Adjust these ranges to what you feel is appropriate for your problem space.

While trig functions will help give you a precise indication of distance, for smaller distances such as these Pythagorean is generally close enough for a 'best guess' answer, with x = 69.1 * (sourcelat - citylat) and y = 53.0 * (sourcelong - citylong).

Calque answered 11/8, 2009 at 15:31 Comment(1)
This is not true except near the equator. In the US and Europe, for example, you need to take into account that a change in longitude mean much less distance than the same change in latitude. If you want a simple approximation, rescale the longitude difference by the cosine of the latitude (you could use the mean latitude of the two points). For the correct algorithm, see #28428Indoiranian

© 2022 - 2024 — McMap. All rights reserved.