Geocoding 5000 addresses in php script
Asked Answered
H

5

7

I'm looking to geocode over 5000 addresses at once in a PHP script (this will only ever be run once).

I have been looking into google as a potential resource for doing just this, however I've read reports that after running 200 or so queries through them google will kick you off for the day.

I was just wondering if there was any other way to geocode 5000 or so addresses, another service like google offers or something similar I could use?

Or will I just have to stagger this? The problem is I don't really have much time and to do 200 or 300 a day for 5000 results will take almost 5 (working) weeks.

Thanks

Tom

Headwards answered 13/10, 2010 at 11:24 Comment(1)
Are you willing to pay for this?Niall
I
13

You could use Bing Maps instead: the Spatial Data API is made for batch geocoding thousands of addresses at once (that link is even a detailed tutorial on how to use it with PHP).

You just need to register a key at http://www.bingmapsportal.com but that's free and fast (you get the confirmation email within minutes).

Illyes answered 13/10, 2010 at 11:55 Comment(0)
T
6

Is there a limit to the number of geocode requests I can submit?

If more than 2,500 geocode requests in a 24 hour period are received from a single IP address, or geocode requests are submitted from a single IP address at too fast a rate, the Google Maps API geocoder will begin responding with a status code of 620.

[...]

If you need to submit a very large set of addresses to the Geocoding Web Service to cache for later use, you should consider Google Maps API Premier, which provides a separate batch geocoding quota for this purpose.

-- http://code.google.com/apis/maps/faq.html#geocoder_limit

As @Pekka mentioned: note that Google's terms of service forbid geocoding stuff for purposes other than showing it on a map.

Thrown answered 13/10, 2010 at 11:29 Comment(3)
+1 Note that Google's terms of service forbid geocoding stuff for purposes other than showing it on a mapRustcolored
Not that it's much enforceable really, now that they don't even require an API key any more. But as a company, I would think twice about doing this on a big scaleRustcolored
they will be shown on a map - in an iphone application - and the lat and long will be used to direct people to google maps for driving directions and such. all google logos and trademarked are shown and respected and all TOS are being followed. That's interesting in the FAQ, however it will be doing all the calculations at a high speed. I could put sleep(1) into my script to stagger the requests a bit, but I need to guarantee that the results are getting geocoded.Headwards
D
2

The most reliable solution is to download geolocation database to your host so that you can do unlimited queries.

http://www.google.de/search?q=geolocation+database&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:ru:official&client=firefox#hl=en&expIds=17259,17315,18168,23628,25646,25834,26637,26746,26761,26849&xhr=t&q=geolocation+database+download&cp=22&pf=p&sclient=psy&client=firefox&hs=MvK&rls=org.mozilla:ru%3Aofficial&source=hp&aq=f&aqi=&aql=&oq=geolocation+database+d&gs_rfai=&pbx=1&fp=d950b79c3319a56e

Demodulation answered 13/10, 2010 at 11:30 Comment(2)
good idea, will these be accurate though? I saw on that first result from your link that even the paid one will only be 80% accurate. Is this normal?Headwards
@Thomas Clayson: Yes, that's normal; there won't be perfect accuracy from Google's geolocation either. The good news is that most lookups are more accurate than that - in other words, the number is highly influenced by the "middle-of-nowhere" spots that you'll rarely need to look up.Declivitous
D
1

As @Bart Kiers says, there's a limit on the number of requests you can do in a 24hr period; there's also a "not too fast" per-hour (?) limit. I'd suggest that you divide (seconds per day) 86400/2500 (the limit) to get a query rate that shouldn't exceed the "too fast" per/hour limit. It comes out to about one query per 35 seconds, which should get you the results in two days.

However, do check the return codes: if the service starts returning 620, stop and give it a rest for some time, else you risk a ban.

Declivitous answered 13/10, 2010 at 11:46 Comment(0)
M
0

What you're trying to do is indeed not according to Google's terms of service.

That said, Google will start returning 'over-quota' responses if you don't pause at least 250mS between geocoding requests.

In practice, if you only make 2 requests a second you won't get throttled until the 2'500 day's limit.

Match answered 13/10, 2010 at 12:9 Comment(3)
If they're geocoding for showing it on GMaps (without looking it up every time), they could be well within the ToS. Without more detail, it's hard to tell.Declivitous
the detail is that we have a list of specific business listings that we need to search through. using the google local api isn't going to work, its only within this subset of about 5k businesses that we want to search. The application displays these on a google map with pins at the lat/long coordinates to depict where they are. the only reason we need to store the lat/long is to work out the distance. The device will get the user's location via gps and we need to return the nearest 10 or 20 results. AFAIK this is within the TOS, and is nothing dubious at all.Headwards
So your problem is to geocode the 5'000 businesses once and for all, to know where they are? You can do this as I suggested (I have done it), but it'll take a couple of days to do 5'000. If you get them from Google, then the TOS don't allow you to store them, even if it's only a one-time thing. Again, I'm not trying to tell you what to do, just drawing your attention to something I shouldn't theoretically have done either :) As an aside, here's the resulting page www.calvert.ch/geodesix/offices.htmMatch

© 2022 - 2024 — McMap. All rights reserved.