MySQL - Perl: How to get array of zip codes within submitted "x" miles of submitted "zipcode" in Perl example

V

3

0

I have found many calculations here and some php examples and most are just over my head.

I found this example:

SELECT b.zip_code, b.state,
       (3956 * (2 * ASIN(SQRT(
       POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
       COS(a.lat*0.017453293) *
       COS(b.lat*0.017453293) *
       POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b
WHERE
       a.zip_code = '90210' ## I would use the users submitted value
GROUP BY distance
having distance <= 5; ## I would use the users submitted value

But, I am having trouble understanding how to implement the query with my database.

It looks like that query has all I need.

However, I cannot even find/understand what b.zip_code actually is! (whats the b. and zips a, zips b?)

I also do not need the state in the query.

My mySQL db structure is like this:

    ZIP | LAT     | LONG
  33416 | 26.6654 | -80.0929

I wrote this in attempt to return some kind of results (not based on above query) but, it only kicks out one zip code.

## Just for a test BUT, in reality I desire to SELECT a zip code WHERE ZIP = the users submitted zip code
## not by a submitted lat lon. I left off the $connect var, assume it's there.
my $set1 = (26.6654 - 0.20);
my $set2 = (26.6654 + 0.20);
my $set3 = (-80.0929 - 0.143);
my $set4 = (-80.0929 + 0.143);
my $test123 = $connect->prepare(qq{SELECT `ZIP` FROM `POSTAL`
WHERE `LAT` >= ? AND `LAT` <= ? 
AND `LONG` >= ? AND `LONG` <= ?})  or die "$DBI::errstr";
$test123->execute("$set1","$set2","$set3","$set4") or die "$DBI::errstr";
my $cntr;
while(@zip = $test123->fetchrow_array()) {
    print qq~$zip[$cntr]~;
    push(@zips,$zip[$cntr]);
    $cntr++;
}

As you can see, I am quite the novice so, I need some hand holding here with verbose explanation.

So, in Perl, how can I push zip codes into an array from a USER SUBMITTED ZIP CODE and user submitted DISTANCE in miles. Can be a square instead of a circle, not really that critical of a feature. Faster is better.

Virtual answered 28/2, 2011 at 16:1 Comment(0)

D

3

I don't see what's wrong with your first query. You have latitude and longitude in your database (if I'm understanding, you're comparing a single entry to all others). You don't need to submit or return the state that's just part of the example. Make the first query work like this:

my $query = "SELECT b.zip_code,
       (3956 * (2 * ASIN(SQRT(
       POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
       COS(a.lat*0.017453293) *
       COS(b.lat*0.017453293) *
       POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b WHERE
       a.zip_code = ? 
GROUP BY distance having distance <= ?";

my $sth = $dbh->prepare($query);
$sth->execute( $user_submitted_zip, $user_submitted_distance );
while( my ($zip, $distance) = $sth->fetchrow() ) ) {
     # do something
}

This won't be that fast, but if you have a small record set ( less than 30k rows ) it should be fine. If you really want to go faster you should look into a search engine such as Sphinx which will do this for you.

Dunning answered 28/2, 2011 at 16:15 Comment(9)

Hmm, thank you. I do not have $1500 for Sphinx but, my zip code db IS 40k lines. I am trying the example you gave me based on what I have learned here in the last few minutes. I will get back to everyone soon. – Virtual 28/2, 2011 at 16:31

Sphinx is free. Support is what costs money. There are a few other similar search engines but I'm not as familiar with them – Dunning 28/2, 2011 at 16:44

@Cfreak, since my zip db is so large, would I be better off / faster using a square area? If they get a few more results (going out to the corners) that is just fine. If so, can you provide an example? This is for a band website for folks to find future gigs in their area so, no budget here. Thanks again for your help. – Virtual 28/2, 2011 at 16:44

Also 40k rows is probably pushing it but if everything is indexed it will probably be fast enough – Dunning 28/2, 2011 at 16:45

latitude and longitude must be calculated as a circle. The calculations would be wrong otherwise. Your other option would be to build a large table that basically holds a matrix of the distance between each zip-code but that's get's very large, very quickly. – Dunning 28/2, 2011 at 16:47

Thank you @Cfreak. Your example and some tutoring from @Jaddo above helped me figure it out. That was difficult for me but, thanks to stackoverflow and you guys for helping me... so much! xxx – Virtual 28/2, 2011 at 17:44

The formula gives great-circle distances and there is some discussion of its "magic numbers" here. I love Sphinx but it wouldn't give any great advantage over MySQL for this small problem. You can improve MySQL's efficiency greatly by limiting the possible b's to a square: add AND b.lat BETWEEN a.lat - ? AND a.lat + ? AND b.lng BETWEEN a.lng - ? AND a.lng + ? with your radius for those placeholders. Otherwise MySQL scans all 40K rows for every query. – Perplexed 1/3, 2011 at 13:49

Also, since each zip is unique in your table (right?), the GROUP BY ... HAVING is unnecessary. I would write the above query as ... FROM zips AS a, zips AS b WHERE a.zip_code = ? AND distance <= ?. If you later change that query to find zips within a given distance of multiple origins, you'd want to limit by MIN(distance) and in that case the GROUP BY form of the query would be appropriate. – Perplexed 1/3, 2011 at 13:58

@Jamie - sphinx has built in distance calculation that is orders of magnitude faster than MySQL. Try that query on a table that has more than 10 million rows and tell me about how fast it is. – Dunning 1/3, 2011 at 15:12

J

5

I'll tackle the small but crucial part of the question:

However, I cannot even find/understand what b.zip_code actually is! (whats the "b." and "zips a, zips b"?)

Basically, the query joins two tables. BUT, both tables being joined are in fact the same table - "zips" (in other words, it joins "zips" table to itself"). Of course, since the rest of the query needs to understand when you are referring to the first copy of the "zips" table and when to the second copy of the "zips" table, you are giving a table alias to each copy - to wit, "a" and "b"'.

So, "b.xxx" means "column xxx from table zips, from the SECOND instance of that table being joined".

Jaddo answered 28/2, 2011 at 16:12 Comment(4)

Ahh, thank you. Looks like I am due for a bunch more reading. Based on now knowing a little more thanks to you, I will re-visit the query and attempt to visualize what is going on there. – Virtual 28/2, 2011 at 16:16

so.. the lat and lng in the first example are coming from where? "a.lat-b.lat and a.lng-b.lng" That is still a bit confusing for me. Are those values pulled from the db? Should they (and other references in there) be "a.LAT-b.LAT and a.LONG-b.LONG" ?? – Virtual 28/2, 2011 at 16:36

From 2 copies of zips table, and yes, what you said at the end of the second comment is exactly right – Jaddo 28/2, 2011 at 17:0

I get an error: Can't use string ("45418") as an ARRAY ref while "strict refs" ???? 45418 was the zip I tried. I even tried with placeholders as in the example below. I am really trying, really. – Virtual 28/2, 2011 at 17:7

D

3

I don't see what's wrong with your first query. You have latitude and longitude in your database (if I'm understanding, you're comparing a single entry to all others). You don't need to submit or return the state that's just part of the example. Make the first query work like this:

my $query = "SELECT b.zip_code,
       (3956 * (2 * ASIN(SQRT(
       POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
       COS(a.lat*0.017453293) *
       COS(b.lat*0.017453293) *
       POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b WHERE
       a.zip_code = ? 
GROUP BY distance having distance <= ?";

my $sth = $dbh->prepare($query);
$sth->execute( $user_submitted_zip, $user_submitted_distance );
while( my ($zip, $distance) = $sth->fetchrow() ) ) {
     # do something
}