MySQL - Perl: How to get array of zip codes within submitted "x" miles of submitted "zipcode" in Perl example
Asked Answered
V

3

0

I have found many calculations here and some php examples and most are just over my head.

I found this example:

SELECT b.zip_code, b.state,
       (3956 * (2 * ASIN(SQRT(
       POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
       COS(a.lat*0.017453293) *
       COS(b.lat*0.017453293) *
       POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b
WHERE
       a.zip_code = '90210' ## I would use the users submitted value
GROUP BY distance
having distance <= 5; ## I would use the users submitted value

But, I am having trouble understanding how to implement the query with my database.

It looks like that query has all I need.

However, I cannot even find/understand what b.zip_code actually is! (whats the b. and zips a, zips b?)

I also do not need the state in the query.

My mySQL db structure is like this:

    ZIP | LAT     | LONG
  33416 | 26.6654 | -80.0929

I wrote this in attempt to return some kind of results (not based on above query) but, it only kicks out one zip code.

## Just for a test BUT, in reality I desire to SELECT a zip code WHERE ZIP = the users submitted zip code
## not by a submitted lat lon. I left off the $connect var, assume it's there.
my $set1 = (26.6654 - 0.20);
my $set2 = (26.6654 + 0.20);
my $set3 = (-80.0929 - 0.143);
my $set4 = (-80.0929 + 0.143);
my $test123 = $connect->prepare(qq{SELECT `ZIP` FROM `POSTAL`
WHERE `LAT` >= ? AND `LAT` <= ? 
AND `LONG` >= ? AND `LONG` <= ?})  or die "$DBI::errstr";
$test123->execute("$set1","$set2","$set3","$set4") or die "$DBI::errstr";
my $cntr;
while(@zip = $test123->fetchrow_array()) {
    print qq~$zip[$cntr]~;
    push(@zips,$zip[$cntr]);
    $cntr++;
}

As you can see, I am quite the novice so, I need some hand holding here with verbose explanation.

So, in Perl, how can I push zip codes into an array from a USER SUBMITTED ZIP CODE and user submitted DISTANCE in miles. Can be a square instead of a circle, not really that critical of a feature. Faster is better.

Virtual answered 28/2, 2011 at 16:1 Comment(0)
D
3

I don't see what's wrong with your first query. You have latitude and longitude in your database (if I'm understanding, you're comparing a single entry to all others). You don't need to submit or return the state that's just part of the example. Make the first query work like this:

my $query = "SELECT b.zip_code,
       (3956 * (2 * ASIN(SQRT(
       POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
       COS(a.lat*0.017453293) *
       COS(b.lat*0.017453293) *
       POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b WHERE
       a.zip_code = ? 
GROUP BY distance having distance <= ?";

my $sth = $dbh->prepare($query);
$sth->execute( $user_submitted_zip, $user_submitted_distance );
while( my ($zip, $distance) = $sth->fetchrow() ) ) {
     # do something
}

This won't be that fast, but if you have a small record set ( less than 30k rows ) it should be fine. If you really want to go faster you should look into a search engine such as Sphinx which will do this for you.

Dunning answered 28/2, 2011 at 16:15 Comment(9)
Hmm, thank you. I do not have $1500 for Sphinx but, my zip code db IS 40k lines. I am trying the example you gave me based on what I have learned here in the last few minutes. I will get back to everyone soon.Virtual
Sphinx is free. Support is what costs money. There are a few other similar search engines but I'm not as familiar with themDunning
@Cfreak, since my zip db is so large, would I be better off / faster using a square area? If they get a few more results (going out to the corners) that is just fine. If so, can you provide an example? This is for a band website for folks to find future gigs in their area so, no budget here. Thanks again for your help.Virtual
Also 40k rows is probably pushing it but if everything is indexed it will probably be fast enoughDunning
latitude and longitude must be calculated as a circle. The calculations would be wrong otherwise. Your other option would be to build a large table that basically holds a matrix of the distance between each zip-code but that's get's very large, very quickly.Dunning
Thank you @Cfreak. Your example and some tutoring from @Jaddo above helped me figure it out. That was difficult for me but, thanks to stackoverflow and you guys for helping me... so much! xxxVirtual
The formula gives great-circle distances and there is some discussion of its "magic numbers" here. I love Sphinx but it wouldn't give any great advantage over MySQL for this small problem. You can improve MySQL's efficiency greatly by limiting the possible b's to a square: add AND b.lat BETWEEN a.lat - ? AND a.lat + ? AND b.lng BETWEEN a.lng - ? AND a.lng + ? with your radius for those placeholders. Otherwise MySQL scans all 40K rows for every query.Perplexed
Also, since each zip is unique in your table (right?), the GROUP BY ... HAVING is unnecessary. I would write the above query as ... FROM zips AS a, zips AS b WHERE a.zip_code = ? AND distance <= ?. If you later change that query to find zips within a given distance of multiple origins, you'd want to limit by MIN(distance) and in that case the GROUP BY form of the query would be appropriate.Perplexed
@Jamie - sphinx has built in distance calculation that is orders of magnitude faster than MySQL. Try that query on a table that has more than 10 million rows and tell me about how fast it is.Dunning
J
5

I'll tackle the small but crucial part of the question:

However, I cannot even find/understand what b.zip_code actually is! (whats the "b." and "zips a, zips b"?)

Basically, the query joins two tables. BUT, both tables being joined are in fact the same table - "zips" (in other words, it joins "zips" table to itself"). Of course, since the rest of the query needs to understand when you are referring to the first copy of the "zips" table and when to the second copy of the "zips" table, you are giving a table alias to each copy - to wit, "a" and "b"'.

So, "b.xxx" means "column xxx from table zips, from the SECOND instance of that table being joined".

Jaddo answered 28/2, 2011 at 16:12 Comment(4)
Ahh, thank you. Looks like I am due for a bunch more reading. Based on now knowing a little more thanks to you, I will re-visit the query and attempt to visualize what is going on there.Virtual
so.. the lat and lng in the first example are coming from where? "a.lat-b.lat and a.lng-b.lng" That is still a bit confusing for me. Are those values pulled from the db? Should they (and other references in there) be "a.LAT-b.LAT and a.LONG-b.LONG" ??Virtual
From 2 copies of zips table, and yes, what you said at the end of the second comment is exactly rightJaddo
I get an error: Can't use string ("45418") as an ARRAY ref while "strict refs" ???? 45418 was the zip I tried. I even tried with placeholders as in the example below. I am really trying, really.Virtual
D
3

I don't see what's wrong with your first query. You have latitude and longitude in your database (if I'm understanding, you're comparing a single entry to all others). You don't need to submit or return the state that's just part of the example. Make the first query work like this:

my $query = "SELECT b.zip_code,
       (3956 * (2 * ASIN(SQRT(
       POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
       COS(a.lat*0.017453293) *
       COS(b.lat*0.017453293) *
       POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b WHERE
       a.zip_code = ? 
GROUP BY distance having distance <= ?";

my $sth = $dbh->prepare($query);
$sth->execute( $user_submitted_zip, $user_submitted_distance );
while( my ($zip, $distance) = $sth->fetchrow() ) ) {
     # do something
}

This won't be that fast, but if you have a small record set ( less than 30k rows ) it should be fine. If you really want to go faster you should look into a search engine such as Sphinx which will do this for you.

Dunning answered 28/2, 2011 at 16:15 Comment(9)
Hmm, thank you. I do not have $1500 for Sphinx but, my zip code db IS 40k lines. I am trying the example you gave me based on what I have learned here in the last few minutes. I will get back to everyone soon.Virtual
Sphinx is free. Support is what costs money. There are a few other similar search engines but I'm not as familiar with themDunning
@Cfreak, since my zip db is so large, would I be better off / faster using a square area? If they get a few more results (going out to the corners) that is just fine. If so, can you provide an example? This is for a band website for folks to find future gigs in their area so, no budget here. Thanks again for your help.Virtual
Also 40k rows is probably pushing it but if everything is indexed it will probably be fast enoughDunning
latitude and longitude must be calculated as a circle. The calculations would be wrong otherwise. Your other option would be to build a large table that basically holds a matrix of the distance between each zip-code but that's get's very large, very quickly.Dunning
Thank you @Cfreak. Your example and some tutoring from @Jaddo above helped me figure it out. That was difficult for me but, thanks to stackoverflow and you guys for helping me... so much! xxxVirtual
The formula gives great-circle distances and there is some discussion of its "magic numbers" here. I love Sphinx but it wouldn't give any great advantage over MySQL for this small problem. You can improve MySQL's efficiency greatly by limiting the possible b's to a square: add AND b.lat BETWEEN a.lat - ? AND a.lat + ? AND b.lng BETWEEN a.lng - ? AND a.lng + ? with your radius for those placeholders. Otherwise MySQL scans all 40K rows for every query.Perplexed
Also, since each zip is unique in your table (right?), the GROUP BY ... HAVING is unnecessary. I would write the above query as ... FROM zips AS a, zips AS b WHERE a.zip_code = ? AND distance <= ?. If you later change that query to find zips within a given distance of multiple origins, you'd want to limit by MIN(distance) and in that case the GROUP BY form of the query would be appropriate.Perplexed
@Jamie - sphinx has built in distance calculation that is orders of magnitude faster than MySQL. Try that query on a table that has more than 10 million rows and tell me about how fast it is.Dunning
I
0

fetchrow_array returns a list of list references, essentially a two-dimensional array, where each row represents a different result from the database query and each column represents a field from the query (in your case, there is only one field, or column, per row).

Calling while ($test123->fetchrow_array()) will cause an infinite loop as your program executes the query over and over again. If the query returns results, then the while condition will be satisfied and the loop will repeat. The usual idiom would be to say something more like for my $row ($test123->fetchrow_array()) { ..., which will only execute the query once and then iterate over the results.

Each result is a list reference, and the zip code you are interested in is in the first (and only) column, so you could accumulate the results in an array like this:

my @zips = ();    # for final results
for my $row ($test123->fetchrow_array()) {
    push @zips, $row->[0];
}

or even more concisely with Perl's map statement:

my @zips = map { $_->[0] } $test123->fetchrow_array()

which does the same thing.

Irremovable answered 28/2, 2011 at 16:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.