Database/SQL: How to store longitude/latitude data?
Asked Answered
C

9

75

Performance question ...

I have a database of houses that have geolocation data (longitude & latitude).

What I want to do is find the best way to store the locational data in my MySQL (v5.0.24a) using InnoDB database-engine so that I can perform a lot of queries where I'm returning all the home records that are between x1 and x2 latitude and y1 and y2 longitude.

Right now, my database schema is

---------------------
Homes   
---------------------
geolat - Float (10,6)
geolng - Float (10,6)
---------------------

And my query is:

SELECT ... 
WHERE geolat BETWEEN x1 AND x2
AND geolng BETWEEN y1 AND y2
  • Is what I described above the best way to store the latitude and longitude data in MySQL using Float (10,6) and separating out the longitude/latitude? If not, what is? There exist Float, Decimal and even Spatial as a data type.
  • Is this the best way to perform the SQL from a performance standpoint? If not, what is?
  • Does using a different MySQL database-engine make sense?

UPDATE: Still Unanswered

I have 3 different answers below. One person say to use Float. One person says to use INT. One person says to use Spatial.

So I used MySQL "EXPLAIN" statement to measure the SQL execution speed. It appears that absolutely no difference in SQL execution (result set fetching) exist if using INT or FLOAT for the longitude and latitude data type..

It also appears that using the "BETWEEN" statement is SIGNIFICANTLY faster than using the ">" or "<" SQL statements. It's nearly 3x faster to use "BETWEEN" than to use the ">" and "<" statement.

With that being said, I still am unceratin on what the performance impact would be if using Spatial since it's unclear to me if it's supported with my version of MySQL running (v5.0.24) ... as well as how I enable it if supported.

Any help would be greatly appreacited

Calvo answered 2/9, 2009 at 21:16 Comment(4)
Use FLOAT if degree is the natural format for your application so you don't have to convert. Microdegrees stored as INT gives you more precision. You can't beat these two on performances. Only consider spatial extensions if spatial index is needed.Scour
@ZZ Coder, I don't mind converting - I'm simply looking to find what data type based on my use case above would return the fastest SQL result-set. Also, is how I have written my SQL most efficient?Hectorhecuba
As long as you have indices on lat-lon, your query should be fast. We went with INT because our DBA told us the index on float is slow but I never tried myself so no idea how slow it is. You can also get a circular selection fairly fast by using @geodist() on view returned by this query.Scour
Similar question with accepted answer: #159755Requiescat
S
31

float(10,6) is just fine.

Any other convoluted storage schemes will require more translation in and out, and floating-point math is plenty fast.

Shannan answered 2/9, 2009 at 21:58 Comment(3)
@richardtallent, are you saying that 1) how I'm storing the dataing, 2) the datatype that I've picked ,3) my SQL statement and 4) my InnoDB database engine ... all of these are already picked/designed to be the absolutely most efficient use (best performance)?Hectorhecuba
@richardtallent, meaning - I can't change anything to make it perform betterHectorhecuba
+1 Have a read of coordinate.codeplex.com where Jaime Olivares explains why floats are the correct type for coordinates...Henceforth
T
12

I know you're asking about MySQL, but if spatial data is important to your business, you might want to reconsider. PostgreSQL + PostGIS are also free software, and they have a great reputation for managing spatial and geographic data efficiently. Many people use PostgreSQL only because of PostGIS.

I don't know much about the MySQL spatial system though, so perhaps it works well enough for your use-case.

Telescopium answered 2/9, 2009 at 21:16 Comment(2)
I might have to look into that. Any info on the performance of fetching all the records between a box coordinate (lat/long)?Hectorhecuba
Starting 5.6 MySQL is now a serious contender for GIS functions.Lys
L
6

Google uses float(10,6) in their "Store locator" example. That's enough for me to go with that.

https://mcmap.net/q/74104/-what-is-the-ideal-data-type-to-use-when-storing-latitude-longitude-in-a-mysql-database

Also, starting MySQL 5.6.x, spatial extensions support is much better and comparable to PostGIS in features and performance.

Lys answered 2/9, 2009 at 21:16 Comment(0)
C
6

The problem with using any other data type than "spatial" here is that your kind of "rectangular selection" can (usually, this depends on how bright your DBMS is - and MySQL certainly isn't generally the brightest) only be optimised in one single dimension.

The system can pick either the longitude index or the latitude index, and use that to reduce the set of rows to inspect. But after it has done that, there is a choice of : (a) fetching all found rows and scanning over those and test for the "other dimension", or (b) doing the similar process on the "other dimension" and then afterwards matching those two result sets to see which rows appear in both. This latter option may not be implemented as such in your particular DBMS engine.

Spatial indexes sort of do the latter "automatically", so I think it's safe to say that a spatial index will give the best performance in any case, but it may also be the case that it doesn't significantly outperform the other solutions, and that it's just not worth the bother. This depends on all sorts of things like the volume of and the distribution in your actual data etc. etc.

It is certainly true that float (tree) indexes are by necessity slower than integer indexes, because of the longer time it usually takes to execute '>' on floats than it does on integers. But I would be surprised if this effect were actually noticeable.

Canoness answered 3/9, 2009 at 13:45 Comment(3)
I tested both Float vs INT using the "IN BETWEEN" SQL statement and measuring the time using both the EXPLAIN and BENCHMARK command and the different in result set querying was less than 1% difference in speed. HOWEVER, using the ">" or "<" operator, instead of "in between", resulted in a query execution time nearly 3x longer.Hectorhecuba
So it appears that no difference exist between using FLOAT vs INT as long as you use the "IN BETWEEN" SQL statement.Hectorhecuba
I haven't been able to test Spatial since it's unclear to me if my version (v5.0.24) supports Spatial out-of-the-box or if I have to somehow install an extension/plug-inHectorhecuba
O
4

Float (10,6)

Where is latitude or longitude 5555.123456?

Don't you mean Float(9,6) instead?

Onomatology answered 2/9, 2009 at 21:16 Comment(2)
MySQL ignores the resolution specification when storing FLOAT and DOUBLE. It simply uses 32- or 64-bit floating point. On modern machines that's very likely to be following the IEEE 754 binary32 and binary64 standards.Howlyn
@OllieJones can you please explain this further. I have the same doubt, why not float(9,6)?Potted
S
4

I would store it as integers (int, 4-bytes) represented in 1/1,000,000th degrees. That would give you a resolution of few inches.

I don't think there is any intrinsic spatial datatype in MySQL.

Scour answered 2/9, 2009 at 21:33 Comment(9)
@ZZ Coder, info on Spatial datatype --> dev.mysql.com/doc/refman/5.0/en/spatial-extensions.htmlHectorhecuba
Also, you state using an INT. How can you store data like 93.2343213 as an INT?Hectorhecuba
@Javier, what about negative long/lat? This assumes that all long/lat are positive values - which is only true for the North Eastern hemisphere --> en.wikipedia.org/wiki/ISO_6709Hectorhecuba
@Timtom: Since this is a performance question, you probably don't want use the GIS/Spatial extension unless you want spatial index. A point represented in WKB takes almost 20 bytes. It's not supported by all engines either.Scour
@Timtom: INT is signed so you can store negative values. Simply multiple your degree values by 1,000,000.Scour
@ZZ Coder, don't you mean divide by 1,000,000 (or multiple 0.000001)Hectorhecuba
Integer operations are always faster than floating point. In special, integer comparisonBigoted
@Rodrigo, any reference material supporting this claim?Hectorhecuba
All, please note my updated original post. There is no difference in INT vs FLOAT performance.Hectorhecuba
I
2

I have the exact same schema (float(10,6)) and query (selecting inside a rectangle) and I found that switching the db engine from innoDB to myisam doubled the speed for a "point in rectangle look-up" in a table with 780,000 records.

Additionally, I converted all lng/lat values to cartesian integers (x,y) and created a two-column index on the x,y and my speed went from ~27 ms to 1.3 ms for the same look-up.

Israel answered 2/9, 2009 at 21:16 Comment(0)
C
0

I know probably you would have moved past this problem. I just wanted to add another approach to this question, in case someone is looking to store geolocation data. You could encode latitude and longitude information into a geohash. Since they are prefixed searchable to a required degree of precision. It seems you can convert your query to a start and end prefix and do a prefix search with LIKE query.

Consume answered 2/9, 2009 at 21:16 Comment(0)
C
0

It really depends on how you are using the data. But in a gross over-simplification of the facts, decimal is faster but less accurate in aproximations. More info here:

http://msdn.microsoft.com/en-us/library/aa223970(SQL.80).aspx

Also, The standard for GPS coordinates is specified in ISO 6709:

http://en.wikipedia.org/wiki/ISO_6709

Cytology answered 2/9, 2009 at 21:37 Comment(3)
@Armitage, let's assume I have a column for latitude and a coloum for longtitude. And that I'm storing the data as, for example, 93.12342342Hectorhecuba
@Armitage, maybe I should ask - what's the best why I should be storing the long/lat data in MySQL? Assume I want to do lots of queries where I'm returning all the records that are between x1 and x2 latitude and y1 and y2 longitude.Hectorhecuba
Sounds like the performance difference is negligible. From what I am reading, Spacial will perform better and probably save you time coding your project, but I've never used it.Cytology

© 2022 - 2024 — McMap. All rights reserved.