Averaging out Lat/longs in SQL Server database
Asked Answered
N

1

7

I'm new to SQL Server. I'm trying to figure out how I can get the below one done:

I have thousands of lat/long positions pointing to the same OR very close by locations. It's all stored flat in a SQL Server table as LAT & LONG columns.

Now to cluster the lat/longs and pick one representation per cluster, what I must be doing?

I read through a method called "STCentroid" : https://msdn.microsoft.com/en-us/library/bb933847.aspx

But is it worth letting the Server do a polygon with all these million rows and find the center point? Which would implicitly mean a single representation for all the near by duplicates. Might be an in efficient/wrong way?

Only points around few meters must be considered as duplicate entries. I'm thinking how I can pick the right representation.

In better words:

If there's a group of points G1{} (GPS positions) trying to point to a location L1. (Physical loc). & There's a group of points G2{}, trying to point to a location L2. How do I derive Center Point CP1 from G1{}. & CP2 from G2{}, such that CP1 is very close to L1 & CP2 is very close to L2.

And the fact is, L1 & L2 could be very near to each other say, 10 feet.

Just thinking how do I approach this problem. Any help please?

Nerveracking answered 23/8, 2015 at 6:46 Comment(8)
Why is the question down voted? -1?Nerveracking
Why not just average all the latitudes to get the average latitude and average all the longitudes to get the average longitude? As the points are all close together should give reasonable results.Doloroso
@Martin. Thanks for the reply. The suggested Answer would work if I have to just get one final Lat/long as result. But here I need a final bunch of Lat/long(S) derived out of each segment. So , out of thousands of records, final output would be CP1, CP2, CP3,CP4...CPn.Nerveracking
Are you trying to produce a cluster and output the coordinates of that?Polydactyl
@Jon Bellamy. Yeah. Correct. To be precise: The records are randomly populated in the DB. Trying to Group them into similar/close/same Lat/Longs & tag them as Cluster-1, and repeat this until we form Cluster-2, Cluster-3..Cluster-N. (Here I assume, Cluster means a group of similar Coordinates with some X Meters/Feet) Once we made this clusters group, I would like to pick 1 representation per cluster. So ultimately, the result would be N records. Each record representing one cluster from the DB.Nerveracking
@Nerveracking OK, understood. OK - do you need this at the database level, or can you do it in (for example) c#? Also, if we're clustering, are you happy with the an arbitrary point representing the centre of the cluster's grid, or do you want the cluster to be the coordinates of the singular record closest to the center of the grid?Polydactyl
Thanks for the response Jon Bellamy. As we are counting in meter/feet, I guess an arbitrary lat/long would be good enough. And I was only thinking about solving this in terms of SQL Server Spatial APIs. If there's another simpler way we could do this, I would certainly like to listen.Nerveracking
Very similar indeed: #330556. Note that SQL Server Analysis Services will do this for you, if your licence extends to including it...Racialism
M
3

Clustering points will be problematic. You are going to have issues if you have two potential clusters close together, and if you need precision or optimization, then you will need to do some research on your implementation. Try: Wiki-Cluster Analysis

However, if the points clusers are fairly far apart, then you could try a fairly simple cluster and then find the envelopes.

Something like this may work, although you would be well served to actually make a spatial column and add a spatial index.

ALTER TABLE Recordset ADD (ClusterID INT) -- Add a grouping ID
GO
DECLARE @i INT --Group Counter
DECLARE @g GEOGRAPHY --Point from which the cluster will be made
DECLARE @Limit INT --Distance limitation
SET @Limit = 10

SET @i = 0
WHILE (SELECT COUNT(*) FROM Recordset R WHERE ClusterID IS NULL) > 0 --Loop until all points are clustered
BEGIN
  SET @g = (SELECT TOP 1 GEOGRAPHY::STPointFromText('POINT(' + CAST(LAT AS VARCHAR(20)) + ' ' + CAST(LONG AS VARCHAR(20)) + ')', 4326) WHERE ClusterID IS NULL) --Point to cluster on
  UPDATE Recordset SET ClusterID = @i WHERE GEOGRAPHY::STPointFromText('POINT(' + CAST(LAT AS VARCHAR(20)) + ' ' + CAST(LONG AS VARCHAR(20)) + ')', 4326).STDistance(@g) < @Limit AND ClusterID IS NULL--update all points within the limit circle

  SET @i = @i + 1
END

SELECT --Clustered centers
  ClusterID,
  GEOGRAPHY::ConvexHullAggregate(GEOGRAPHY::STPointFromText('POINT(' + CAST(LAT AS VARCHAR(20)) + ' ' + CAST(LONG AS VARCHAR(20)) + ')', 4326)).EnvelopeCenter().Lat AS 'LatCenter',
  GEOGRAPHY::ConvexHullAggregate(GEOGRAPHY::STPointFromText('POINT(' + CAST(LAT AS VARCHAR(20)) + ' ' + CAST(LONG AS VARCHAR(20)) + ')', 4326)).EnvelopeCenter().Long AS 'LatCenter',
FROM
  RecordSet
GROUP BY
  ClusterID
Mcwilliams answered 27/8, 2015 at 22:21 Comment(2)
Thanks for the neat reply hcaelxxam. Let me try this and get back.Nerveracking
No problem. Hey, I just realized I left out a condition on the update part, i.e. AND CluserID IS NULL. Sorry.Mcwilliams

© 2022 - 2024 — McMap. All rights reserved.