Dimension for geozones or Lat & Long in data warehouse
Asked Answered
Y

1

6

I have a DimPlace dimension that has the name of the place (manually entered by the user) and the latitude and longitude of the place (automatically captured). Since the Places are entered manually the same place could be in there multiple time with different names, additionally, two distinct places could be very close to each other.

We want to be able to analyze the MPG between two "places" but we want to group them to make a larger area - i.e. using lat & long put all the various spellings of one location, as well as distinct but very close locations, in one record.

I am planning on making a new dimension for this - something like DimPlaceGeozone. I am looking for a resource to help with loading all the lat & long values mapped to ... something?? Maybe postal code, or city name? Sometimes you can find a script to load common dimensions (like DimTime) - I would love something similar for lat & long values in North America?

Yanyanaton answered 25/7, 2017 at 21:25 Comment(3)
Not sure why this was downvoted, it is an interesting question. Think about using a structure like ISO-3166 and ISO-3166-2 (en.wikipedia.org/wiki/ISO_3166-2) for your grouping, and use a geocoding API for your place names.Boggess
Can you recommend any APIs? So I would write a script (not in SQL I am assuming) to lookup lat&long via the API to get a place name returned?Yanyanaton
Here's one: developers.google.com/maps/documentation/javascript/examples/…Boggess
S
3

I've done something similar in the past... The one stumbling block I hit up front was that 2 locations, straddling a border could be physically closer together than 2 locations that are both in the same area.

I got around it by creating a "double grid" system that causes each location to fall into 4 areas. That way 2 locations that share at least 1 "area" you know they are within range of each other.

Here's an example, covering most of the United States...

IF OBJECT_ID('tempdb..#LatLngAreas', 'U') IS NOT NULL 
DROP TABLE #LatLngAreas;
GO

WITH 
    cte_Lat AS (
        SELECT 
            t.n,
            BegLatRange = -37.9 + (t.n / 10.0),
            EndLatRange  = -37.7 + (t.n / 10.0)
        FROM
            dbo.tfn_Tally(1030, 0) t
        ),
    cte_Lng AS (
        SELECT 
            t.n,
            BegLngRange = -159.7 + (t.n / 10.0),
            EndLngRange = -159.5 + (t.n / 10.0)
        FROM
            dbo.tfn_Tally(3050, 0) t
        )
SELECT 
    Area_ID = ROW_NUMBER() OVER (ORDER BY lat.n, lng.n),
    lat.BegLatRange, 
    lat.EndLatRange, 
    lng.BegLngRange, 
    lng.EndLngRange
    INTO #LatLngAreas
FROM
    cte_Lat lat
    CROSS JOIN cte_Lng lng;


SELECT 
    b3.Branch_ID,
    b3.Name,
    b3.Lat,
    b3.Lng,
    lla.Area_ID
FROM
    dbo.ContactBranch b3    -- replace with DimPlace
    JOIN #LatLngAreas lla
        ON b3.Lat BETWEEN lla.BegLatRange AND lla.EndLatRange
        AND b3.lng BETWEEN lla.BegLngRange AND lla.EndLngRange;

HTH, Jason

Shorts answered 1/8, 2017 at 21:8 Comment(4)
This looks like it will work - or is a very good start anyway. Thank you!Yanyanaton
No problem. Glad to help.Shorts
What does your function tfn_Tally() do?Yanyanaton
It's just a function version of a traditional tally table (others call it them "numbers tables" too). It's an idea borrowed from Itzik Beg-Gan. itprotoday.com/microsoft-sql-server/…Shorts

© 2022 - 2024 — McMap. All rights reserved.