display data on real map based on postal code
Asked Answered
D

4

6

I try to display data on a real map (data should be within Ontario, Canada): I have .csv file with two columns, A and B, A is a postal code, B is its associated value (integer, 1 to 5), I want to find the area that A maps to and color it based on the associated value. For example. postal code P0G, P0A (the first 3 digits of Canadian postal code, which represents an area) with associated values 2 and 5, I want to display different colors (maybe 2 with green and 5 with red) on these two areas on a real map.

I don't know how tbh, maybe using Python or some website service or some API? I don't have experience with visualizing data on real map before.

The question I described above is a simplified version. The actual question has more columns (domains), one postal code for each row, I need to have |domains| number of graphs/maps being generated for each domain. But I figured once I know how to do one domain, then I should be able to do all.

I tried to use some online analytical tool like SimplyAnalytics but didn't figure out how to display many areas with different colors at the same time. I tried to search similar questions but it seems because of different end goals and format of data, so the change of code/method may vary vastly.

Thanks!!!

Some sample data will be:

enter image description here

Diocese answered 21/9, 2019 at 20:47 Comment(0)
R
4

If you are looking to do it in python, you could could use the geopandas library. Below is a sample code (Github Gist). First we need to get the shapefile which defines the area for each Postal Code (Forward Sortation Area) (link). Then filter postal codes for Ontario and join it with the data you want to plot.

import geopandas
import pandas as pd
import pandas_bokeh
import matplotlib.pyplot as plt
pandas_bokeh.output_notebook()

canada = geopandas.read_file("./gfsa000b11a_e.shp")
ontario = canada[canada['PRUID'] == '35']

# Sample data to plot
df=pd.DataFrame({'PCODE': ['P0V','P0L','P0T','P0Y', 'P0G', 'P2N'], 'A':[6,3,5,2,2,4] })

# Join ontario dataset with sample data
new_df=ontario.join(df.set_index('PCODE'), on='CFSAUID')


new_df.plot_bokeh(simplify_shapes=20000,
                  category="A", 
                  colormap="Spectral", 
                  hovertool_columns=["CFSAUID","A"])

enter image description here

Rachael answered 21/9, 2019 at 23:45 Comment(10)
This is amazing! Thank you so much!!! Another question: within a particular 3-digit postal code, say M5H, I have more specific 6-digit code like` ['M5H3G8', 'M5H2N2', 'M5H2G4', 'M5H3Y2']` with corresponding values [3, 2, 4, 1], how can I produce a similar map with more specific postal code?Diocese
@Diocese It appears geographical boundaries for 6 character post codes are not publicly available for free. There are either available though a university or with paid licensing option. If you have access to those data, you can follow a similar approach as above.Rachael
@Sanik I really appreciate your help.I downloaded ONldu.zip from the link, after I re-point to the right .shp file and update df=pd.DataFrame({'PCODE': ['P0V','P0L','P0T','P0Y', 'P0G', 'P2N'], 'A':[6,3,5,2,2,4] }) to df=pd.DataFrame({'PCODE': ['M5H3G8', 'M5H2N2', 'M5H2G4', 'M5H3Y2'], 'A':[6,3,5,2] }), it gives me error msg KeyError: 'PRUID', I don't know what key I should use here (I found [www150.statcan.gc.ca/n1/pub/92-162-g/2012001/tech-eng.htm#a1] and [www150.statcan.gc.ca/n1/pub/92-162-g/2012001/tbl/… but still didn't figure out), can you please help?Diocese
@Kenni Since the data file you have downloaded is only for Ontario, you do not need the line ontario = canada[canada['PRUID'] == '35'] which I had used to filter the data only for Ontario.Rachael
@Sanik Thanks. I commented out ontario line, I used new_df=canada.join(df.set_index('PCODE'), on='CFSAUID') instead, it throws error KeyError: 'CFSAUID' so I removed on='CFSAUID' and used hovertool_columns=["A"] instead, but it runs forever, any hint what is happening? Where can I find documentation about this?Diocese
You will have to check what fields are available in the dataset (run canada.columns) and modify the code accordingly. I don't have access to the data, so cannot help much on that.Rachael
Thank you. I read the documentation and changed it into new_df=canada.join(df.set_index('PCODE'), on='POSTALCODE'), it has no error before plot_bokeh but because the data is massive (more than 555696 rows) so plot_bokeh is extremely slow, do you know any way to speed up?Diocese
You could try reducing simplify_shapes to say 5000. It would reduce the resolution/quality of the map, but should be faster.Rachael
Thanks for all the help.Diocese
@Rachael Do you know how the colormap can be inverted? Unfortunately, colormap="Spectral_r" doesn't work and the documentation of pandas-bokeh is horribly bad and does not even mention such a basic feature for inverting a color map.Sessile
D
2

@Samik's answer is great, it works perfectly on 3-digit postal code. However, for 6-digit, plot_bokeh is really slow. In my case, Ontario boundary shapfile took 21 hours to render (I timed it in Python, maybe my machine is slow)!!! If you have multiple domains, it will be 21*|domains| hs, time will be a huge issue.

A better way for 6-digit (large files in general), use Tableau , load spatial files and render map, select proper parameter to customize your map, it will be way quicker than plot_bokeh; however, using Tableau doesn't involve programming, it suits better for general users.

Diocese answered 26/9, 2019 at 17:56 Comment(0)
S
2

Alternatively you can take hand of pgeocode library to convert zip codes into lat/long coordinates. It will return the middle point of the polygon but solves a lot of scenarios

import pgeocode
nomi = pgeocode.Nominatim("ca")

dfzip = df["zipcode"].apply(lambda x:nomi.query_postal_code(x))
df = pd.concat([df, dfzip], axis="columns")

The rest is just plotting points with the library and technique of your choice

Swacked answered 7/12, 2020 at 19:4 Comment(0)
A
0

@Kenny, have you tried to play with the simplify_shapes option of plot_bokeh. If your shameful is very big however, pandas-bokeh will not be the right choice for your problem.

Aerometry answered 3/10, 2019 at 14:46 Comment(1)
I have not but as I said in my own answer, Tableau does an excellent job. As my OP says, it's not restricted to programming as long as it gives me good data visualization. But thanks anyway.Diocese

© 2022 - 2024 — McMap. All rights reserved.