How to best use zipcodes in Random Forest model training?
Asked Answered
B

2

5

I have a dataset with zipcode column. They have some significance in output and I want to use it as a feature. I am using random forest model.

I need a suggestions on best way to use zipcode column as a feature. (For example should I get lat/long for that zipcode rather than directly feeding zipcodes etc.)

Thanks in advance !!

Barbbarba answered 11/9, 2018 at 17:23 Comment(1)
Take a look at H2O's GLRM algorithm and this tutorial on handling zip codes using GLRM plus another supervised learning algorithm.Domeniga
N
5

A common way of handling zip codes or any high cardinality categorical column is called "target encoding" or "impact encoding". In H2O, you can apply target encoding to any categorical columns. As of H2O 3.20, this is only available in R, but in the next stable release, 3.22, it will be available in all clients (JIRA ticket here).

If you are using R, my advice is to try both target encoding and also the GLRM method mentioned by Lauren and compare the results. If you're in Python or another language, then try GLRM for now and give target encoding a try when H2O 3.22 is released.

Nonfulfillment answered 11/9, 2018 at 18:52 Comment(1)
Thanks Erin for your response. I will try it out with H2O 3.22 in pythonBarbbarba
L
3

I'd 2nd what Erin LeDell says about target encoding.

Here are some other options and not all of them may apply:

  • Reduce the granularity of zip Code to the first 1,2,3 or 4 digits. So zip code 90210 becomes 902 (902XX) and would represent Los Angeles County. 902 zipcodes
  • Can you group zip codes by MSA or CBSA?
  • Is there a feature about zip codes that can be appended i.e. city/urban/rural etc.
  • Can you pull in some zip code demographics,population size or income
  • Distance to/from a key location (airport, city center, etc.)
  • Target encode but then group into very high, high, medium and low (or whatever makes sense) example this will help prevent over training your models.
Lavoie answered 13/9, 2018 at 22:54 Comment(1)
Very interesting and useful information Ryan. Thank you so much.Barbbarba

© 2022 - 2024 — McMap. All rights reserved.