Normalize an Address
Asked Answered
P

6

5

I am trying to normalize an address.

The diagram below shows the relevant tables for this question I believe. I want to know how ZipCodes should be integrated into the model. This would be for international addresses so I know that a Zip/PostalCode is not used everywhere. I think City::ZipCode is 1::0-n (I have read others saying this is not always the case but they never provided evidence). If they are correct then I guess this would be a many-to-many relationship. Since each Address can only have at most one ZipCode while a ZipCode can contain many addresses I am lost at how to normalize this model.

Since the Address may or may not contain a contain a ZipCode I need to refrain from having that as a nullable FK in the Address table.

EDIT: Just want to emphasize that the entities and attributes provided are drastically scaled back from the actual DB. It is only used as a reference and to address my concern of where to include zipcodes into the model.

enter image description here

Pruritus answered 3/4, 2011 at 15:57 Comment(1)
USPS ZIP codes consist of five digits. ZIP+4 consists of nine digits. But Canadian postal codes, to pick one neighborly example, consist of six characters, alternating alpha and numeric. Do you think all those values are drawn from the same domain?Studhorse
S
7

To normalise the schema you have; add a table Address-ZipCode table, with foreign keys Address ID and Zip Code; and primary key Address Id - identical to that in the Address table. Then include the Zip codes by using a Left Join between address and the new table. The new table will only be populated when an address has a zipcode.

However, I would suggest that if you are trying to accommodate international addresses, the schema you have is likely to be inadequate - you will need multiple address lines and more levels of category than shown in your diagram. Categories missed include country, sub-region, town, and possibly others.

My answer here (which is extremely long) shows what is needed to deal with international addresses (and other things) comprehensively. This is massive overkill unless you are dealing with millions of addresses in each of multiple countries.

Sailer answered 3/4, 2011 at 16:27 Comment(0)
R
3

I haved struggled with this for several different apps over the years. How you set this up depends upon your needs. I work in affordable housing, and one of the things we need to do is relate different geographical components (City, county, state, etc) to various REGIONS as defined by HU (Housing and Urban Development in USA).

What I ended up with looks a little like THIS:

tblState:
    StateID
    StateCode (AL, AK, AR . . . etc)
    StateName (Alabama, Alaska, Arkansas,  . . . etc)

tblCounty
    CountyID
    HUDRegionID FK to tblHUDRegion
    StateID FK to tbleState
    CountyName (Pierce County, WA; Lane County, OR)
NOTE: I recognize I could normalize even further and create a table of count names, many-to-many related to States ON stateID, but there's a limit, man!)

tblCity
    CityID
    CountyID
    CityName

tblZIPCOde
    ZIPCodeID
    CityID

tblHUDRegion
    HUDRegionID
    HUDRegionCode
    HUDRegionName

In my case, HUD regions are defined at the county level (one HUD region includes one or more counties (or "County-Towns" in some cases). Each HUD region actually has a Unique Identifier defined ny HUD (The HUD CBSA_Sub), which I use as the "HUD-region_code". Also important to note is that HUD regions can include counties in one or more states. Therefore, the HUD region identifier is related to county, but only indirectly to state, THROUGH each county. For example, the HUD "Portland/Vancouver/Beaverton" HUD MSA includes counties (and cities) in both Oregon and Washington states.

In YOUR case, you will need to define one more top-layer, tblCountry. Further, you may need to adapt the concenpt of "county" and "state" a bit to accomodate other countries ("Province" and whatever they use for subdivisions greater than City, but less than state. "Region" may work in this case as well - I believe many european coutnries use "regions").

A country has one or more States (or equivelent). A state has one or more Counties (or equivelent). A County Has one or more Cities. And Cities tend to have at least one postal code.

Regions such as the HUD region, in my case, tend to be defined as aggregations at one of these levels.

In many cases, outside of this HUD-driven model I had to develop (it is often the case one needs to ascertain which HUD MSA one is working by ZIP, or by County. In all cases, it is not safe to assume that HUD regions are contained within a apecific state.

Also important to note is that the USPS periodically change the ZIP codes for certain areas.

Riposte answered 3/4, 2011 at 16:55 Comment(1)
Also note that a city can have multiple zip codes AND a zip code can serve multiple cities. (My zip zip code serves 2 cities. As did my previous zip code with 2 different cities that were in two different counties.) Zip codes are weird.Muddy
S
3

Depending on what country you are in the rules for postal codes can get pretty dicey. You are pretty safe to assume that a postal code has one official city name, but both the U.S. and Canada allow for alternative city names for a postal code. I know this for a fact because I develop postal address validation software for North America. The non-official names are often recognized by the postal authorities and you typically have to permit their use.

Therefore, if you want to be able to use non-official names you need m:n between city and postal code. I would question why you want a code table of postal codes in any case. It is better for address storage to treat these as independent attributes rather than trying to normalize them.

If you somehow think that you will be able to use some data in your database to go backwards from postal code to city name or forwards from city name to postal code, then you are setting yourself up for disappointment! There are USPS and Canada Post recognized software solutions for doing address validation and if you spend any time actually looking into it, you'll find that the problem domain of address validation is much more complicated than you think it is. If address accuracy is important to your app (and it should be in most cases) then go buy 3rd party tools to do your address validation and store your addresses in a single table with as many columns as makes sense to you.

Sverre answered 3/4, 2011 at 17:33 Comment(1)
In my response I focused on what to do about ZIP codes. You can go with the ADDRESS >---|- CITY >---|- REGION structure if your business requirements are that simple. Just don't try to insinuate ZIP code into this hierarchy because it is bound to be too restrictive.Sverre
B
2

Normalizing, or standardizing addresses, is a huge problem for most entities that need accurate, regularly-formatted addresses. (I work in the address validation industry - for SmartyStreets - so I've dealt with this a lot.) Because of the complexity of different delivery endpoints, address changes, updates to the components of an address, and many other things, it's best to recruit a certified service to take care of that for you.

Assuming you're working with US addresses, there are APIs or list processing services you can hook into pretty easily to get the data you need. For example, if you're having issues with a NULLable ZipCode FK, then you might as well append the ZIP code to every address (if it can't find one, then why keep it because it is a bad address anyway).

One such service is SmartyStreets' Address Validation API, which handles API requests, or you could process an existing list/table of addresses with our Bulk Address Validation Tool.

Brassie answered 26/1, 2012 at 19:40 Comment(0)
A
0

zip codes has_many addresses / address belongs_to zip_code. Do you need to normalize? Most apps it's best to just have a zip_code column in the addresses table. Maintaining all the zipcodes for international addresses is an uphill battle.

Also you are duplicating region_id in address and city. You might need to explain what region is in your app but this looks like it only needs to be in city.

Alienee answered 3/4, 2011 at 16:32 Comment(1)
See my response to Chris, this diagram is drastically simplified just to ask the question. Region is a generic term with addresses to capture State / Provinces / Regions as many different terms are used throughout the world.Pruritus
E
0

Worldwide, 119 of 190 countries use postal codes. Notable countries which do not use them include Ireland and Panama.[1]

Besides supporting that fact, it would be a very annoying system which insisted on having a postal code. It should also allow postal code unknown.

In the U.S., each "city" has at least one zip code, so the relationship is correct. I know this from developing a zip code database for about a year.

Eustazio answered 3/4, 2011 at 16:34 Comment(3)
I was trying to refrain from having to use Nulls. At this point of course I am still questioning what approach to take with zipcodes. I know that USPS changes zip codes all the time (I believe monthly) but handling the few exceptions that might arise might be less of a worry then keeping up with their constant changes. Not to mention the unknown changes that might arise around the world. I have a fairly large list of a worldwide breakdown at this point which is a great starting point.Pruritus
Actually zip codes don't change all that fast. A given address, on average, is affected by a zip code change less than once every 20 years. There are only hundreds of zip code table updates each month, and most of those don't affect many addresses. Most of those monthly changes affect the FIPS fields and the allowable alternative names for a zip code. One can easily use five-year-old data without a complaint, even from databases with tens of thousands of consumers.Eustazio
Cool thanks for the added insight. A lot to consider with capturing and managing this type of info. I am sure that with the housing burst a lot of changes were made. My neighborhood experienced it since USPS decided our initial zip code was overcrowded or something and depending on when your house was built you may have switched zips within a very short time after moving in. I am not sure how old the dataset I have is, although it can't be that old. Regardless it is still a decent starting point.Pruritus

© 2022 - 2024 — McMap. All rights reserved.