best database design for city zip & state tables

D

5

My application will need to reference addresses. Street info will be stored with my main objects but the rest needs to be stored seperately to reduce redundancy. How should I store/retrieve ZIPs, cities and states? Here are some of my ideas.

single table solution (cant do relationships)

[locations] locationID locationParent (FK for locationID - 0 for state entries) locationName (city, state) locationZIP

two tables (with relationships, FK constraints, ref integrity)

[state] stateID stateName

[city] cityID stateID (FK for state.stateID) cityName zipCode

three tables

[state] stateID stateName

[city] cityID stateID (FK for state.stateID) cityName

[zip] zipID cityID (FK for city.cityID) zipName

Then I read into ZIP codes amd how they are assigned. They aren't specifically related to cities. Some cities have more than one ZIP (ok will still work) but some ZIPs are in more than one city (oh snap) and some other ZIPs (very few) are in more than one state! Also some ZIPs are not even in the same state as the address they belong to at all. Seems ZIPs are made for carrier route identification and some remote places are best served by post offices in neighboring cities or states.

Does anybody know of a good (not perfect) solution that takes this into consideration to minimize discrepencies as the database grows?

Dickens answered 5/1, 2010 at 16:31 Comment(2)

@Mitch: Link? (15 chars) – Ribonuclease 5/1, 2010 at 16:37

Searched quite a bit before posting. I'd love to see another documented case with an explanation. This type of solution is implemented in SO MANY applications. Does anybody care to share their approach? – Dickens 5/1, 2010 at 16:58

R

2

I don't know if you're internationalizing your app, but the general construct is this, with a one-to-many relationship with the following item:

Country
Region (state/province)
City

That's usually sufficient to be able to filter your data in a meaningful way. Trust me on this: you don't want to get into the technicalities of geographic land division.

For an address, store the data above plus street address, postal code (international version of zip code), etc. down to the resolution you need. I say resolution because you could split the address field into things like apartment number, street number, street name, street direction, etc. -- but that data may be dependent on the location, so I would avoid doing that if you are going to internationalize your app. Just a street address field is sufficient 99.99% of the time.

Ribonuclease answered 5/1, 2010 at 16:54 Comment(1)

I agree with the basic principle here. The main issue I wanted to build around was location redundancy and location misspelling. Storing states & cities in tables was key to preventing that. Since I will be coding for the US only, it will prob be a states table and a cities table with a one state to many cities. ZIPs may be an afterthought. – Dickens 5/1, 2010 at 17:24

D

3

Thanks for all the replies. I wanted to give a review & my solution incase someone was interested. The question was "How should I store/retrieve ZIPs, cities and states?"

Jon Seigel gave me a fairly reassuring answer about using: Country Region (state/province) City with one to many relationships.

My reasons were redundancy and misspelling. Allowing any free-input of cities and states columns stored within the address records opens up a slew of issues with queries. Not having relational integrity could allow incorrect cities to states. I just wanted to store locations in a uniform way for users to be able to lookup.

For anybody interested my solution is this:

[state]; stateID; stateName

[location]; locationID; stateID (FK); cityName; zipID

The [location.stateID] is a foreign key relation with a one-to-many to [state.stateID]. I decided to keep the ZIP with the location table as unique ZIPs are not directly relational to a unique city. Also it seems ZIPs are not a basis for city/state boundary determination, rather they are for USPS purposes and actually indicate a carrier route and postal delivery zone which can span cities or even states. Another location record can be added with the same city name and the additional ZIP. This way ZIP searches can result in all cities & city searches can result in all zips if need be.

Dickens answered 5/1, 2010 at 16:31 Comment(1)

Thanks @ryan a, I was looking for just this sort of thing. – Possessory 28/8, 2015 at 13:52

R

3

There is actually some database(with a single table) that the USPS puts out every year with ZIP codes and state and counties and state/county codes. I would look into it. I have an (outdated) copy of it. The schema is pretty simple:


ZIPCODE nvarchar(5) not null
CITY nvarchar(50) null
STATE nvarchar(2) null
STATECODE nvarchar(50) null
COUNTY nvarchar(50) null
COUNTYCODE nvarchar(50) null

(see below)

edit: Also, I would allow your users to add a new zipcode(with city and county and such) because zipcodes are being added all the time..

http://www.usps.com/ncsc/addressinfo/addressinfomenu.htm

edit: Actually, I guess I'm wrong. I don't have an official copy of their database.. I downloaded one of their sample files and their schema seems quite complex.

Rostand answered 5/1, 2010 at 16:59 Comment(3)

thats a great resource. however right now i'm just concerned with getting a design going and not populating yet. – Dickens 5/1, 2010 at 17:7

Well, I mean the UPS gives it to you populated. This way the user can just type their zipcode and you'll automatically know the city and state and such. – Rostand 5/1, 2010 at 17:8

That link is now dead. – Gessner 12/3, 2019 at 15:0

R

2

I don't know if you're internationalizing your app, but the general construct is this, with a one-to-many relationship with the following item:

Country
Region (state/province)
City

That's usually sufficient to be able to filter your data in a meaningful way. Trust me on this: you don't want to get into the technicalities of geographic land division.

For an address, store the data above plus street address, postal code (international version of zip code), etc. down to the resolution you need. I say resolution because you could split the address field into things like apartment number, street number, street name, street direction, etc. -- but that data may be dependent on the location, so I would avoid doing that if you are going to internationalize your app. Just a street address field is sufficient 99.99% of the time.

Ribonuclease answered 5/1, 2010 at 16:54 Comment(1)

I agree with the basic principle here. The main issue I wanted to build around was location redundancy and location misspelling. Storing states & cities in tables was key to preventing that. Since I will be coding for the US only, it will prob be a states table and a cities table with a one state to many cities. ZIPs may be an afterthought. – Dickens 5/1, 2010 at 17:24

S

1

It depends on whether data integrity, normalization is more important, or performance.

However, for most applications, what you really want is a household. So, this information should be stored separately from your customer, so you can represent multiple customers living at the same household.

The household must have the street address, apt, city, state/province, country, zip/postal code.

I would not bother to normalize this by including only a reference to the city (which would be a key since there could be multiple cities with the same name), but you should store a separate table with cities, countries, postal codes just for validation and integrity purposes.

I would put these foreign keys as fields in the household.

I've worked with marketing databases, and developed an AI system for constructing customer keys and household keys for householding purposes for a bank, and this was one of the primary issues. For analysis purposes, we needed to aggregate accounts to the customer level, and customers to the household level. So your representation should support this for future analytical purposes.

Strident answered 5/1, 2010 at 17:3 Comment(0)

S

0

There's not a single right model for this need - there are dozens. To know which is best for you depends on some additional info, like:

performance & capacity - what's driving concern about redundancy?
functionality - what kind of data analysis will be performed?
historical data - do you have to maintain old data? note that zip codes change, and this invalidates some of the offered solutions
internationalization
language
do you have other kinds of locations? You may want a more abstract solution that can consolidate physical with electronic locations - if for example your user wants to pick preferred contact method, etc.
do you want to allow locations to be shared?
any other physical location info also kept or highly likely to be added? Like county, country, lat & long, etc?

Swirsky answered 5/1, 2010 at 18:37 Comment(4)

I thought my intention was pretty straightforward. I need to store location data for retrieval. Cities, states and ZIPs. My inquiry was in regards to the structural design of my tables. I said US only. I did not have any analysis or language requirements. Just Cities, states and ZIPs. One, two or three tables and why. – Dickens 5/1, 2010 at 19:36

Well, no. You didn't say US only, didn't say what kinds of queries you'll be doing (need a drop down list of cities for a given state?), nor what kind of performance requirements to try to meet, nor what languages to support, etc. There's a reason why there are many ways to model this - and it's because of these kinds of requirements. – Swirsky 5/1, 2010 at 23:47

Sorry the US only was a comment to Jon Seigel. I didn't think normalization and good practice depended on exact queries - rather designing correctly for reliable storage and retrieval independant of how I want to display the data which of course should be handled by how I perform my app logic. The relation of Cities, States and Zips I thought would correspond to what they are in reality. What belongs in what and how my tables should reflect that. Sorry for the confusion. – Dickens 7/1, 2010 at 15:50

No problem. The kind of queries don't affect the logical model - but they can affect the physical model, depending on other performance factors. For example, if you have a primarily transactional database with a small surgical queries dipping in via the index for a single row - your physical model can look a lot like the logical. On the other hand, if you're doing a lot of reporting then a logical or transactional model will kill your performance and you'll want to denormalize. – Swirsky 8/1, 2010 at 14:23

Recommended topics

Hot tags