Best practices for storing postal addresses in a database (RDBMS)?
Asked Answered
T

15

130

Are there any good references for best practices for storing postal addresses in an RDBMS? It seems there are lots of tradeoffs that can be made and lots of pros and cons to each to be evaluated -- surely this has been done time and time again? Maybe someone has at least written done some lessons learned somewhere?

Examples of the tradeoffs I am talking about are storing the zipcode as an integer vs a char field, should house number be stored as a separate field or part of address line 1, should suite/apartment/etc numbers be normalized or just stored as a chunk of text in address line 2, how do you handle zip +4 (separate fields or one big field, integer vs text)? etc.

I'm primarily concerned with U.S. addresses at this point but I imagine there are some best practices in regards to preparing yourself for the eventuality of going global as well (e.g. naming fields appropriately like region instead of state or postal code instead of zip code, etc.

Tartan answered 21/11, 2008 at 23:28 Comment(3)
Right off the bat zip has to be a char field - otherwise certain zipcodes which start with 0 would become inaccurate.Calefaction
As a rule of thumb, when you need to do math calculations with the number, it should be integer. If you only display it, it should be char (telephone, zip code, etc.)Esbjerg
OP is only concerned by US addresses, but since this came up a lot in searches I'd like to point out that zipcodes in other countries can have letters. Like in Canada which alternates letters and numbers. Also, house numbers can possibly have letters in them which are not the same as apartment/suite etc...Baucom
Z
59

For more international use, one schema to consider is the one used by Drupal Address Field. It's based on the xNAL standard, and seems to cover most international cases. A bit of digging into that module will reveal some nice pearls for interpreting and validating addresses internationally. It also has a nice set of administrative areas ( province, state, oblast, etc ) with ISO codes.

Here's the gist of the schema, copied from the module page:

country => Country (always required, 2 character ISO code)
name_line => Full name (default name entry)
first_name => First name
last_name => Last name
organisation_name => Company
administrative_area => State / Province / Region (ISO code when available)
sub_administrative_area => County / District (unused)
locality => City / Town
dependent_locality => Dependent locality (unused)
postal_code => Postal code / ZIP Code
thoroughfare => Street address
premise => Apartment, Suite, Box number, etc.
sub_premise => Sub premise (unused)

A lessons I've learned:

  • Don't store anything numerically.
  • Store country and administrative area as ISO codes where possible.
  • When you don't know, be lax about requiring fields. Some country may not use fields you take for granted, even basic things like locality & thoroughfare.
Zwinglian answered 17/1, 2015 at 2:8 Comment(3)
May I ask what the "name_line" is intended for? I din't really find an explanation in the Drupal Docs or xNal Standard. How I understand it the name_line is for sending real letters or parcels by mail. The first_name / last_name are only needed if you want to address the customer directly, e.g. by email ("Dear Mister <last_name>"). Or is there any other purpose/ benefit to it?Devastation
When delivering to (large) commercial premises, a name is often necessary for the internal mail delivery system (consider office buildings with mail rooms)Kimberly
Address Field has been replaced by Address. Looks like the fields might be a bit differentTammy
F
29

As an 'international' user, there is nothing more frustrating than dealing with a website that is oriented around only US-format addresses. It's a little rude at first, but becomes a serious problem when the validation is also over-zealous.

If you are concerned with going global, the only advice I have is to keep things free-form. Different countries have different conventions - in some, the house number comes before the street name, in some it comes after. Some have states, some regions, some counties, some combinations of those. Here in the UK, the zipcode is not a zipcode, it's a postcode containing both letters and numbers.

I'd advise simply ~10 lines of variable-length strings, together with a separate field for a postcode (and be careful how you describe that to cope with national sensibilities). Let the user/customer decide how to write their addresses.

Fabrienne answered 22/11, 2008 at 1:28 Comment(2)
For what it's worth, this isn't for a web site, but the point about international addresses is still well taken.Tartan
While I don't disagree with the message, and in fact I applaud you for the stance you take, I've had to downvote you because I abhor the fact as someone who spends the large majority of my time writing tools to cleanse address data of storage of address data in a free form format. Addresses may be formatted differently, but the data is still largely the same. Whether a street number is displayed prior to the street name or after is largely irrelevant for storage purposes - only for display purposes.Materialism
V
26

If you need comprehensive information about how other countries use postal addresses, here's a very good reference link (Columbia University):

Frank's Compulsive Guide to Postal Addresses
Effective Addressing for International Mail

Vespertine answered 13/9, 2010 at 7:33 Comment(0)
L
18

You should definitely consider storing house number as a character field rather than a number, because of special cases such as "half-numbers", or my current address, which is something like "129A" — but the A is not considered as an apartment number for delivery services.

Loci answered 21/11, 2008 at 23:53 Comment(0)
R
12

I've done this (rigorously model address structures in a database), and I would never do it again. You can't imagine how crazy the exceptions are that you'll have to take into account as a rule.

I vaguely recall some issue with Norwegian postal codes (I think), which were all 4 positions, except Oslo, which had 18 or so.

I'm positively sure that from the moment we started using the geographically correct ZIP codes for all of our own national addresses, quite a few people started complaining that their mail arrived too late. Turned out those people were living near a borderline between postal areas, and despite the fact that someone really lived in postal area, say, 1600, in reality his mail should be addressed to postal area 1610, because in reality it was that neighbouring postal area that actually served him, so sending his mail to his correct postal area would take that mail a couple of days longer to arrive, because of the unwanted intervention that was required in the correct postal office to forward it to the incorrect postal area ...

(We ended up registering those people with an address abroad in the country with ISO-code 'ZZ'.)

Rebound answered 9/6, 2009 at 0:31 Comment(0)
A
9

Ive found that listing all possible fields from smallest discrete unit to largest is the easiest way. Users will fill in the fields they see fit. My address table looks like this:

*********************************
  Field              Type
*********************************
  address_id (PK)    int
  unit               string
  building           string        
  street             string
  city               string
  region             string
  country            string
  address_code       string
*********************************
Arabinose answered 19/2, 2013 at 9:41 Comment(2)
How do you store PO Boxes?Mainstream
just add another column PO_box If you have to do this retrospectively, that means non of the previous addresses needed a PO Box, so it can be set to nullArabinose
B
8

You should certainly consult "Is this a good way to model address information in a relational database", but your question is not a direct duplicate of that.

There are surely a lot of pre-existing answers (check out the example data models at DatabaseAnswers, for example). Many of the pre-existing answers are defective under some circumstances (not picking on DB Answers at all).

One major issue to consider is the scope of the addresses. If your database must deal with international addresses, you have to be more flexible than if you only have to deal with addresses in one country.

In my view, it is often (which does not mean always) sensible to both record the 'address label image' of the address and separately analyze the content. This allows you to deal with differences between the placement of postal codes, for example, between different countries. Sure, you can write an analyzer and a formatter that handle the eccentricities of different countries (for instance, US addresses have 2 or 3 lines; by contrast, British addresses can have considerably more; one address I write to periodically has 9 lines). But it can be easier to have the humans do the analysis and formatting and let the DBMS just store the data.

Beckon answered 21/11, 2008 at 23:46 Comment(0)
A
8

Unless you are going to do maths on the street numbers or zip / postal codes, you are just inviting future pain by storing them as numerics.

You might save a few bytes here and there, and maybe get a faster index, but what do you when US postal, or whatever other country you are dealing with, decides the introduce alphas into the codes?

The cost of disk space is going to be a lot cheaper than the cost of fixing it later on... y2k anybody?

Austreng answered 22/11, 2008 at 0:7 Comment(0)
S
8

Adding to what @Jonathan Leffler and @Paul Fisher have said

If you ever anticipate having postal addresses for Canada or Mexico added to your requirements, storing postal-code as a string is a must. Canada has alpha-numeric postal codes and I don't remember what Mexico's look like off the top of my head.

Spinney answered 22/11, 2008 at 0:9 Comment(0)
C
3

Where's the "trade off" in storing the ZIP as a NUMBER or VARCHAR? That's just a choice -- it's not a trade off unless there are benefits to both and you have to give up some benefits to get others.

Unless the sum of zips has any meaning at all, Zips as number is not useful.

Cameleer answered 21/11, 2008 at 23:54 Comment(10)
One tradeoff might be database size. In mysql 5, a mediumint row would only take 3 bytes per row while a varchar(5) would take twice as much. I also thought that numeric searches were faster than text ones, but I am not positive on that.Earthy
Pro of using an integer: storage size, con of using an integer: you lose leading 0's and have to put them back after reading. Also if going the character route I would use a CHAR not a VARCHAR in most databases.Tartan
one should use a varchar. Canadian postal code use an alpha numeric encoding, which wouldn't fit well in a number.Faeroese
The trade off, is a varchar2 will work, and a number won't. Reread the question. He has some concern about global postal codes.Faeroese
Just to elaborate on John's comment, many US zip codes have a leading zero (e.g. "06070"), so that info is lost if you store them as a number.Rhodes
While I do understand the "forwards-compatible" logic behind using varchar in this sense, the claim that "zips as number is not useful" is a bit too dogmatic. If you know you are going to be working with US-only zip codes, it makes sense to store zip codes as integers, just like when writing in a strictly typed language, you don't define everything as type String... If you know it's going to be a number, why not lean on the type checking of the DB/programming language and call it what it is - an Integer?Mariel
@Mariel one argument for using varchar is that zip codes are not numeric in the mathematical sense; it doesn't make sense to do addition or subtraction on them; they are merely encoded with a restricted character set. https://mcmap.net/q/162334/-is-it-a-good-idea-to-use-an-integer-column-for-storing-us-zip-codes-in-a-databaseTernary
@SteveFolly And in further support of Zip codes being strings, the leading characters have special significance: en.wikipedia.org/wiki/ZIP_Code#Primary_state_prefixes If one is going to be implementing logic like "what are the left-most characters of the value?" then that sure sounds more like a string than an integer.Kulturkampf
Another reason not to use integer storage (even if only storing US zipcodes) is that it won't work with Zip+4 codes used in the US. An insert of one of these like 04451-2209 could potentially go into the DB as "2242".Fussell
the tradeoff is: if you use an integer your app will use approximately 0.0023% less storage space; whereas if you use a varchar readers of your code won't think you're an ignorant american.Wensleydale
O
2

This might be an overkill, but if you need a solution that would work with multiple countries and you need to programmatically process parts of the address:

you could have country specific address handling using two tables: One generic table with 10 VARCHAR2 columns, 10 Number columns, another table which maps these fields to prompts and has a country column tying an address structure to a country.

Overdone answered 22/6, 2009 at 10:40 Comment(1)
I have actually considered that myself. In addition to, or perhaps instead of a table which maps columns to prompts based on country I was thinking of creating updatable views for each specific address format. Have not pulled the trigger yet, but have thought about it.Imperfect
M
2

Inspired by Database Answers

Line1
Line2
Line3
City
County_Province
PostalCode
CountryId
OtherDetails
Mainstream answered 15/1, 2015 at 12:37 Comment(1)
Should be County_Province not Country_Province.Unwinking
W
1

If you ever have to verify an address or use it to process credit card payments, you'll at least need a little structure. A free-form block of text does not work very well for that.

Zip code is a common optional field for validating payment card transactions without using the whole address. So have a separate and generously sized field for that (at least 10 chars).

Wellrounded answered 23/8, 2013 at 21:51 Comment(0)
O
1

At the moment, I'm developing an international ecommerce website.

It should cover almost all addresses in this world as shown below:

*****************************************************************
Type            Field name    Displayed name in your form         
*****************************************************************
INT             id (PK)            
VARCHAR(100)    building      Apt, office, suite, etc. (Optional)
VARCHAR(100)    street        Street address     
VARCHAR(100)    city          City
VARCHAR(100)    state         State, province or prefecture
VARCHAR(100)    zip_code      Zip code 
VARCHAR(100)    country       Country
*****************************************************************
Offwhite answered 14/6, 2022 at 21:58 Comment(0)
M
-4

I would just put all the fields together in a large NVARCHAR(1000) field, with a textarea element for the user to enter the value for (unless you want to perform analysis on eg. zip codes). All those address line 1, address line 2, etc. inputs are just so annoying if you have an address that doesn't fit well with that format (and, you know, there are other countries than the US).

Moraine answered 13/9, 2010 at 7:43 Comment(7)
What a horrible idea! There isn't enough space in a "Comment" to describe the nightmare this invites. Better to spend a little extra time designing it properly than trying to untangle the mess afterward. See Samm Cooper's answer. I think I have only down voted one other answer here on SO, but this one definitely earned a down vote from me.Imperfect
Which mess? What do you need the data for? Often you only need it to pass it directly to some label printer or similar, and then you can just treat it as a blob of text. Other times you might care about cities and zip codes (but you better make sure you only have customers in supported countries then)Moraine
OP did not mention "only needing to pass it to a label printer" and at every job I have ever had we have used the address as "data", running reports, collecting taxes (Colorado sales tax for appliances being put in a new home vary from one side of the street to the other), assigning leads to sales people, satisfying government compliance requirements, the list goes on and on. "Destroying" data (by mashing distinct items into one field or not capturing data that is available) is a "sin" in my book and has always proven to be the nightmare I warned about when people ignored me.Imperfect
If you later discover that you did not need a piece of data you can always "destroy" it later. "Creating" data, ranges from nightmare (splitting information into separate fields) to impossible (capturing data after the fact). If the OP had said, "only need to send it to label printer" I would have applauded and up-voted your answer. However, without specific mention of something like that a suggestion to "destroy" data, IMO, verges on the brink of irresponsible or even mean.Imperfect
Where I've worked (mostly e-commerce), we tend to store it in 5-6 different fields, but we never, ever, do anything with the information other than use it to send to delivery.Moraine
Consider yourself luckyImperfect
I dare say "proper design" is also apt to create mess. Like, when your users are forced to put the data into the wrong fields to workaround bad assumptions. For example, when they will end up putting their postal code into the city field because otherwise they can't get the correct addressing order.Deify

© 2022 - 2024 — McMap. All rights reserved.