I read this article a long time ago and I took it seriously. So instead of asking for address, postal/zip code, city, state/province, I just put up a big text area labeled "Full address" so that people have complete freedom about what to fill in.
80% of the users ended up only filling in their street address, not their postal/zip code, city and state/province, even though they're from countries where (most?) addresses satisfy that format.
We ended up reverting to the previous form where we explicitly asked for postal/zip code etc.
This is where most of the "falsehoods programmers believe" articles fall flat.
The thing to do is not try to design a single form that accommodates every possible address or name or whatever. The thing to do is examine your use case, design a form that works for, say, 99.9% or more of the people you want it to reach, and then if you really want to get the last 0.1% have someone who can do customer service and has access to a freeform text box into which they -- not the end user -- will enter the information.
Because when you get right down to it, the freeform text box is the only thing that accepts 100% of valid addresses (or valid names, or whatever). But it also necessarily accepts a ton of invalid ones, too, and so it causes more trouble for the common cases than it saves in the uncommon ones.
So maybe have a typical address form and add a checkbox labelled "My address doesn't fit the required format" that would replace your usual form with a freeform text field?
I've found that if you have authoritative addressing data for your target market(s), a search/autocomplete field is a great alternative to the multi-field form being talked about in this discussion. Having access to that sort of data is, of course, a pretty big caveat as much of it is currently proprietary but efforts are being made globally to address (couldn't resist!) this (e.g. http://openaddresses.io, http://alpha.openaddressesuk.org).
"Punt and let customer support handle it" is a pretty lazy solution, in my view. Additionally, what a developer might think of as a "99.9%" solution is more often than not a 99% solution, or a 90% solution. With enough customers, dealing with everyone who can't use your slapped-together form will be more expensive than doing it right.
Software doesn't become great by assuming someone else will handle the edge cases and details, although you may get away with selling it for a while until your user base grows.
With enough customers, dealing with everyone who can't use your slapped-together form will be more expensive than doing it right.
The way you write shows some deep assumptions: that this is a case of "slapped-together" vs. "great" software, for example. And those assumptions are, I'd assert, counterproductive to the discussion at hand.
The comment I was replying to actually was a real-world example of what you're missing, namely that this is a trade-off. Every time you support another 0.0001% of corner cases, you're increasing the surface area of potential failures for other cases. And since the other cases are far more common, they're going to bite you more often.
Supporting the most common cases, to a 99.9% or so level, directly and then having an escape hatch for the rest is not "slapped together" -- it's deliberate design which takes the problem and the cost/benefit of different solutions into account. It's also, from that perspective, almost certainly the correct design.
And yet it gets blasted repeatedly in articles like the OP here, which assume that the only possible reason for such design is ignorance/incompetence on the part of the developer. That simply is not true, and you should stop implicitly buying into that (it's another assumption that's hidden in the way you write).
Except, in this scenario, it's relatively easy to come up with a good 99.9% solution:
Seven text fields will cover greater than 99.9% of users:
Name
Address Line 1
Address Line 2
City
State
Country
Postal Code
So much of that article discussed stuff that was irrelevant - Live in Singapore? Great, just fill in Singapore, Singapore, Singapore. I've done that on many, many sites and it's always worked fine. Don't have a postal code in Ireland? Most people there learn to try EIRE or 00000.
I think a better article would have been:
"Here are the scenarios in which the 7-text address system doesn't work, and how you can make it better." - But I couldn't find anything that suggested it wouldn't work just fine.
My country unfortunately doesn't fit that scenario. Here in the UAE there is no residential postal delivery, so if you want to receive letters you have them sent to a PO Box. Most people usually use their office's, but I have my own. That means you can reach me with simply:
PO Box XXXX
Dubai
UAE
For sending parcels to be delivered by a courier you can include a physical address, however most streets here don't have names (and even if they do people have no idea what they are) so people go by directions. Which means something like:
Flat XXXX
Building Name next to / opposite / near Landmark
Area Name
In most cases it's best to include a phone number so the courier can phone for directions if they get lost (without street names it's easy), and of course everyone requires a name even though I'm the only person living here and using this PO Box. That means my 'full address' ends up being:
Name Surname
Tel: 05XXXXXXXX
Flat XXXX, Building Name
Next To Landmark
Area Name
PO Box XXXX
Dubai
UAE
In most cases that never fully fits in the length of the fields, or they do something silly like requiring a postcode but limiting it to 5 digits (luckily my PO Box number is only 4 digits in length, but in reality they are [0-9]*). Anything I order from eBay has "NOTPROVIDED" on it as I left out an optional field they think should always be included :D
Edit: Area names have their own fun: There is a road going from one end of the country to the other which in Dubai is called Sheikh Zayed Road (it has a number and different names in other emirates :D), but that is also the name of an area along part of the road.
I think this is the key point. Sure, you can come up with some clever and unique system to accommodate the one percent of your user base who isnt well served by standard forms, but that 1% is already used to adapting their unique information into a standard format.
Web forms for things like name and address are a sort of ad-hoc standardization of a nonstandard data type.
The main thing is not to make every field required. I've got nothing meaningful to fill in at state, nor at address line 2. Too many sites make too many false assumptions about addresses.
I have never seen anywhere require "Address Line 2." It's only used as a specifier to the address when there are multiple units at the address (office buildings, apartments, and the like).
Great software is that which solves your business problem.
If not "punt to a human", what's the solution? We've already seen that giving users a free-form text box is a cure worse than the disease. Do you analyze every country's address scheme and make a dynamic form that covers all the cases? (worse IMO - it's not going to cover all the cases, and it's going to be even more infuriating for users when it goes wrong. See the Jersey example in another comment) Some wizard solution that I can't think of? Saying this approach is "lazy" is all well and good, but do you have a positive proposal?
At least it helps if you don't block on empty fields then - I keep getting annoyed when sites demand I enter "state/province" for international shipping when my country doesn't have that concept.
Or demand that I enter 6-digit postal code when we only use 4. Or not being able to handle non-ASCII chars. And things like that - helping people is fine and useful, but blocking a user because you set up assumptions valid only for your locality is a huge annoyance.
If your country doesn't have the concept of separate city/country, (Singapore doesn't) - you can just enter "Singapore, Singapore, Singapore"
Or UK, London, London.
Totally agree that systems should not presume to known (often incorrectly) the format/content of such fields, outside of areas where they can. (I.E. Validating Postal Codes in regions like Canada, United States, etc..)
> I keep getting annoyed when sites demand I enter "state/province" for international shipping when my country doesn't have that concept.
And uses a select box of US states, so even if your country did have such a concept (and used it for mail, which may not be the case) you couldn't give it.
Yup, I’ve done that far too often, I even have a bookmark to a small script that just replaces the next input box I click on with a freeform text input.
We tried the same basic move — one open-ended address field instead of a set of more specific parts — in an attempt to simplify our sign-up flow. We too have since reverted to a more conventional multi-part form.
First thing we discovered: We might have been happy with one big box, but none of the payment services we use worked the same way. This more than cancelled out any usability benefit, because we could no longer reliably pre-fill the address fields on any form for card or bank details. Our customers would still wind up entering the specific parts of their address anyway, but now they effectively had to type the same address twice during the sign-up process as well! (Edit: There’s also a related issue that some browsers will remember and pre-fill fields that look like common address parts automatically, which to my knowledge no browser does for an open-ended address.)
Second thing we discovered: Same as 'FooBarWidget, given a freeform text field, some customers will give you beautifully formatted multi-line addresses, some will stick everything on one line, some will stick just the first line on that one line, some will assume you meant e-mail address(!) and so on.
More recently, we also discovered a third issue because we’re in the EU: I can see no reasonable way to automatically parse any common address format that is sufficient to comply fully with the new VAT place-of-supply rules, but to have a chance of even getting close in the 99% case you at least need to have a separate country indicator.
I do sympathise with the frustration of having to break down addresses into different fields. If the big bureaucracies like payment schemes and governments had more practical rules for working with customers in different locations, one big box really is all we should need, though getting customers to enter something valid in it is still a tricky issue. But since those more practical rules seem unlikely to happen any time soon, there is little either we or our immediate service providers can do but follow the dubious but widely accepted conventions anyway.
Did you need it for delivery of something? Was that made very clear to the user? I can't believe so many people, wanting to receive something through the post, would wilfully screw up their address - maybe there's more to your use-case than meets the eye?
Simple clerical errors rarely invalidate a contract. I'm not a lawyer though so I don't actually know.
I would have done address line 1, address line 2, line3, etc. I think people got confused by your form because they are used to typing in 1234 Sample Street <tab> expecting to be asked the rest of the information further down. The other option would be on form submit to show what they have put in and say "make sure this is your full address [CONTINUE] [EDIT ADDRESS]."
Don't know about the US, but here in the UK having an incorrect address in a contract may make it trickier to enforce a court decision against a company as certain kinds of documents used in the court process are only valid if served to the Registered Office of the company.
Why would the address on the contract matter? If the contract lists 'ABC' as their address, you still need to serve papers to their registered office which may have always been XYZ, or may have changed after the contract was signed - so in any case you have to ignore the address on the form and use the official one that you look up from the registry.
The only case where I've seen address used as a disambiguator is when treating multiple private individuals with the same name, but in that case also you want some official ID, not the address which may change frequently (and may contain a different John Smith than the one who lived there a year ago).
Well, if putting down the wrong address on a contract is enough to invalidate the contract, then people would be "accidentally" fudging their addresses all the time. :)
Really, I am no legal expert but I did once sue someone in small claims court. They wanted to know what I did to verify the other party's address, not just taking their word for it.
I cannot imagine how such a law, dealing with international addresses and all the issues identified in the article could ever be practically applied, but I guess YANAL.
Did you try pre-filling the address field with an example address (John Doe, 123 Example Avenue, Hobbiton 1234) to make it clear what is required? (Although it seems like placeholder text only works with textarea fields since HTML5...)
Side by side examples would be better, because the example is still there for double checking after you've filled in the data. Pre-filled obliterates the example.
Indeed, I've been on the receiving end of somebody taking this kind advice to heart once, and things not working out too well.
At that job the we had a legal obligation to let anyone in the country to opt their building out from a certain database, by entering the address on a web form. We'd then send a snail mail verification code to that address, and enact the opt out on entry of the code.
This was covered incredibly well in the media, and the number requests was in the millions (don't know how many of those actually turned out to be valid). I suspect the people who'd made the web form had read this exact rant about addresses, since contrary to how things were normally done, they'd just put in a big textarea for the address. Now, this was completely unnecessary even by the standards of the "falsehoods" article, since it was a country with very well established address conventions, and since by definition no addresses from other countries should be entered. And as might be predicted, it caused some problems.
First, just as you note, people didn't really understand what they needed to enter into a textarea like that. They didn't do as bad a job as in your case, but you'd have things like people leaving out zip codes, leaving out the city, putting their name on the first line (even though the name had been asked for separately), entering all address components on the same line, entering addresses in different countries, and so on. And unfortunately the geocoding service used to let them check whether the address had been interpreted correctly (for the purpose of "this is the address that should be expunged from the db") was very good at finding the right location even with badly malformed addresses, and no other input validation was done. BTW, it's quite likely that without this geocoding step the amount of bizarrely formatted or just outright invalid addresses would have been higher.
More importantly, even ignoring the data quality issues, the services used sending out millions of letters in bulk would not accept free form addresses in general. No, they needed the address broken out in separate fields, exactly in the way they would have already been stored if we'd had the kind of structured input form that everyone uses.
My part of the story was then to clean up the mess, a task I got despite being in a totally different group, since I happened to work on the address extraction parts of the geocoder at the time. A disaster always takes precedence over real work :-/
Writing the code to do the right thing 99% of the time took a few days, and I don't want to know how much time was spent by someone manually on the remaining 1% that were flagged by the program. I somehow doubt that anywhere near as much time was saved on punting on the web form.
Knowing that addresses may not conform to any arbitrary rules should lead to disabling automatic validation (or downgrading it to nothing more but suggestions, "are you sure that's correct?" hints).
Pretty much sums it up. Instead of a simple form and some "I have an unusual address format" check box that gives a more free form field we have people abandoning all restrictions.
And being burned by the total anarchy tried, there is the usual discussion about localization not being worth it that simplifies to "I don't believe those not like me should get my attention".
"An address will exist in the country's postal service's database"
This is the one that I've run afoul of -- not because I live in a brand-new building, or a houseboat, or 30 miles from anywhere on an unnamed road; it's just that there is no door-to-door delivery for houses within a 2-block radius of the local post office, so we have a PO box.
The trouble happens when a business that needs my street address uses the USPS database for address verification. One example is online stores that don't ship to PO boxes. Some of these sites have a form with a sort of "Are you sure" prompt when my street address isn't recognized; others just refuse to accept it.
Even worse was when the local company that picked up my trash was bought by one of the larger regional "waste management" operations, and all the drivers' routes were re-planned for "efficiency" (evidently using software that hit some USPS database); the upshot was that everyone on my street had their address removed from the pickup routes.
Your address should be included in the postcode database if you are registered to vote. Sometimes new addresses take a few months but they should get added eventually.
Related--the "commonly used" address is different from the "official" address. When I moved into my current house, it's official USPS address differed from what many of the local service people thought was the address (and, indeed, what many maps showed to be the address).
In these days of GPS--and just the passage of time--it's not a problem any longer. But for the first five years or so I lived in the house, I had to be careful to explain the situation to people coming to the house.
That people have addresses at all, or can describe their residence in an unambiguous or clear way (even using GPS coordinates).
I used to live in a place I couldn't even remotely give directions to. It was deep within a neighborhood of a poorer country, none of the streets had names, none of the buildings are numbered. I lived in an building where none of the apartments had numbers or names.
If I wanted something delivered, I would go to a local shop for the company delivering it, and show my ID, and they would have it routed there if it wasn't in the building already.
If you wanted a billing address for, I don't know, tracking me down, initiating lawsuits, something like that? I honestly just assume that's impossible.
Furthermore, just because you don't have an address doesn't mean you can't receive mail! At least as recently as a few decades ago, there were rural communities in the US where nobody had a street address; the postal service knew where everybody lived and would deliver mail given just a name and town. I'm not sure whether this arrangement still exists in the US, but I'm pretty sure it still does in Ireland and probably elsewhere.
The push for 911 changed a lot of places by assigning street names, but the 911 folks are sometimes the only ones who know the names. UPS still delivers to some vague addresses on the reservation. Shipping to "House 311 behind the school" does tend to confuse a few Internet merchants.
When I was growing up in the 1960s, we just had a street name (which probably wasn't even required) and RFD #1 and the name of the adjacent town that handled rural delivery for the area. And this wasn't the back of beyond; it was just a couple miles outside one of the main Philadelphia suburban spokes. At some point they gave us a Box number to use although we didn't actually have a postal box and they continued to deliver in the usual manner at least until we moved around 1980.
In the US though, there was a real push to rationalize street addresses for emergency services in the late 90s or so. It may not be universal but I know of even summer camps at the end of dirt roads in Maine that have street addresses now.
How can GPS coordinates be ambiguous? I can see different units of measure/notations being used, and the accuracy might be a problem for extremely close addresses, but wasn't it the point of GPS to provide an exact reference for any point on the planet?
Bingo. In my apartment building none of the units had numbers. I could point you to my unit, which was one of two doors on my floor, the other leading to a unit directly above me. So even GPS + Floor wouldn't quite me enough, although that's close enough that we could figure out who the package was for.
Then the full name should be anough to determine the floor. More problematic though is the inaccuracy of GPS. Where I live, my smartphone's GPS encircles 3 buildings.
In the UK, locations may be encoded as Ordinance Survey map references (I've heard of people using OS references as addresses). It is what most geo databases in the UK were keyed on, until a few years ago.
The OS uses a different datum to the GPS system, so if you convert to lat/lon you'll be out by about 150 metres (where I am, at least). You need to re-project the coordinates onto the other datum to get the real location.
Is it common to believe that post codes don't start with zero? All of New England (ME, MA, NH, CT, RI, VT) have zero starting post codes. Plus apparently a part of New Jersey. Map here: http://en.wikipedia.org/wiki/ZIP_code
"Correspondence to and from Rensselaer Polytechnic Institute uses the official address of 110 Eighth Street, Troy, NY 12180. This address serves as a mailing address only; you will not find a building with that number on 8th Street."
I got in to an argument in school with my computer programming teacher. A BASIC course, we were having to design a system to accept an address, and I was treating my ZIP code as a string.
IIRC, something like
50 INPUT "Your ZIP code?", ZIP$
Everything was reviewed via handwritten code and flowchart before we were allowed to type it in in the lab, and I was told "ZIP code is a number, but you're putting it in to a string, that's wrong, fix it".
"But a ZIP code starting with 0 would then not have the 0 at the front when we show it back to the user" I said.
"ZIP codes don't start with 0"
Me: "Some do"
Her: "No, they don't"
I lived in Michigan, but was a huge infocom fan, and they were in MA, and had a leading 0 in their ZIP code. We argued about this for a good few minutes, and it wasn't until I brought in something the next day that demonstrated legit ZIP codes starting with 0 that I 'won'. Crap like this reinforced my distrust of authority and cynicism in life (for better or worse).
EDIT: Anyone else remember the "New Zork Times" newsletter? :)
EDIT 2: It wasn't until much later that I learned the ZIP code system we had wasn't actually completely formed until after she was at least a teen, if not a full adult - she simply wasn't exposed to stuff that I was earlier, it didn't really impact her, and she just assumed they were all numbers with no leading 0s. In some ways a minor point, but... it also taught me about stuff I took for granted not always having been there, even something as basic as addresses. Didn't really learn that until much later after that class.
The worst is when you store it as a string, having learned from others' mistakes, but then upon exporting to CSV... Excel kindly strips the leading 0 and you get a bug report that your software is screwing up the ZIP codes!
Excel is the bane of my existence.
I give a lot of speeches about how to preserve data precision.
However It seems to be largley in vain, as there is often a lot of back and forth when handling data between different groups, even programmers don't seem immune from screwing up a csv file with excel.
Can't blame them. The only other choice they usually have is some crappy, poorly thought "database system" written by IT people who are poster children of this article. I'm a programmer and I defend use of Excel in the offices - I've actually worked in such an office before and I know that all the alternatives suck more.
And I guess that's one of the biggest mistake programmers make - assuming that real-life data will conform to some imaginary, bureaucratic, fixed format. Even schemas ain't fixed in real life. That's why people use Excel.
The other awesome is taking a screenshot, pasting it in to MS Word, then mailing the docx file. The entire screenshot then ends up being 3 inches across.
I remember when we did MS Access basics in my ICT class in what Americans would call middle school, and my teacher correctly mentioned that a telephone number is text, not a number. :)
Yea, this seems to be the case. I've had the leading zero stripped on a few occasions when receiving items at an FPO AE address (all APO/DPO/FPO AE zip codes start with a zero).
There are so many misconceptions from developers, even more when building apps used worldwide. If you plan to accept data from different countries, free text with no validation is the only acceptable answer.
I remember once we had to remove validation from names because some countries don't even have last names, and others have real names with two or even one characters.
free text with no validation is the only acceptable answer.
Unfortunately, it's not an acceptable answer in practice.
In my experience, users aren't clear what to do when presented with a free-form, multi-line text box in which they can enter their address. This results in frequent missing data – users aren't aware they need to include a postal code, or county, or country…
This is probably because users are generally conditioned to expect separate text fields for separate tokens in their addresses.
There's a middle ground though – extract the tokens you need (postal code, country) and allow the user to freeform the rest. And don't even think about trying to validate addresses in any real way – you'll fail!
Of course I'm not saying single multiline textbox for every app out there. That would be crazy. :-)
But I'm always surprised how many sites ask for information that will never be used for anything, and assume things like the lenght or characters valid in zip codes or phone numbers.
I heard these days on "There is no such thing as a fish" podcast there is a country somewhere where the post office locates places by directions given by the sender! And that is officially accepted! Crazy world, try validating that. :-)
Ask a series of questions and adapt the follow up to the questions depending on the answer given.
Starting out with 'select your country' and then expand from there, the more you know the more you can narrow down the remainder of the input.
That would be a nice little widget to be able to throw onto a form 'world accurate address input fields'.
And for some localities it will indeed display a freeform text field, but for localities where there is more structure it could supply that structure and make certain bits mandatory.
Picking an arbitrary starting point like 'country' still has problems. I'm in Jersey Channel Islands, which isn't a country at all (much like Vatican City & other territories).
We are British but not part of the UK nor members of the EU. We're served by the British Royal Mail system and use UK-style postcodes, but sometimes when I input my address I must chose a country, so pick UK, which triggers UK VAT on my order. We're exempt from UK VAT so dealing with this is frustrating. Some retailers do waive it when I raise the point but I have to remember every time.
Total edge case I know, but I hope it demonstrates that even picking a really broad starting point like 'country' can still fail sometimes.
I have an unlimited Gigabit fibre to the home connection here at home, and I pay £60/month for it. My 'high score' so far was 1.4TB transferred one month but typical use is 600GB or so.
We are in a pretty unique situation here with a government minister betting big on switching our entire island's copper telephone lines over to fiber optics, part subsidised with taxpayer funds.
Jersey is a tax haven (40% of the economy is evading tax), lying 20km from France and 160km from Britain, so there's no reason the internet access has to be bad.
Citation needed. You are welcome to pay maximum taxes where you live if you like, but jurisdictions with lower tax rates keep the pressure on governments to deliver a good return on investment to their citizens. When I moved from San Diego to Dallas after the dotcom bust I slashed my taxes which was great (no income tax in TX). This wasn't tax evasion, it was common sense. San Diego was no longer a good investment.
Also, you totally missed the Gigabit fiber-optic broadband plans[0]. I have a 1GB unlimited plan from their competitor for £55/month[1]. Standard fair use clause applies but I've never been throttled and I use massive amounts of data every month.
That place could be Dubai, as that was how addresses were handled as of a few years ago. In such a quickly developing city, the typical street naming process cannot keep up.
I'm in Italy and, years ago, it happened to me to sens postcards with directions from a know landmark to the building (because I didn't have real addresses of some friends with me). Postcard always arrived (ie: third palace going west from FAMOUS_HOSPITAL_MAIN_ENTRANCE, City, Italy).
> In my experience, users aren't clear what to do when presented with a free-form, multi-line text box in which they can enter their address. This results in frequent missing data – users aren't aware they need to include a postal code, or county, or country…
Suggestion: users are deliberately not inserting data which is not relevant to the service they are getting.
I have a one-word name. It's my "legal," wallet-name. The only corporation/government/monoliths that address me with that one name is the department of licensing (my drivers license) and the State Department (my passport), both of which just show my one name.
I have to just make things up for one or the other field (first, last). Usually I go with my initial as my first name, and my name as my last name. Or my one name in both fields. I once signed up to some web service that I didn't care about with a last name of IHaveNoLastNameAndThisFieldIsTotallyMadeUp.
This one's particularly common in Edinburgh. In fact in the example they use - "Regent Road" connects to Princes Street, which becomes Shandwick Place, then Atholl Place. At this point the main fork becomes Dalry Road which then becomes Gorgie Road which becomes Stenhouse Road and then Calder Road - all of which are roughly a straight line:
I've relatives about six miles away from there in a residential close whose name is Ravenshaugh Crescent on the left, and Ravensheugh Crescent on the right.
Local council clearly made an error when putting the two named street signs on each side of the corner, nobody has ever known which one it was supposed to be in the first place, and so... this.
There's also the chance a single segment of the road has several names. For example when crossing a bridge or a round-about. Never mind all local names vs official names, old names and translations.
Correct, and indeed having the same name on different sides of the same road. I just recalled another example from Edinburgh[0] where one side of the street is called "Lochrin Buildings" while the other side of the street is called "Gilmore Place" (then both sides of the street become Gilmore Place, which then invisibly changes its name to Granville Place before changing into Polwarth Gardens.
Seldom are the only restrictions that apply to an address only the ones in a single software system. In fact, your address data could be the least of the problems you have to worry about.
When actually using all the addresses you stored for shipping stuff, it is almost guaranteed that the shipping company will cut off or drop lines from labels, and of course every shipping company is going to have its own quirks. Maybe just because not every address is going to fit onto a fixed label area in a fixed font size.
I have in fact lost several shipments to my address(es) due to every single kind of the above caveats.
Of course, they're falsehoods that everyone believes about addresses (except maybe for postal workers), but programmers are the only ones who have to actually think about them.
It would be nice to come up with some sort of conclusion or recommendation. Should addresses just be used as one big blob of text, and never parsed at all? should there be individual per-country libraries for parsing them? should we just address everything by coordinates (which doesn't solve the houseboat problem)? How about a unique identifier for every person on the planet, plus a gps tracking system that guarantees big brother can deliver to you whenever, wherever?
Now that the average piece of post is a prig package that needs signing for, rather than a small letter that can just go through the letterbox, I quite like the idea of centralised pigeon-hole buildings, that have existed in many towns as the only method of delivery, but are now being born everywhere thanks to amazon, etc. That's quite a different problem, though :-)
I think the consensus is to go with the "big blob of text" approach. You mentioned that postal workers are the only people qualified to parse addresses—let them do it. The validation done on your end shouldn't be much more than asserting that addresses contains non-whitespace characters.
I don't know how well this works in practice, but it's the most "correct" thing to do.
Name of recipient, state/province/etc., postal code and country, maybe. But for the rest, that bit (at least in the UK) that goes in the middle... what policing can you do? That part is just a text blob. You can't do much meaningful with it except for showing it to somebody (e.g., on a label affixed to the package you're sending) and have them figure it out.
Well sure, once you've separated name and country and postal code you've already pulled out most of the data.
The blob has been tamed to 1-2 lines, and you're probably best off giving 'line 1' 'line 2' 'line 3' fields. At this point the chance of confusion is minimal, even if you can't validate very well.
make sure its valid text - make sure you handle all the high asci characters and do something sensible with non Latin text.
I once spent the best part of a day tracking down a problem with a single bad address for a major UK directory some how some one had entered an address in Egypt in Arabic.
As soon as you don't control the whole thing end-to-end, i.e. from the user entering the data to printing it on an envelope: it won't work at all. It would require every program in the whole chain to accept this "big blob", which is simply not going to happen.
I can understand not validating fields, but users are hopeless and like guidance. Giving them some structure in address fields helps the user through the system.
"Wikipedia has a photo of a parcel where a Russian/Cyrillic address was displayed on a computer with the wrong character encoding, and transcribed from that. Reportedly a russian postal worker was able to reverse the mapping and deliver the parcel."
The linked URL doesn't work anymore but thanks to Archive.org and reverse image search in Google I managed to find it:
It's not even true that a single building has only one address, must exist in one town or even in one country. There's a house that has one address in Baarle Hertog (Belgium) and another address in Baarle Nassau (Netherlands), with different house numbers too.
The number of organizations that accept the address, but chop off the apartment number when they send mail because it is too long is ridiculous. Even better, it tends to be government departments. For example, the IRS does this.
Martin Luther King Jr Way has to be one of the most common street names in the US. Many orgs can't even get the simple cases right; I don't hold out much hope for the obscure cases.
See the Google 'real-names' debacle for just how wrong people can get this kind of thing, even when they're being loudly told that they're doing it wrong.
Pro-tip: no one will ever, ever misdeliver a package addressed to {house#} MLK #{Apt#}. I've lived on an MLK. If you live on Lakeshore Drive in Chicago, LSD works just as well. Where MLK is concerned, he has an official holiday, Americans know what those three letters mean.
That article can also be read as a list of things that need to be fixed by the various postal systems.
We can issue addresses to computers, many of which cannot be considered to be in a fixed place, yet somehow we can't issue a permanent, unique address to something that's not likely to move around much.
Reminds me of a quote from a geocoding session at an OpenStreeMap conference. "Addresses are not a theoretically hard problem. The problem is that people don't follow standards, or have the same standard."
Trying to get everyone on the planet to massively change how they view the world is not easy.
Zip+4 was/is an attempt to get close to that, but not right up to it. I have no idea what my +4 is. In the two or three cases over many years that I actually had to fill it in (and cared enough to continue), I had to look on the USPS web site.
One more: in SW Portland there is a section east of the 0 line on the street grid where all buildings have a leading zero. 0634 and 634 are different addresses on the same street.
This article was submitted here nearly two years ago (as you will find out if you click the link to the HN discussion at the bottom of the article). But I thought of one not included in the article.
From Portland's Wikipedia page:
> On the west side, the RiverPlace, John's Landing and South Waterfront Districts lie in a "sixth quadrant" where addresses go higher from west to east toward the river ... East-West addresses in this area are denoted with a leading zero (instead of a minus sign). This means 0246 SW California St. is not the same as 246 SW California St. Many mapping programs are unable to distinguish between the two.
The city deserves a "bug" for that. Computers might not be able to distinguish between those easily, but I'd bet most humans (especially ones not from Portland) would have no idea either. Addresses are not set in stone -- it's easy enough for them to fix and avoid the entire issue.
edit: I've had my zip code changed within the last few years. That causes the same amount of pain as changing any other part of the address and is done without much fanfare.
I'm not sure how many of these falsehoods programmers actually believe. But one falsehood I've actually seen in the wild isn't included:
"(Direction) Street" is necessarily the same (or different) than "Street." Or even if they're different, users will understand the difference, at least on a local basis.
I had a GPS that would always omit any directions that prefixed a street name. I was occasionally thrown for a loop when it told me to turn on Beacon St in Boston, when it really meant North Beacon St, which is a nearby, but unrelated, street.
I live on a road that has two sets of numbers, both identical (but several hundred meters removed from each other) in two different towns but with the same name. Getting mail and packages delivered here is for want of a better word a challenge.
Is it because the post sucks at its job? I would expect this to be handled correctly for the sole reason that you could have two unconnected roads with the same name in two different cities and end up in essentially the same situation, no?
Now I understand why PayPal lets scammers register with an address that reads "asd, asdf,asdff, Turkey" and immediately allow somebody with that address to send me funds. (Ultimately stealing/using the credits they purchased instantly from my site with the fake paypal account setup on a stolen credit card)
Without a current streetmap of the entire world, how would you really know it's a bogus address?
Similar to "Street names don't recur" -
A named road will be continuous: a friend lives in a condominium,
in one of a group of 8 buildings bounded by 3 fairly normal roads.
There are 4 separate driveways between buildings leading to parking.
At some point the condo units were renumbered with street numbers,
and the disjoint driveways were all given the same name: <Condo> Lane.
Regular delivery drivers - USPS, UPS, FedEx, and pizza seem to cope,
but taxi drivers or other irregular visitors who expect numbers
to be continuous along streets are almost always baffled.
Similar to "A road will only have one name",
for emergency services - fire, ambulance, and police - a similar
case arises when a route passes through several small towns, each
with its own set of street numbers, possibly with variations of proper
street name, and perhaps with different direction/cardinality mappings.
For a motorist calling 911, reporting that one is at street number
123 on El Camino Real (on the SF peninsula) will probably map to
several possible locations, depending which of the 12 or so towns
one is in.
When I moved to London and opened up a Lloyds Bank account (then Lloyds TSB), I was confused to find they did not consider my office postcode W1F 7RB valid. I poked at it a bit and found some programmer assumed that the first half of a UK postcode was \w+\d+. The hilarious part was the branch I was opening an account at was in W1S, so their form wouldn't even take their own postcode.
Isn't it a false dichotomy that you can either have a complicated multipart form or a freeform text box for addresses? Why not by default show the multiform box that provides some nice (optional) validation to catch that vast majority of cases and also give the user an option to fill out a freeform text box if the former doesn't work for them?
Maybe a good approach would be to use a format that somehow fits 99% of the addresses and a link on the bottom of the form with the text "Problems fitting your address in that form?". When the user clicks the link, all the fields of the form would be substituted by just a multi-line input field. Then you have a solution for the 1%.
This hit home for me as a couple of months ago we faced the choice between continuing to develop a parser-based approach to location extraction from free text, or moving to an entity extraction and search approach, i.e. geotagging or geocoding. Notably we were just trying to get city and state, and even limiting ourselves to the U.S. the combinations we were seeing made parsing seem like a game of increasing complexity delivering diminishing returns. We ultimately went with a search-base approach and it's been working much better and is more tolerant of format variations.
My mainland European address saves me a fortune in online shopping - payment and delivery both usually fail.
My two neighbours and I share a driveway but we have our own gates and house numbers. The street is unnamed and unnumbered like the other roads in the immediate area.
There are at least two valid postcodes for the property, which is a few minutes walk from a major administrative boundary. Postal deliveries might turn up once in four to six weeks.
When I put this info into card validators, then tell them that my bank is in a different country to the one I'm ordering from, they generally barf.
I've personally seen a building with a fractional street number, in Kingston, Ontario, Canada. I've also had to deal with irregular addresses in Canada. Working on a Canada-only program, I was expecting addresses to have the components:
[unit-number, ]building-number street-name
city/town, province/territory, country
postal-code
I was fortunately already expecting characters from the two official languages of Canada, English and French, so I was prepared to deal with accented characters.
Later, I had the opportunity to work in Iqaluit, Nunavut, Canada, which violated most of my assumptions, both explicit and implicit. First, the territory (not province) of Nunavut is a relatively recent creation, having been created by splitting off a part of the Northwest Territories on April 1st, 1999. Before that, the addresses were all in a different territory.
Second, Iqaluit uses a system where every building in the city has a unique number. Currently (2015) the highest number is rapidly approaching 7000, but at the time it was in the 5000s. In addition to their unique number, some buildings also have a name, which is sometimes written only in the Latin alphabet, sometimes written only in the Inuktitut syllabary, sometimes either, and in at least one case both. When a building has both a name and a number, people may use just one or the other. (I haven't found a building without a number yet, but I'm no longer going to assume there aren't any.)
Street names were not introduced until 2003, and when they were, all street signs were labeled in both the Latin alphabet and the Inuktitut syllabary. Since the system of uniquely numbering every building is continuing, most people ignore the street names unless they're actually talking about streets, not buildings. Nonetheless, some attempts have been made to get everyone to change their mailing addresses to include the street. In every case, everyone has agreed that use of the Inuktitut syllabary should be encouraged.
All these peculiarities are in the territorial capital, where almost all the territorial government and law-enforcement addresses are, so anyone dealing with addresses for the Canadian government should be aware of this (but probably isn't).
On a related topic, the US has long had a system of two-letter abbreviations for its states, commonly used in its addresses. Canada eventually introduced a standard set of two-letter abbreviations for all its provinces and territories, being careful not to duplicate any of the US state abbreviations. However, many people still use the traditional abbreviations, which are of variable length, sometimes have completely different French and English versions, and sometimes include hyphens to prevent confusion with US state abbreviations. (So 'T-N' might appear, meaning 'Terre-Neuve', the French name for Newfoundland, with the hyphen mandatory to prevent it from being mistaken for the US abbreviation for Tennessee. Periods and capital letters with accents also appear, e.g. 'Î.P.É.')
Since its introduction, the "standard" two-letter system has seen at least three name changes. Quebec was PQ before 1991 and is now QC, although sometimes QU or QB show up, Nunavut was added in 1999 (previously part of NT, now NU), and Newfoundland changed its name to Newfoundland and Labrador in 2001, and its abbreviation from NF to NL in 2002. Also, the territory formerly known as "Yukon Territory" officially changed its name to just "Yukon" on April 1st, 2003. (What is it with the Canadian territories and changing important stuff on April 1st?) Their postal abbreviation did not change however. It's still YT, not YK, despite the latter being used fairly often and making more sense now.
This matters because not all two-letter abbreviations appearing in the database (this includes your database) are on the standard list, either because they were entered incorrectly, or because they were correct when they were entered, but have since changed, and the database wasn't updated for fear of breaking working code. As a result, a naive lookup-table to get the full province name from the two-letter abbreviation will fail.
These two are quite annoying whenever I have to write my address somewhere that presumes them. Here an address is city, city area (by name), block inside the area (by number, sometimes a letter). Blocks are numbered in the order they're built, so their numbering doesn't follow any pattern. And while there is a street passing by and it does have a name, the building itself doesn't have an address on the street.
And no, we don't have states. From the description above, you might think that the city:city area might be used like state:city, but no. City is city-sized, city area is neighbourhood-sized.
Additionally, there are addresses on streets, but those are not the same places as area-block numbers. Sites around here that need to get your address either include every possible field and ask you to only fill in the applicable ones, or give you a free-form text area after asking for which city and post code you're at.
I think amazon gets it right - country, administrative area (state in the US, something else in other countries, maybe nothing in some), city, postal code, two freeform lines.
Developers can't know everything there is to know in the world. Developers aren't suppose to know specific stuff like this. Some parts of modern society are easy to digitalize, other (often historical) parts aren't. I think it's up to entrepeneurs to find ways to solve these problems, and create a better world by doing that. Don't blame/shame developers for stuff like this. It's not even remotely fair.
When you're training developers remember that you're not training demigods.
Well, after reading this article and other variations on the theme that's one less set of mistakes to make.
Even if developers are not demigods they shouldn't be above learning.
Entrepreneurs have very little chance of fixing this, it's mostly a local government thing and since it isn't actually broken I highly doubt that something will change. So 'deal with it' is the appropriate response.
Learning is important. Part of being a developer is that you must keep learning new techniques and facts that help you accomplish your goals or your clients goals. So I'm not saying we don't need to deal with internationalization, localization, and globalization. When the time comes, deal with it. I'm just saying there's no shame in being a developer with little or no expertise in those subjects. Many commenters here don't seem to acknowledge the many amazing technical things that developers generally do know about.
80% of the users ended up only filling in their street address, not their postal/zip code, city and state/province, even though they're from countries where (most?) addresses satisfy that format.
We ended up reverting to the previous form where we explicitly asked for postal/zip code etc.