Hacker News new | past | comments | ask | show | jobs | submit login
You Suck (On web apps that mismanage text) (raganwald.com)
102 points by raganwald on March 19, 2010 | hide | past | favorite | 75 comments



I just tried to sign up for Notifo and wasn't allowed to use my real name because it has a - in it. So now I'm waiting for that to be fixed.

And don't get me started on how profanity filters consider my last name profane. Years ago I had a Hotmail account registered to the name Ivana C. Teens-Give-Head because they wouldn't accept John Graham-Cumming.

edit: Notifo telling me this will be fixed shortly.


We should start a club of people whose names vex software.

I could talk your ear off with how many ways I've broken systems in Japan, but one of my Asian American friends takes the cake. Her name is not Kim Kim, but it could be. Apparently some systems check to make sure you don't enter your first name twice...


Can't find the reference, but Caterina Fake has complained about her name being rejected as, well, fake.


Penenberg: Fake is real, right?

Fake: Yes. I can't tell you how many times I've booked an air ticket only to get to the airport and find out they killed my ticket because it goes into the system and the program tosses a ticket that says "fake" on it. Twice I've gone to the counter for a KLM flight through Northwest and have been rejected. They say, "You don't have a ticket." I give them a confirmation and after some investigation I learn my ticket has been cancelled because the system deleted it. For a while I couldn't join Facebook because of my last name. During the registration process I was asked for my real name and when I wrote "Fake" it rejected me. Finally a friend working for Facebook took care of me.

http://www.fastcompany.com/blog/adam-penenberg/penenberg-pos...



I love Hotmail because my surname (Message) is an "illegal word". Apple shouldn't have targetted IBM in the 1984 adverts; it's actually Microsoft that is instituting Newspeak.


No - that's Gilad Bracha. ;)


Just updated Notifo to accept hyphens and apostrophes. That was a horrible oversight on my part. Please try it now.


Why are you filtering any characters out?


cross site scripting and/or injection attacks


Why filter characters if you can just HTML-encode them?


Done and connected it to HN.


I interviewed for a company in Coeur d'Alene once. We went to lunch and everyone complained about how no one accepted apostrophes in the city name.


Obviously you should be yelling at your parents for naming you so disrespectfully. Product of the 70's? :P


No, Graham-Cumming is a family name going back at least to the 17th century.


Thank goodness we don't live in a P.G. Wodehouse novel, we'd be getting cranky emails from Cyril Bassington-Bassington, whose name would probably crash three quarters of the systems for having a hyphen and the other three quarters for being too long.


A good friend of mine was blessed by her parents with three middle names and the last names of both parents. Her parents divorced and her mother remarried, adding another last name. In 2008 she married and added her husband's last name.

Her full legal name (in first-middle-middle-middle-last-last-last-last form) no longer fits into any database known to man, nor even on her driver's license, and is a consistent source of amusement.


Considering that the author of this world in which we are meeting is P.G., are you sure we don't already :-)

And I'd be delighted to live in a Wodehouse novel.


Sorry, it seems people missed the <sarcasm>. I really didn't mean any disrepsect.


For any "human" data: trim and escape, and you're done. If you want to validate it, just ask the party that knows for sure (send an email, run a transaction, visit the URL).

This includes names, addresses, phone numbers, emails, URLs, CC/account numbers, user names, passwords (maybe tell them that caps-lock is on or any other weird keyboard state if you can).


As long as you wait until you know where the data is going, escaping is always a good idea.


Yes! For years I thought I was the crazy one for telling our clients, "we don't really need to validate names. if they give you the wrong name, it's their problem; and it's more work for us; and we'll probably fuck it up and make someone mad". The answer was always "do it anyway, it's what we agreed on." The client feels like we are screwing them if we make the work easier, even though the end result is a higher-quality more usable website. Sigh.

The comment about developers making work for themselves is also spot on. I answer a lot of programming questions, and the questions are always asked because the programmer has reached the end of a twisty maze of his own creation. Turn around, walk, spin around, and try again. You'll find a better solution.

And oh yeah, I do this all the fucking time. Pick any random github project of mine, and you'll see 8 revisions of the API before I finally pick one that's not retarded. Even then. (Side note: I don't change the API after I release.)

Anyway, best rant ever.


> The comment about developers making work for themselves is also spot on. I answer a lot of programming questions, and the questions are always asked because the programmer has reached the end of a twisty maze of his own creation. Turn around, walk, spin around, and try again. You'll find a better solution.

This deserves to be repeated a thousand times.

How many times bad code and bad ideas stick around simply because those that came up with them can't even imagine that they could do without them.

I have run into this many times with people that try Plan 9, 'where is my pet unix "feature"?!?', guess what? It was not a 'feature' and it causes untold pain, and that is why it is not in Plan 9.

Just last week somebody was in the Go mailinglist asking why there is no preprocessor! sigh


// from appendix B of rfc 3986 (http://www.ietf.org/rfc/rfc3986.txt)

'&^(([^:/?#]+):)?(//([^/?#]))?([^?#])(\?([^#]))?(#(.))?&'

The above regular expression is meant to match URI's. Since almost anything can be a URI, the re also matches almost everything.


You know you have done too much work with regular expressions when you think "Hey, wait a second, that can't possibly work" and start trying to debug it in the Ruby console for 10 minutes prior to realizing "Oh, HN is italicizing it because of the asterixes it is silently stripping."


  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?


The article was about e-mail addresses. Here's a link to an RFC-compliant regular expression for matching valid e-mail addresses: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

Despite that evil voice telling you "There is nothing Perl cannot do!" (http://www.bastichlabz.org/bastich/Strips/ba980225.gif), regular expressions alone are not sufficient for real parsing,


I had this booking a flight. The system mangled a hyphenated surname, so the "pay now" page was wrong. With no way to go back and modify it, we had to return to the start and try all over again. On the third failed attempt the clever system had detected a certain interest in our flights, and socked up the price by $200!


On the third failed attempt the clever system had detected a certain interest in our flights, and socked up the price by $200!

Or the inventory you were trying to buy was no longer available. Seats are not all the same price; they are divided into "fare buckets" that are usually lettered. There are very few cheap fares on each flight, more medium fares, and even more full-fares. You should check the code as you are booking; if the fare code changed, they ran out of inventory. If the fare code stayed the same, then they raised the price of that inventory.

Just saying -- it wasn't some conspiracy. Someone just bought them out from under you. Most airlines let you hold a reservation for a few hours, so if this ever hits you again, just hold the reservation, call the web services desk (not the general reservations desk), ask web services to fix your name, and then continue the ticketing process online.

(This is the procedure for AA, anyway. Dunno about other airlines, as I've never used them.)

Personally, I always hold, triple-check my plans, and then buy. So I have never had inventory disappear out from under me, and I have never needed to change a non-changeable fare :)


Dammit, Reg -- I thought you were back in business when I saw this. Call me a greedy bastard if you must, but I've been suffering major withdrawal and the methadone I've found out there just ain't cuttin' it anymore.


He's still publishing, at least as of February 2010: http://github.com/raganwald/homoiconic/tree


Reg's posts were more like black beauties for me.

Even the best coffee is a poor substitute.

C'mon Reg; write a book or something.


I just went through a website sign-up yesterday, and got my password e-mailed to me in cleartext. These anti-patterns will always exist.


Hey, we mailed your password in cleartext. But you can trust us with your privacy and your money!


Hacker News emails passwords in plain text :-)


Not if you use your own OpenID.


Pretty sure that if I use an OpenID, Hacker News still emails passwords in cleartext.


Meta-comment: The comments on the OP and here are far, far better than the post itself.


Very interesting. Normally I don't read blog comments, because usually they're dumb, but these aren't too bad. I think you wrote a rant that everyone can agree with. We have all been burned by validation before, and we have all been forced to write it. It's boring and annoying for everyone. (Watching the clients test the website usually consists of them typing stuff to test the validation rules. They don't check the spelling, they don't check that it works as they specified, but they do check that they can't put 999999 as their zip code. Sigh!)

You also have the right readership -- the people that will disagree with your post don't even know what a "raganwald" is.

A perfect storm, if you will, for constructive blog comments :)


Thanks for the rant, raganwald. Just wanted to say that you're my favorite commenter on Hacker News. Always insightful and level-headed.


Here's another thing that needs to stop: Asking people for their first name + surname.

Not every culture has the concept of a surname. If you need to ask people for their names just do so via one one text field.


waybackwhen, I added additional code to validate s Swiss ZIP code to an application, thinking that the 8000 area must be the highest number area

Of course it is 9000 and ever since the ZIP code field is a non-validated text field in all applications I have done since :-)


Not only that, but they're only called ZIP codes in the US (this is a peeve of mine). In Switzerland they're "post codes" ("code postal" in French, don't know the Swiss German name). They also write addresses in a different order. As an example, here's the address of a Kebap shop I used to frequent:

  Avenue de la Sallaz 29
  1010 Lausanne
  Suisse
1010 is the post code, identifying La Sallaz (or a part of it?), in Lausanne. My point? All this stuff is very local and if you want to do it right you should just ask for addresses free-form and if you need to extract information from it then you should use a geocoding library (e.g. http://geocoder.rubyforge.org/) to normalize it for you.


(the german term is "Postleitzahl", usually abbreviated as PLZ)

Considering this and all the other issues related to addresses, I really wonder why we are still trying to store them in separated fields.

Why can't the address just be one big multiline text field where the user types whatever would be needed to receive a postal letter? If we need the data in structured form, we could always write a locale-aware parser that extracts the needed information.

Splitting the address in multiple fields (sometimes even labelled address_line_1, address_line_2 and so on) is probably a relic from the times where databases had nothing but CHAR (with a maximum length of 100 or something) and where applications were created for the local market only.

I would need to do some a/b testing, but I really doubt that it'll be easier for a user to fill out the traditional

Street 1: _______________ Street 2: _______________ ZIP: _____ City: _____________

form instead of just one big text field labelled "Postal Address" - personally, I'd probably be WAY faster filling out that one.


We really should go one-big-text box. I've tried to do it as often as clients will let me get away with it.

A recent client demanded the address be broken out into fields, but I at least swayed them into accepting a big-text-box for "international" addresses after showing them a few examples.


My pet peeve is the field labelled "State or province" which can't be left empty. This is surprisingly common.

Most countries on Earth are not federations; dozens of nations are small enough that dividing into "provinces" would be meaningless; and even larger countries that do have regions may not include their names in mailing addresses.


Clearly the country is "European Union" and the state is "France".


Americans back in the 50s used to do it almost the same, but flipped: for example, the old way to write the address (w/ zip code) for City Hall in Manhattan NYC

  260 Broadway
  New York 17
  New York
but then the 17, originally a numbering system for big cities, became 100-07, the postcode 10007. (Compare the postcodes around London, EC1/N1/SW1/etc, which were originally just for sorting mail around the central city of England.)


Oh interesting. I had assumed that the transition to zip codes always added digits at the start (like San Francisco, which prepended 941 to the old codes) and didn't know there were places that inserted new digits in the middle instead.


Oof, my bad. It was New York 7, no insertion of digits, same as SF.


If the email address was literally "foo+bar@domain", they may not have gone out of their way to screw you; there are lots of web apps that treat "+" as a special character, so all they had to do was pass it over another HTTP connection.


I use these as labels to auto-file registrations in Gmail, and I've seen a few anti-patterns:

1. Reject at data entry even though they make me type it twice and they're going to do a round-trip click-to-verify. Too many to name fail this way.

2. Accept at data entry, convert it to a space (+ on URL means space). Pray that the login routine accepts an email address with a space (it probably won't). Tirerack fails this way.

3. Accept at data entry, then fail to create the account in other internal systems. Allow login using the +, but once logged in, data from internal systems is unavailable, and the portal errors in unusual ways. VMWare fails this way.

Failing using anti-pattern 1 is preferable to pattern 2 (replace with space then fail login) or 3 (accept and allow login but fail to interoperate with other systems internally).

After 45+ emails and calls about anti-pattern 3 over 18 months, VMWare still hasn't successfully delivered a VMWare license to me. By now they're on version 3 (a free upgrade if v2 was bought recently) but still haven't delivered me version 2 or 3. Next email, perhaps I should send them raganwald's article.

// EDIT: Added Tirerack.


This is pretty close to an argument for writing the extra line of code to reject email addresses with "plus" characters in them: the front-end team might not know how the backend team will screw up.


No, they probably didn't actually get up in the morning and decide to do me in. I suspect that particular problem was either giving it to an overzealous cleaning algorithm or--as you say--passing it over an http connection without escaping it properly. A third possibility is that it was given to a mainframe application written in the 1970s that uses brutal hackery to deal with email addresses. Such kludges are often redolent with broken edge cases.


Hey. Why were you working with a travel agent anyways?


The problem is that the actual spec for validating email addresses is preposterously long and complex, and can't even be implemented as a regexp since it requires nested parsing. So everyone just writes /^\w+@\w+\.[\.\w]+$/ or something lame.


No, the problem is people trying to validate email addresses when they shouldn't. Similar to the credit card name example in TFA, you're trying to save a call to the MTA, when in reality, a user that doesn't want to be contacted will enter foo@foo.com, and you have to send it anyway.


You want to validate email addresses, because a surprising number of users is incapable of typing their email address correctly in one try. Validation saves a lot of rework.


What I do is validate and warn if it looks bad, but not prevent.


The 'spec' for validating an email address is to send mail containing a token to it and require the user to respond with it.

Bang paths are valid but aren't going to be routable these days. Different hosts allow different characters in usernames, and have different meta-replacement rules for stuff like periods and pluses.

You don't validate a domain name with a regex, you use a goddamn DNS resolver. Email addresses are a superset of that! Don't use a fucking regex.


No, the problem is people trying to validate email addresses.


I cannot include my middle name on my twitter profile because they have a limit on how long a name can be.


Ragan is totally right about this being wrong and needing to be fixed.

However, the idea that websites should use the bank's payment gateway for validation is misguided. Your fee's will be increased (or your account will be suspended) on many payment gateways if you do this.


I think it depends on what you're validating. If you are trying to validate a name, you had better get it right! For example, my Visa still says Reginald Braithwaite-Lee, but I might register on your site as Reg Braithwaite. Did I misspell my name or is that what's on my Visa? Is the hyphen a typo?

OTOH, some things seem to be more certain, like rules about check sums. I like the approach suggested by many folks: Use JS to validate on the client, and put up a "Are you really, really sure?" message for things that seem unusual like a single name or a funny character.


I wonder how much you could get out of their support as an apology, ie free flights and so on? I never really try that, but I hear some people extract bonus offers routinely.


REDDIT MODE=ON

Apology? Mwahahaha! What happened was that they sent me an email saying the electronic ticket would follow by email within 24 hours. This was on a Friday for a Monday morning flight at 8:00. When no ticket arrived by Sunday morning, I called them to discover their office was closed. I called again at 5AM on Monday morning and they were still closed, so I waited until their office opened at 8:00.

Wrong move. They charged me for the flight, saying that even though they promised an e-ticket and didn't send it, and even though their offices were closed, I should have shlepped out to the airport where the airline would have resolved my problem. I appealed to Visa but lost.


This reminds me -- I never look for my e-tickets in my email. I just show up at the airport and print it there.

I have never needed email to fly on an airplane.


Isn't the sure fire way of validating an email address to connect to the mail server and see if the address exists?


Many mail daemons will "accept" mail no matter who the recipient is. Internally, unmatched mail may be discarded or forwarded to a "catch-all" account, but all the sender sees is "Recipient ok"

I think this started as a response to SPAM bots that used to "RCPT TO:" random strings and save a list of valid address.


Pretty sure that's a MUST NOT in the RFC


In the era of server-side JavaScript, there's no excuse for not using the exact same validation code everywhere.


...and that's why I have a simple email address: only letters and numbers (plus the @ sign).


tl;dr, couldn't get past the absurdity. Did this post have a point?


This was not one of my best, so I forgive your snarky tl;dr :-)

Here's an even longer but (IMO) much better article about much the same thing: http://weblog.raganwald.com/2007/09/we-have-lost-control-of-...


It is a humor tempered response to an otherwise aggravating situation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: