Hacker News new | past | comments | ask | show | jobs | submit login
An employee, whose last name is Null, kills our employee lookup app (stackoverflow.com)
538 points by willvarfar on April 27, 2012 | hide | past | favorite | 148 comments



This is funny, but it's also a real world example of the kind of encoding nightmare that made SOAP RPC encoding really awkward. Various SOAP toolkits used to serialize a missing value as the empty string, or a literal value like "null" or 0, or all sorts of awfulness. I think the correct thing for the spec is to set xsi:nil="true" as an attribute on the XML tag in question, but IIRC about half the toolkits didn't understand that.

(I speak in the past tense of SOAP because I am an optimist.)


I worked at a company where we were replacing the user-facing component of our giant, ugly PHP storefront with a Rails version; in doing so, our developers implemented a JSON bridge between the two, allowing the frontend and backend to operate separately, using separate databases (and actually, they were in separate data centres).

As we were testing, we found that some products in our database would cause a JSON decoding error on the Rails side. After a few minutes, we realized the problem. We had a string field for something (product IDs, manufacturer SKU, etc). On the PHP side, the JSON encoder was using PHP's is_numeric() for each field to see if the field was a number (to determine how to encode it). Some of the SKUs, however, happened to be composed entirely of digits, and for those, PHP encoded them into the JSON as integer values. This, of course, broke on the Rails end, because Rails was expecting a string value and got an integer value.

In the end, we had to write a surprising amount of code to work around the brain damage involved, since regardless of what we tried to do PHP wanted, by default, to send things as integers whenever possible. I believe the final fix was to actually patch the JSON encoder library and special-case that field.


Heh. I recently had a password reset function break. Problem not reproducible on the test system. Turns out the reset email is produced by a Freemarker template engine, which is "smart" about datatypes: "oh, a number! I need to Format that nicely with commas to separate the thousands!" Too bad that number was the user ID - Not a problem on the test system with its 300 users, but in production...


Your test system has 300 users? Thanks for depressing me...


That's 300 user accounts, created by testers over the course of several releases.


Next time use is_int() instead of is_numeric().

is_int() checks the type of the field, while is_numeric() looks for strings that look like numbers.

You will also need to use settype() when getting your data from the database since integers from the database will pass through as strings (since the database range and PHP range aren't necessarily the same, use a float if you need unsigned ints).

Or just use the built in json_encode().


> use a float if you need unsigned ints

what.


(I assume that "what." is a request for an explanation.)

A float can store an exact integer of up to 53 bits even on a 32 bit machine.

PHP only has signed ints. If you need to store an unsigned int you can either store it internally as signed and only convert it to unsigned with printf() when you output it (and deal with the complexity of comparisons), or use a float and limit yourself to 53 bits.

If you have a 64 bit machine then of course you can easily fit an unsigned 32 bit int in that range. But it's wise not to rely on that at least for another few years.

In short, if you need more than 32 signed bits of range, and you want to make sure your code will run on any machine, then use a float. If you know you only use 64 bit machines then you have more flexibility. (You can use PHP_INT_SIZE and PHP_INT_MAX to check.)

If you need even more range than that then use the built in GMP library.

Also, PHP will automatically convert numbers that are too large from ints to float, so normally you don't see any of this. It's only if you use settype() to force an int that you have to pay attention to this.


Did you homebrew your own JSON encoder in PHP? Sounds that way. The standard encoder respects types.


could you have added some text at the end of the id before sending it and then removed the extra bits from the end on recieving side? something like a parity value.


Or just call String#to_s on the Rails side.


That would probably work, but it means that the problem + workaround is spread out across two systems rather than being contained in just one. Also, working around it on the consumer side means that any new consumers (or any new string fields!) will need to use the workaround too, further spreading out the problem. Better to keep it encapsulated in one system if possible.


If you have a method that breaks when an int is passed in, just seems like good defensive programming to call to_s in Ruby. Any other caller could make the same mistake. But, I also understand/agree with fixing the root issue for the sake of other clients.


This is really a fundamental problem: how do you indicate operation failure? This relies on two things: the range (the valid output values) of the operation itself, and the range of the datatype you're mapping the operation's result to.

If the operation and the datatype's range are not equal, then you can indicate failure inside the return value by applying special meaning to invalid values. But if the operation and the datatype's range are equal, then you need another distinct value to indicate failure. The difficulty is in recognizing which situation you're in, and as you point out, this is one where, effectively, the operation and the datatype have the same range.


Wow. I love that the S stands for Simple. "You keep using that word. I do not think it means what you think it means."



God, I love that rant.

"I trust that the guys who wrote this have been shot." :-)

People who all run the same version of Visual Studio think SOAP is awesome. Get handed somebody else's "whiz-dull" a few times, and see how much fun it is to generate a working client using a different brand/version client stack.


I had the pleasure of writing a client for a SOAP service in a Titanium/JavaScript app not too long ago. If you don't have visual studio generating those proxy classes for you then indeed it's a huge pain.


Well, at least SOAP uses XML which has defined the basic formats. I hate that there are at least three different datetime formats in JSON and they are all used. WTF! SOAP isn't that bad if you stay away from WS- extensions


A lot of protocols and standards containing the word 'simple; aren't. Most of them, in fact! I have a suspicion that this is because these designs start as antitheses to existing complex designs. 'Aha!' say the designers. 'We won't repeat those mistakes!' But because they proceed from the same basic assumptions as the complex designs they try to replace, they always produce something complex in the end, because they never really understood simplicity.


SMTP is pretty simple I would say


Depends how many of the encoding options you want to support. http://fanf.livejournal.com/64533.html


The S in SOAP is for Simple as the L in LDAP is for lightweight


People who mock LDAP for not being lightweight have obviously never dealt with DAP.


The existence of a worse thing does not justify a bad thing.


What makes it so bad? I haven't looked at it in a while, but I don't remember having a beef with the original LDAP:

http://www.ietf.org/rfc/rfc1777.txt


I think while LDAP may qualify as "lightweight" the fact it uses ASN.1 BER does, in my opinion, make it fail the "simple" test.


For what it's worth, the "lightweight" here is not an assertion that it is lightweight in an absolute sense. It's a modifier on OSI's Directory Access Protocol: http://en.wikipedia.org/wiki/Directory_Access_Protocol

It's hard for people to imagine now, but at the time the Internet was just one of many competing network standards. Had this been developed after the rise of the web, I'm sure it would have been a very different protocol.


LDAP was, in fact, developed after the rise of the Web. By Netscape!

(And SMTP, gopher, and finger were developed before the rise of the Web.)


Definitely not true. The first LDAP implementation was published in 1993, and was worked on for a while before that internally at the University of Michigan:

http://en.wikipedia.org/wiki/Tim_Howes

The first real browser, Mosaic, was released the same year.

Eventually Howes went to Netscape, but Netscape didn't exist until 1994, and Howes didn't join them until 1996


It started out that way, but then they made all of CORBA's mistakes.


Yeah -- before the enterprise types got a hold of it, SOAP was actually fairly pleasant to work with. Sigh. Oh well.

You can kind of get a flavor of what pre-enterprise-jackassery SOAP was like to work with by looking at Dave Winer's XML-RPC (spec: http://xmlrpc.scripting.com/spec.html), which was one of the precursors of SOAP.


That's a good point - didn't CORBA also start fairly straightforward (I can't believe I said that) but then grew extra layers of mind numbing complexity for transactions, security etc. - pretty much like the various weird WS-* specifications that most people seem to ignore?


S for simple really comes apart in "SNMP".


Hmph. Having tried to use SNMP on occasion, I always assumed S was for Sinister. Or perhaps Special.


I just figured the 'S' was a shortened form of "WTF?!?"

I'll admit I can't figure how they got to 'S' from there, however.


They must have got the absolutely brilliant idea of conflating null with the empty string from Oracle.


Ran into this with a REST XML API recently where someone was trying to do some reflection-type serialization of XML. The API had longitude and latitude of all train stations, and some genious decided to call the tags 'lat' and 'long'. 'long' conflicted with the datatype Long and it wasn't fun. Version 2 of the API has fixed this issue luckily.


"Lat" and "Long" seem like great tag names for this purpose. It sounds like the problem wasn't this guy, but the "reflection-type serialization of XML".


Sure, it's partly both, but 'lat' and 'lon' are used almost universally across Geo-related APIs. See Google Maps for instance.


I think the absence of the element/attribute is the best way to define null assuming your XSD is set up properly. Many XML marshalling libraries work well with this approach.

(note, I too have long since abandoned SOAP)


I've had the joy of working with a SOAP endpoint that doesn't recognise <element /> syntax, which left me having to create attributes assigned to '' in my Python code, so SUDS would generate the <element></element> syntax for me.

They also massively over-engineered the endpoint, constantly wrapping elements within elements, for no real reason.


That's always annoyed me: sure, XML can be heavyweight but if we're going to use it we should at least get the benefits.

Naturally that line of reasoning didn't get very far with the “maintainers” of an internal purported-SSO system with a SOAP endpoint which crashed on non-ASCII data or SQL special characters in the submitted username / password values.


I have joked that I might change my name to Sample User, develop a piece of land in the country, and name my road Example Avenue, taking address 123. This would make me impervious to datamining, because my results would always be thrown out.

But a last name of 'Null' may be even better. :)


On the contrary, you'd probably receive a lot of "test" mail that leaked through.


I've done data-mining on customers, and truth be told, they'll send that mail without human intervention. You wouldn't be impervious to ye olde mail merge!


Given how I fill out a lot of webforms, I feel sorry for whoever lives at 123 Fake St.


well, Google Maps says this guy is not happy with you http://maps.google.com/maps?q=123+Fake+Street,+Chuxiong,+Yun...


I've always had my mail forwarded to 1 Long St, Testville. I hope that guy has a big mailbox.


Hehe, the person at 123 street ave will be similarly pissed because of me.


I wonder if the guys who own asdf.com ever check the email going through asdf@asdf.com


They do (or at least have in the past). http://www.asdf.com/asdfemail.html. Interestingly, their real e-mail address is jklsemicolon@asdf.com


There is also the guy who owns bar.com which received alot of mail to foo@bar.com, its worth a read if only for this paragraph.

I MX’d the mail over to a friend’s spam-detection system for about 4 hours one time, but the volume crashed his server and he asked for relief.


When AOL first started allowing screen names longer than 8 characters, I knew someone who registered the name "My Documents". That got some ... interesting emails from people trying to save their downloads.


If a patient with the last name of "Mouse" ever checks in to the hospital where I work, I have doubts about whether any of his labs will be performed. Standard practice is when creating a test user in production or placing a test order, name him Anything Mouse and people know to simply delete the request from the system.


If others create dummy content anything like I do, you'd do better to name yourself asdffsadf asfafs.


When a website asks me for my birthday, I usually put 01.01.1970 into it.

Any system administrator looking at that will either be amused or search for the error in his date time parser.


Oh man, reminds of the time I was working on an intranet app for a big furniture company and all test user signups were coming through as 01/01/1970. After several frustrating hours trying to track down the source of the error I had the client enter a new user in front of me to see why it was happening for them & not me. I watched in horror as he set the birth date to January 1st 1970.

He had some limited exposure to development in the past and had got into his head that this date was the Universal Developer Test Date.


Funny, I've taken to using 31-Dec-1969.


> develop a piece of land in the country

Careful about picking a low-populated area like this. I used to live in a town with population of about 2,000 and the post office clerks knew most everyone by name. One time I signed up for a site and just used "123 Blah St." as a placeholder address. Months later, some letter was mailed to that address, but the mail clerk, recognizing my name, just helpfully put it in my PO Box anyway!


I once worked for a medical records software company. We received a bug report that a particular patient's record could not be viewed. Our support engineer remoted into the client's site and asked the secretary for the patient's name. It was Bobby Null. You can imagine what sort of underlying assumption about String serialization led to this issue. [A preemptive aside: We had proper confidentiality agreements in place. No HIPAA rules were violated.]


Doesn't telling us the patient's name violate HIPAA in itself?


Good question. I recall having done a Google search and noting that there were not an insignificant number of people with the last name Null in the US, so I wasn't too concerned about posting this. Probably a HIPAA violation, but not a major one.


Methinks the "Bobby" part is fictional. Probably a reference to Bobby Tables.


I wasn't that clever.


You lose plausible deniability by admitting that.


Clearly the patient was an xkcd fan :-)


In this case, probably yes. Might want to remove the post, it's a fairly major violation.

Often, names alone wouldn't necessarily constitute a violation as names are generally not sufficient to count as personally identifiable information... but a name like 'Bobby Null' is, I think, quite unique.

When I was being trained on HIPAA compliance I was told that sole first names are generally perfectly fine, and sole last names can often be fine but should be avoided for very common names. But I should also say that I am not an expert on HIPAA compliance.


I don't know the ins and outs of HIPAA, largely because I don't have to deal with them at all, but I don't see how this should be a violation. That's not to say that it's not, but rather that it seems like an odd rule.

All the post tells us is that a person named "Bobby Null" exists and has medical records, as do most people. It doesn't say anything about this persons medical issues/history at all.

I could learn more about someone by sitting a touch too close to the reception area at a doctor's office.


Also not an expert, but I agree. The violation is only if there is PHI - personal health information released. Stating that John Doe was present at X Clinic is a problem; stating that he exists is not.


Having a record implies that you were present at X Clinic. If it's a specialist clinic, then confirming the existence of patient record could allow someone to infer the condition or a range of conditions. Most clinics won't confirm or deny that a patient is there (or has records) without a release. In this case, though, we don't know where the record was stored.


Good point. My training said no full names, but that was because we were directly associated with a specific product/analysis, so any full names would associate the patient with a particular health... thing.

A name by itself, you are quite right, is not PHI. Thanks for the reminder!


For very _common_ or very _uncommon_ ?


According to howmanyofme.com there are 9 Bobby Nulls in the U.S.


My favorite along these lines:

http://caterina.net/archive/001011.html

Flickr cofounder Caterina Fake couldn't fly on Northwest Airlines because their system silently deleted her tickets.


Sort of weirdly classical, like Odysseus identifying himself to the Cyclops as "Noman".


Or the schizophrenic in Hitchcocks "Psycho" called Norman.


We should add these to the list!

http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...

* No one has a name that is a reserved system keyword (Null, Nan, Unknown...)


I have received mail from SSA addressed to <myname> Unknown, because of confusion over my name.


My license plate when I live in Texas was "NULL". I never got a ticket when running the toll booth and the camera OCR'd my license plate.




Mr Null, the uncle of famous Bobby Tables.

http://xkcd.com/327/


Have a look at the response headers from any reddit page http://www.reddit.com/ and you'll see:

Server '; DROP TABLE servertypes; --

I thought it was some sort of bug or attack and reported it to them. Here's the response I got:

"It's a nod to http://xkcd.com/327/

Hope you parameterized your queries ;)"

I love when companies do fun things like this.


This is probably a direct consequence of the fact that XML (unlike S-expressions, or JSON) fails to be self-describing. See [PDF]: http://homepages.inf.ed.ac.uk/wadler/papers/xml-essence/xml-...


XML is self-describing, it just so happens that XML's data model is not identical (and actually not even close) to SOAP's data model, or the typical programming language's data model.

XML itself only describes a text encoding, XML infoset describes node labeled trees, possibly graphs through xml:id and idref.

Unlike JSON it doesn't have a concept of null, it only has absence of a node. The authors of SOAP just invented a truly terrible way of mapping XML into a programming language's constructs (which are typically edge labeled trees with typed nodes).

XML is actually a decent data format for markup. Using it for other purposes (RPC format, configuration files, ...) usually doesn't end well.


With xml, I thought <tag/> was null and <tag></tag> was an empty string.


No, those two forms are equivalent: both indicate an empty string.


19k in parking tickets for XXXXXXX license plate - http://blog.al.com/spotnews/2009/10/the_price_of_vanity_plat...

I heard of similar story of a student in Birmingham whose license plate was 'null'.


It's probably a good time to bring up "Falsehoods Programmers Believe About Names": http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...


This fellow Adam https://twitter.com/#!/undefined signed up for twitter as "undefined", scaring up twitter bugs like this one: "While visiting websites like Twitter & ESPN, the webpage will suddenly switch to the twitter page of the username 'undefined', who is not one of my twitter friends or followers"... https://getsatisfaction.com/twitter/topics/when_i_visit_twit....


Ironic considering that the language behind (coldfusion) doesn't even have a concept of null (it just uses empty string).


Well my last name have a ñ . So for example my credit card have a weird character like "&" . Others just change to n. My last name crash a educational site when I registered


Could you not replace it with 'ny' or something similar?


Using a name other than your official legal name is frowned upon in many contexts. In some countries, although not the US, it's actually illegal under many circumstances.


Something that worries me about perl to no end is tests like:

if ($lastname) { ... }

This fails when $lastname="0". But I am constantly seeing perl code that does it.


Actually, what worries you is not Perl per se, but people that write Perl code and don't know what they want to test for. The code shown interrogates $lastname for values that represent truth in Perl, while it ought to be checking for definedness:

    if(defined $lastname) { ... }
The two are totally different cases. I would also argue that the problem lies elsewhere if you have values for a 'lastname' field in your data set that consist of a single letter.


Considering that there a plenty of people with no last name at all, I don't find it at all hard to imagine that there might also be people with a real last name consisting of only one letter.

There is a town famously called simply "Y" in France.


One letter strings evaluate as true. This would only be false for a last name comprised of the single digit '0'


It's not Perl. I recently had a conversation with a friend who didn't like me using "if myvar == 0:" or "if myvar is 0:" in python code rather than "if myvar:". Call me paranoid, but i like to be as explicit as possible in my checks, you never know when magic conversion tricks (which are often platform- or implementation-dependant) will end up biting you in the ass.


Python is a little better than Perl or PHP on this; it won't treat the string "0" as false. It does, however, treat both 0 and None as false, and also 0.0 == 0 == False, which is the same kind of potential bug.

Generally I find that Python's avoidance of implicit string type conversions means that I almost never have this kind of bug in my Python.

On another note, `myvar is 0` is undefined behavior; Python implementations can perfectly legitimately return False for that even if myvar is, in fact, the integer 0. Try this, in Python 2.7.3:

  >>> x = 257
  >>> x is 257
  False
  >>> x = 257; x is 257
  True
  >>> 257 is 2**8 + 1
  False
  >>> 256 is 2**8
  True
  >>> x = 256
  >>> x is 256
  True
That's because `is` denotes object identity, not value equality, and for immutable objects like integers, strings, and tuples of immutable objects, object identity is fair game for optimization. In the above, "is" gives us a fascinating window into the particular optimization decisions taken by the CPython 2.7.3 interpreter. But, child, if you want your code's behavior to depend on some problem domain instead of interpreter optimizations, don't use "is" to compare integers!


Yes child, check this out: http://lateral.netmanagers.com.ar/weblog/posts/BB979.html , in particular my comment from two months ago. http://lateral.netmanagers.com.ar/weblog/posts/BB979.html#co... and see where I'm coming from.


Using "if myvar is 0" exposes you to more implementation dependent behavior than "if myvar".


I agree, but it's still more explicit in terms of what process is used to evaluate the content of myvar and what it should match, especially in a context where you're expecting a numeric value rather than a boolean.


The same problem exists in PHP, where "", "0" and 0 are all treated as false if you don't check the type. Welcome to the trap of loose typing.



17 years ago, when I got my second Internet account with my ISP, I filled in these 3 names for my choice of email address on their paper signup form.

root@ , nobody@ and daemon@

They gave me "daemon". I've terminated that account long ago, but last I checked (6 years ago?), I could still retrieve emails and dial in using a modem using that account.


This is hilarious. Seriously, the question votes were being incremented live. :)


This is exactly why you should never mix data and code/markup. When the semantic barrier is broken, all shit breaks loose.

I've always wondered if SICP style scheme would cause these sort of problems.


It's more an issue of in-band or out-of-band signaling.

It's hard to do in-band signaling properly, but often time you only have a single data channel and then you have no choice.


I believe that these errors are so common they represent a Cognitive bias on the part of programmers. At some point every developer wants to execute a one line command and have the system "do something". If they cannot get that one line, then they have two options. - Wrap up more abstraction code, until one line executes (the SOAP solution), or think deeply about what you are trying to do and take things away until one line is clear and obvious (The REST solution)


Dear god, so many deleted answers from people trying to be funny instead of informative! (I am counting 4 from the last hour and 3 more from the previous years)


I think this is a joke, possibly inspired by the XKCD comic (link posted in the stackoverflow comments). The string "Null" would not cause this behavior.


I have personally encountered "Null" as a surname in system used by job applicants world-wide. The system's session layer encodes absent values as the string "null" at some point. The Null clan is the only problematic case, and they are numerous enough that the maintainers are aware of the problem but not so numerous to fix the session layer.

I wish I could listen in on a dinner conversation at Null house. They must have an interesting perspective about computers.


'The string "Null" should not cause this behavior.'

FTFY. It's certainly possible there's a bug in the library.


The OP claims it's real. After dealing with some SOAP systems I would entirely believe this is possible.


I know someone named Null who's had similar problems as a user of web apps. In fact I mentioned it here a while back: http://news.ycombinator.com/item?id=1440890


It could very easily be real. I was in a Fantasy Football league on Yahoo a few years ago, and there was a player named Keith Null who played briefly after another player was injured. His name just showed up as Keith.


Is this real? I don't really understand what language they're discussing here, but wouldn't the string "Null" be distinct from the protected Null?


Not when naive serialization turns both of those into the same string "Null", which the other end deserializes into the keyword.


At least your employee has a last name. I had an Indonesian hacker in my team, he had no last name...

It is all about assumptions. OP assumed nobody would be called Null. MusicBrainz index assumes no band chose to name themselves "Various Artists" or "[unknown]". These are advisable but how to not assume that people have last names?


It once took me more time than I would like to admit to realize that the string "false" is still true.


I wonder if the "employee" is a doctor. Something similar showed up on TheDailyWTF back in 2007:

http://thedailywtf.com/Articles/Paging_Dr_0x2e__Null.aspx


Naturally the very first thing I did when opening this discussion thread was search the page for “bobby tables” and “xkcd”. Of course, there were already three separate mentions of that comic in the thread.


Me too. Bobby tables is classic.

http://xkcd.com/327/


And this is why it is a bad idea to look for null or nil as a value representation in place of text or number. Instead, use a different representation, like an empty or non-existent element/attribute, etc.


Funny, just watched monsters inc last night and saw a guy named Jonathan Null in the credits. Also, Tom Duff was there too.


I have problems using a data analysis software where the stock ticker data from a company is "NAN"


It's actually one of my friends database challenge. Indeed it is a funny fact


I wish I could upvote this twice.


type or untype?


In other news, employee with last name NaN gets a huge paycheck due to software glitch...


Nan is actually a legitimate name (http://en.wikipedia.org/wiki/Nan) so if you're putting everything into upper case and not sanitizing, you're gonna have a bad time.


And dev is a common first name in India. Its a valid surname too, but that spoils the pun.


Somewhere in the world exists a poor soul named Dev Null.


Null is also a legitimate name. When names are Anglicized from foreign languages there are often dozens of alternate spellings.


Patio11's article "Falsehoods Programmers Believe About Names"

http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...


This post reminded me of that also. I find myself sending someone that post every quarter or so :)


That's a good list. I felt a bit less smug when I got to number 40.


I actually got a credit card receipt for $NaN once from a Pitney Bowes postage kiosk. I wondered if I'd need to send my bank a check for $0/0.

Picture at http://www.arcfn.com/2008/05/importance-of-software-testing....


This is what bounties are for.


The post is from December 2010 with no follow-up. It was likely a joke. HN got trolled.


There is a followup explaining how he solved the problem. Right there on the page.


There is a follow-up...


But is his first name Bobby?


Little Bobby'); DROP TABLE Students;-- Null, always getting into trouble!


little bobby tables we call him ....


Ahaha that's amazing!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: