Hacker News new | past | comments | ask | show | jobs | submit login
Nulls Nullified (2005) (dbazine.com)
28 points by declanhaigh on July 24, 2023 | hide | past | favorite | 44 comments



The author talks a lot about how NULL in SQL is supposedly unnecessary, and claims that it is totally possible to handle unknown SQL values without NULL. Yet he successfully avoids explaining how his solution is supposed to work. At the end he links to a (paid?) paper where his solution is apparently explained, though the link has since ceased to work. Great.


There are many papers describing how NULL as an explicit “value” (you still at least need something to present empty results from left or right joins) can be made unnecessary – just look back to the development of relational databases and the theory that are based upon. No need to pay for this author's paper when it probably just restates things that are found in many open access papers/books/other.

Representing unknown/non-existent without allowing an explicit NULL value is can be modelled by the property being another entity, you have a child table that lists the values of the property so where you would store a NULL otherwise there simply exists no row in that table. In SQL this means extra joins and in most (all?) SQL implementations this results in (sometimes significantly) lower performance. The big argument there is whether this points to a problem in the theory (if arguing that NULL should not be “stored”), in SQL as an implementation of relational database theory, or in the implementations of SQL…

[I'm aware my terminology is all over the place, as most people's is: when talking about relational theory rather than SQL as a speicific implementation of it I should use tuple not row, relation not table, etc., but getting that right only serves to confuse people (a great many of those with a more self-taught and/or field trained programming background background than one that involves any computer science study) who know only SQL and have thus far not needed to be aware of the theory or history].


It's not a problem with the theory, because the theory was not created to be a high-performance application.

It's also clearly not a problem with SQL since people very loudly and clearly prefer the version with nulls and operations that create nulls.

It is a problem with the idea that people should program in relational algebra. They shouldn't. It's the same kind of issue as functional languages that extend lambda calculus; logical languages with explicit evaluation strategies; type systems with undecidable situations; module systems that allows access to private features... No actual usable system is a perfect representation of the math behind it.


But how to represent missing values in left/right/outer joins?


I think this one of the many things that the author wanted to point out, outer joins is not a valid relational model operator, and thus, leads to generating nulls. Outer joins would have to be some kind of subqueries in relational model, which would likely cause either way too subtle querying semantics or, as little as the would, still always unacceptable performance cost, given how databases are benchmarked.


NULL makes little sense from first principles.

Either get rid of it, or allow nulls (and logically any other union) by supporting union types.

It is quite arbitrary to allow a type that makes sense (string, int) etc. and then also allow nulls so you allow INT | NULL union but no other unions. So you have this multi-purpose "other value" whose meaning is inferred by the application.

I guess they were added as a pragmatic "I dunno" field for CRUD systems without needing to go to all the effort to support unions. For example not everyone has a middle name, but you don't want to go all 5th Normal Form on it.


I don't know about not making sense from first principles, but if a principle concern is bug free code then NULL is a disaster. I've seen so bugs caused by people forgetting a column could be NULL and using '<', '<=' I avoid them like the plague now. It's not just the unfamiliar behaviour, it's also that everyone forgets to write tests for it.

A simple fix would be to add a dialect option that makes any operation on a potentially null value a syntax error. This is possible because in SQL it's almost always possible to determine if a value X could potentially be NULL or not. If it could be NULL, something like (X == y) or (X + y) should generate a syntax error. Instead you should have to write (X is not null and X == y) or ISNULL(X + y, 0).


NULL/missing is so common that it makes sense to provide special language support, even if you don't have full unions. See e.g. Kotlin and Typescript.

What doesn't make sense is to treat a nullable type the same as non-nullable type causing errors everywhere (Java), or make it so special that it basically has completely separate logic and operators attached (SQL)


A lot of confusion comes from SQL nulls having the same name as nulls in programming languages. They are conceptually completely different things.

Nulls in programming languages are a special “sentinel” value indicating the lack of an object. Nulls in sql means “unknown” or “I am not able to answer that question”.

This is why in most programming languages two nulls are equal, since they are the same sentinel value. Not so in SQL, since one unknown value is not necessaily the same as some other unknown value.


Worse: with JSON NULL is considered a type not a value of other types, which can make validating nullable values with JSON-Schema a royal pain.


JSON does not define a type system, but javascript defines null as the only value of the null type. This works fine in eg typescript since you can use a type union of null and any other type, if you want to indicate a nullable value.


Don't forget in addition to null, javascript has undefined and undefined.

(the value undefined and the property-does-not-exist undefined)


but don't forget that `typeof null === 'object'` :)


They're null pointer objects /s


Typescript does have full unions, although only partially discriminated ones (in the sense that `number | string` is discriminated but `number | number` is not - custom discriminators can be written fairly easily though). This resolves the problem that the previous poster had - nullability is not some magic trait of certain types, but rather a function of the general case of union types. In that case, `string | null` is no different from `string | number`, and null is just one possible type among many.


What does being "discriminated" mean here?



> NULL/missing is so common that it makes sense to provide special language support, even if you don't have full unions. See e.g. Kotlin and Typescript.

No it doesn't. It's a bad idea in programming languages just as it's a bad idea in SQL.


Given that a lot of non-niche modern languages disagree with you and you provide no rationale, I remain unconvinced.

Swift, Kotlin, Javascript/Typescript along with some older languages (PHP, C#, probably more) have special null-handling in the language. Go and Rust do it differently, but do have an extremely strong opinion on how missing values should be handled.


There are tradeoffs - if you go against the zeitgeist and ban nulls then you risk people you need to please for success sticking their nose up at you. It is a bit like car safety features - if it is such a good idea why wasn't it taken seriously until say the 90s.


In theory NULL (or any such sentinel value, such as the infamous NaN is floatin-point arithmetic) makes sense because there are partially defined functions. Division being the most famous example, but most basic operations on lists share that property too.

The problem is the way it's handled in many programming languages, where it can blend so easily with legitimate value and blow the whole program off at execution time.

It's a practical problem, not a theoretical one.

And it's not the same as union types. The division of two numbers always returns a number, and nothing else, it's just that, for some values, it's not defined.


I could argue that { number that is defined | number that is not defined } is in fact a union type (or, perhaps more correctly, a sum type).


Being pedantic here but "sentinel" means "signalling the end of a sequence" (such as a 0 byte for C strings, or visually when you close an array or a block of code with `}`)


As far as I understand NULLs were indeed a pragmatic choice, but the chief reason was the need to compute derived table values. E.g. in a join it is normal to get unknown values for a cell. This has to be expressed somehow. How? NULL seems to be a reasonably good generic solution to this. It may be possible to come up with a different generic solution, but such a solution would probably require something like conditional fields in a table and this is a whole new level of relational logic.


I wonder how Codd's original relational algebra handled this case, as he didn't have NULL cells.


This part I do not know, but NULLs are either Codd’s idea or the one he sided with.

Another example when an invention goes against the established logic would be quaternions.


>Another example when an invention goes against the established logic would be quaternions.

How so?


I don’t think it had outer joins.


I'll have to disagree.

We can all agree that the "silently accept null" approach is, by now, a quadrillion dollar mistake.

But having a safe "or null" union type as the _only_ union type in your language, isn't as far fetched as you make it sound.

It's the good old "allow zero, one or an infinite amount of any feature" rule. Sometimes "one" is the right choice.

And the same principle applies to values. A plain value is one value. A list/collection is an arbitrary number of values. And `null` is no value. Some would, rightly, say that `void` is no value, so `null` it's really a value representing no value. And then you will also want to represent zero-or-one value, which is where the -or-null type comes in.

This is a more fundamental union than just "a Foo or a Bar" which is always a value, then we're just arguing over type.

You can always use an `Option` class instead. If you have classes. But if you're going to use `Option.none` to represent the value of an uninitialized variable, you're going to build it into the language anyway, and then you might as well admit to that one union type.


While I'm a big proponent of union types in programming languages (against Haskell-like Option wrapping and unwrapping), it is not clear to me that they would work for something like SQL. Would it mean allowing columns with union types?


As soon as you have composite types, you will want to filter and join on the individual components, which would require extending the query language to allow drilling down into nested composite types. Basically you are back to hieracical databases. This is why Codd recommends expressing composite types as relations rather than complex values. You can express the same information, but can use relational algebra rather then special-case operators.

Nulls are a special case though - even if null was not allowed in base tables, you would still need them in outer joins.


> For example not everyone has a middle name, but you don't want to go all 5th Normal Form on it.

(actual question) how does 5NF apply here?

I thought that this case could be handled null-free by introducing a separate table

people_middlenames (human_id PK, middlename NOT NULL);

Is it already 5NF? I thought 5NF requires more complex structure of data, like explained in Wikipedia, for example.

Thanks,


Looks 5NF enough to me.

That being said, names is an awful example; if you have to assume a specific structure from someone's name you kinda already lost [1][2].

Better to just assume it's a opaque mutable unicode string and hope for the best.

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

[2] https://shinesolutions.com/2018/01/08/falsehoods-programmers...


> NULL makes little sense from first principles.

You need something to represent lack of a value on LEFT JOINs. The author acknowledges as much, the two things he seems to be complaining about are:

1. It's impossible to distinguish between true lack of value from LEFT/RIGHT JOIN and a value that was directly set as NULL.

- Interestingly, this is exactly why JavaScript has both null and undefined, a design decision that many people also complain about and grumble that it should have been unified into one.

2. There's no reason for NULL to not equal NULL, which makes NULL a headache to handle and requires special IS NULL/IS NOT NULL operators to test for NULL-ness.

- Postgres 15 now addresses this with the DDL option UNIQUE NULLS NOT DISTINCT.

Finally, the author claims to have a solution that solves all these problems but sadly locked it behind a paywall on a website which is now defunct.

> Until recently, there was no logically correct, relational solution to missing data. We offer an outline of a possible such a solution in Practical Database Foundations paper #8, “The Final NULL in the Coffin,” which also summarizes the problems with NULLs.

http://www.dbdebunk.citymax.com/page/page/1396241.htm

> This website has been cancelled.

> Click here to go to the 5-minute website builder.

https://web.archive.org/web/20041209115415/http://www.dbdebu...

> ORDERING AND PRICING

> Delivery will be in PDF format. We strongly recommend to upgrade to the latest version of Acrobat reader and to have the following fonts installed: Verdana, Courier, Arial Narrow, Arial Black and Wingdings, so we don't have to embed these fonts and enlarge the files.


> There's no reason for NULL to not equal NULL

Only if you consider NULL to be an exact value, to have one meaning i.e. does not exist. In the mathematical set theory upon which relational database theory was built this is not the case.

NULL really represents “unknown” not just the subset that is “unknowable”/“does not exist” as which point not all NULLs are equal as they may represent entirely different things that you currently don't know - this is why NULL=<anything> == NULL (including NULL=NULL == NULL) and NULL≠<anything> == NULL.


It seems to be available for $14.99 on Amazon.


It's not a union type, but an option. INT | NULL --> Option<INT>.


Option is an union type


It's a sum type, which is not quite the same thing.


Why do pointed sets not make sense from first principles?


I assume the author's alternative to null is to punt nullable values into their own table, where the presence of the row tells you that the value is in fact there. Thing is, if your table has three such columns then you're going to need three extra tables. The whole argument is reminiscent of the polemic against surrogate keys, viz that they are a violation of Codd. No-one is arguing that nulls in SQL aren't a mess, but an RDBMS is a tool, not a holy shrine, and most of us prefer to just use bloody things to get work done and ignore the purist bikeshedding.


The author is right about one thing. If NULL means "unknown", then the result of the comparison A > B where either or both values is NULL should itself be NULL, not FALSE.


Null means unknown. I think the main issue is that people use it to mean “not applicable to this row”, which is different than null’s initial purpose. I think the problems with null could be solved by adding in another None value to distinguish between these two situations.


LOL @ "I'm smarter than the authors of the SQL standard, buy my paper to find out why!"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: