Hacker News new | past | comments | ask | show | jobs | submit login
NULL: The worst mistake of computer science? (2015) (lucidchart.com)
209 points by BerislavLopac on Dec 4, 2018 | hide | past | favorite | 368 comments



NULL can mean and be different things in different domains of computer science. NULL in the database world isn't the same thing in the programming world. In the programming world, null is a result of the system architecture, systems programming, etc. In SQL, NULL is a result "lack of data". There have been debates on whether there should be different types of NULL. A NULL type for "data that is available but we don't have it yet" - like car make and model for a car owner. A NULL type for "data that does not apply" - like car make and model for an adult who doesn't own a car. A NULL type for "data that never applies" - like car make and model for a child. Then you get into the philosophical debate on whether a NULL can ever equal a NULL. Does it even make sense to even think of NULL in terms of equality. How can an unknown entity ever equal another equality? But what if you are just asking "are they both unknown"? Then you probably can think of two NULLs as "equal".

In higher level programming, the consensus seems to be the less nulls the better. Which is why languages like C++, C#, etc are introducing Option-like syntax ( mostly to accommodate the database world and their NULLs ).

NULL exists to solve particular problems in computer science. It can also cause a lot of problems. You can argue it's the best solution and worst mistake depending on the situation.


All of those different nulls can be solved by not having null as a special case of your database specification, but as a first class type construct.

    data MightBeData a = Yes a | Unavailable | NotApplicable | NeverApplicable


NULL in databases have many properties that save a shitload of coding time and help write more secure code. To cite only one of theses useful properties NULL automatically propagate through all operations and aggregations.


Is that the behavior you actually want, though? In many cases "this value is explicitly unknown" has dramatically different semantics from "the computation that produced this value had an unexpected NULL input", and if you interpret the latter as the former, you've likely just corrupted your data.

Monadic Maybe (in higher-level languages like Haskell or Rust) has the semantics you describe, but the advantage that you only get it when you explicitly ask for it. If you care about data integrity you usually want to be particularly precise about the results of your computation; it's helpful if your type system can sanity check them as they propagate through every operation.


Yes of course when I deal with data if I compute an average of over 20 record and one have a NULL value I want to know that something is wrong with that specific operation. Silently returning something wrong would be really bad. Crashing whole query would be anoying as well because if you compute 3Billion aggregates at the same time and only 0.1% return NULL you might want to filter them out instead of doing nothing at all or correcting input.

Of course all of this would be feasible with an arbitrary no data value, but you’d have to basically rewrite all propagation functions that are built in with NULL. As a DB admin I would consider a database without NULL handling as utterly flawed.

In fact with modern SQL database you often get twice the fun because there is a second propagating special value for numeric type called NaN (not a number) to further distinguish lake of data from invalid data if need be!


I think you can say something similar about programming languages that don't have automatic memory assignment.

It can be perfectly valid to have a pointer reserved in memory, without having given it a value to point to yet. I see why this isn't much of an issue in 2015, but it was.

I don't think NULL is a problem at all really, but it sure isn't fun in languages like C# where you can have nullable types, but I think that's a problem with C# and not the concept of NULL.


This is hard to reconcile with type theory for me.

NULL, to me, implies and uninhabited type, i.e. there can never be a value with a NULL type. Using null for a "data isn't there, apply, available, etc" seems like an abuse of the type system. I see no reason that the former needs to be supported at the type level. These properties are just responses to queries, not some mystical, uninhabitable oblivion. Unnecessary type features just make verification and learning a language much more difficult.


NULL isn't the uninhabited type, that's the bottom type. NULL is a value that inhabits every type.


Not in all languages. In Lisp dialects related to classical Lisp, like ANSI Lisp, there is a unique nil object which is a kind of symbol. It is self-evaluating and serves as (the one and only) Boolean false value, and also as a representation for the empty list, which terminates all non-circular lists.

There is no nil-like value in the domain of any other type. If you hold a value which satisfies stringp then you know you have a string of zero or more characters and not some null reference.

Because the typing is dynamic, then if you have a situation in which you would like to either have a string, or a value indicating "there is no string here now", you can use nil for that. But replacing a string variable with nil changes the type; it is not a string reference that has been nulled out, but a string object previously stored in the variable has been replaced with the nil object.


I think he's speaking specifically of the Algol-derived languages that the article is talking about, i.e. C, C++, Java, etc. Other languages (eg. SML, Ocaml, Haskell, and Rust) force you to make None an explicit value in an algebraic type (eg. Maybe/Optional), and that's what the article is arguing for. In dynamically-typed languages (Python, Javascript, Ruby) the question is irrelevant because there's no static type checking anyways. Static type systems bolted on top of dynamic languages (eg. CMUCL, Closure Compiler, TypeScript, Python typing) often get this right - they treat a nullable type as distinct from a non-nullable one and perform checking upon entry. There're also some languages (Kotlin, Java8 with @NonNull) that are fundamentally saddled with null because it's part of the platform APIs, but have built layers on top of it similar to these to perform nullability checks.


> NULL is a value that inhabits every type.

Not in all type systems. Particularly, type systems which has union types may chose to simply create a separate type for NULL. There are also variants that have separate NULL types for different base types, which also invalidate your claim.


> Not in all type systems

I think this is inaccurate. We are talking about computer science, which is important and a constraint around the general type theory. A type system is different than how you interact with it, so dispensing with language-specific symbolic representation further normalizes the discussion.

Fundamentally, a (computer science) type is a representation of binary, for the most part, data. That representation has to inhabit some part of bounded memory. When that memory is initialized (empty), it's some form of NULL, for a lack of another term. It exists for every type system in computer science.

> There are also variants that have separate NULL types for different base types, which also invalidate your claim

That's not the same thing. Different NULL types make sense for different sized discrete (fixed bounds) memory allocation. A unicode character has a fixed size allocation, while a string might be unbounded allocation (it grows in some fashion, as needed).

Edit: Kneejerk downvoting, classy.


This does not make sense. Null is not the same as zero and neither is the same as an all-zeros bit pattern. A memory word interpreted as an integer has no meaningful null value. A memory word interpreted as a pointer may or may not have a special bit pattern (which may or may not be the same as integer zero) that represents a pointer to nowhere; it all depends on language semantics. The address 0x0 can be perfectly valid on some architectures. Even though in C the literal 0 denotes a null pointer constant, it does not mean that the value of a null pointer is literally zero.


> A memory word interpreted as an integer has no meaningful null value.

What a type is, depends on what the runtime operates on. You can make a runtime that just grabs random bits of data as a type and say "that's an integer" but it's not a useful construct/example. A runtime keeps track of types in some way external to the data itself. So I'll disagree that an all-zeros is not the same as a null, because it's a common way to initialize the data that is identified with a type (like in a pointer table). It's not a 1:1 but it's common. There's not always a formalized name, but it (an uninitialized state) always exists as part of the type system (when not reusing existing memory allocation, which is an initialized state). Always.


It was not I that downvoted you, but I agree with the down-voter in that you are not making good arguments. There is basically nothing in common with NULL as used by programming languages in general compared to how you want to define it in your comment. How NULL is represented internally, is totally language dependent. C and its ilk has defined it one way (that is reasonably close to your description ON SOME ARCHITECTURES - not all), but how other languages define it differ wildly.

And strings may only grow if strings are mutable, which AGAIN is extremely language dependent.


I’m not sure I buy this definition of type even in theory.

From a category theoretic point of view a type would be nothing more then the constraints on how terms may be composed.

Either you simply view types as as objects in some category or, perhaps a bit more interesting, as a functor. See f.ex http://noamz.org/papers/funts.pdf

Or rather it seems to be a common theme in language design to conflate these two notions of types, and we should probably stop doing that.


Category theory doesn't require NULL but physical state does. Given the state of the machine as a constraint, there must be an uninitialized state of unknown or undefined (but allocated) for each type to ensure types are run as functors. I don't see why talking about the theory is useful, given the practical constraint will always provide an asterisk of *given you're working on physical memory

All types are functors in practice...which always has an element of initialization to ensure the type is defined in the memory.


But that is a big asterisk, which was my point. Only the compiled and running program is actually forced to working with physical memory. All stages before that is just modeling.

My belief is that we should stop conflating data types (input and state modeling) and program types (domain modeling) so we can advance to more productive workflows. F.ex runtime reflection should never have been a thing, instead the focus should have been on macros and staged compilation, most meta programming can probably be better evaluated design time rather than run time.


Well said.


Sorry, I was talking specifically about C++ and friends, which is the set of languages for which NULL is considered problematic. For the languages you're talking about, NULL is much more well-behaved, so there's less reason to complain.


Thank you. I got confused on nomenclature because of a null pointer's similarity to the bottom type.

I was trying to say that having a NULL value that inhabits every type seems silly as not every set of values has the NULL value. Considering that NULL can be represented as a special case of sum types, it seems even sillier to mandate such a value on all types.

Using both NULL and nullable seems very confusing as well.


Consider Javascript, where undefined and null are both unique primitive values.


Is anyone aware of a situation that code would fault if we replaced all undefineds with nulls in JS?

General code, not things specifically testing differentiation between undefined and null.


This is clear in (Common) Lisp.

There is a null class/type whose one and only instance is the special object nil.

This nil also itself names a type which contains no instance: its domain of values is the empty set. This is the type at the bottom of the type spindle, corresponding to the NULL that you're talking about.

nil is a special self-evaluating symbol representing Boolean false, the empty list, and this bottom type.

So you see, if we just don't conflate the null object with the null set type, everything works out.


It needs to be supported at the type level, whether by null or by options, simply because “data not available” is a common value people need to use. When there’s no good way to express it, they’re forced to invent special sentinel values, and you end up in the situation where array index -1 means “value not found in the array”.


I have worked with a MySQL database where the designer(s) decided that, on many of the tables, -1 should represent no value instead of null. As you can imagine, this has caused some problems when they've done this with columns representing dollar amounts.

This was done because of their belief that having any nulls in the table is the kiss of death for performance; they've used the phrase "tablescan" a lot. I have not been able to find a basis for these claims.


Databases, and MySQL in particular because of how many of it's defaults are pants-on-head nonsensical, are a haven of cargo cult performance rituals. I have a suspicion this is because, rather than analysing their own N! loops/queries, it's easier for mediocre programmers just to blame the database.


Yes, option types are awesome. No, they are not nulls.

Algebraic data types are not direct support "no data found at the type level". Algebraic types are really just a fancier enum/union type. It just so happens that inventing special sentinel values is awesome when you have an algebraic type system to check your work.


Take a look at Kotlin or Typescript†. Basically, they decided to fully design the language with support for null-as-option.

That means several things:

  * T (non-nullable) and T? (nullable) are different types. T? = T | null
  * Where T is expected T? is not accepted, but where T? is specified T is also accepted
  * T? is automatically cast to T in the places where it's asserted to be not null, e.g. within an if(x != null) branch
  * Method calls are not allowed on T? (unless the function specifies it can handle null)
  * There's syntax for providing a value in case of null. (x ?: fallback) in Kotlin, (x || fallback) in Typescript. 
It's a much more pleasant developer experience than the Option ADTs/Enum types from Scala or Haskell, which the same amount of safety.

Typescript goes even further and has the best enumeration support I've seen any language have. T | U is a fully valid type, and if T | U is asserted to be one of them it is automatically cast to T/U. It is a very natural and efficient way of building ADTs [1]

† Typescript 2.0+ with --strictNullCheck on

[1] https://www.typescriptlang.org/docs/handbook/advanced-types.... , Discriminated Unions header


These are great features, but they are really just syntactic sugar over algebraic types. It's not a more pleasant developer experience than sum (Option) types, it's a more pleasant experience with sum types. Ex:

    macro! unwrap(x, fallback) {
      match x {
        Some(n) => n,
        None => fallback
      };
    }
> Typescript goes even further and has the best enumeration support I've seen any language have. T | U is a fully valid type, and if T | U is asserted to be one of them it is automatically cast to T/U. It is a very natural and efficient way of building ADTs

That's pretty cool. It seems like refinement types. [1]

[1]: https://en.wikipedia.org/wiki/Refinement_(computing)#Refinem...


It's subtly different because of the implicit coercions and smart casts. You can always pass T to a function that takes T?, while in Haskell/Rust you would need to pass Some(t). Similarly, Kotlin does control-flow analysis and converts all T?s inside an "if (t != null) ..." block or after an "if (t == null) return" statement into a T, which dramatically reduces the amount of try!/unwrap() calls that I used to see littering early Rust code. There's better syntactic sugar for it in Rust now, but the point is that Kotlin doesn't need nearly as much syntactic sugar because nullable is integrated with the typechecker and flow analysis.


> It's not a more pleasant developer experience than sum (Option) types, it's a more pleasant experience with sum types.

Having worked with both Scala, Haskel, Kotlin and Typescript I disagree, the dev experience for Kotlin/Typescript is miles ahead for optionality. It seems like a small difference vs something like do notation or for comphrehension but it really adds up.

It hard to explain until you've tried it. The most succinct way would be to say that optional code looks and feels nearly identical to non-optional code, instead of unwrap/do-notation which makes a very large syntactical difference.

Take a look at a piece of async Kotlin co-routine code v.s. Java/Scala (Completable)Future, same effect.


If you like those features, I think you would like Crystal, which has much of the same. Full union types, flow typing and type inference for basically everything on the stack makes for a very fluid and scripting language like experience while keeping type safety.


> * There's syntax for providing a value in case of null. (x ?: fallback) in Kotlin, (x || fallback) in Typescript.

Well, uh, unless T can be falsy. 0 || fallback === fallback. A proper ?? operator a la C# would be great, but they've pushed back against it because it doesn't entirely mesh well with nulls in the JS world.

JS is still probably my favorite language to work in despite stuff like this. But that's one footgun that everyone should be aware of.


A true elvis and null-safe call operator like ?., ?? and/or ?: would be an improvement, but the idea doesn't seem to be gaining traction in the Javascript world where || is "good enough"


That's a terrible approach because it's inherently noncompositional and thus breaks parametricity. You can't tell whether T? is the same type as T without knowing what T is, so if you use T? and handle null in your generic functions then they might suddenly misbehave when passed a null.

(Previously said T?? rather than T?, thanks to corrections in replies)


This is incorrect. The type system of both languages makes sure that T does not include null, so the case where you "accidentally" handle the null of T is impossible.

More generally, with unions you always have this behavior but I've never seen this be a problem. If your function accepts T | U and you pass (T | U) | U it simplifies to T | U and in my experience the code that handles U is always the correct thing to do for U.


> The type system of both languages makes sure that T does not include null, so the case where you "accidentally" handle the null of T is impossible.

So does that mean you can't call generic methods with ? types? Because there's an excluded middle here: either something like String? is a first-class type, in which case you can invoke a <T> T ... method with T=String? and then any T?s inside that method have the potential to misbehave, or you can't, in which case String? becomes an awkward second-class type.


What? No. T is a subtype of T?, so I don't get where you are going.

If a generic function f accepts a covariant Dog and Animal is a supertype of Animal you cannot call f with Animal because it requires a Dog. if f accepts a covariant Animal you can call it with a Dog because Dog is a subtype of Animal.

Now replace Dog with T and Animal with T? and you can see it is perfectly fine.


We're talking about generics. <T> and the like. I can have a Map<String, Dog>, I can have a Map<String, Animal>, and these are different things but they both work fine because both Dog and Animal are first-class types.

Can I have a Map<String, Dog?> ? If no, then Dog? isn't a first-class type. If yes, we have all sorts of nasty surprises, because code written in terms of Map<String, T> will assume that if map.get(someKey) is null then that means someKey isn't in the map, and this code will work fine until someone uses a ? type for T and then break horribly.


> Can I have a Map<String, Dog?>

yes

> because code written in terms of Map<String, T> will assume that if map.get(someKey) is null that means someKey isn't in the map

And it can safely assume that. Map<String, T?> isn't a subtype of Map<String, T>, so passing Map<String, T?> to a place requiring Map<String, T> will not compile.

T is a subtype of T?, not the other way around. Map is defined as Map<K, out V>, meaning V is covariant.

So you can pass a Map<String, Dog> where a Map<String, Dog?> is required, but not a Map<String, Dog?> where a Map<String, Dog> is required.


> You can't tell whether T?? is the same type as T?

Huh? T?? would always be the same type as T?. `T | NULL | NULL == T | NULL`.


I think you have too many ? in there; you can't tell if T? is the same type as T unless you know if T is U? for some U. (This is true of untagged unions more generally, ? is just shorthand for | null.)

But T?? is always identical to T?.


They’re not direct support, but option types only work in languages with the right syntax and tooling to make them work. C union types, for example, aren’t really capable of providing a null replacement simply because they’re so complex to use.


They’re not direct support, but option types only work in languages with the right syntax and tooling to make them work.

I strongly disagree.

You don't need pattern matching syntax to use option types effectively. All you need are the right methods - map, flatMap, getOrElse etc etc. Any language with inheritance and dynamic dispatch can implement a good option type.

Even in languages with pattern matching, I never reach for it when dealing with options.


> Yes, option types are awesome. No, they are not nulls.

No, but they serve a superset of the semantic functions of nulls, better than nulls do.


All this is true. I immediately took it to mean the Tony Hoare invention of Null. That of the zero memory pointer. Other types of null usually have other names with the notable exception being databases. Also Go uses nil for its null which puzzled me for a while then realized its behaviour is different, you can work with them to a point, e.g. len() or append().


NULL should mean an intended lack of data. Undefined should be unintended.


You know who works on a platform with NULL but doesn't have quite so many problems with it? DBAs.

There's some need to draw a distinction between the basic idea of NULL, and the way that NULL has been implemented in most high-level programming languages.

In most RDBMSes, values can't be null unless you say they are. Sometimes explicitly, as in table definitions, sometimes implicitly, when you select a JOIN type. Either way, though, the fact that the developer is in control of when it can and cannot happen means that it always has a knowable meaning. (Or should, anyway.)

The problem with many programming languages is, you're given it whether you want it or not. In a low-level language like C, that's reasonable, because it takes a sensible approach to how it works: Only pointers can be null, and all pointers are nullable for obvious (especially in the 1970s) reasons.

More generally, I'm not going to fault languages from that era for trying it out, because this stuff was new, and things were still being felt out. So I don't really fault Tony Hoare for giving null references a try in ALGOL W.

What seems much more bothersome is high level languages like Java and C# cargo culting this behavior. They could have followed the lead from languages like SQL and let the programmer be in control. They should have. They already throw exceptions when a memory allocation fails, and they allow inline variable initialization, and declaring variables at the point of usage, and composite data types have constructors, so they lack all of (early) C's reasons why ubiquitous nulls were a good idea. They could have, I think quite easily, made nullability optional. At which point it'd have basically the same semantics as optional types from functional programming, so I doubt we'd be worrying about it anymore.

But they didn't.


NULL in SQL really isn't great. For one, nullable table columns is a bad default, and you have to explicitly write out "NOT NULL" to avoid this behavior. I'd say that 90% of the time I want not-null table columns, and only 10% of the time do I want a nullable column.

Secondly, NULL has weird arithmetic. It turns out that NULL=NULL is false, and NULL<>NULL is also false. (This is unlike C/Java/Python/etc. by the way.)

Thirdly, even if you design all your tables to have NOT NULL on all columns, your queries can still synthesize NULL values in the results. For example, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, (but not INNER JOIN). For example, computing max(column) on a table with zero rows.


I can get behind your first statement. Having NULLable as a default on columns is "probably" a bad idea.

I'm not so sure I can agree with the other two. NULL<>NULL (and NULL=NULL) both return false for a very simple reason: truly missing data _can't_ be equal to anything, including missing data... Because it's missing. You cannot with certainty say that value1 is or is not equal to each other.

For the third point... What should max(column) return when there's no data? You're telling the engine "give me the maximum value of something that doesn't exist". That is, in my experience, "missing data."


For example, if it were the case that NULL = NULL, really counterintuitive stuff would happen on joins because a null cell would match with every other null cell you are joining on:

        person
   name      home_address
   ---------------------------
   "Alice"   NULL
   "Bob"     "123 Jump Street"

                letter
   return_address     description
   ----------------------------------
   NULL               "Ransom Letter"
   NULL               "Spy Document"
   "123 Jump Street"  "Hello, from Bob"
   
Then

    SELECT name, description FROM person INNER JOIN letter ON home_address = return_address
would return

    name     description
    ------------
    "Alice"  "Ransom Letter"
    "Alice"  "Anonymous Spy Document"
    "Bob"    "Hello from Bob"
So now Alice is associated with a bunch of letters she didn't necessarily write because she doesn't have a home address.


I find NULL to be incredibly useful. I do agree that it has bad defaults in SQL, and equality is annoying similar to NaN in other languages.


> NULL in SQL really isn't great. For one, nullable table columns is a bad default, and you have to explicitly write out "NOT NULL" to avoid this behavior.

This is not true on many dbms. It's an implementation choice.


NULL in SQL is a notorious source of errors and confusion (particularly when it comes to e.g. tri-state boolean logic). It certainly can come from nowhere and surprise you - if anything the behaviour is even worse than in Java or C#. So I don't think there's anything to learn from there. (Rather what modern languages should have done - and increasingly do - is follow ML practice and avoid null entirely, implementing option types as ordinary library types where the programmer explicitly wants to represent absence).


That's not really NULL's fault that it causes confusion in SQL. That's just ternary logic. People who don't handle NULLs in SQL aren't really mishandling the NULL. NULL is just a value. They're simply failing to understand the Boolean value of UNKNOWN and what that means. They're so used to thinking only in bivalent logic that the additional complexity throws them off.

However, "It's more complex for me to think about," or, "I don't understand the convention," or even, "It's easy to forget the convention," are not a very convincing arguments. It's similar to arguments about little endian vs big endian. Yes, big endian is how we write our positional numbers, but little endian makes casts a noop. Or arguments about zero-based array indexing. These concepts aren't difficult. They're just more complex. Negative numbers aren't difficult, but they're more complicated than just cardinal numbers. Fractions and decimals aren't difficult, but they're more complex than integers. Multiplication and division aren't difficult, but they're more complex than addition and subtraction.


>That's not really NULL's fault that it causes confusion in SQL. That's just ternary logic

The problem is that its not simply ternary logic. It's a ternary logic that gets mapped onto a boolean algebra, which leads to the usual strange repercussions (particularly, the presence of nulls creates both false positives, and false negatives, silently).

The SQL language goes out of its way to pretend its not ternary, though in fact it is. You have to actively keep in mind when writing SQL that the database is trying to trick you. This is not a good thing, and it's hard to blame the programmer when they get tricked.


> Or arguments about zero-based array indexing. These concepts aren't difficult. They're just more complex. Negative numbers aren't difficult, but they're more complicated than just cardinal numbers. Fractions and decimals aren't difficult, but they're more complex than integers. Multiplication and division aren't difficult, but they're more complex than addition and subtraction.

We usually consider it a good thing when programming languages let you opt out of the complex thing. In a good language, you can do integer arithmetic if you don't want to deal with fractions or decimals. You can do cardinal arithmetic if you don't want to deal with negative numbers. You can do ordinary Boolean logic if you don't want to do ternary logic.

The problem with SQL isn't that it has NULL. It's that it's too hard to not have NULL. Which is the problem with null in general.


Yeah, but saying "I want to use SQL" and "I don't want to use NULL or ternary logic" is a bit like saying "I want to use the existing datetime types" and "I want all years to have 10 months, all months to have 30 days, etc." Or like saying, "I want everything to use integers" and "I need fractional components." Your requirements break the abstraction not because the system is constrained, but because you're breaking the conceptual model that's the foundation of what you're trying to use. It's not a language problem. It's not a data problem. It's not a computing problem at all. It's applying the wrong conceptual model to meet your needs. That isn't a problem with the conceptual model, either, since plenty of people use it very successfully.


> Your requirements break the abstraction not because the system is constrained, but because you're breaking the conceptual model that's the foundation of what you're trying to use.

How so? Elsewhere in the thread it's claimed that the original relational model didn't have nulls, which is what I'd expect.


Relational algebra doesn't have nulls, but there's a difference between the mathematical theory and concepts and the reality of a relational system.

As I mention elsewhere, Codd's own list of rules for a relational database [0] explicitly require nulls (see Rule 3).

[0]: https://en.wikipedia.org/wiki/Codd%27s_12_rules


I don't see any entanglement with the rest of the rules, or with what makes a relational database a relational database. "A systemic way to represent missing and inapplicable information" may be necessary, but better alternatives to null are imaginable. A relational database without nulls sounds like an ML without exceptions: actually a pretty good idea.


I guess. My tendency is to think that it's more a problem for developers who are new to SQL, and are surprised to find out that, despite having the same name, nulls in SQL don't have the same semantics as nulls in other languages.

Once you get a handle on the semantics, though, they make a lot of sense. The trick is to understand how SQL's NULL is rooted in mathematical formalism, not the pragmatics of dealing with pointers. It has more in common with NaN in floating-point numbers. So, for SQL, "null <> null" behaves like "NaN <> NaN". For C and friends, "null == null" for the same reason that "0 == 0".


In SQL, NULL<>NULL yields false. You use IS NULL / IS NOT NULL to test for NULL values. In programming languages, NaN!=NaN yields true. You use x!=x to test for NaN values.

Saying that SQL NULL is rooted in mathematical formalism doesn't explain anything, because anything (even nullptr and NaN) can be explained in mathematical formalism. What we want is a simple semantic model that a human can understand and one that lacks nasty unintuitive surprises.


> In SQL, NULL<>NULL yields false.

No, NULL <> NULL yields UNKNOWN. That's why NULL <> NULL and NOT (NULL <> NULL) behave the same: they have the same value. UNKNOWN is a first-order truth value in ternary logic.

The key is that in a WHERE clause, a record is only returned if the WHERE clause evaluates to TRUE. Not TRUE or UNKNOWN. TRUE.


> In SQL, NULL<>NULL yields false.

It yields NULL, not false. So do NULL = NULL or NOT NULL.


NULL isn't a Boolean value in ternary logic any more than 3.2 or 'Hello' or December 12, 2018, have Boolean values. It's UNKNOWN. UNKNOWN is related to NULL, but they don't work identically.

NULL is a value that any column data type can potentially have. NULL is what comparison and evaluation operators work with. UNKNOWN is a ternary Boolean type, and the Boolean type is what Boolean operators work with (AND, OR, NOT) and nothing else. This Boolean type in an RDBMS is unavailable to the user and is for internal evaluation purposes only. RDBMSs that support a "bool" type are not implementing the same thing. You can never say UPDATE MyTable SET Col = Value1 AND Value2. That's not going to work. Many RDBMSs have a documentation page that explains this difference, like this one[0] from Microsoft SQL Server.

Notably, NULL + 3 and NULL * 5 are both NULL. Any mathematic operation on NULL is NULL. But UNKNOWN AND FALSE is FALSE, and UNKNOWN OR TRUE is TRUE.

[0]: https://docs.microsoft.com/en-us/sql/t-sql/language-elements...


NULL is an alias for UNKNOWN on many systems (like MySQL.) Other DBs don't even have UNKNOWN.

UPDATE table set col=value1 and value2 works fine IF value1 and value2 are booleans.


That's a great example of MySQL creating a proprietary extension of ANSI SQL that does little more than deliberately mislead users.


According to https://en.wikipedia.org/wiki/Null_%28SQL%29#BOOLEAN_data_ty... NULL is the same as UNKNOWN. The standard also asserts that NULL and UNKNOWN "may be used interchangeably to mean exactly the same thing"

In 20+ years of DB work, I have NEVER seen anyone use UNKNOWN. It is always NULL. Always.


Alright, I will withdraw my criticism of MySQL on this issue.

However....

> In 20+ years of DB work, I have NEVER seen anyone use UNKNOWN.

I mean, I've already shown where Microsoft does just that [0]. Oracle pretty clearly does the same [1] [2]. People don't use it because you can almost never refer to it directly. The language intentionally hides it. About the only place I know that you can is PostgreSQL [3], which supports the "boolean_expression IS UNKNOWN" predicate.

> The standard also asserts

I assume you've got the 2003 draft standard that's around [4]. I will use that because I don't see any more recent version of 9075-2 that's freely available.

Yes, the standard does say under 4.5 Boolean types:

> This specification does not make a distinction between the null value of the boolean data type and the truth value Unknown that is the result of an SQL <predicate>, <search condition>, or <boolean value expression>; they may be used interchangeably to mean exactly the same thing.

However, that's in the context of describing the Boolean user data type, a.k.a., BOOLEAN. You can tell because 4.2 describes character strings (CHAR, VARCHAR, etc), 4.3 describes binary strings, 4.4 describes the numeric data type, 4.6 describes DATETIME, and 4.7 describes user-defined types.

The standard is not saying that UNKNOWN and NULL are the same. It's saying that the Boolean user data type can use NULL to represent UNKNOWN. It's saying that if you choose implement a BOOLEAN user data type, you can use NULL to represent UNKNOWN. If you choose to assign a boolean expression to a column, that is. Nevertheless, an SQL <predicate>, <search condition>, or <boolean value expression> has a value of True, False, or Unknown. This shown by looking at 6.34 <boolean value expression>:

  <truth value> ::=
      TRUE
      | FALSE
      | UNKNOWN
Or by searching section 8 and seeing where every time they talk about one of the value expressions being the null value, then the predicate "is Unknown".

[0]: https://docs.microsoft.com/en-us/sql/t-sql/language-elements...

[1]: https://docs.oracle.com/cd/B19306_01/server.102/b14200/condi...

[2]: https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_e...

[3]: https://www.postgresql.org/docs/11/functions-comparison.html

[4]: http://www.wiscorp.com/sql_2003_standard.zip

Edit: Bit of cleanup.


Ok, I will concede you are technically correct! However, I never seen a developer use "is unknown", even with Postgres. (I have been working with Postgres for over 15 years.) They always use "is null", which is, for all intents and purposes, the same thing from a developer perspective.


I've only seen it once that I can think of, and I don't remember where. It might've been an example when they added or explained that predicate. I recall something like (Column1 = Column2) IS NOT UNKNOWN, but I don't know why you wouldn't use Column1 IS NOT NULL AND Column2 IS NOT NULL instead. I guess it might save a bit of rewriting, but it still seems pretty narrow.

It's really not useful unless you're talking about the value of a boolean expression or the underlying concepts of SQL, and most RDBMSs don't let you manipulate that directly with DML (MySQL is the first one I've seen that let you do it, and you just taught me that was the case). It's somewhat hidden because of that.


I thought in most (many?) languages NaN != NaN ?


My degree's in mathematics and I share your disdain for pointer bit-twiddling. I still find SQL nulls difficult to reason about or diagnose. I'm sure there are times and places when their behaviour is what you want but most of the time they're just a big extra complication that you don't want or need.


In the tables I define everything is not null with sane defaults by default.

The places I do allow null are few and far between (e.g. updated_at) and I'm struggling to think of instances I've used them as anything other than absence indicators.

In fact I don't think I ever treat it as anything other than that in code either.

Was the purpose of null ever to mean anything other than I have not been defined/set?

All my objects are statically typed so I never run into the issue of testing is thing.x a thing, it's always a thing, or it's a compile error. It's either set or not set, and thanks to the database convention I only have to worry about certain values having null, most of the time it makes sense anyway. Is updated_at turthy doubles as has been updated tests.

Am I incorrect in this method? With this method I fail to see big extra complication. Will switching to option types help me? I debate they will not. But I'm happy to be convinced. I do avoid nulls. I just haven't seen a problem with them in my own code. (Not true for others)

Specifically, with the caching problem, provided you constrain the cache to reason about null == not set. I see no problem.

    Cache.get(K) // null
    Cache.set(K,3) // void
    Cache.get(K) // 3
    Cache.set(K, null) // deletes, void
    Cache.get(K) // null
    Cache.set(K, false)
    Cache.get(K) // false
Only certain values of mine are going to potentially be null from the database, all of which will be contained within serialised objects.

I just never see the issue the author has. The times K do see it are when people get too clever with default values.

I understand it, I just don't see it in practice. Certainly my not frequently enough to make language changes.

Title should just read "Stop abusing null" because the only time I've seen it be an issue is when people are dual encoding meaning.


> In fact I don't think I ever treat it as anything other than that in code either.

> Was the purpose of null ever to mean anything other than I have not been defined/set?

Different people understand null differently (it might mean "error", "value not in map", "invalid user input", ...) and there's never been a clear consensus. If "null" only ever has one meaning anywhere in your codebase, and any third-party libraries you use only ever use it to mean the same thing, you're probably ok. But as soon as there are multiple meanings you'll have confusion and bugs.

> Specifically, with the caching problem, provided you constrain the cache to reason about null == not set. I see no problem.

If you only have the one cache, sure. As soon as you have a two-level cache you start to have problems (you can no longer cache absent results, since you're using the same representation for absence from your outer cache). Or as soon as null shows up anywhere else.

It's the same problem as stuff that relies on evaluation order, or threadlocals: it's ok most of the time, as long as you're not combining it with something else that does the same thing. The trouble is the times when these noncompositional constructs break down are when you have complex nested code - which is precisely when you most need everything to work the way you'd expect.


> Different people understand null differently (it might mean "error", "value not in map", "invalid user input", ...)

Null only has one meaning, null. That's the point.

As soon as you start applying more to it than that you get problems.

> and there's never been a clear consensus.

This is simply not true. Null is null. That is all. Period. End of story. It has never been more than that.

If you have libraries, functions or existing code that ignores this, then that's on you, the developer, to reason about.

I guess I'm taking your meaning a bit out of context, I do understand years of common practice have resulted in much abuse but I don't think the language designers would have ever denoted double meanig in null values.

> If you only have one cache, sure

I feel like you're missing my point. If you need to handle more meaning than null == notset/unset/absent then you need to resort to a new data type. Null only has one meaning, null. You can't get two meanings out of one.

You, as the consumer of the cache must then decide on how to represent or encode further meaning. Either using the Some<T> pattern or an empty string or something like that.

This serialisation can easily be wrapped around a base cache class that just deals with simple storage where null == not set or unset (absent value.)

But the underlying pattern shouldn't involve itself with further concerns than it needs to.

This opinion is precisely because I've seen this sort of oh I'll just add a has function, oh and then I'll add a sub par serialisation library. Ok now I need a sometimes unserialize. Oh some legacy? Ok now I need to deserialize once to one level and twice to all levels. Oh yea, now I should throw not found.

Ok now every single call to cache.get must be wrapped in try catch and we must cast some values to false and others to empty string, oh yea and you have to call has before every get even if you just want to take advantage of a dynamically typed language and test for that falsely value. Cache.get(test) && okdothing();

It's two functions set sets the thing. Get gets it. If it's not there it returns null. That is the whole contract. Why do people try to over complicate the base contract? It's just crazy over engineering.

> which is precisely when you most need everything to work the way you'd expect

Indeed. And I expect null == null.

Not null == not yet set, set but cleared, set but empty, false, error, not found, or anything else for that matter.

I can use null to represent that my cache does not have a value for that key because that is the design I chose that


> I guess I'm taking your meaning a bit out of context, I do understand years of common practice have resulted in much abuse but I don't think the language designers would have ever denoted double meanig in null values.

The language designers didn't give any single clear meaning to null. They just put it in the language, and so different library authors (entirely understandably) used it for different things, and it's now impossible to standardise on any one universal meaning.

> I feel like you're missing my point. If you need to handle more meaning than null == notset/unset/absent then you need to resort to a new data type. Null only has one meaning, null. You can't get two meanings out of one.

Indeed, because null is a language-level special case. (Whereas using Option you wouldn't have any problem: Option is just another normal user-defined type in the language, so Option<Option<T>> works no differently from any other Option).

> You, as the consumer of the cache must then decide on how to represent or encode further meaning. Either using the Some<T> pattern or an empty string or something like that.

So you have a bunch of awkward complexity in precisely the case where you least want extra trouble. You don't know how many places the cache might assume that null values have its particular meaning, and you have no way to know whether you've got them all. The most dangerous pitfalls in programming are things that usually work.

> It's two functions set sets the thing. Get gets it. If it's not there it returns null. That is the whole contract. Why do people try to over complicate the base contract? It's just crazy over engineering.

There's nothing complicated about using option. Set sets the thing. Get gives you an option that's either some if the thing was set, none if it wasn't. Perfectly normal datatype like you'd write yourself, no special cases anywhere.

> I can use null to represent that my cache does not have a value for that key because that is the design I chose that

Only if you write all you own code and never use anyone else's libraries. And even then, you have to remember all the things you used it to mean in all the places you used it. There's only one null and there's no way to define a user-defined thing that works like null, so it begs to be abused (I'd argue to use it at all is to abuse it, given that it has no particular meaning defined in the language).


Lua's version of NULL, called nil, once bit me badly due to behavior in a sqlite library. The sqlite library I was using represented SQL NULL as nil, a perfectly reasonable choice. However, in lua there is also a convention to use nil as the end of table sentinel.

This meant innocent looking code using ipairs() would stop iterating on a row of results once it hit a nil, which could occur in anywhere. It meant we were missing data (the lua code uploaded locally collected data to a server) until I figured out the cause and explicitly iterated over the expected size of the table.


Are there any dbs based on modern type theory?


> NULL is a value that is not a value. And that’s a problem.

The problem isn't NULL, it's languages not enforcing the necessary checks for the "no data" condition. Option can still be NULL ("None" in rust), wrapping NULL in a struct doesn't provide any safety. The safety of Option wrapper types is from the other language features (like rust's "match") and a stricter compiler that forces the programmer to write the NULL check.

NULL would be fine if C required you to write this:

    foo_t *maybe_get_foo(/*...*/) {
        if (/*foo_is_available*/) {
            return foo;
        } else {
           return NULL;
        }
    }

    foo_t *f = maybe_get_foo();
    if (!f) { /*...*/ }   // REQUIRED or compile error
    do_something(f->bar); // only allowed after NULL check
Obviously implementing that requirement would be difficult in C. Languages like Rust were designed with enforcement features (match + None, much stronger type/borrow checking), but lets your have "a value that is not a value".


> The problem isn't NULL, it's languages not enforcing the necessary checks for the "no data" condition.

Talking about "NULL" pretty much implies that. When Tony Hoare talks about null references, it's about every reference being nullable in languages like Java or C#, not about the ability to conceptually wrap/opt non-nullable references in a nullability thingie.


Psst: I think the thingie you're referring to is called a 'monad': https://en.wikipedia.org/wiki/Monad_(functional_programming)


It isn't. There exists a common monad that solves this problem but a wrapper type like Option or Maybe need not be a monad. For example, `Nullable<T>` in C# is not a monad.


Aha! You're right. I misremembered this excellent blog series from Eric Lippert (a member of the c# design team): https://ericlippert.com/2013/02/25/monads-part-two/


No. The thingie may or may not be monadic, but I'm using thingie because it could be a "library" sum type, or it could be a magical-ish builtin, or it could be some other mechanism.


I largely agree, but that would get very tedious because there's no way to create a pointer type that is guaranteed to be non-null in C. You'd end up having to do a lot of unnecessary checks because of that.

People say optionals are the solution, but the way I see it it's the other way around. Pointers types that allow NULL are basically optionals, and the problem is that we use them everywhere, even for things that are not optional. What we are missing are pointer types for things that are not optional.

And the pointer type with a guaranteed value needs to be as easy to use as the nullable pointer type, if not easier. Otherwise it won't always be used when it should be.


> Option can still be NULL ("None" in rust)

'None' in Rust is part of the Option enum, not an equivalent to null.

  pub enum Option<T> {
      None,
      Some(T),
  }


It can be at the machine level. If T is NonZero, then sizeof(Option<T>) == sizeof(T).


You might instead argue that the problem is allowing checks for the null condition. If you make comparing a pointer with null crash, or just remove the keyword to make its use non-idiomatic (or only allow it as special initialization syntax), people won’t casually use null to mean something. It’ll serve its role as a pointer value that crashes cleanly if you dereference it.


Not very difficult. Just provide a not_null pointer attribute, just like const. Then require that all dereferenced pointers must have the not_null attribute. Problem solved.

(Other note: C++ has a not_null pointer-like type: it's references. Unfortunately, C++ references cannot be reseated, which makes wholesale replacement of pointers not feasible. Plus, the language doesn't actually forces you to check pointers before assigning to a reference.)


> Option can still be NULL ("None" in rust), wrapping NULL in a struct doesn't provide any safety. The safety of Option wrapper types is from the other language features (like rust's "match")

The advantage here (especially true in Haskell) is that you can use monadic error handling to make this far more pleasant.


That's a secondary nicety (alongside the ability to make everything "nullable" not just the references/pointers subset), the primary advantage is that:

1. it clearly separates "nullable" and "non-nullable" providing better modelling tools

2. the compiler assists/mandates null-checking, providing better type-safety


Even Rust doesn't have the strictness in your comment. It's perfectly fine by the compiler to make use of `x.unwrap()`: if x is None (or Err, in the case of Result), you'll just get a panic at runtime. The features you note are superior to C's offering, but purely optional.


There's a fundamental difference between the Rust approach of the library providing a function for opting-in to potential crashes and the C/Java approach of not distinguishing that case at all. The programmer still is forced to write a null check, it's just a check that crashes the program.


No, the problem with null is the inability to enforce, at the type level, that a particular value is not null. C simply does not have a type for "guaranteed valid pointer to x".


C allows defining new types though, so it can be done.


It can only really be done on a per codebase level, really.

By which I mean, in my highly sensitive and correct program xyz I can (say) abstract all access to a type (A struct?) through a macro which would contain null checking/assert-ing. This could work perfectly well and even be static checked for (in principle).

However, if I tried to package this type up in a library it all basically falls flat - or at least it would be very easily broken.

Doing this like this is to static analysis(/The Type system) as foreplay is to sex.


Unless I'm misreading OPs statement, it can't be done by defining a new type since C (unlike, say, Ada) doesn't support subrange types.


Could you provide an example? Either source code or a link. I want to see how you might define a type for "guaranteed to not be null pointer to char".


I'm thinking of just:

    struct char_nonnull_t { char *ptr; };

    char *char_nonnull_as_pointer(struct char_nonnull_t cp) { return cp.ptr; }
With the required extra accessor functions. Of course in C, we don't have generics, data hiding or other nice features, so we have to build a wall of conventions around our types instead. Types with invariants are totally possible in C. We just don't have a way to automatically enforce them. It's C, after all.


Since every type can be NULL that’s a lot of verbosity.


Yup, this is what happens when you use Typescript with strict null checking.


Or add exceptions like C++ did,

if (foo_is_available) return foo; else throw FooNotAvailable();


Not sure why I am getting downvotes for this. The parent proposed a language feature to support an enforced code execution when a return value was not available:

    foo_t *maybe_get_foo(/*...*/) {
        if (/*foo_is_available*/) {
            return foo;
        } else {
           return NULL;
        }
    }

    foo_t *f = maybe_get_foo();
    if (!f) { /*...*/ }   // REQUIRED or compile error
    do_something(f->bar); // only allowed after NULL check
And I was pointing out that C++ exceptions do exactly that.

    foo_t *maybe_get_foo(/*...*/) {
        if (/*foo_is_available*/) {
            return foo;
        } else {
           throw FooNotAvailable();
        }
    }

    try {
        foo_t *f = maybe_get_foo();
        do_something(f->bar); // f cannot be NULL
    }
    catch (const FooNotAvailable& ex) {
        /*...*/
    }
Sorry if that wasn't the answer you wanted because C++ is not fashionable. But it is a valid option for how to avoid the use of NULL.


I've made my peace with null. Null is basically just an implicit

    assert(valid(x))
before every time you call a method on x. Similary, I think of exceptions as explicit "crash-unless-caught" commands.

If you write your program with the "blow up early" mentality anway, or use static checking tools and a bit of discipline, I've found that null looses it's terror.


“Looses its terror” here means the opposite of “loses its terror”.


Cry havoc and let loose the null pointer exceptions


In market terms, sir, you've entered the capitulation phase haha. It's actually not correct to say that accessing null will always blow up. In embedded systems without memory protection address 0 may well contain valid data, usually a vector table. In WASM address 0 is totally valid also, if I'm not mistaken, as memory is represented as a big ol' array with an offset and checking for 0 would be too inefficient.


If it's C, it's far more terrifying than that.. there is no assertion! just a vauge threat that something will go wrong if you stick a null in there, with no guarantees and no checks.


> I've made my peace with null. Null is basically just an implicit

That very much depends on the language:

1. it can be a compile-time error (Swift)

2. it can be a runtime error (java, C#)

3. it can depend on what you're actually using (Python, Ruby, and on runtime extensions you might have loaded in the latter case)

4. it can be a no-op (objective-c)

5. it can depend on the combination of platform, compiler and surrounding code going from a segfault to deleting your program's security and/or causality (C, C++)

6. it can depend on the implementation and exact codepaths (Go)

etc...


Sure, null can be a non-problem for the low, low price of 3 or 4 extra lines on each function. But those extra lines distort your program architecture, pulling you away (perhaps without even noticing) from short, composable functions.


Oh I'm fairly disciplined. But we use a 15 year old code base which should've been made with more discipline than it was.

Additionally the program shouldn't "blow up early" but it was in many ways coded that way.


That is a very big IF, considering how many teams are actually structured.


Isn't there an inherent need in programming to express an explicit "nothing" value? Coming from Python and JS, I never found None/null to be much of a problem. I in fact like the distinction of null and undefined in JS. Using null allows you to distinguish from the accidental undefined.


The major criticism towards NULL's does not apply to dynamic languages. The problem in languages like Java is that nullability is not represented in the type. This criticism is not relevant in a language without static type checking. The criticism is also not relevant in a language like TypeScript where nulls are specified in the type.

In short, the problem is not nulls per se, the problem is static type systems which does not allow you to specify if a certain value can be null or not.

But I hereby predict that soon people are going to declare that "nulls are bad" and eliminate them even in languages where they are totally fine.


The OP specifically calls out ways that nulls cause problems in dynamic languages. In fact, they're very similar to the problems nulls cause in statically typed languages, just ignoring some of them because danger is already priced in with a dynamic language.

Dynamically typed programs are usually informally "duck typed," since it's impossible to do something meaningful with truly arbitrary types most of the time. But just like in statically typed languages, there's always the very real possibility that your duck will instead be a null — and the bug is not the function returning a non-duck, it's you ever assuming you have something that quacks like a duck.


Well in a dynamically typed language there is also the possibility than something you expect to be an integer is in fact a string. Null or Nil or None is not different in this respect than any other type.

But of course you can make bugs involving nulls. The OP shows an example in a dynamic language where null is returned instead of the regular value if a lookup key is not found. Obviously this is ambiguous when the found value can also be null. But the problem is not the null per se, the problem is using a sentinel value to indicate a special condition when the same value is also a legitimate regular value. This is just bad API design.


The way Clojure handles nulls is ideal for a dynamically typed language. In Clojure, a nil is a perfectly valid object, and it is also a sequence (it's the empty list equivalent). That means that most standard library functions have no problems letting nil flow through them, returning nil. It winds up being quite similar to monadic Option/Maybe, although with no guarantees.


The key word here is explicit. Explicit Maybe/Optional types from Haskell, Swift or Rust are wonderful. After my experience with Elm and Swift, the mere thought of going back to implicit nullability everywhere hurts.


No, there’s no inherent need for every variable to allow a nothing.

There is a need for a nothing value in many cases and languages without an implicit null, such as Haskell, F# and Rust, employ an ‘option’ type. It’s a dedicated type that either contains a value or nothing. It forces you to declare when you expect a potential null and to check for it.


I didn't use the word variable. I think we're talking about the same thing here.


In C++ I practically never feel like I need a null for anything but a pointer. A null string or integer makes no sense to me.


std::optional has the potential to eliminate many of the null-pointer uses.


If my pointer already supports NULL, why would I put std::optional on top of it? I was saying I don't need a null for anything but a pointer.


Mainly to avoid scenarios where a pointer is used instead of a value type only because the author wishes to express that it is optional.


Because it's a 0 overhead abstraction and guard against the case when you (or a coworker) forget to check for null. Are we seriously gonna pretend like you never caused a null pointer exception?


I don't see how it's zero overhead? And honestly while it's obviously not never, it's not exactly frequent either.


It's a zero overhead abstraction because it'll compile to the same code.


std::optional doesn't do that though, you can deref an empty optional and the result's the same as deref'ing an empty unique_ptr or a null pointer: UB.


But it makes you stop and thing before doing so. Thats the whole point.


It doesn't do that any more than a pointer


std::optional is a null-pointer use, it has the exact same semantics.


Even in dynamic languages, it's generally bad practice to have functions that return multiple possible types, and you are often better served by returning a more proper "null object" that accepts the same messages/methods as what would normally be returned. But for the most part, the problem people have with null is related to compile time checks and it not being opt in, which is unrelated to dynamic languages where every return type cold contain... whatever.


I agree that returning the same type, if possible, is better, since you can model it so that one specific value (zero, empty string, etc.) expresses the "nothing" accordingly. But I have found that several other developers don't share this view and prefer functions to return false, null, etc. instead of a value in the same space as the normal return type.


> Isn't there an inherent need in programming to express an explicit "nothing" value?

In numerical calculus: yes, most certainly. What do you expect the result of log(-1) to be? The alternative is to use specially tagged particular numbers as "no-data", and pray so that they do not appear naturally as the result of computations.


Technically, you could force the value to be one constrained to a valid range, rather then augmenting the domain, but this is a lot of work and maybe not worth it for pracial use.


Some systems already do that. C#'s Double type includes Double.NaN, Double.NegativeInfinity, and Double.PositiveInfinity. Math.Log(-1) does indeed return NaN.


That doesn't mean numbers need to always be nullable. It means that `log` doesn't return a number, it returns a nullable number.


>What do you expect the result of log(-1) to be?

i pi, or more generally i pi n for odd n.


Or exceptions.


Progress, it's now a solved problem in modern languages like Rust, Swift or Kotlin See for example: https://kotlinlang.org/docs/reference/null-safety.html


I mean it's been a solved problem since the 70s if not earlier, the problem has always been uptake. And that is not solved e.g. C++ recently introduced std::optional, which is not a type-safe version of a null pointer but is instead a null pointer wrapper for value types.


It is not possible to have a NULL type that works for all situations and has stable semantics.

The issue is, NULL should be a concept, not a value. I see no problem with using sentinel values, so long as they are well designed, and such good design comes with skill and experience, just as with all other aspects of architecture. The quest to have a single value that can be used for all the various possible meanings of NULL, to me, is the root of the problem.


> The quest to have a single value that can be used for all the various possible meanings of NULL, to me, is the root of the problem.

Exactly right. In particular, the conflation of nulls to indicate both error and non-error conditions (e.g. out-of-memory vs end-of-linked list) makes it impossible to distinguish errors from non-errors in many situations, and that is obviously bad.

Ideally you want nulls/sentinels that carry information about where, when, and why they were generated. You want separate nulls for numerical overflow/underflow, end-of-linked-list, out-of-memory, timeout, suppressed error/exception, unpecified/unknown value (preferably a separate one for each type) yada yada yada.


I agree with pretty much everything in the article. However, I would give Java a lower score because no one uses java.lang.Optional in practice, and there is too much legacy libraries and application code that cannot or will not be changed. Also, the @NotNull annotation isn't in Java SE; it is made available through various third-party libraries.

A language with a null value can dramatically simplify things for a language designer, though. In the case of Java, we know that every array of objects is initialized to null references. Thereafter, we can construct and assign objects to each slot of the array. Otherwise we run into issues that C++ faces - when we construct the array, the field of every object is uninitialized, so they are potentially dangerous if read or destructed, and need the special syntax of placement new to be initialized. The trick to avoiding null here is to avoid pre-allocating an array, and instead to grow a vector one element at a time. The C++ std::vector<E> is very accessible and performant, whereas Java java.util.List<E> is very clunky to use compared to native arrays.

Another case that gets simplified is object construction. When the memory for an object has been allocated but before the user's constructor code has run, what values should the fields have, assuming that they are observable? In a Java constructor, all fields are initially set to null/0, then you simply assign values to fields in the body code of the constructor. In C++ constructors however, you should initialize fields in the initializer list, and then you have still have the option to initialize fields in the body.

I still think pervasive null values are bad for the programmer (rather than the language designer). Now that I have preliminary experience in Rust, I see that its design is much safer and still practical, so I think this language shows the way forward.


Re: Array Initialization

One approach you can take is the Rusty "hang up a technical difficulties sign" (unsafe) while you mess around with potentially uninitialized memory, which is valid, but places the burden on you as the library writer. Another would be to initialize your array of pointers as an array of Option<Box<T>> pre-filled with None. Due to pointer alignment you can actually optimize Option<Box<T>> by turning it into a tagged pointer (which I believe is what Rust does) so that None == null at the machine level, while the language exposes a safe interface on top. [1]

Re: Object Construction

With object construction in Rust you can either (a) create all fields in advance and specify them at construction [best] (b) use mem::uninitialized() [bad] or (c) create a builder which has optional fields for everything and yields a constructed option via 'a' later [most work].

[1] https://doc.rust-lang.org/std/option/


It appears that Java will eventually get Value types, which will allow for Optional to be defined on the stack, at least making it actually do something useful.


That 'nothing' is inconvenient applies the same in mathematics with zero. Why do we have to have a number we can't divide by? What is 0 to the power of 0? It's a special case we always need to worry about. But its inclusion in the number system is not in question.

And I remember my distress using a financial package being told that my unused zero value still MUST have a currency! My pocket is empty, how can it have a currency? If a farmers field is empty - must I say what it is empty of - cows, sheep, aardvarks?

I think worrying about inconsistency here is worrying about the inconsistency of the world we live in. 'Nothing' is a mysterious thing we need to accept and respect.


I recall having a friendly argument with a friend who insisted that 30°C was exactly twice as hot as 15°C.


It looks as if even in temperature zero is a bit problematic. https://en.m.wikipedia.org/wiki/Zero-point_energy


Rich Hickey's "Maybe Not" should be watched by anyone who thinks nulls/nils/undefineds are okay. It should also be watched by anyone who think that Optional/Maybe/Nullable and good enough:

https://www.youtube.com/watch?v=YR5WdGrpoug


To someone who has been using Asm and C for decades, these arguments just make no sense. Reading this article reminds me of the arguments against pointers, another thing that's frequently criticised by those who don't actually understand how computers work and try to "solve" problems by merely slathering everything in thicker and thicker layers of leaky abstraction. It's not far from "goto considered harmful" either.

any reference can be null, and calling a method on null produces a NullPointerException.

...which immediately tells you to go fix the code.

There are many times when it doesn’t make sense to have a null. Unfortunately, if the language permits anything to be null, well, anything can be null.

That's not an argument. See above.

3. NULL is a special-case

...because it indicates the absence of a value, which is a special case.

though it throws a NullPointerException when run.

...and the cause is obvious. I'm not even a regular Java user (and don't much like the language myself, but for other reasons) and I know the difference between the Boxed types and the regular ones.

NULL is difficult to debug

Seriously? A "nullpo crash" is one of the more trivial things to debug, because it's very distinctive and makes it easy to trace the value back (0 stands out; other addresses, not so much.) What's actually hard to debug? Extraneous null checks that silently cause failures elsewhere.

The proposed "solution" is straightforward, but if you reserve the special null value to indicate absence then you can make do with just one value instead of a pair, of which half the time half of the value is completely useless. If you can check for absence/null, you will have no problems using Maybe/Optional. If you can't, Maybe/Optional won't help you anyway --- because it's ultimately the same thing, using a value without checking for its absence.


> another thing that's frequently criticised by those who don't actually understand how computers work and try to "solve" problems by merely slathering everything in thicker and thicker layers of leaky abstraction

I think that you're being quite unkind. Haskell's Maybe type and Rust's Option types are very far from "leaky abstractions" and were developed by people who definitely understand how computers work. In fact, your description of them appears to indicate that you aren't really sure how they work (None doesn't take up "half of the value") -- the point of typeclasses is that cases where NULL is a reasonable value are explicit and your code won't compile if you don't handle NULL cases. Allowing NULL implicitly for many (if not all) types is where problems lie.

It also appears you're arguing that languages which don't have models that are strictly identical to the von Neumann architecture are "merely slathering everything [with] leaky abstraction". Would you argue that LISPs are just leaky abstractions?


Yep! It's worth restating the fact that wrapping a pointer in an Option in Rust actually takes up NO extra space, because Rust's smart enough to just optimize it back into a nullable pointer.

https://doc.rust-lang.org/std/ptr/struct.NonNull.html


Well, a better comparison would be Typescript with strict null type checking enabled. You just use algebraic data types to specify whether a value can be null.


`Option<T>` in Rust is basically identical to `T | null` in TypeScript.


If your whole world is asm and C, then I take it you don't care much about type systems. Bless your heart, you lonely programmer of ephemeral software, may you be employed gainfully fixing your own bugs for decades. The saltiness if mostly for entertainment, please don't take too much offense. For everyone else working at a level of complexity where mistakes are inevitable and costly, types are an essential bicycle for the faulty minds of programmers.

The article is not arguing that we shouldn't express or model the absence of a value. It is arguing that all types having to support "no-value" case leads to error prone code that has historically cost immeasurable amount of money and much credibility and respect. If everything can be null then it takes too much effort to null check every reference, so developers routinely forget or think they know better. Instead it argues that we should model the idea a possible empty values as a separate composable type. Then you can write a large percentage of your code with a guarantee that objects/type/values are never going to be nil, while still handling that possibility in the smaller error checking parts of your code base.

One interesting anecdote is that our team, working in Swift, had to integrate a crash reporting tool and verify that it works. The challenge was that we haven't seen a runtime crash in several months in production.

> A "nullpo crash" is one of the more trivial things to debug

If it happens in your debugging environment in front of your eyes then maybe. Some of us work on software that is used by millions over decades and would never get to see any reports from a majority of crashes.


Not sure what world you are from. But here on earth apparently garbage collection and javascript is all the rage ;)

Issues with null doesn't even register in comparison.


> > There are many times when it doesn’t make sense to have a null. Unfortunately, if the language permits anything to be null, well, anything can be null.

> That's not an argument. See above.

It is actually their best point, IMO. I really like how RDBMS/SQL solve this: fields hold values and you specify beforehand whether they can hold NULLs. The author is right, sometimes it does not make sense for variables to be null-able (think ID fields or usernames) but often it does (e.g. a user's avatar). Being able to indicate that would be a nice idea. C++ for example does that, as `Field x` is not nullable but `Field* x` is.


The thing is, you're focusing on when you've detected that there is an issue (a crash). A lot of the issues with NULL are the fact that you can't easily detect if beforehand. It's not indicated in the types, or the syntax. That means that it's incredibly easy for a NULL issue to sneak into an uncommon branch or scenario, only to be hit in production.


But why was the scenario not tested before production? Should that not be the case anyway?


You can never guarantee you really have 100% test coverage in all scenarios in complex software.


Indeed. It gets asymptotically more expensive. Whereas a typechecker is a system of tests that's able to "cover" 100% of the code.


It’s cute that you think tests find all problems.


   > ...which immediately tells you to go fix the code.
Assuming your code isn't deeply nested. I've seen cases where null was triggered years after code went into production. In that case you have to:

A) Assume value isn't null and have more readable code

B) Litter the code with null checks.

e.g.

    if (a.getStuff().getValue() == "TEST")
becomes

    if (a != null && a.getStuff() != null && a.getStuff().getValue() == "TEST)

Thing with Maybe/Optional you have to check for presence of None, otherwise your code won't compile. Another smart way is what C# did. Integer can't be null. Integer? can be null.


But expanded null checks could be automated by the compiler if so desired right? Without having to change the nature of null into an optional.

@MAYBE if(a.getStuff().getValue() == "TEST")


You could check every value for null, sure. But a) why would you want to? (and wouldn't it be bad for performance) and b) how would you handle it? Knowing that a value somewhere in your program was null doesn't really help you any.


> ...which immediately tells you to go fix the code.

But which code? The point where you observe the error could be many compilation units away from the code that's broken; it might be in a separate project, or even 3rd-party code.

> ...because it indicates the absence of a value, which is a special case.

Why does it need to be a special case? Is your language incapable of modelling something as simple as "maybe the presence of this kind of value, or maybe absence" with plain old ordinary, userspace values?

> Seriously? A "nullpo crash" is one of the more trivial things to debug, because it's very distinctive and makes it easy to trace the value back (0 stands out; other addresses, not so much.) What's actually hard to debug?

"Tracing the value back" is decidedly nontrivial. And totally unnecessary if you just don't allow yourself to create that kind of value in the first place.

> if you reserve the special null value to indicate absence then you can make do with just one value instead of a pair, of which half the time half of the value is completely useless.

What do you mean? If you're talking semantically, you want absence to be a different kind of thing from a value: it should be treated differently. If you're talking about runtime representation, you can pack an Option into the same space as a known-nonzero type if you want to (Rust does this), but that's an implementation detail.

(Confusing sum types with some kind of pair seems to be a common problem for programmers who haven't used anything but C; sum types are a different kind of thing and it's well worth understanding them in their own right).

> If you can check for absence/null, you will have no problems using Maybe/Optional. If you can't, Maybe/Optional won't help you anyway --- because it's ultimately the same thing, using a value without checking for its absence.

Nonsense on multiple levels. Maybes deliberately don't provide an (idiomatic) way to use them without checking. By having a Maybe type for values that can legitimately be absent, you don't have to permit values that can't be absent to be absent, and therefore you don't have to check most values - rather you handle absence of values that can be absent (the very notion of "checking" comes from a C-oriented way of thinking and isn't the idiomatic way to use maybes/options) and don't need to consider absence for things that can't be absent.


The whole point of using type systems is to prevent human errors; a "poka-yoke" for programming.

The great advantage of Maybe/Optional systems is that only some of your references have to use them. You can draw a clear boundary between the parts of the code that have to check everything, and those that can prove it's already been checked.

In assembler we have no real type annotations, but for a long time I've considered trying to design a type-checking structure-orientated assembler.


> ...because it indicates the absence of a value, which is a special case.

But that's exactly the problem -- it's special, meaning it's only useful for certain situations. An ideal type system would provide compile-time guarantees, rather than having to wait for users to report issues. A type system which A) allows you to define variables as non-nullable and B) requires a null guard before every dereference of nullable variables eliminates this entire class of bug. What on earth is wrong with that?

EDIT:

> Seriously? A "nullpo crash" is one of the more trivial things to debug

Even if this were true [0], wouldn't it be easier if such a crash was just never even possible?

[0] which it's not - all a nullpo stack trace tells you is that something important didn't happen, at some point before this


In general, it's difficult to make the case that runtime errors are preferable over compile-time errors, unless the difficulty to enable compile-time errors is significant. In the case of Optional<> as a language/DB type, I can't imagine much effort involved, except for uptake.

Especially given that it can trivially be optimized out, since the eventual assembly should very well make use of the 0x0 property. But if you can encode the guarantee in your "high-level" language, why would you not want it?

In fact, I'm not sure how anyone could imagine assert(n != null) scattered throughout the codebase is a pleasant situation, unless of course, as most do, you're skipping the safety check for unsafe reasons.


Completely agree, this is CS theory gone off the deep end...


A simple, safe way of tracking nulls like option is "CS theory going off the deep end?"

Implicitly allowing all code to return nothing, and manually trying to remember what can return null and what can't, and checking that value, is incredibly error prone. It's really crazy that this has been the dominant way of handling the problem for many decades when their is a dead-simple way of ensuring it can't happen.

Null pointer errors, contrary to many claims, show up in production code all the time. Eliminating them is of huge value.


The NULL pointer errors yo're referring to in most cases resource issues. i.e. malloc returning NULL.

This is not the source of the vast majority of pointer errors.

Checking for (and trapping) NULL pointer dereferences is trivial, what is more difficult is the rest of the pointer range that doesn't get checked but is equally invalid, i.e. the other 4-billion (32-bit) possibilities.

Non-NULL-pointer checks are much more important than NULL checks.

The world of pointer issues is very much greater than "ASSERT(ptr!=NULL)".

...and as for correct error-recovery (not error-detection), well, don't get me started.


>This is not the source of the vast majority of pointer errors. >Checking for (and trapping) NULL pointer dereferences is trivial, what is more difficult is the rest of the pointer range that doesn't get checked but is equally invalid, i.e. the other 4-billion (32-bit) possibilities.

I think we write vastly different types of software. I can assure that that null-related errors are extremely common in situations besides resource issues. If it were just a resource-related problem, garbage-collected languages would almost never have issues, yet Java is infamous for NPEs. In Scala, where Options are ubiquitous, I've literally never had a single NPE.

It is very common for libraries to return null just to represent the absense of a result (ex: a row returned from a SQL query has no value for a column). That sort of thing means you have NPEs wholly unrelated to malloc or anything similar. These nulls are expected under normal program operation. They aren't errors. So, it's crazy to not to let the type system assist you in checking for nulls, so you don't forget and wind up with a NPE.


Two things are getting conflated here.

Pointer issues (that I was referring to) and a failure indication.

The most trivial pointer issue is a NULL pointer. This is such a trivial issue to catch its hardly even an error, yet people use that case as the exemplar for NULL issues.

detecting (and handling) failures on the other hand is very much different and more in the spirit of what the option-type arguments are about. In that case, the difficulty is not in detecting the error (that option-types will help with) but the application-level recovery. that is nothing that the language aid you with, its system-design and architecture related.

Basically, its the wrong issue to be thinking about.


>The most trivial pointer issue is a NULL pointer. This is such a trivial issue to catch its hardly even an error, yet people use that case as the exemplar for NULL issues.

How can you claim that NPEs are "hardly ever an error." NPEs are the most common error there is! They are indeed easy to catch, but you need to do so nearly everywhere, obscuring the code and introducing potential for error. There is no real, conceptual difference between something like a malloc returning null or a database query result containing a null. It is the same thing.

A null absolutely is an error if you don't catch it. By not using Options, it's vastly easier for that to happen.


"hardly even an error"

not

"hardly ever an error".

in other words, NULL pointer errors are a trivial error to deal with.


They're even easier to deal with if your type system guarantees you can never get them in the first place.


But yet it is the most common type of bug, even in production. Clearly, it's not that easy to deal with for human programmers. What's the problem with letting the compiler help you?


Just look at the example on Wikipedia[0]. Tagged unions are super-simple, and make NULL completely unnecessary. NULL causes lots of headaches, while tagged unions have never caused anyone headaches, so removing NULL is kind of the obvious thing to do. In a sufficiently advanced language, such as Rust, they get optimized to equivalent code anyway, so there isn't even any performance loss.

[0]: https://en.wikipedia.org/wiki/Tagged_union#Examples


Cannot understand this position... a non-nullable pointer is pretty much just a normal pointer but the compiler checks if you have tested for null. In rust (and I believe Swift) optional references also have the same size as normal pointer. In some version of non-nullables you only need checks for dereferencing and not for other handling.


I might have confused optionality and non-nullability...


Calling this "CS Theory" is intellectual dishonesty.


Option<&T>, in Rust, will compile down to a C pointer to T, with Nothing represented by the null pointer.


Also, when you are aiming for full test coverage null dereferences will be caught during testing.


Nobody does full path coverage, not even NASA.


sqlite is 100% branch test covered

https://www.sqlite.org/testing.html


Full path coverage is much more difficult than 100% branch coverage. It's next to impossible in any non trivial codebase that wasn't designed specifically for formal verification.


At least C# has the syntactic sugar to easily check for null references which lets you avoid the horrors of code like `(if s != null && s.length)`. Instead you can type `s?.length`. Never have I appreciated syntactic sugar as much.


A lot of higher languages use the concept of truthiness such that null, or 0-length both evaluate to false:

    if (s) then do stuff with s;


Other candidates:

- null terminated strings

- machine dependent integer widths


I came here to write that ASCIIZ is the more costly design decision than null, that not only led to crashing programs, but also to security vulnerabilities, sloppy APIs that cannot handle "binary data" and subpar performance.


Null-terminated strings are useful for buffering values of unknown length (e.g. copying from a network stream). Length-prefixed strings have issues of their own (e.g. what size length prefix to use, efficient encoding of variable-length integer prefixes, what happens if a length-prefixed string's calculated end-position is outside the process' memory space?)


> Null-terminated strings are useful for buffering values of unknown length (e.g. copying from a network stream).

That doesn't seem like a good example. The data arriving over the network arrives length-prefixed, because 0 is a legal byte value for arbitrary data. What do you then gain by throwing away your existing knowledge of the length?


> null terminated strings

This is mentioned.

> machine dependent integer widths

What exactly do you dislike about this?


It's an area of the language which can produce "works on my machine" bugs?

It becomes less important as we all converge on 64-bit machines with 32-bit "int" types, but it's still a monumental pain point for portability.


> What exactly do you dislike about this?

Or what do you even do about it?

A handful of solutions already exist: - Use a higher-level language - Java


> Or what do you even do about it?

All you need is a list of what width each type is.

> A handful of solutions already exist: - Use a higher-level language - Java

It doesn't require being "higher level". If anything it pushes code to a slightly lower level.


> All you need is a list of what width each type is.

This is exactly the kind of minutiae that GP was bemoaning.

> If anything it pushes code to a slightly lower level

Yeah by way of higher-level abstractions ...


> This is exactly the kind of minutiae that GP was bemoaning.

How is this “minutiae”? You should always know the possible range of a numeric variable or field when you create it, so why not just write what size it is? In Rust the main numeric types look like this: i32, u64, u8. You just pick the one you want.


I'm sorry, I misunderstood you.

These typedefs as I'm used to them do address cross-platform issues.

Storage classes are "minutiae" however when all you want is just a straight up number.

Python gives me an Integer type when I want a whole number, or a Float when I want to represent partials.

I don't really care to be honest how that gets represented in memory in this case.


A float in Python is an implementation-specific size, just like C. So I'm really confused about using it as an example here.

> Storage classes are "minutiae" however when all you want is just a straight up number.

You can have a single "straight up number" and mention the bit width in the language spec. The mere act of writing it down doesn't force coders to deal with any more minutiae than they already had to deal with.

> Yeah by way of higher-level abstractions ...

I strongly object to this. "float is at least x bits" and "float is exactly x bits" are the same level of abstraction, and almost every language, high or low level, picks one of those options.


You can strongly object all you want, but when I'm writing code in python, or any other high-level language I don't care one jot about storage size.


You could apply that same "who cares?" attitude to the size of "double" in C. Whether you burden yourself with that knowledge is not a feature of the language. More "C coders" care because they're micro-optimizing, but it's no more needed in C than Python.

Also you named Java as being on the easy side and that has four different integer sizes...


No ... not really.

Double doesn't behave like a whole number.

java only has a single int type, which is 32-bit regardless of machine architecture.


> Double doesn't behave like a whole number.

I was suggesting double for your partials, not your whole numbers.

> java only has a single int type, which is 32-bit regardless of machine architecture.

I'm so confused.

You said having a "list of what width each type is" is bad because it forces the user to deal with "minutiae".

But that's exactly what Java does. int is 32 bits, short is 16, long is 64

And then you praise a type in Python that does the same thing as "double" in C. It's usually 64 bits, but it might be something else.


the main numeric type is usize


That’s not correct, it’s i32. usize has a specific purpose: when you need a length of something in memory.


The entire idea of a "main numeric type" is sort of silly - know what data you're dealing with and pick the length accordingly!


Yeah, I only say "main" here because it's what you should pick if you're not sure.


Nulls in strongly typed languages can get rather weird but from a C/C++ perspective it is the same as 0. nullptr is just a correctly casted 0.


That's not correct.

In C, the literal "0" is a null pointer constant handled at the compilation stage, but casting a runtime zero is not specified to yield a null pointer (and the address zero can be perfectly valid and usable), nor are null pointers specified to be zero-valued (quite the opposite).


I disagree. nullptr is directly convertable to a boolean 0:

#include <stdio.h>

int main(int argc, char *argv[]) {

  printf("%d\n", nullptr == 0);

  return 0;
}


1. You're comparing a null pointer literal with a null pointer constant. You're literally comparing two things guaranteed to be equal. How, exactly, is it surprising that they do? Did you even attempt to understand what I wrote?

2. The specific implementation you have on hand could use zero-valued null pointers, I'm telling you what the standard doesn't say, confirmed by the C FAQ:

http://c-faq.com/null/varieties.html

http://c-faq.com/null/confusion4.html

http://c-faq.com/null/machexamp.html

3. The null pointer is not "directly convertible to a boolean 0", it's the literal 0 which expresses a null pointer: http://c-faq.com/null/ptrtest.html


I see what you are saying. I mean zero as null constant. I've never even seen anyone try to assign a null value with some integer zero at some memory address. Using the NULL constant in C/C++ is near effortless and 0 just works no matter what the underlying implementation is doing. So for all intents and purposes nullptr is just a casted NULL (0) constant.


This is only true in C, which is part of why C is less insane than C++.

The two languages should not be conflated anymore.


This is not even true in C. 0 is a "null pointer literal" when used in pointer context, this does not imply that the actual null pointer has a value of zero.


Ah yes you are right, an interesting point however,

    6.3.2.3 Pointers
    
       3. An integer constant expression with the value 0, or such an expression cast to type void *,
          is called a null pointer constant. If a null pointer constant is converted to a pointer type,
          the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer
          to any object or function.
    
       4. Conversion of a null pointer to another pointer type yields a null pointer of that type.
          Any two null pointers shall compare equal. 
Looking at the definition of NULL:

    7.17 Common definitions <stddef.h>
    
        3 The macros are
        
                NULL
       
          which expands to an implementation-defined null pointer constant;
Meaning that NULL is implementation defined, but is assured by the standard to compare equal to any 0-valued pointer of any type, even if it is itself another special value.


Saying “equal to any 0-valued pointer” there can be misleading. It is true of a pointer assigned from a (foo*)0 constant, but not true of a pointer to a hardware address 0 or a pointer with bits all 0 (assuming they exist).


Actually nullptr is a struct.



nullptr is actually "a prvalue of type std::nullptr_t" or something like that. Since C++11 NULL is the same as nullptr.


And it is directly convertible to a bool. The primary effectiveness seems to be with operator overloading where a NULL could trigger an integer method instead of a pointer method.


Yes that's the example given here. https://en.cppreference.com/w/cpp/language/nullptr So its an improvement but not exactly a game changer.


To the small extent to which "since C++11" is even relevant to "from a C/C++ perspective", even in C++11 NULL is still allowed to simply be defined as 0.


NULL in 'relational' databases in particular is a disaster. Or at least according to the notorious Fabian Pascal.

http://www.dbdebunk.com/2017/04/null-value-is-contradiction-...

Codd never proposed it in his original relational model. For good reason.


Nonsense.

First of all, Codd's early work didn't propose it because Codd's propositions were idyllic and rooted in mathematics and not software engineering. In Codd's world, you would never create a FirstName/GivenName and a LastName/Surname field. You'd just create a Name field. The fact that you might want to sort by last name or the fact that someone named "Alice Taylor Smith" has a surname of "Smith" while "Robert Taylor Smith" has a surname of "Taylor Smith" isn't relevant. In Codd's world, you'd just define Name as a Name type, and the Name type itself would know everything about the entire domain of names. It would be about as complicated as working with a DATETIME. The same would be true for an Address type. Such a type would support both Western and Japanese addresses, which are wildly different. Codd doesn't need NULL because he has complex types which perfectly and comprehensively represent their domains. That's not realistic. Parsing Names and Addresses for even a single culture is notoriously difficult.

Second of all, Codd's later major work, The Relational Model for Database Management, includes his Twelve Rules (numbered 0 to 12) for database design in order for a DBMS to be considered relational[0]. Rule number 3 is:

> Rule 3: Systematic treatment of null values:

> Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.

So Codd clearly thought that nulls were essential.

[0]: https://en.wikipedia.org/wiki/Codd%27s_12_rules


Disagree that it is a disaster. Nullability is explicit in the column type, so it doesn't have the "billion dollar mistake".

Furthermore you need to represent missing values somehow if you perform a left join.


I think it would be nice if `NOT NULL` was set by default on columns. However there are a lot of legitimate use cases that can't be (practically) solved by restructuring.

Data can be incomplete. Maybe only because it is not (yet) known. If NULL values were impossible it would create the need for one additional table with a foreign key relationship for every attribute that can be independently NULL. Sometimes this pattern is a good idea, but I don't think I can be convinced it should be the only possible solution.


>I think it would be nice if `NOT NULL` was set by default on columns

Then make sure to have explicit defaults?


"Explicit" defaults are still quite implicit. Usually what I want is an error if I fail to provide a required value, not any kind of silent default, null or otherwise.


null is often used in relational databases to be semantically equivalent to "info missing". I get that this use of null isn't ideal.

However there are quite a lot of cases where it is semantically equivalent to "not applicable"; while some of these cases can be avoided with a restructure, some can't, and in either case I don't see a compelling argument to do so.

e.g. a parent_id column that is null for top-level objects in a hierarchy. Restructuring the database to avoid this seems like moving to an unintuitive paradigm in pursuit of rule-following for its own sake.


How do you propose we represent the absence of data? At least in SQL "logically missing data" and "missing data in memory" are properly divorced, unlike many languages in which the latter is abused and conflated with the former.


I'm afraid current relational database have little common with the Codd's ideas.


Julia Missing Values

"Julia provides support for representing missing values in the statistical sense, that is for situations where no value is available for a variable in an observation, but a valid value theoretically exists. Missing values are represented via the missing object, which is the singleton instance of the type Missing. missing is equivalent to NULL in SQL and NA in R, and behaves like them in most situations."

https://docs.julialang.org/en/v1/manual/missing/index.html

+

"First-Class Statistical Missing Values Support in Julia 0.7"

https://julialang.org/blog/2018/06/missing


The nice thing about Julia is that it separates `nothing` from `missing`. nothing<:Nothing is a null that does not propagate, i.e. `1+nothing` is an error. It's an engineering null, providing a type of null where you don't want to silently continue if doing something bad. On the other hand, missing is propagates, so `1+missing` outputs a missing. This is a Data Scientist's null, where you want to calculate as much as possible and see what results are directly known given the data you have. The two are different concepts and when conflated it makes computing more difficult. By separating the two, Julia handles both domains quite elegantly.


NULL is certainly a mistake, but even more of a mistake is not allowing distinct states in variables.

NULL is just another case of a state of a variable. Other states are 1, 15, 0xffffffff, etc.

That mainstream languages don't handle this is the worst mistake of the computer industry.


Most languages do. Java will always initialise a non-initialised value to NULL for instance (EDIT or 0 or false for primitives).

It's simply a reality of how computers operate that when you allocate a piece of memory (a variable) it will have something in it that you'll need to clear or initialise.

In this respect, NULL is doing you a favour.


Only for field variables. Local variables most definitely have to be initialised by the programmer, otherwise it's a compiler error


I've been using kotlin and swift. They've partly removed null with the 'maybe' feature.

So instead of calling methods on null objects the methods are just not called if the object is null.

This helps when there's a race condition, and you attempt to call a method on a null object, and then that solves itself by the same code being called again without the race condition.

But a lot of the time if the object is null and the method is not called you still have an error, but it's just not a null pointer error now.

This 'nullless' code is nice in some places, especially with UI lifecycles calling code repeatedly, but other times it just changes the type of error you debug.


Actually in Kotlin, nullability is part of the type system and denoted with a ?. So String and String? are two different types. Dereferencing a nullable type is a compile error until you do the null check. Doing that triggers a smart cast to the non nulled type. So the inferred type becomes non nullable and you don't have to do any casts. You can force this cast by using the operator !!, which you should avoid for obvious reasons but is useful with some legacy code. If you get this wrong you still get an npe.

It also provides backward compatibility with java where java types are considered nullable by default unless other wise annotated with a @NonNull. Also you get nice warnings about redundant null checks.

This provides for a lot of extra compile time safety and it largely removes the need for Maybe, Optional, and other kludges that people have been coming up with to force programmers to replace null checks with empty checks.


Nah. There will always be missing values, no matter how many layers of safety measures we wrap around the fact. Hitting a NULL in C is very unforgiving; but that's just the spirit of C, there are plenty of ways to provide a less bumpy ride in higher level languages.

My own baby, Snigl [0], uses the type system to trap runaway missing values without wrapping. Which means that you get an error as soon as you pass a NULL, rather than way down the call stack when it's used.

https://gitlab.com/sifoo/snigl#types


I don't write business software that needs many 9s of uptime so I find null to be fine. Yes, returning null is kind of throwing your hands up but I find that to be kind of the point. It allows my software to fail fast if it does and makes it incredibly obvious where things are going wrong. Generally if a reference has a value of null where it shouldn't, I can pinpoint the location of the bug within a few minutes or even seconds.

IMO it makes the program much easier to reason about compared to returning some sort of empty value and then failing much much later in the program.


> it means that C-strings cannot be used for ASCII or extended ASCII. Instead, they can only be used for the unusual ASCIIZ.

This is a very pedantic quibble, and I'm not even sure it's correct. ASCII has NUL as well, and ASCIIZ isn't a character set AFAIK.

> C++ NULL boost::optional, from Boost.Optional

First of all, nullptr, second, std::optional.

> Objective C nil, Nil, NULL, NSNull Maybe, from SVMaybe

Nil is not a thing in Objective-C, to my knowledge.

> Swift Optional

You're looking for nil. So it should be four stars?

> Swift’s UnsafePointer must be used with unsafeUnwrap or !

You're confusing Optional and UnsafePointer.


What the comment about ASCII means is that NUL is a valid character in an ASCII string, but it can't be represented in C's null-terminated string encoding as the format (sometimes called ASCIIZ, but yeah: not an encoding... but I mean, come on... the article is clear here) terminates at the first NUL.

Also, Nil is absolutely a thing in Objective-C: it is a null pointer of type Class (whereas nil is a null pointer of type id; you should avoid mixing them up, though I will admit nothing much bad will happen as Class and id are generally co-polymorphic due to the type system being kind of lame. I am not sure they always have to be, though).

(And as someone who has been programming in C++ since before it was standardized at all, I frankly think listing NULL and boost::optional is totally acceptable and complaining about it as if C++11 is more canonical is just being annoying.)

Doing a quick search for how nil works in Swift, it apparently isn't a null pointer, so you are wrong there as well :(.

> nil means "no value" but is completely distinct in every other sense from Objective-C's nil.

> It is assignable only to optional variables. It works with both literals and structs (i.e. it works with stack-based items, not just heap-based items).

> Non-optional variables cannot be assigned nil even if they're classes (i.e. they live on the heap).

> So it's explicitly not a NULL pointer and not similar to one. It shares the name because it is intended to be used for the same semantic reason.

Given that I don't think any of the rest of your comment was legitimate criticism, I am frankly betting that your comment about UnsafePointer is also not useful, but I am kind of tired of having to analyze this comment at this point (I stepped in due to the note about character sets and the floor kept sinking).


> What the comment about ASCII means is that NUL is a valid character in an ASCII string, but it can't be represented in C's null-terminated string encoding as the format (sometimes called ASCIIZ, but yeah: not an encoding... but I mean, come on... the article is clear here) terminates at the first NUL.

I think ASCIIZ is the more common format, so I replied with all that in response to ASCIIZ being called "unusual". Most popular languages that actually allow NUL bytes in strings usually tend to support some encoding of Unicode anyways…

> Also, Nil is absolutely a thing in Objective-C: it is a null pointer of type Class

You got me…in my defense, I didn't know about this to my knowledge. It was still stupid of me to assume that DuckDuckGo would be case-sensitive when I searched that. I guess I should use this for Classes now instead of nil.

> I frankly think listing NULL and boost::optional is totally acceptable and complaining about it as if C++11 is more canonical is just being annoying.

One ships with C++, one doesn't; that's like saying Joda-Time is the canonical date library for Java instead of java.time. Although, I should probably ask you if you consider Joda-Time to be "more canonical" before listing it as an example…

> Doing a quick search for how nil works in Swift, it apparently isn't a null pointer, so you are wrong there as well :(. > I am frankly betting that your comment about UnsafePointer is also not useful

I should have been more explicit, since these kind of run together when you bring pointers into the mix, which I guess I should have realized once I read the footnote. I'm not really satisfied with the explanation given in the article, nor your rebuttal of my argument. Swift's nil is overloaded in a sense: for native Swift structures, it's the whole "Optional-as-an-enum" abstraction that we know about. For class types, and pointers, it's a bit more complicated: you just cannot assign nil to a UnsafePointer or a SomeClass unless it's "Optional", but the "Optional-ness" is completely in the type system and under the hood, in order to facilitate interoperatability with C, Objective-C et al. you need to actually have the type of the size be sizeof(void *), have zeroes in it, etc. You cannot actually set either of these types to nil if they are non-Optional unless you do illegal things. So when you set a "pointer" (being an Optional<SomeClass>, UnsafePointer?) here to nil, you are literally shoving a nil it in, which also happens to work well with Swift's type system and Optional.none abstraction because there is no way to subvert it legally. All of this was basically a long-winded way of saying that yes, Swift's nil is actually NULL, but the type system makes sure that you don't get a "bare NULL" which lets you pretend like the Optional enumeration abstraction works but under the hood, and semantically, it's the same thing as NULL.


> ASCII has NUL as well

Yes, this is why C-strings can't be used for ASCII, because C-strings can't contain NUL as a character.

> You're looking for nil. So it should be four stars?

nil isn't a null by the definition this article is using, because it's not a subtype of every reference type.


This tidbit gets a ton of mileage but I think it's overrated. There are a lot of unsafe shortcuts we take to get better ergonomics and NULL is one of them.

I think it's a bit unlikely we'll fully get rid of null, but we can get rid of some of the pitfalls. TypeScript for example pretty much fixes the problem, by enforcing you check for null when needed, though TypeScript takes a handful of other soundness shortcuts. Go makes null less harmful by treating nil pointers like empty values by convention.


It's the worst mistake because it made you believe that its atrocious ergonomics are actually superior to more sensible solutions. Implicit nullability doesn't really save you any null checks. It just makes it possible to forget necessary checks.

It was fine to design a language with nullable pointers in the 70s. It's unacceptable nowadays. nil in Go is a major mistake.


Okay. So let's say we get rid of nil in Go. Now, structs with pointers have no zero value. Slices and maps have no zero value. Funcs have no zero value. Reflect can no longer create objects because it can't possibly enforce that you initialize the pointers. Functions that return either an error or a value now need a new pattern, probably requiring generics or another special type. Map access needs to return this special type.

Did we win? Did that make Go better? Fuck no. Most people aren't frequently hitting nil pointer errors in Go because unlike C the behavior is a lot more reasonable and the conventions a lot simpler. And by the way, we didn't fix all the runtime errors. Nil pointers are just one possible runtime error. How about out of bounds array access, memory exhaustion, race conditions?

And yeah, I get that you can also fix all of those things, which is then called Rust. But we don't need another Rust, Rust is a fine Rust. Go has, imo, much better ergonomics and most of the time it's just fine for what I'm doing. Like, writing small to medium size servers and utilities in Go has rarely been a regretful experience. And, even if we had no runtime errors we would still need unit testing to ensure our components are functioning correctly. So, most of the time I'm aware of when my code has runtime errors anyways.

Getting rid of null is not magic. It does not get rid of all runtime errors. And yes, it does impact ergonomics. I will take Go zero values at the cost of nil pointers, every day.


> Now, structs with pointers have no zero value.

A zero value is much better than undefined value, I'll grant you that. I prefer the forced initialization approach (Haskell, presumably Rust and many others). If I add a new field, I want to know where I need to populate it. Or if you must, maybe a default value defined on the struct (perhaps that's also "considered harmful" for reasons I can't think of at the moment).

But it seems you prefer the ergonomics of default-zero. I don't get it, but I can't argue with preference.


Easy: default zero is simple. It's predictable behavior. It's consistent.

By convention, you should design your code to also treat zero values as empty. In Go, the zero value of bytes.Buffer is a ready to use, empty buffer.

If you drop default zero, you lose a lot of convenience and gain a lot of ceremony. It's not the end of the world, but neither is the null pointer error. It's just another runtime error. Just like divide by zero.


> Implicit nullability doesn't really save you any null checks.

It sort of does, because explicit nullability forces you to do many redundant null checks when you actually knew that something could not possibly be null.


If you know something can't possibly be null, then store it as a non-nullable type as soon as you know that.


> as soon

that would need dynamically typed data...


I totally disagree - NULL gives much worse ergonomics all-around. While modifying code, I'm constantly afraid of whether the value I'm accessing could be NULL. Most SQL schemas are filled with "NOT NULL" to the point of ridiculousness, and most Java methods that I've seen tend to have @NotNull used everywhere too. Not having NULL gives you a lot of confidence when reading and writing code, by guaranteeing that your object does indeed exist.


I would recommend looking Haskell's Maybe and Rust's Option type to get a better idea of how this can be solved -- and how this article isn't really overrated (just commonly misunderstood).

They allow for explicit NULL-ness (which is a necessary concept) without falling into the trap of making everything implicitly possibly NULL. And when NULL-ness is explicit you are then forced to explicitly handle it in order for your program to compile (in the case of Haskell and Rust).


I dunno why everyone's assuming I don't know about the common solutions to the problem. TypeScript does explicit nullness without needing monads, and I actually mentioned that one. I still disagree that this is not overrated.

Go's idea of nil, for example, seems OK to me, and the language would need to be way more complex to fix it. For example, it would need a type system with explicit nullness, or maybe even actual generics. But it mostly doesn't matter because doing things with nil doesn't crash nearly as much in Go. Like a nil slice just acts like an empty slice. You can even append to nil and it returns a non nil slice. You can call methods on nil. Etc.

The trouble with getting rid of nil to me is that it requires you to either have values at all times, or deal with the possibility that you don't at all times. Go has the very very nice property that you can initialize any type to a zero value and it should work as an "empty" object. Pointers without nullability don't have a zero value. Fixing nulls at the cost of getting rid of Go's properties for zero values would not be worth it.


Don't get me wrong, Go does nil a lot better than some other languages (being able to call methods on a nil is sometimes a good thing depending on how your methods handle it -- most methods don't handle it well at all). The fact that even most map operations (access and deletion) also "just work" is really useful.

But I think you're over-selling the zero values feature of Go. It is very rare to see third-party libraries that have zero values which do anything but cause a NPE when you try to use them -- mainly because they embed pointers and then you have the same implicit NULL-ness that causes NPEs everywhere. It is great that the core language managed to get zero values right in most cases, but it's far from being as wide-spread as you might hope.

Also (nil interface != nil pointer that fulfills interface) is a very common mistake I see in Go code, and while it's not necessarily related to the existence of a nil value it is still related to the general concept of nil in Go.

[And on the TypeScript comment -- you don't really need monads for Option<> or Maybe types. You just need algebraic types -- and TypeScript has those. Haskell does use Monads for Maybe, but that's because Haskell has many other type-theory things that make it necessary to support using Maybe as a monad.]


I'm not overselling zero values in Go. Simply try to envision the cascading consequences on the language if you removed the zero values; no existing Go code would work, and I think the language would need to shift so much that even hello world couldn't be automatically translated to such a language. All to prevent a single type of runtime error among many, one that Go developers are not complaining about the way that Java, JavaScript, C developers have.


Nobody is arguing that Go should have algebraic types and ditch zero values, so I don't know why you're harping on this point. Now -- it would be somewhat nice because errors would be much more reasonable to handle (the new "check" proposal is okay but still quite flawed) but you're right that it would either be far too complicated or old code wouldn't work anymore. Go has already made it's bed when it comes to nil values, but that doesn't mean that all new languages should follow suit -- because Go's nil handling isn't all sunshine and roses (nil interfaces -- for obvious reasons -- cause NPEs).

As I've said, Go does nil basically as well as you can without having algebraic types. But given the semi-anecdotal evidence that I've definitely seen my fair share of NPEs in production Go code in the past 5 years, it's clear that it's not sufficient.


I've had I believe a single NPE in Go in production for three years. It's a blip on the radar. If you are testing your code, how do you even get an NPE in production? Seriously, it should be rare. I've actually had more trouble with channels than pointers in Go.

Also, if the situation is really so bad.... How come no one cares? I saw no Go 2 proposal to fix this situation, just generics, less boilerplate for error handling. But nothing about nil pointers. It's not at the top of anyone's list. Generally, people feel comfortable with nil in Go in ways they don't in JS and C.

I will take that for fact even though certainly you disagree. So why do people feel more comfortable with nil in Go? Because nil is not an error.

In C and JS, there are cases where nil is treated as an error. For example, getElementById returns null when it finds nothing. If you designed this API in Go, you'd instead return nil, PLUS an error (or Perhaps, a bool, if no other possible errors exist.) You can argue semantics but in Go it's generally held that if you aren't returning what the user wanted you should return an error. Exceptions to that exist but probably mostly just string manipulation functions.

This convention is so strong, though, that it nearly eliminates nil pointer errors caused by edge cases. Most nil pointer errors you DO hit in Go are:

1. Set on nil map 2. Send on nil channel 3. Failing to check error

The thing is, if you are writing production code, and writing tests for your production code, this shouldn't even make it to your code repository. It's virtually a non issue.

Even better; in C and C++, when you hit a null pointer, you don't get an NPE. You get a segfault. Everything dies. Kaput. Go is obviously not alone in not having this issue, but it's surely worth noting that it doesn't.

Go is not going to die some day because it's "not safe enough" - it's absolutely safe enough to write reliable code. The difference is, writing reliable code in Go is easy because for the most part, all you have to do is follow conventions and unit test. Same cannot be said about C and JS where you will inevitably get blindsided by sharp edges.

If you are in a situation where you can't have runtime errors, it's a poor fit. I don't know personally any developers that are in this situation. If your code doesn't put lives on the line, its OK to have a runtime error. Most of your real outages will be due to other bugs, and probably more of then will be due to other people's bugs, natural disasters, human operator error, bad configuration pushes, etc.

If you really think Go NPEs are even a significant portion of Go reliability issues I'm gonna need more evidence.


> If you really think Go NPEs are even a significant portion of Go reliability issues I'm gonna need more evidence.

Don't get me wrong, there are many other Go reliability issues. I just don't agree with hand-waving it away by saying that "you should have tested for it". I'm very in favour of testing (umoci is arguably the most rigorously-tested container image format tool, and it's written in Go) but I think that tests shouldn't be used as the solution for safety problems. The logical conclusion of such a view is that any language can be reasonably safe if you have enough tests -- and while this is true (just look at SQLite) it's hardly practical to replicate the degree of testing that SQLite does for every C project. In order for everyone to benefit, safety needs to be built in (and Go does have a lot of safety built in).

> Go is not going to die some day because it's "not safe enough"

You keep refuting claims I never made...


The very first post I made in this thread was that I believe that this claim, that null is the biggest mistake in computer science, is way overblown.

To this point, I bring up Go because Go is an example of a language where NPEs are basically a non-issue. There's not much to say there, if we still disagree on this point I'm not getting anywhere and I'm just going to give up.

Following that logic, though, it feels inappropriate that null pointers have this ridiculous stigma compared to other runtime errors. Would anyone care about Rust if it's only promise were to get rid of null pointer errors? I'd argue if rust still had NPEs but effectively solved concurrency issues it would be exactly as popular today.

I'm not claiming there's no value in alternatives to null. I am absolutely disputing the idea that null is the biggest mistake in computer science. Full stop, absolutely unconvinced. I can think of a lot of things I'd consider much worse.

I still prefer more runtime safety over less, but there is a balance to be had too.


But nobody's talking about "getting rid of nil" wholesale. They're talking about getting rid of null/nil reference exceptions via the mechanisms of non-nullability (having the option to define some variables as non-nullable) and null guards (a compiler which forces a null check before any operation which requires a non-null value). This way you still have null, but with compile-time guarantees that you'll never get null when you didn't expect it.


That's the same as the TypeScript approach, but its never going to happen in Go because of the relatively small gain for a massive jump in complexity in the compiler.


Author forgot about Erlang, and it's wonderful lack of NULL


So how does NaN in Python and NA in R relate to NULL? I know Python has the None type, but it's not the same as NaN. One of the most annoying things in Numpy is that there is no way to indicate that an integer value is "missing", similar to NaN for floats. In R both integers and strings can be NA (if I remember correctly). So for numeric types at least, there is definitely the need to somehow indicate that a value is "missing".


Numpy has masked arrays [1]. Though I can't say how well they work.

[1] https://docs.scipy.org/doc/numpy/reference/maskedarray.gener...


NaN is not the same as None/null, and using it that way is asking for bugs.

None/null is for missing/unknown values (which may be a reasonable thing in your domain), while NaN is for the result of an illegal/undefined operation (which is definitely a bug, such as division by 0).


Linux does well with some macros: IS_ERR, IS_ERR_VALUE, PTR_ERR, ERR_PTR, PTR_ERR_OR_ZERO, ERR_CAST, IS_ERR_OR_NULL

https://elixir.bootlin.com/linux/latest/source/include/linux...

No, it doesn't totally replace NULL, but it does solve some of the problems in a high-performance way.


Honestly, I feel that the problem isn't null but that type systems (at least earlier on) tended to allow other types to be null, willy-nilly. Null is best considered a separate type to non-null values, and is basically not a problem if the type system handles that in some way. Be it option or union types - both solve it and it mostly stops being an issue.


Checking if the optional is present is very similar to checking for NULL values. Now if you have a nice match statement like rust and lambda functions for streams, that may make things a bit more readable.

You will still need analysis tools to check that all code paths check for none before accessing that value ..


Do people generally agree that Java Optional == Scala Maybe / Haskell's Maybe?

Java's Optional seems fundamentally flawed in that Java allows any reference to hold a null value and Optional can still throw a NPE when calling isPresent on it so it still gives people a footgun.



Yes I was going to mention that video in which Rich makes an important point in my opinion: database tables, or objects, or structs, still live within the Place Oriented Programming (PLOP) mindset. That mindset was born in a time where disk and RAM were the expensive resource therefore update-in-place was the default. I insist on the "in-place": you need to know where something is so you can update it. The downside of PLOP is that if you have no value for one of the slot in your generic form (be it a table, object or struct) then what can you put there?

The alternative is to use data shapes that do not require something to be in a certain place. Hence the use of maps as the most basic data shape: you either have an entry in it, or you don't, no need to have a null entry. Expanding that thinking to databases, and you realise tables is not the right aggregate, instead you need to go one level down to something that datomic calls datom, or RDF calls a fact.

To summarise, PLOP forms that package together a set of slots to be filled magnify the issue of NULL/null/nil. Instead make the slot your primary unit of composition and make sure you use aggregation of slots that does not force you to have slots filled with a null value when there is no value in the first place.


Yes, 'maybe not' is very relevant to this discussion, but few people seem to agree with my understanding of what he says about the right solution:

Optionality doesn't fit in the type system / schema, because it's context dependent. For some functions, one subset of the data is needed, for others a different subset. Trying to mash it into the type system / schema is just fundamentally misguided.


Yes, he's rather explicit in saying Maybe is a poor tool. I'll have to watch the talk a second time to be sure, but I'm not sure he proposes any solution at the level of type systems. Not using Maybe or using Union is not what he is advocating. For him (and me too) types are the wrong thing to put data in because, among other things, it forces you back into PLOP. His point is to remove entirely the need to fill slots with nothing. Obviously the talk is more about specs than types. While tactfully avoiding the debate around types, he's still starting the talk with types to help those that are only there to decomplect their thinking.


That's exactly what he says. People might disagree that this is the right approach, but I'm not sure what other ways anyone could interpret that talk.


> Java's Optional seems fundamentally flawed in that Java allows any reference to hold a null value and Optional can still throw a NPE when calling isPresent on it so it still gives people a footgun.

The existence of Optional makes it possible to migrate away from using "null". E.g. Map#get could be replaced with a method that returns Optional.

Calling isPresent is the wrong way to use an Optional. But yes, it's an ordinary Java value, it can be null as long as Java-the-language permits null. That's not a problem with Optional, it's a problem with Java, and introducing Optional as a plain old normal Java class is the first step to fixing it.


Conflating NULL with NUL terminated strings seems a stretch... both have problems, but separate problems. I suppose they're both related in that they provide a "special" value rather than separating that information though.


NULL is a convenient way to map singularities in your model of the problem.

I have on a few occasions tried to write NULL-less code, and it adds a good bit of work.

- model all possible states for a value

- determine appropriate default actions for all types

- meaningful place-holder values

It's a good exercise, and I think more code should be written this way, but - as an Engineer I'm trying to model just enough of the problem to solve it. I'm not trying to simulate every possible outcome in that domain.

Certain corners of your problem simply don't need to be modeled, and what's more the effort needed to model them can just be too much.

NULL is a great way to just throw up your hands and go "I don't know and I don't care". Much as when modelling a physical system singularities typically represent phenomena that the model doesn't take account of, so it goes with NULL. It simply says "Don't Go There".


Have you ever dealt with the Maybe(Haskell)/Option(F#) types? If not, then you don't understand what's wrong with NULL and how to easily avoid it without much work.


I find Maybe a bad idea. It forces me to write denormalized code when I know that something is not NULL. It's not possible to specify this knowledge as a data structure since data structures are static but context is dynamic. I much prefer the simple NULL sentinel that blows up like an assertion when I made a mistake. That said, there's not very often a need for NULL at all if you structure the code correctly.


If you know something can't be null, then don't use an option. Simple as that. For example, a SQL library can return a non-nullable column of String as just a String, not an Option[String]. Thus, you actually get a solid distinction that you don't get with null pointers.

There's no reason to include sentinels that will randomly blow up your program.


No. The point is that the data structure can't know if there's a NULL since the data structure is static. Context is dynamic. Code is dynamic as well, and it can know that some things must exist based on other dynamic conditions.

So this "solid" distinction often is just noise and actually blurs the intention of the programmer: An explicit unwrap is required syntactically while it should not be required semantically because really the option data is not an option but a requirement in certain contexts.


If it is a requirement for something not to be null, unwrap the option before you send to it to the part of the program that can't accept nulls, and deal with the case of None in a sane way and in a predetermined place. Then you don't have to worry about unwrapping in the rest of the code. You can escape from Option. It's not like IO. You just have to check for None if you want to get something out, as you should.

In this fashion, you have type safety everywhere, and you deal with the case of a missing value in a predictible way, in a single spot.


> I much prefer the simple NULL sentinel that blows up like an assertion when I made a mistake.

Haskell, for instance, has the 'fromJust :: Maybe A -> A' function that allows you to do just that. It unpacks the Maybe typed value and throws a runtime error if it fails.


Yes. Most Haskellers will sneer at it, while personally I think it's the right thing to do because it conveys the programmer's ideas about invariants. But syntactically an explicit unwrapping function is still a lot of noise. Simple null pointers as we have in C, with an unmapped segment at address zero so that it throws a segmentation fault, are much better.


Replying to your lower comment (the coffee has kicked in):

The situation you describe is one where a null really is an unrecoverable error, and the program should terminate. That is the one case where it makes sense to just let a NPE happen.

However, the vast majority of time, a null is just an absence of value, and does not signify an unrecoverable error. Those are the kind of situations that an Option/Maybe helps with, since it doesn't let you forget to handle the null case.

Even if a null value returned from a function is abnormal, and the program shouldn't continue, an Option is still going to be better most of the time. After all, you probably have connections and stuff you want to cleanly terminate before shutting the program down.


I haven't drank all my coffee yet this morning, but are you saying that throwing a segfault can be a good thing?

Either you unwrap the Option, or you have to remember to do an manual null check. The second option is more verbose.


> but are you saying that throwing a segfault can be a good thing?

Sure, what's bad about it? A logic bug was detected, so the program should be terminated. Or how do you intend to continue?

Segfault is not so different from what happens if you do "fromJust Nothing" in Haskell or get a NullPointerException in Java. You can even write a handler for the segfault, but I guess that's rarely a good idea.


> Sure, what's bad about it? A logic bug was detected, so the program should be terminated. Or how do you intend to continue?

I intend to not have the logic bug in the first place, by encoding my invariants in the type system.

If you "know" that the value is present rather than absent, you must have a reason for knowing it, so explain that reason to the compiler. E.g. maybe you took the first element of a list value that you know is non-empty - so maybe you need to change the type of that value to a non-empty list type. That way the compiler can check your reasoning for you, and will catch the cases where you thought you "knew" but were actually wrong.


> by encoding my invariants in the type system

the way I program that is nothing but a pipe dream.

> If you "know" that the value is present rather than absent, you must have a reason for knowing it, so explain that reason to the compiler.

I might know that it exists for example because it is computed in a post-processing step after a first stage but before a second stage. So it exists in the second stage but not in the first. Relying on global data (which I won't give up) makes it practically impossible to encode that the data is not there in the first stage.

And that's not a problem at all. I simply don't access that data table in the first stage... Trying to explain my processing strategy to a compiler would amount to headaches and no benefits.


> I might know that it exists for example because it is computed in a post-processing step after a first stage but before a second stage. So it exists in the second stage but not in the first.

So the first stage could create a handle to it, or even just a phantom "witness" that you treat as proof that the value is present.

> And that's not a problem at all. I simply don't access that data table in the first stage... Trying to explain my processing strategy to a compiler would amount to headaches and no benefits.

Shrug. I found that errors would make it into production, because human vigilance is always fallible. And the level of testing that I needed to adopt to catch errors was a lot more effort than using a type system.


Accesses to unallocated global data is the type of errors that you typically hit on the first test run. Another example would be function pointers loaded from DLLs.

I don't think type systems help all that much. Type + instead of -, and you're out of luck.


> Accesses to unallocated global data is the type of errors that you typically hit on the first test run.

Depends what conditions cause it; the hard part is being sure that every possible code path through the first stage will initialise the data, even the rare ones like cases where some things time out but not others.

> I don't think type systems help all that much. Type + instead of -, and you're out of luck.

Not my experience at all - what do you mean? If you declare a type as covariant instead of contravariant or vice versa, you'll almost certainly get errors when you come to use it.


1) Pretty easy to guarantee if main looks like stage1(); stage2(); stage3(); etc.

2) Change a plus for a minus and it is still an int.


> 1) Pretty easy to guarantee if main looks like stage1(); stage2(); stage3(); etc.

You can only use the global program order once, I'd rather save it to spend on something more important. val result1 = stage1(); val result2 = stage2(result1); ... means my code dependencies reflect my data dependencies and I'll never get confused about what needs what or comes before or after what (or the compiler will catch me if I do), so I can refactor fearlessly.

> 2) Change a plus for a minus and it is still an int.

True. If you get your core business logic wrong then types won't help you with that (though FWIW I'd argue that it's worth having a distinct type for natural numbers, in which case - and + return different types). But I found that at least 80% of production bugs weren't errors in core business logic but rather "silly little things": nulls, uncaught exceptions, transposed fields... and types catch those more cheaply cheaper than any alternative I've seen.


> I much prefer the simple NULL sentinel that blows up like an assertion when I made a mistake

Are you nuts? I prefer the compiler gives me an error instead of blowing up in production.


I see these as a more sophisticated way of dealing with NULL. It allows me to define alternative default behaviour beyond just throwing an undeclared exception.

They are still NULLs however under the hood and I still need to do the work of defining what I want to happen when they occur. It's just neater.


They're not null. You might use them to represent the same thing that you use null to represent, but the type system won't let you use them in expressions that aren't explicitly built to accept them.


You can conceptualize the Maybe based on the NULL, but there's no point in the compilation or runtime in which they actually become NULLs, it's just a regular container value.


Which is the same as if I go and ensure a default "noop" value is assigned ... it's logically a NULL. I don't know anything about it except that it hasn't been assigned.


It's not a NULL. A NULL is a value that is considered by the type system to be a valid instance of a given type, except it doesn't actually fulfill the type's contract. A Maybe is a completely different type, much like a list or a map or a tree.


Well at this stage you're just being narrow minded.

I think if you read back over this thread it should be clear what I mean.

I proposed that NULL has a purpose. You proposed that Option obviates this. I stress that it's just a neater way to manage the conditions you don't need to model. You point out that in terms of implementation it's different, where I explain that still, logically it's the same thing.

Of course NULL has a very specific meaning in the structure of the language, and when you start using things like Options it makes managing NULL easier, but it's a rose by another name, gift-wrapped, and bundled with some plant food.

Logically however, at the point where you're modelling your problem it's the same.


> I have on a few occasions tried to write NULL-less code, and it adds a good bit of work.

Really depends on the language. On Java / C#, it's hard to avoid null-checking a lot of stuff. In C++ you will never encounter a null std::string outside of code written by indian students learning on Turbo C++ 3.5. So you don't need to check for anything: if it's a value, it exists.


Strings are an easy case. Default initialisation to "" (empty string). You'll still encounter NULL when modelling other domains however.


From my practical experience, Null has a use, but is over-used or misused. If you concatenate a Null string to other non-Null strings, the Null should usually be treated like a zero-length string instead of nullifying the ENTIRE expression result. I know this differs from how numbers typically are treated, but so be it. Strings are not numbers.

Without that behavior, one often has to write verbose statements such as: denull(stringA,"") || denull(stringB,"") || denull(stringC,"") || denull(stringD,"") etc. ("denull" name varies per vendor. "||" is concatenate here.)


The problem is in tooling. If all compilers/builders out there could detech null for us, those kinds of error could be taken care with much more ease.


TypeScript can be easily configured to do this[0], and Kotlin always does this[1]. The future is now!

[0]: https://www.typescriptlang.org/docs/handbook/release-notes/t... [1]: https://kotlinlang.org/docs/reference/null-safety.html


In VS with resharper I get "Possible null exception" as warning, but null exception is runtime thing and not compile time, so I don't see how it is tooling problem. Compiler does not know your user did not fill in some field in form.


A lot of static analysis tools will do this.

I can't speak for other IDEs, but Intellij will always warn me if I'm exposing myself to some NULL operations.

I do think better first class language support is the way to go though.


A higher level programming language is a kind of tooling. Use a language that doesn't have NULL and problem solved.


in C NULL is just 0. a nullptr in c++ is just a pointer which points to 0. so it's not an undefined value... it's set to 0 on purpose so you can check it.

consider this: char ptr; ptr = (char )0xb8000;

before assigning ptr, ptr can be ANY value from 'random memory'. (compiler trickery aside.. because it might initialise it to 0 anyway...)

so you want to have: char ptr = NULL; ptr = (char )0xb8000;

So you can then do IF(ptr != NULL) { do_stuff(); } you could not check for validity of the ptr value or it being present otherwise. an if(ptr) or if(!ptr) would only work if it's initialised and reset to NULL each time before assignment, so you can validate the assignment.

This is not mistake but a tool.

for a hardcoded offset like this it might be fair to say you could do if(ptr == 0xb8000) {do_stuff()} but what if it was a ptr returned by a new allocation or so? Or by taking the address of another variable or object? In that case setting things to null and checking them is absolutely essential to assuring your code works like you intended it to.

this whole article seems just a bunch of nonsense. for some languages it might hold true, but i can't beleive it would do for any. perhaps this original algol null... who knows.


Proper typesafe systems wouldn't let you use C-style reinterpretation casts either.

It's quite instructive to see how the low-level Rust people handle this.


So if you have memory-mapped IO, how would you write to a specific address? In C/C++, a reinterpret cast is exactly what you need there. What would you use in Rust?


There are various alternatives, but generally the approach is to know what address you need in advance and create an object for it at compile time that can be accessed type-safely.

e.g. https://zinc.rs/apidocs/ioreg/index.html

or the longer but more detailed http://blog.japaric.io/brave-new-io/ , which covers various approaches. It even points out that you can use the type system to enforce that peripherals are only accessed from multiple threads or parts of the program in a safe manner, which you can't do if you can just reinterpret-cast into anything.


Option monad fixes null reference problems. Does not fix the ambiguity between not found vs doesn't exist.


If you have multiple possible reasons for a value to be "absent", that sounds like a case for an "either" or "result" type.


Why can't people just get over it and stop blaming the language for their own sloppy code?


Because everybody is working on the limit of complexity they can comprehend, so there is no headspace left for dealing with bad ergonomics. We need all the help we can get.


> everybody is working on the limit of complexity they can comprehend

That's the problem right there.


Because languages are tools, and tools should help you do your job, not hinder you.


Why can't people just get over it and stop blaming users for the tools failing them?


Well, apparently it doesn't work that way.


I don't care. java.lang.Optional is a PITA in a way null never is.


Whether null/NULL is a good a idea or not (I like it, just yesterday it saved my ass) it saddens me that more and more articles I stumble upon are made to criticize technologies but not to talk about solutions or innovations


The article talks about a solution, suggesting that optionals are a much better way to handle this.




Also...

https://news.ycombinator.com/item?id=10148972 (150 points | Aug 31, 2015 | 143 comments)


just about nothing is as popular as null is.


The problem is not the NULL but the lack of types.


No. Undefined behaviour is.


Thanks for the article! I've often heard that null is bad, but haven't ever seen such a thorough, readable explanation.

Just so I can think it fully through for myself, it seems that the problems with null are:

1. Its semantics are different from whatever type it is substituted for, so can't be used as a value

2. Superficially, it looks identical to a missing record value. This difference might be something you want to ignore (isNullOrEmpty), or something you care about (cache miss or hit with null)

3. It is used both for missing data, and missing functionality, which confuses two separate systems.

I agree that null as a type generally works better than null as a value, but I don't know if you can always articulate it as a type, especially in dynamic languages. A pragmatic solution seems to be a combination of:

- A Maybe type or monad. This forces you to unpack the nullable semantics of the thing, either in the type system or by unwrapping the value. A Maybe monad is a well designed interface for dealing with the edge cases, but it doesn't make the edge cases go away. This eliminates problem #1, and manages problem #2.

- Nil punning. (concat nil nil) yields an empty list in clojure. Same for +, string/join, etc. This is really similar to Monads/Types, but switches the responsibility for handling null intelligently from the data structure to the standard library. Putting null in the type forces you to opt in to null; nil punning forces you to opt out. This makes for more terse code, which is nice, but probably has a slightly narrower scope of application than monads, since it tackles problem #1 by making it make sense in most cases rather than eliminating it entirely, and nil punning doesn't always make sense. Incidentally, this seems closest to PHP's and javascript's strategy; their real problem is that they extend nil punning to cases where nil isn't involved (1 + '1' anyone?).

- Key or attribute errors. This is sort of a fallback to compensate for failing to handle the null case, but often works well when something just "shouldn't be null". This is probably just a substitute for a lack of compiler checks, but works well enough in the python world; sometimes failing hard is the right thing.

- Distinction between code and data. I like higher-order functions, so I'll just say that "sometimes data includes functions". But in most cases, the function you're calling should be resolved at compile time. Interfaces should be fully implemented, and (as in python), there should be a distinction between missing functionality (AttributeError) and missing data (KeyError).

Ultimately, it seems to be a question of language/api/user interface design: there is a difference between present, present and empty, and absent. Regardless of what strategy you use to manage the difference, there has to be one.


If you have self-discipline and aren't lazy, NULL is totally fine. Maybe language designers could have made it behaving more like a 0-dimensional sentinel for beginners.


I wonder whether the author also hates the 0 and 1 elements of natural numbers. Since they have the same flaw of having weird, special semantics that all other other numbers don't share. In fact 0 is not even a number, but a placeholder for the concept of the absence of a number. Just like NULL.


That's exactly the point the author isn't making - which is that there should be an explicit distinction between Exists But Holds No Value and Does Not Exist.

0 is neither. It's clearly a number which can be used in arithmetic and defines a specific integer quantity. (Unless you think 0 is identical to NaN...)

The problem is that in some languages it's used for either or both of the above logical definitions, when it shouldn't be used for either.


Zero's behavior is totally consistent with the other numbers, though - it doesn't break associativity, commutativity, or any of the other stuff you'd expect. On the other hand, NULL takes every type I've ever written and adds an instance whose behavior with every function is, at best, to crash my program, and at worst, completely undefined. Its behavior is not at all consistent with the other instances.


0 breaks division, though...


No, 0 is definitely a number.


I can assure you that the number 0 in 0℃ is a number and not the absence of temperature.

But other than that, excellent point about the wonkiness of special values.


> the number 0 in 0℃ is a number

If we're going to get pedantic ...

0 in this case is an offset. The number is what's behind it: The point at which water freezes (aka 273 °K).

In this case 0 indicates the absence of any offset.


Psst, in 1968, we renamed "degrees Kelvin" to "kelvins", and "°K" to "K", to make it clear that it was an absolute and not relative scale.


I wonder why this gets downvoted - seems pretty reasonable to me...


I imagine because the post is not pedantic but is just wrong. If you want to be pedantic and correct, the symbol 0 represents a real number, an integer, and other numbers. The symbol 0 is not a cardinal (counting) number. Some people consider the natural numbers to be either the cardinal numbers or the whole numbers which include 0. So some people consider 0 to not be a natural number.

As a mathematical entity it doesn't represent the absence of anything. It is just a symbol that has certain properties associated with it. There isn't a hole in the real number line where 0 should be: there is a number there.

It isn't pedantic to insist that 0 isn't a number, it is equivocal to do so. In most contexts it does not need to be treated like a special value. Temperature measured in degrees is an example where you don't need to treat 0 specially, at least not more than other values...


It's a rather odd argument and it falls apart as soon as you ask about 0 Kelvin.


> it falls apart as soon as you ask about 0 Kelvin

Huh? The whole point of Kelvins is that their zero point is an actual 0 value. That's why Kelvins are a unit and °C aren't.

Q: An object's temperature is 20°C. The object's temperature increases by 10%. What is the new temperature of the object?

A: 49°C.


> That's why Kelvins are a unit and °C aren't

Could I ask that you update Wikipedia with your discovery? Sadly the page appears to be erroneously using the word "unit" all over the place! https://en.wikipedia.org/wiki/Celsius

Also, NIST might benefit from your guidance: https://physics.nist.gov/cuu/Units/kelvin.html


It's already there, if you know enough to recognize it:

> The degree Celsius (°C) can refer to a specific temperature on the Celsius scale or a unit to indicate a difference between two temperatures

In other words, you can add a quantity of °C to a temperature value and you'll get another temperature value. But you can't measure a temperature in °C.

Compare how, for example, the python datetime library uses a datetime type and a timedelta type. A datetime plus a timedelta is a datetime. datetimes refer to points in time, and timedeltas don't.

°C measures a temperature delta, but not a temperature.


> It's already there, if you know enough to recognize it

"As an SI derived unit", "or a unit to indicate", "the unit was called"

Three mentions of it being a unit in the first paragraph, alone. I understand the point being made; the conclusion that "celsius is not a unit" is bogus, however, by any common definition, including NIST's.

In a now-deleted comment, you linked to a Wikipedia page on Dimensional Analysis, which includes the sentence "to convert from units of Fahrenheit to units of Celsius".


Except when you take into account that 0 Kelvin actually is theoretically "nothing" with regards to temperature. It's not a floating point, it's an absolute.


They have all the same semantics as regular numbers? They all derive naturally from the Peano axioms.

You may be thinking of NaN.


Yet `Maybe<T> | Option<T> | ...` is not an option (pun intended), as Rich Hickey explains here: https://www.youtube.com/watch?v=YR5WdGrpoug.

In effect, his argument is: 1) You have `public X Do(Y y)` changed into `public X Do(Option<Y> y)`. This will break your API. 2) You have `public X Do(Y y)` changed into `Option<X> Do(Y y)`. This will break your API.

Thus, do not use Option<T> or equivalent in your API's. Only use a language-supported construct such as C#8's upcoming `string?` and `string`.


This is a spot where I've got to respectfully disagree with Mr. Hickey.

Changing a public API call that used to guarantee that it returned a value so that it might now return nothing is a breaking change, and, as an API consumer, I want my APIs to broadcast that change loudly. Compiler errors are a good (but not the only) way to do that.

Changing a public API member so that its arguments are now `Maybe[T]` is just silly. There's no need to introduce a breaking change there. Just overload it so that you now have versions that do and do not take the argument and get on with life.

If there's an argument to be made here, it's that statically and dynamically typed languages require different ways of doing things. In a statically typed language, I expect the compiler to keep an eye on a lot of these things, and I'm used to leaning on the compiler to catch things like a function's return value changing. In a dynamic language, I'm not.

I'm also, when working in a dynamic language, used to having to deal with the possibility that, at all times, any variable could contain data of literally any type. Removing nullability there changes the set of possible "this reference does not refer to what I expected" situations from (excuse the hand waving) a set with infinite cardinality to a set whose cardinality is infinity minus 1. If you think of NULL as effectively being a special type with a single value (call it "void"), then eliminating it reduces the number of classes of errors I have to worry about in a dynamic language by 0. I'm hard pressed to see any value there.


This is backwards. Rich did not advocate for changes that break promises.

The point in the talk is that "strengthening a promise" should not be a breaking change. Changing return type from "T or NULL" to always returning T. The case where you previously couldn't guarantee a result, but now you can.

The other case "relaxing a requirement" also should not be a problem. The case where you previously had to give me a value, but now I don't need it and can do my calculation without it.


TBH, I'm happy with that being a breaking change, too. Just keep returning a T? that happens to always have a value until the next major version # increment (or whatever), and then make the breaking change, and then I get a clear signal that I can delete some lines of code.

The alternative seems like a path that, in any decently complex software project, ultimately leads to an accumulation of useless cruft that'll probably continue to grow over time as people keep copy/pasting code that contained the now-useless null-handling logic.


"The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt. – Rob Pike"

When did computer science become about hand holding? has it always been this way? Look at react. It was designed to force functional programming concepts in an OOP manner. Is the future of programming the implementation of tightly controlled interfaces with extreme type safety? I would argue thats where we are going. Things are becoming less expressive, not more.


>When did computer science become about hand holding?

When people with pragmatic goals want to get large teams of new programmers productive fast, and can't expect everyone to be able to fend on their own or can afford the cost of accumulated mistakes.

>Look at react. It was designed to force functional programming concepts in an OOP manner.

Whatever that means, as React has little to do with "OOP manner".


OOP meaning React.Component, functional meaning immutable html state, property inheritance, render(), etc


Component hierarchies != OOP. They are an inevitable part of UI, which is hierarchical.

React has move to stateless components and functions over classes.


It's the Java approach. Don't give people footguns and force them to write software in a readable, testable, maintainable style. It works extremely well in software engineering, because you want systems that work reliably and that can be maintained/extended by any other engineer at your org. In the professional software engineering world, "clever" programmers are almost always a horrible drag on their team.


It's not the Java approach at all. The first languages to remove ubiquitous nulls (e.g. MLs) were looking to increase expressivity and modeling abilities, not to force bondage and discipline upon developers.


Prevention of user error is a good design principle: https://justuxdesign.com/blog/poka-yoke-design-the-art-of-er...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: