NULL: The worst mistake of computer science? (2015)

porpoisely · on Dec 4, 2018

NULL can mean and be different things in different domains of computer science. NULL in the database world isn't the same thing in the programming world. In the programming world, null is a result of the system architecture, systems programming, etc. In SQL, NULL is a result "lack of data". There have been debates on whether there should be different types of NULL. A NULL type for "data that is available but we don't have it yet" - like car make and model for a car owner. A NULL type for "data that does not apply" - like car make and model for an adult who doesn't own a car. A NULL type for "data that never applies" - like car make and model for a child. Then you get into the philosophical debate on whether a NULL can ever equal a NULL. Does it even make sense to even think of NULL in terms of equality. How can an unknown entity ever equal another equality? But what if you are just asking "are they both unknown"? Then you probably can think of two NULLs as "equal".

In higher level programming, the consensus seems to be the less nulls the better. Which is why languages like C++, C#, etc are introducing Option-like syntax ( mostly to accommodate the database world and their NULLs ).

NULL exists to solve particular problems in computer science. It can also cause a lot of problems. You can argue it's the best solution and worst mistake depending on the situation.

lalaithion · on Dec 4, 2018

All of those different nulls can be solved by not having null as a special case of your database specification, but as a first class type construct.

    data MightBeData a = Yes a | Unavailable | NotApplicable | NeverApplicable

Twisell · on Dec 4, 2018

NULL in databases have many properties that save a shitload of coding time and help write more secure code. To cite only one of theses useful properties NULL automatically propagate through all operations and aggregations.

nostrademons · on Dec 4, 2018

Is that the behavior you actually want, though? In many cases "this value is explicitly unknown" has dramatically different semantics from "the computation that produced this value had an unexpected NULL input", and if you interpret the latter as the former, you've likely just corrupted your data.

Monadic Maybe (in higher-level languages like Haskell or Rust) has the semantics you describe, but the advantage that you only get it when you explicitly ask for it. If you care about data integrity you usually want to be particularly precise about the results of your computation; it's helpful if your type system can sanity check them as they propagate through every operation.

Twisell · on Dec 5, 2018

Yes of course when I deal with data if I compute an average of over 20 record and one have a NULL value I want to know that something is wrong with that specific operation. Silently returning something wrong would be really bad. Crashing whole query would be anoying as well because if you compute 3Billion aggregates at the same time and only 0.1% return NULL you might want to filter them out instead of doing nothing at all or correcting input.

Of course all of this would be feasible with an arbitrary no data value, but you’d have to basically rewrite all propagation functions that are built in with NULL. As a DB admin I would consider a database without NULL handling as utterly flawed.

In fact with modern SQL database you often get twice the fun because there is a second propagating special value for numeric type called NaN (not a number) to further distinguish lake of data from invalid data if need be!

eksemplar · on Dec 4, 2018

I think you can say something similar about programming languages that don't have automatic memory assignment.

It can be perfectly valid to have a pointer reserved in memory, without having given it a value to point to yet. I see why this isn't much of an issue in 2015, but it was.

I don't think NULL is a problem at all really, but it sure isn't fun in languages like C# where you can have nullable types, but I think that's a problem with C# and not the concept of NULL.

davemp · on Dec 4, 2018

This is hard to reconcile with type theory for me.

NULL, to me, implies and uninhabited type, i.e. there can never be a value with a NULL type. Using null for a "data isn't there, apply, available, etc" seems like an abuse of the type system. I see no reason that the former needs to be supported at the type level. These properties are just responses to queries, not some mystical, uninhabitable oblivion. Unnecessary type features just make verification and learning a language much more difficult.

giornogiovanna · on Dec 4, 2018

NULL isn't the uninhabited type, that's the bottom type. NULL is a value that inhabits every type.

kazinator · on Dec 4, 2018

Not in all languages. In Lisp dialects related to classical Lisp, like ANSI Lisp, there is a unique nil object which is a kind of symbol. It is self-evaluating and serves as (the one and only) Boolean false value, and also as a representation for the empty list, which terminates all non-circular lists.

There is no nil-like value in the domain of any other type. If you hold a value which satisfies stringp then you know you have a string of zero or more characters and not some null reference.

Because the typing is dynamic, then if you have a situation in which you would like to either have a string, or a value indicating "there is no string here now", you can use nil for that. But replacing a string variable with nil changes the type; it is not a string reference that has been nulled out, but a string object previously stored in the variable has been replaced with the nil object.

nostrademons · on Dec 4, 2018

I think he's speaking specifically of the Algol-derived languages that the article is talking about, i.e. C, C++, Java, etc. Other languages (eg. SML, Ocaml, Haskell, and Rust) force you to make None an explicit value in an algebraic type (eg. Maybe/Optional), and that's what the article is arguing for. In dynamically-typed languages (Python, Javascript, Ruby) the question is irrelevant because there's no static type checking anyways. Static type systems bolted on top of dynamic languages (eg. CMUCL, Closure Compiler, TypeScript, Python typing) often get this right - they treat a nullable type as distinct from a non-nullable one and perform checking upon entry. There're also some languages (Kotlin, Java8 with @NonNull) that are fundamentally saddled with null because it's part of the platform APIs, but have built layers on top of it similar to these to perform nullability checks.

yxhuvud · on Dec 4, 2018

> NULL is a value that inhabits every type.

Not in all type systems. Particularly, type systems which has union types may chose to simply create a separate type for NULL. There are also variants that have separate NULL types for different base types, which also invalidate your claim.

ArchTypical · on Dec 4, 2018

> Not in all type systems

I think this is inaccurate. We are talking about computer science, which is important and a constraint around the general type theory. A type system is different than how you interact with it, so dispensing with language-specific symbolic representation further normalizes the discussion.

Fundamentally, a (computer science) type is a representation of binary, for the most part, data. That representation has to inhabit some part of bounded memory. When that memory is initialized (empty), it's some form of NULL, for a lack of another term. It exists for every type system in computer science.

> There are also variants that have separate NULL types for different base types, which also invalidate your claim

That's not the same thing. Different NULL types make sense for different sized discrete (fixed bounds) memory allocation. A unicode character has a fixed size allocation, while a string might be unbounded allocation (it grows in some fashion, as needed).

Edit: Kneejerk downvoting, classy.

Sharlin · on Dec 4, 2018

This does not make sense. Null is not the same as zero and neither is the same as an all-zeros bit pattern. A memory word interpreted as an integer has no meaningful null value. A memory word interpreted as a pointer may or may not have a special bit pattern (which may or may not be the same as integer zero) that represents a pointer to nowhere; it all depends on language semantics. The address 0x0 can be perfectly valid on some architectures. Even though in C the literal 0 denotes a null pointer constant, it does not mean that the value of a null pointer is literally zero.

ArchTypical · on Dec 4, 2018

> A memory word interpreted as an integer has no meaningful null value.

What a type is, depends on what the runtime operates on. You can make a runtime that just grabs random bits of data as a type and say "that's an integer" but it's not a useful construct/example. A runtime keeps track of types in some way external to the data itself. So I'll disagree that an all-zeros is not the same as a null, because it's a common way to initialize the data that is identified with a type (like in a pointer table). It's not a 1:1 but it's common. There's not always a formalized name, but it (an uninitialized state) always exists as part of the type system (when not reusing existing memory allocation, which is an initialized state). Always.

yxhuvud · on Dec 4, 2018

It was not I that downvoted you, but I agree with the down-voter in that you are not making good arguments. There is basically nothing in common with NULL as used by programming languages in general compared to how you want to define it in your comment. How NULL is represented internally, is totally language dependent. C and its ilk has defined it one way (that is reasonably close to your description ON SOME ARCHITECTURES - not all), but how other languages define it differ wildly.

And strings may only grow if strings are mutable, which AGAIN is extremely language dependent.

CuriousSkeptic · on Dec 5, 2018

I’m not sure I buy this definition of type even in theory.

From a category theoretic point of view a type would be nothing more then the constraints on how terms may be composed.

Either you simply view types as as objects in some category or, perhaps a bit more interesting, as a functor. See f.ex http://noamz.org/papers/funts.pdf

Or rather it seems to be a common theme in language design to conflate these two notions of types, and we should probably stop doing that.

ArchTypical · on Dec 5, 2018

Category theory doesn't require NULL but physical state does. Given the state of the machine as a constraint, there must be an uninitialized state of unknown or undefined (but allocated) for each type to ensure types are run as functors. I don't see why talking about the theory is useful, given the practical constraint will always provide an asterisk of *given you're working on physical memory

All types are functors in practice...which always has an element of initialization to ensure the type is defined in the memory.

CuriousSkeptic · on Dec 6, 2018

But that is a big asterisk, which was my point. Only the compiled and running program is actually forced to working with physical memory. All stages before that is just modeling.

My belief is that we should stop conflating data types (input and state modeling) and program types (domain modeling) so we can advance to more productive workflows. F.ex runtime reflection should never have been a thing, instead the focus should have been on macros and staged compilation, most meta programming can probably be better evaluated design time rather than run time.

ArchTypical · on Dec 6, 2018

Well said.

giornogiovanna · on Dec 5, 2018

Sorry, I was talking specifically about C++ and friends, which is the set of languages for which NULL is considered problematic. For the languages you're talking about, NULL is much more well-behaved, so there's less reason to complain.

davemp · on Dec 4, 2018

Thank you. I got confused on nomenclature because of a null pointer's similarity to the bottom type.

I was trying to say that having a NULL value that inhabits every type seems silly as not every set of values has the NULL value. Considering that NULL can be represented as a special case of sum types, it seems even sillier to mandate such a value on all types.

Using both NULL and nullable seems very confusing as well.

crooked-v · on Dec 4, 2018

Consider Javascript, where undefined and null are both unique primitive values.

lugg · on Dec 4, 2018

Is anyone aware of a situation that code would fault if we replaced all undefineds with nulls in JS?

General code, not things specifically testing differentiation between undefined and null.

kazinator · on Dec 4, 2018

This is clear in (Common) Lisp.

There is a null class/type whose one and only instance is the special object nil.

This nil also itself names a type which contains no instance: its domain of values is the empty set. This is the type at the bottom of the type spindle, corresponding to the NULL that you're talking about.

nil is a special self-evaluating symbol representing Boolean false, the empty list, and this bottom type.

So you see, if we just don't conflate the null object with the null set type, everything works out.

SpicyLemonZest · on Dec 4, 2018

It needs to be supported at the type level, whether by null or by options, simply because “data not available” is a common value people need to use. When there’s no good way to express it, they’re forced to invent special sentinel values, and you end up in the situation where array index -1 means “value not found in the array”.

bootlooped · on Dec 4, 2018

I have worked with a MySQL database where the designer(s) decided that, on many of the tables, -1 should represent no value instead of null. As you can imagine, this has caused some problems when they've done this with columns representing dollar amounts.

This was done because of their belief that having any nulls in the table is the kiss of death for performance; they've used the phrase "tablescan" a lot. I have not been able to find a basis for these claims.

fphhotchips · on Dec 4, 2018

Databases, and MySQL in particular because of how many of it's defaults are pants-on-head nonsensical, are a haven of cargo cult performance rituals. I have a suspicion this is because, rather than analysing their own N! loops/queries, it's easier for mediocre programmers just to blame the database.

davemp · on Dec 4, 2018

Yes, option types are awesome. No, they are not nulls.

Algebraic data types are not direct support "no data found at the type level". Algebraic types are really just a fancier enum/union type. It just so happens that inventing special sentinel values is awesome when you have an algebraic type system to check your work.

dtech · on Dec 4, 2018

Take a look at Kotlin or Typescript†. Basically, they decided to fully design the language with support for null-as-option.

That means several things:

  * T (non-nullable) and T? (nullable) are different types. T? = T | null
  * Where T is expected T? is not accepted, but where T? is specified T is also accepted
  * T? is automatically cast to T in the places where it's asserted to be not null, e.g. within an if(x != null) branch
  * Method calls are not allowed on T? (unless the function specifies it can handle null)
  * There's syntax for providing a value in case of null. (x ?: fallback) in Kotlin, (x || fallback) in Typescript.

It's a much more pleasant developer experience than the Option ADTs/Enum types from Scala or Haskell, which the same amount of safety.

Typescript goes even further and has the best enumeration support I've seen any language have. T | U is a fully valid type, and if T | U is asserted to be one of them it is automatically cast to T/U. It is a very natural and efficient way of building ADTs [1]

† Typescript 2.0+ with --strictNullCheck on

[1] https://www.typescriptlang.org/docs/handbook/advanced-types.... , Discriminated Unions header

davemp · on Dec 4, 2018

These are great features, but they are really just syntactic sugar over algebraic types. It's not a more pleasant developer experience than sum (Option) types, it's a more pleasant experience with sum types. Ex:

    macro! unwrap(x, fallback) {
      match x {
        Some(n) => n,
        None => fallback
      };
    }

> Typescript goes even further and has the best enumeration support I've seen any language have. T | U is a fully valid type, and if T | U is asserted to be one of them it is automatically cast to T/U. It is a very natural and efficient way of building ADTs

That's pretty cool. It seems like refinement types. [1]

[1]: https://en.wikipedia.org/wiki/Refinement_(computing)#Refinem...

nostrademons · on Dec 4, 2018

It's subtly different because of the implicit coercions and smart casts. You can always pass T to a function that takes T?, while in Haskell/Rust you would need to pass Some(t). Similarly, Kotlin does control-flow analysis and converts all T?s inside an "if (t != null) ..." block or after an "if (t == null) return" statement into a T, which dramatically reduces the amount of try!/unwrap() calls that I used to see littering early Rust code. There's better syntactic sugar for it in Rust now, but the point is that Kotlin doesn't need nearly as much syntactic sugar because nullable is integrated with the typechecker and flow analysis.

dtech · on Dec 5, 2018

> It's not a more pleasant developer experience than sum (Option) types, it's a more pleasant experience with sum types.

Having worked with both Scala, Haskel, Kotlin and Typescript I disagree, the dev experience for Kotlin/Typescript is miles ahead for optionality. It seems like a small difference vs something like do notation or for comphrehension but it really adds up.

It hard to explain until you've tried it. The most succinct way would be to say that optional code looks and feels nearly identical to non-optional code, instead of unwrap/do-notation which makes a very large syntactical difference.

Take a look at a piece of async Kotlin co-routine code v.s. Java/Scala (Completable)Future, same effect.

yxhuvud · on Dec 4, 2018

If you like those features, I think you would like Crystal, which has much of the same. Full union types, flow typing and type inference for basically everything on the stack makes for a very fluid and scripting language like experience while keeping type safety.

smrq · on Dec 4, 2018

> * There's syntax for providing a value in case of null. (x ?: fallback) in Kotlin, (x || fallback) in Typescript.

Well, uh, unless T can be falsy. 0 || fallback === fallback. A proper ?? operator a la C# would be great, but they've pushed back against it because it doesn't entirely mesh well with nulls in the JS world.

JS is still probably my favorite language to work in despite stuff like this. But that's one footgun that everyone should be aware of.

dtech · on Dec 5, 2018

A true elvis and null-safe call operator like ?., ?? and/or ?: would be an improvement, but the idea doesn't seem to be gaining traction in the Javascript world where || is "good enough"

lmm · on Dec 4, 2018

That's a terrible approach because it's inherently noncompositional and thus breaks parametricity. You can't tell whether T? is the same type as T without knowing what T is, so if you use T? and handle null in your generic functions then they might suddenly misbehave when passed a null.

(Previously said T?? rather than T?, thanks to corrections in replies)

dtech · on Dec 5, 2018

This is incorrect. The type system of both languages makes sure that T does not include null, so the case where you "accidentally" handle the null of T is impossible.

More generally, with unions you always have this behavior but I've never seen this be a problem. If your function accepts T | U and you pass (T | U) | U it simplifies to T | U and in my experience the code that handles U is always the correct thing to do for U.

lmm · on Dec 5, 2018

> The type system of both languages makes sure that T does not include null, so the case where you "accidentally" handle the null of T is impossible.

So does that mean you can't call generic methods with ? types? Because there's an excluded middle here: either something like String? is a first-class type, in which case you can invoke a <T> T ... method with T=String? and then any T?s inside that method have the potential to misbehave, or you can't, in which case String? becomes an awkward second-class type.

dtech · on Dec 8, 2018

What? No. T is a subtype of T?, so I don't get where you are going.

If a generic function f accepts a covariant Dog and Animal is a supertype of Animal you cannot call f with Animal because it requires a Dog. if f accepts a covariant Animal you can call it with a Dog because Dog is a subtype of Animal.

Now replace Dog with T and Animal with T? and you can see it is perfectly fine.

lmm · on Dec 9, 2018

We're talking about generics. <T> and the like. I can have a Map<String, Dog>, I can have a Map<String, Animal>, and these are different things but they both work fine because both Dog and Animal are first-class types.

Can I have a Map<String, Dog?> ? If no, then Dog? isn't a first-class type. If yes, we have all sorts of nasty surprises, because code written in terms of Map<String, T> will assume that if map.get(someKey) is null then that means someKey isn't in the map, and this code will work fine until someone uses a ? type for T and then break horribly.

dtech · on Dec 13, 2018

> Can I have a Map<String, Dog?>

yes

> because code written in terms of Map<String, T> will assume that if map.get(someKey) is null that means someKey isn't in the map

And it can safely assume that. Map<String, T?> isn't a subtype of Map<String, T>, so passing Map<String, T?> to a place requiring Map<String, T> will not compile.

T is a subtype of T?, not the other way around. Map is defined as Map<K, out V>, meaning V is covariant.

So you can pass a Map<String, Dog> where a Map<String, Dog?> is required, but not a Map<String, Dog?> where a Map<String, Dog> is required.

yxhuvud · on Dec 4, 2018

> You can't tell whether T?? is the same type as T?

Huh? T?? would always be the same type as T?. `T | NULL | NULL == T | NULL`.

dragonwriter · on Dec 4, 2018

I think you have too many ? in there; you can't tell if T? is the same type as T unless you know if T is U? for some U. (This is true of untagged unions more generally, ? is just shorthand for | null.)

But T?? is always identical to T?.

SpicyLemonZest · on Dec 4, 2018

They’re not direct support, but option types only work in languages with the right syntax and tooling to make them work. C union types, for example, aren’t really capable of providing a null replacement simply because they’re so complex to use.

HumanDrivenDev · on Dec 4, 2018

They’re not direct support, but option types only work in languages with the right syntax and tooling to make them work.

I strongly disagree.

You don't need pattern matching syntax to use option types effectively. All you need are the right methods - map, flatMap, getOrElse etc etc. Any language with inheritance and dynamic dispatch can implement a good option type.

Even in languages with pattern matching, I never reach for it when dealing with options.

dragonwriter · on Dec 4, 2018

> Yes, option types are awesome. No, they are not nulls.

No, but they serve a superset of the semantic functions of nulls, better than nulls do.

karmakaze · on Dec 4, 2018

All this is true. I immediately took it to mean the Tony Hoare invention of Null. That of the zero memory pointer. Other types of null usually have other names with the notable exception being databases. Also Go uses nil for its null which puzzled me for a while then realized its behaviour is different, you can work with them to a point, e.g. len() or append().

bytematic · on Dec 5, 2018

NULL should mean an intended lack of data. Undefined should be unintended.

bunderbunder · on Dec 4, 2018

You know who works on a platform with NULL but doesn't have quite so many problems with it? DBAs.

There's some need to draw a distinction between the basic idea of NULL, and the way that NULL has been implemented in most high-level programming languages.

In most RDBMSes, values can't be null unless you say they are. Sometimes explicitly, as in table definitions, sometimes implicitly, when you select a JOIN type. Either way, though, the fact that the developer is in control of when it can and cannot happen means that it always has a knowable meaning. (Or should, anyway.)

The problem with many programming languages is, you're given it whether you want it or not. In a low-level language like C, that's reasonable, because it takes a sensible approach to how it works: Only pointers can be null, and all pointers are nullable for obvious (especially in the 1970s) reasons.

More generally, I'm not going to fault languages from that era for trying it out, because this stuff was new, and things were still being felt out. So I don't really fault Tony Hoare for giving null references a try in ALGOL W.

What seems much more bothersome is high level languages like Java and C# cargo culting this behavior. They could have followed the lead from languages like SQL and let the programmer be in control. They should have. They already throw exceptions when a memory allocation fails, and they allow inline variable initialization, and declaring variables at the point of usage, and composite data types have constructors, so they lack all of (early) C's reasons why ubiquitous nulls were a good idea. They could have, I think quite easily, made nullability optional. At which point it'd have basically the same semantics as optional types from functional programming, so I doubt we'd be worrying about it anymore.

But they didn't.

nayuki · on Dec 4, 2018

NULL in SQL really isn't great. For one, nullable table columns is a bad default, and you have to explicitly write out "NOT NULL" to avoid this behavior. I'd say that 90% of the time I want not-null table columns, and only 10% of the time do I want a nullable column.

Secondly, NULL has weird arithmetic. It turns out that NULL=NULL is false, and NULL<>NULL is also false. (This is unlike C/Java/Python/etc. by the way.)

Thirdly, even if you design all your tables to have NOT NULL on all columns, your queries can still synthesize NULL values in the results. For example, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, (but not INNER JOIN). For example, computing max(column) on a table with zero rows.

dpcx · on Dec 4, 2018

I can get behind your first statement. Having NULLable as a default on columns is "probably" a bad idea.

I'm not so sure I can agree with the other two. NULL<>NULL (and NULL=NULL) both return false for a very simple reason: truly missing data _can't_ be equal to anything, including missing data... Because it's missing. You cannot with certainty say that value1 is or is not equal to each other.

For the third point... What should max(column) return when there's no data? You're telling the engine "give me the maximum value of something that doesn't exist". That is, in my experience, "missing data."

smadge · on Dec 5, 2018

For example, if it were the case that NULL = NULL, really counterintuitive stuff would happen on joins because a null cell would match with every other null cell you are joining on:

        person
   name      home_address
   ---------------------------
   "Alice"   NULL
   "Bob"     "123 Jump Street"

                letter
   return_address     description
   ----------------------------------
   NULL               "Ransom Letter"
   NULL               "Spy Document"
   "123 Jump Street"  "Hello, from Bob"

Then

    SELECT name, description FROM person INNER JOIN letter ON home_address = return_address

would return

    name     description
    ------------
    "Alice"  "Ransom Letter"
    "Alice"  "Anonymous Spy Document"
    "Bob"    "Hello from Bob"

So now Alice is associated with a bunch of letters she didn't necessarily write because she doesn't have a home address.

cpburns2009 · on Dec 4, 2018

I find NULL to be incredibly useful. I do agree that it has bad defaults in SQL, and equality is annoying similar to NaN in other languages.

grumpydba · on Dec 4, 2018

> NULL in SQL really isn't great. For one, nullable table columns is a bad default, and you have to explicitly write out "NOT NULL" to avoid this behavior.

This is not true on many dbms. It's an implementation choice.

lmm · on Dec 4, 2018

NULL in SQL is a notorious source of errors and confusion (particularly when it comes to e.g. tri-state boolean logic). It certainly can come from nowhere and surprise you - if anything the behaviour is even worse than in Java or C#. So I don't think there's anything to learn from there. (Rather what modern languages should have done - and increasingly do - is follow ML practice and avoid null entirely, implementing option types as ordinary library types where the programmer explicitly wants to represent absence).

da_chicken · on Dec 4, 2018

That's not really NULL's fault that it causes confusion in SQL. That's just ternary logic. People who don't handle NULLs in SQL aren't really mishandling the NULL. NULL is just a value. They're simply failing to understand the Boolean value of UNKNOWN and what that means. They're so used to thinking only in bivalent logic that the additional complexity throws them off.

However, "It's more complex for me to think about," or, "I don't understand the convention," or even, "It's easy to forget the convention," are not a very convincing arguments. It's similar to arguments about little endian vs big endian. Yes, big endian is how we write our positional numbers, but little endian makes casts a noop. Or arguments about zero-based array indexing. These concepts aren't difficult. They're just more complex. Negative numbers aren't difficult, but they're more complicated than just cardinal numbers. Fractions and decimals aren't difficult, but they're more complex than integers. Multiplication and division aren't difficult, but they're more complex than addition and subtraction.

setr · on Dec 4, 2018

>That's not really NULL's fault that it causes confusion in SQL. That's just ternary logic

The problem is that its not simply ternary logic. It's a ternary logic that gets mapped onto a boolean algebra, which leads to the usual strange repercussions (particularly, the presence of nulls creates both false positives, and false negatives, silently).

The SQL language goes out of its way to pretend its not ternary, though in fact it is. You have to actively keep in mind when writing SQL that the database is trying to trick you. This is not a good thing, and it's hard to blame the programmer when they get tricked.

lmm · on Dec 4, 2018

> Or arguments about zero-based array indexing. These concepts aren't difficult. They're just more complex. Negative numbers aren't difficult, but they're more complicated than just cardinal numbers. Fractions and decimals aren't difficult, but they're more complex than integers. Multiplication and division aren't difficult, but they're more complex than addition and subtraction.

We usually consider it a good thing when programming languages let you opt out of the complex thing. In a good language, you can do integer arithmetic if you don't want to deal with fractions or decimals. You can do cardinal arithmetic if you don't want to deal with negative numbers. You can do ordinary Boolean logic if you don't want to do ternary logic.

The problem with SQL isn't that it has NULL. It's that it's too hard to not have NULL. Which is the problem with null in general.

da_chicken · on Dec 4, 2018

Yeah, but saying "I want to use SQL" and "I don't want to use NULL or ternary logic" is a bit like saying "I want to use the existing datetime types" and "I want all years to have 10 months, all months to have 30 days, etc." Or like saying, "I want everything to use integers" and "I need fractional components." Your requirements break the abstraction not because the system is constrained, but because you're breaking the conceptual model that's the foundation of what you're trying to use. It's not a language problem. It's not a data problem. It's not a computing problem at all. It's applying the wrong conceptual model to meet your needs. That isn't a problem with the conceptual model, either, since plenty of people use it very successfully.

lmm · on Dec 4, 2018

> Your requirements break the abstraction not because the system is constrained, but because you're breaking the conceptual model that's the foundation of what you're trying to use.

How so? Elsewhere in the thread it's claimed that the original relational model didn't have nulls, which is what I'd expect.

da_chicken · on Dec 4, 2018

Relational algebra doesn't have nulls, but there's a difference between the mathematical theory and concepts and the reality of a relational system.

As I mention elsewhere, Codd's own list of rules for a relational database [0] explicitly require nulls (see Rule 3).

[0]: https://en.wikipedia.org/wiki/Codd%27s_12_rules

lmm · on Dec 5, 2018

I don't see any entanglement with the rest of the rules, or with what makes a relational database a relational database. "A systemic way to represent missing and inapplicable information" may be necessary, but better alternatives to null are imaginable. A relational database without nulls sounds like an ML without exceptions: actually a pretty good idea.

bunderbunder · on Dec 4, 2018

I guess. My tendency is to think that it's more a problem for developers who are new to SQL, and are surprised to find out that, despite having the same name, nulls in SQL don't have the same semantics as nulls in other languages.

Once you get a handle on the semantics, though, they make a lot of sense. The trick is to understand how SQL's NULL is rooted in mathematical formalism, not the pragmatics of dealing with pointers. It has more in common with NaN in floating-point numbers. So, for SQL, "null <> null" behaves like "NaN <> NaN". For C and friends, "null == null" for the same reason that "0 == 0".

nayuki · on Dec 4, 2018

In SQL, NULL<>NULL yields false. You use IS NULL / IS NOT NULL to test for NULL values. In programming languages, NaN!=NaN yields true. You use x!=x to test for NaN values.

Saying that SQL NULL is rooted in mathematical formalism doesn't explain anything, because anything (even nullptr and NaN) can be explained in mathematical formalism. What we want is a simple semantic model that a human can understand and one that lacks nasty unintuitive surprises.

da_chicken · on Dec 4, 2018

> In SQL, NULL<>NULL yields false.

No, NULL <> NULL yields UNKNOWN. That's why NULL <> NULL and NOT (NULL <> NULL) behave the same: they have the same value. UNKNOWN is a first-order truth value in ternary logic.

The key is that in a WHERE clause, a record is only returned if the WHERE clause evaluates to TRUE. Not TRUE or UNKNOWN. TRUE.

masklinn · on Dec 4, 2018

> In SQL, NULL<>NULL yields false.

It yields NULL, not false. So do NULL = NULL or NOT NULL.

da_chicken · on Dec 4, 2018

NULL isn't a Boolean value in ternary logic any more than 3.2 or 'Hello' or December 12, 2018, have Boolean values. It's UNKNOWN. UNKNOWN is related to NULL, but they don't work identically.

NULL is a value that any column data type can potentially have. NULL is what comparison and evaluation operators work with. UNKNOWN is a ternary Boolean type, and the Boolean type is what Boolean operators work with (AND, OR, NOT) and nothing else. This Boolean type in an RDBMS is unavailable to the user and is for internal evaluation purposes only. RDBMSs that support a "bool" type are not implementing the same thing. You can never say UPDATE MyTable SET Col = Value1 AND Value2. That's not going to work. Many RDBMSs have a documentation page that explains this difference, like this one[0] from Microsoft SQL Server.

Notably, NULL + 3 and NULL * 5 are both NULL. Any mathematic operation on NULL is NULL. But UNKNOWN AND FALSE is FALSE, and UNKNOWN OR TRUE is TRUE.

[0]: https://docs.microsoft.com/en-us/sql/t-sql/language-elements...

icedchai · on Dec 4, 2018

NULL is an alias for UNKNOWN on many systems (like MySQL.) Other DBs don't even have UNKNOWN.

UPDATE table set col=value1 and value2 works fine IF value1 and value2 are booleans.

da_chicken · on Dec 4, 2018

That's a great example of MySQL creating a proprietary extension of ANSI SQL that does little more than deliberately mislead users.

icedchai · on Dec 4, 2018

According to https://en.wikipedia.org/wiki/Null_%28SQL%29#BOOLEAN_data_ty... NULL is the same as UNKNOWN. The standard also asserts that NULL and UNKNOWN "may be used interchangeably to mean exactly the same thing"

In 20+ years of DB work, I have NEVER seen anyone use UNKNOWN. It is always NULL. Always.

da_chicken · on Dec 5, 2018

Alright, I will withdraw my criticism of MySQL on this issue.

However....

> In 20+ years of DB work, I have NEVER seen anyone use UNKNOWN.

I mean, I've already shown where Microsoft does just that [0]. Oracle pretty clearly does the same [1] [2]. People don't use it because you can almost never refer to it directly. The language intentionally hides it. About the only place I know that you can is PostgreSQL [3], which supports the "boolean_expression IS UNKNOWN" predicate.

> The standard also asserts

I assume you've got the 2003 draft standard that's around [4]. I will use that because I don't see any more recent version of 9075-2 that's freely available.

Yes, the standard does say under 4.5 Boolean types:

> This specification does not make a distinction between the null value of the boolean data type and the truth value Unknown that is the result of an SQL <predicate>, <search condition>, or <boolean value expression>; they may be used interchangeably to mean exactly the same thing.

However, that's in the context of describing the Boolean user data type, a.k.a., BOOLEAN. You can tell because 4.2 describes character strings (CHAR, VARCHAR, etc), 4.3 describes binary strings, 4.4 describes the numeric data type, 4.6 describes DATETIME, and 4.7 describes user-defined types.

The standard is not saying that UNKNOWN and NULL are the same. It's saying that the Boolean user data type can use NULL to represent UNKNOWN. It's saying that if you choose implement a BOOLEAN user data type, you can use NULL to represent UNKNOWN. If you choose to assign a boolean expression to a column, that is. Nevertheless, an SQL <predicate>, <search condition>, or <boolean value expression> has a value of True, False, or Unknown. This shown by looking at 6.34 <boolean value expression>:

  <truth value> ::=
      TRUE
      | FALSE
      | UNKNOWN

Or by searching section 8 and seeing where every time they talk about one of the value expressions being the null value, then the predicate "is Unknown".

[0]: https://docs.microsoft.com/en-us/sql/t-sql/language-elements...

[1]: https://docs.oracle.com/cd/B19306_01/server.102/b14200/condi...

[2]: https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_e...

[3]: https://www.postgresql.org/docs/11/functions-comparison.html

[4]: http://www.wiscorp.com/sql_2003_standard.zip

Edit: Bit of cleanup.

icedchai · on Dec 5, 2018

Ok, I will concede you are technically correct! However, I never seen a developer use "is unknown", even with Postgres. (I have been working with Postgres for over 15 years.) They always use "is null", which is, for all intents and purposes, the same thing from a developer perspective.

da_chicken · on Dec 5, 2018

I've only seen it once that I can think of, and I don't remember where. It might've been an example when they added or explained that predicate. I recall something like (Column1 = Column2) IS NOT UNKNOWN, but I don't know why you wouldn't use Column1 IS NOT NULL AND Column2 IS NOT NULL instead. I guess it might save a bit of rewriting, but it still seems pretty narrow.

It's really not useful unless you're talking about the value of a boolean expression or the underlying concepts of SQL, and most RDBMSs don't let you manipulate that directly with DML (MySQL is the first one I've seen that let you do it, and you just taught me that was the case). It's somewhat hidden because of that.

mercer · on Dec 4, 2018

I thought in most (many?) languages NaN != NaN ?

lmm · on Dec 4, 2018

My degree's in mathematics and I share your disdain for pointer bit-twiddling. I still find SQL nulls difficult to reason about or diagnose. I'm sure there are times and places when their behaviour is what you want but most of the time they're just a big extra complication that you don't want or need.

lugg · on Dec 4, 2018

In the tables I define everything is not null with sane defaults by default.

The places I do allow null are few and far between (e.g. updated_at) and I'm struggling to think of instances I've used them as anything other than absence indicators.

In fact I don't think I ever treat it as anything other than that in code either.

Was the purpose of null ever to mean anything other than I have not been defined/set?

All my objects are statically typed so I never run into the issue of testing is thing.x a thing, it's always a thing, or it's a compile error. It's either set or not set, and thanks to the database convention I only have to worry about certain values having null, most of the time it makes sense anyway. Is updated_at turthy doubles as has been updated tests.

Am I incorrect in this method? With this method I fail to see big extra complication. Will switching to option types help me? I debate they will not. But I'm happy to be convinced. I do avoid nulls. I just haven't seen a problem with them in my own code. (Not true for others)

Specifically, with the caching problem, provided you constrain the cache to reason about null == not set. I see no problem.

    Cache.get(K) // null
    Cache.set(K,3) // void
    Cache.get(K) // 3
    Cache.set(K, null) // deletes, void
    Cache.get(K) // null
    Cache.set(K, false)
    Cache.get(K) // false

Only certain values of mine are going to potentially be null from the database, all of which will be contained within serialised objects.

I just never see the issue the author has. The times K do see it are when people get too clever with default values.

I understand it, I just don't see it in practice. Certainly my not frequently enough to make language changes.

Title should just read "Stop abusing null" because the only time I've seen it be an issue is when people are dual encoding meaning.

lmm · on Dec 5, 2018

> In fact I don't think I ever treat it as anything other than that in code either.

> Was the purpose of null ever to mean anything other than I have not been defined/set?

Different people understand null differently (it might mean "error", "value not in map", "invalid user input", ...) and there's never been a clear consensus. If "null" only ever has one meaning anywhere in your codebase, and any third-party libraries you use only ever use it to mean the same thing, you're probably ok. But as soon as there are multiple meanings you'll have confusion and bugs.

> Specifically, with the caching problem, provided you constrain the cache to reason about null == not set. I see no problem.

If you only have the one cache, sure. As soon as you have a two-level cache you start to have problems (you can no longer cache absent results, since you're using the same representation for absence from your outer cache). Or as soon as null shows up anywhere else.

It's the same problem as stuff that relies on evaluation order, or threadlocals: it's ok most of the time, as long as you're not combining it with something else that does the same thing. The trouble is the times when these noncompositional constructs break down are when you have complex nested code - which is precisely when you most need everything to work the way you'd expect.

lugg · on Dec 7, 2018

> Different people understand null differently (it might mean "error", "value not in map", "invalid user input", ...)

Null only has one meaning, null. That's the point.

As soon as you start applying more to it than that you get problems.

> and there's never been a clear consensus.

This is simply not true. Null is null. That is all. Period. End of story. It has never been more than that.

If you have libraries, functions or existing code that ignores this, then that's on you, the developer, to reason about.

I guess I'm taking your meaning a bit out of context, I do understand years of common practice have resulted in much abuse but I don't think the language designers would have ever denoted double meanig in null values.

> If you only have one cache, sure

I feel like you're missing my point. If you need to handle more meaning than null == notset/unset/absent then you need to resort to a new data type. Null only has one meaning, null. You can't get two meanings out of one.

You, as the consumer of the cache must then decide on how to represent or encode further meaning. Either using the Some<T> pattern or an empty string or something like that.

This serialisation can easily be wrapped around a base cache class that just deals with simple storage where null == not set or unset (absent value.)

But the underlying pattern shouldn't involve itself with further concerns than it needs to.

This opinion is precisely because I've seen this sort of oh I'll just add a has function, oh and then I'll add a sub par serialisation library. Ok now I need a sometimes unserialize. Oh some legacy? Ok now I need to deserialize once to one level and twice to all levels. Oh yea, now I should throw not found.

Ok now every single call to cache.get must be wrapped in try catch and we must cast some values to false and others to empty string, oh yea and you have to call has before every get even if you just want to take advantage of a dynamically typed language and test for that falsely value. Cache.get(test) && okdothing();

It's two functions set sets the thing. Get gets it. If it's not there it returns null. That is the whole contract. Why do people try to over complicate the base contract? It's just crazy over engineering.

> which is precisely when you most need everything to work the way you'd expect

Indeed. And I expect null == null.

Not null == not yet set, set but cleared, set but empty, false, error, not found, or anything else for that matter.

I can use null to represent that my cache does not have a value for that key because that is the design I chose that

lmm · on Dec 9, 2018

> I guess I'm taking your meaning a bit out of context, I do understand years of common practice have resulted in much abuse but I don't think the language designers would have ever denoted double meanig in null values.

The language designers didn't give any single clear meaning to null. They just put it in the language, and so different library authors (entirely understandably) used it for different things, and it's now impossible to standardise on any one universal meaning.

> I feel like you're missing my point. If you need to handle more meaning than null == notset/unset/absent then you need to resort to a new data type. Null only has one meaning, null. You can't get two meanings out of one.

Indeed, because null is a language-level special case. (Whereas using Option you wouldn't have any problem: Option is just another normal user-defined type in the language, so Option<Option<T>> works no differently from any other Option).

> You, as the consumer of the cache must then decide on how to represent or encode further meaning. Either using the Some<T> pattern or an empty string or something like that.

So you have a bunch of awkward complexity in precisely the case where you least want extra trouble. You don't know how many places the cache might assume that null values have its particular meaning, and you have no way to know whether you've got them all. The most dangerous pitfalls in programming are things that usually work.

> It's two functions set sets the thing. Get gets it. If it's not there it returns null. That is the whole contract. Why do people try to over complicate the base contract? It's just crazy over engineering.

There's nothing complicated about using option. Set sets the thing. Get gives you an option that's either some if the thing was set, none if it wasn't. Perfectly normal datatype like you'd write yourself, no special cases anywhere.

> I can use null to represent that my cache does not have a value for that key because that is the design I chose that

Only if you write all you own code and never use anyone else's libraries. And even then, you have to remember all the things you used it to mean in all the places you used it. There's only one null and there's no way to define a user-defined thing that works like null, so it begs to be abused (I'd argue to use it at all is to abuse it, given that it has no particular meaning defined in the language).

jdblair · on Dec 4, 2018

Lua's version of NULL, called nil, once bit me badly due to behavior in a sqlite library. The sqlite library I was using represented SQL NULL as nil, a perfectly reasonable choice. However, in lua there is also a convention to use nil as the end of table sentinel.

This meant innocent looking code using ipairs() would stop iterating on a row of results once it hit a nil, which could occur in anywhere. It meant we were missing data (the lua code uploaded locally collected data to a server) until I figured out the cause and explicitly iterated over the expected size of the table.

empath75 · on Dec 4, 2018

Are there any dbs based on modern type theory?

pdkl95 · on Dec 4, 2018

> NULL is a value that is not a value. And that’s a problem.

The problem isn't NULL, it's languages not enforcing the necessary checks for the "no data" condition. Option can still be NULL ("None" in rust), wrapping NULL in a struct doesn't provide any safety. The safety of Option wrapper types is from the other language features (like rust's "match") and a stricter compiler that forces the programmer to write the NULL check.

NULL would be fine if C required you to write this:

    foo_t *maybe_get_foo(/*...*/) {
        if (/*foo_is_available*/) {
            return foo;
        } else {
           return NULL;
        }
    }

    foo_t *f = maybe_get_foo();
    if (!f) { /*...*/ }   // REQUIRED or compile error
    do_something(f->bar); // only allowed after NULL check

Obviously implementing that requirement would be difficult in C. Languages like Rust were designed with enforcement features (match + None, much stronger type/borrow checking), but lets your have "a value that is not a value".

masklinn · on Dec 4, 2018

> The problem isn't NULL, it's languages not enforcing the necessary checks for the "no data" condition.

Talking about "NULL" pretty much implies that. When Tony Hoare talks about null references, it's about every reference being nullable in languages like Java or C#, not about the ability to conceptually wrap/opt non-nullable references in a nullability thingie.

danesparza · on Dec 4, 2018

Psst: I think the thingie you're referring to is called a 'monad': https://en.wikipedia.org/wiki/Monad_(functional_programming)

bfrydl · on Dec 4, 2018

It isn't. There exists a common monad that solves this problem but a wrapper type like Option or Maybe need not be a monad. For example, `Nullable<T>` in C# is not a monad.

danesparza · on Dec 5, 2018

Aha! You're right. I misremembered this excellent blog series from Eric Lippert (a member of the c# design team): https://ericlippert.com/2013/02/25/monads-part-two/

masklinn · on Dec 4, 2018

No. The thingie may or may not be monadic, but I'm using thingie because it could be a "library" sum type, or it could be a magical-ish builtin, or it could be some other mechanism.

slavik81 · on Dec 4, 2018

I largely agree, but that would get very tedious because there's no way to create a pointer type that is guaranteed to be non-null in C. You'd end up having to do a lot of unnecessary checks because of that.

People say optionals are the solution, but the way I see it it's the other way around. Pointers types that allow NULL are basically optionals, and the problem is that we use them everywhere, even for things that are not optional. What we are missing are pointer types for things that are not optional.

And the pointer type with a guaranteed value needs to be as easy to use as the nullable pointer type, if not easier. Otherwise it won't always be used when it should be.

jhomedall · on Dec 4, 2018

> Option can still be NULL ("None" in rust)

'None' in Rust is part of the Option enum, not an equivalent to null.

  pub enum Option<T> {
      None,
      Some(T),
  }

estebank · on Dec 4, 2018

It can be at the machine level. If T is NonZero, then sizeof(Option<T>) == sizeof(T).

SamReidHughes · on Dec 4, 2018

You might instead argue that the problem is allowing checks for the null condition. If you make comparing a pointer with null crash, or just remove the keyword to make its use non-idiomatic (or only allow it as special initialization syntax), people won’t casually use null to mean something. It’ll serve its role as a pointer value that crashes cleanly if you dereference it.

pierrebai · on Dec 4, 2018

Not very difficult. Just provide a not_null pointer attribute, just like const. Then require that all dereferenced pointers must have the not_null attribute. Problem solved.

(Other note: C++ has a not_null pointer-like type: it's references. Unfortunately, C++ references cannot be reseated, which makes wholesale replacement of pointers not feasible. Plus, the language doesn't actually forces you to check pointers before assigning to a reference.)

vmchale · on Dec 4, 2018

> Option can still be NULL ("None" in rust), wrapping NULL in a struct doesn't provide any safety. The safety of Option wrapper types is from the other language features (like rust's "match")

The advantage here (especially true in Haskell) is that you can use monadic error handling to make this far more pleasant.

masklinn · on Dec 4, 2018

That's a secondary nicety (alongside the ability to make everything "nullable" not just the references/pointers subset), the primary advantage is that:

1. it clearly separates "nullable" and "non-nullable" providing better modelling tools

2. the compiler assists/mandates null-checking, providing better type-safety

sidlls · on Dec 4, 2018

Even Rust doesn't have the strictness in your comment. It's perfectly fine by the compiler to make use of `x.unwrap()`: if x is None (or Err, in the case of Result), you'll just get a panic at runtime. The features you note are superior to C's offering, but purely optional.

dbaupp · on Dec 4, 2018

There's a fundamental difference between the Rust approach of the library providing a function for opting-in to potential crashes and the C/Java approach of not distinguishing that case at all. The programmer still is forced to write a null check, it's just a check that crashes the program.

chongli · on Dec 4, 2018

No, the problem with null is the inability to enforce, at the type level, that a particular value is not null. C simply does not have a type for "guaranteed valid pointer to x".

kzrdude · on Dec 4, 2018

C allows defining new types though, so it can be done.

mhh__ · on Dec 4, 2018

It can only really be done on a per codebase level, really.

By which I mean, in my highly sensitive and correct program xyz I can (say) abstract all access to a type (A struct?) through a macro which would contain null checking/assert-ing. This could work perfectly well and even be static checked for (in principle).

However, if I tried to package this type up in a library it all basically falls flat - or at least it would be very easily broken.

Doing this like this is to static analysis(/The Type system) as foreplay is to sex.

arcticbull · on Dec 4, 2018

Unless I'm misreading OPs statement, it can't be done by defining a new type since C (unlike, say, Ada) doesn't support subrange types.

chongli · on Dec 4, 2018

Could you provide an example? Either source code or a link. I want to see how you might define a type for "guaranteed to not be null pointer to char".

kzrdude · on Dec 8, 2018

I'm thinking of just:

    struct char_nonnull_t { char *ptr; };

    char *char_nonnull_as_pointer(struct char_nonnull_t cp) { return cp.ptr; }

With the required extra accessor functions. Of course in C, we don't have generics, data hiding or other nice features, so we have to build a wall of conventions around our types instead. Types with invariants are totally possible in C. We just don't have a way to automatically enforce them. It's C, after all.

empath75 · on Dec 4, 2018

Since every type can be NULL that’s a lot of verbosity.

ummonk · on Dec 4, 2018

Yup, this is what happens when you use Typescript with strict null checking.

r_c_a_d · on Dec 4, 2018

Or add exceptions like C++ did,

if (foo_is_available) return foo; else throw FooNotAvailable();

r_c_a_d · on Dec 5, 2018

Not sure why I am getting downvotes for this. The parent proposed a language feature to support an enforced code execution when a return value was not available:

    foo_t *maybe_get_foo(/*...*/) {
        if (/*foo_is_available*/) {
            return foo;
        } else {
           return NULL;
        }
    }

    foo_t *f = maybe_get_foo();
    if (!f) { /*...*/ }   // REQUIRED or compile error
    do_something(f->bar); // only allowed after NULL check

And I was pointing out that C++ exceptions do exactly that.

    foo_t *maybe_get_foo(/*...*/) {
        if (/*foo_is_available*/) {
            return foo;
        } else {
           throw FooNotAvailable();
        }
    }

    try {
        foo_t *f = maybe_get_foo();
        do_something(f->bar); // f cannot be NULL
    }
    catch (const FooNotAvailable& ex) {
        /*...*/
    }

Sorry if that wasn't the answer you wanted because C++ is not fashionable. But it is a valid option for how to avoid the use of NULL.

captainmuon · on Dec 4, 2018

I've made my peace with null. Null is basically just an implicit

    assert(valid(x))

before every time you call a method on x. Similary, I think of exceptions as explicit "crash-unless-caught" commands.

If you write your program with the "blow up early" mentality anway, or use static checking tools and a bit of discipline, I've found that null looses it's terror.

gre · on Dec 4, 2018

“Looses its terror” here means the opposite of “loses its terror”.

piyh · on Dec 4, 2018

Cry havoc and let loose the null pointer exceptions

arcticbull · on Dec 4, 2018

In market terms, sir, you've entered the capitulation phase haha. It's actually not correct to say that accessing null will always blow up. In embedded systems without memory protection address 0 may well contain valid data, usually a vector table. In WASM address 0 is totally valid also, if I'm not mistaken, as memory is represented as a big ol' array with an offset and checking for 0 would be too inefficient.

kzrdude · on Dec 4, 2018

If it's C, it's far more terrifying than that.. there is no assertion! just a vauge threat that something will go wrong if you stick a null in there, with no guarantees and no checks.

masklinn · on Dec 4, 2018

> I've made my peace with null. Null is basically just an implicit

That very much depends on the language:

1. it can be a compile-time error (Swift)

2. it can be a runtime error (java, C#)

3. it can depend on what you're actually using (Python, Ruby, and on runtime extensions you might have loaded in the latter case)

4. it can be a no-op (objective-c)

5. it can depend on the combination of platform, compiler and surrounding code going from a segfault to deleting your program's security and/or causality (C, C++)

6. it can depend on the implementation and exact codepaths (Go)

etc...

lmm · on Dec 4, 2018

Sure, null can be a non-problem for the low, low price of 3 or 4 extra lines on each function. But those extra lines distort your program architecture, pulling you away (perhaps without even noticing) from short, composable functions.

lawn · on Dec 4, 2018

Oh I'm fairly disciplined. But we use a 15 year old code base which should've been made with more discipline than it was.

Additionally the program shouldn't "blow up early" but it was in many ways coded that way.

pjmlp · on Dec 4, 2018

That is a very big IF, considering how many teams are actually structured.

doubletgl · on Dec 4, 2018

Isn't there an inherent need in programming to express an explicit "nothing" value? Coming from Python and JS, I never found None/null to be much of a problem. I in fact like the distinction of null and undefined in JS. Using null allows you to distinguish from the accidental undefined.

goto11 · on Dec 4, 2018

The major criticism towards NULL's does not apply to dynamic languages. The problem in languages like Java is that nullability is not represented in the type. This criticism is not relevant in a language without static type checking. The criticism is also not relevant in a language like TypeScript where nulls are specified in the type.

In short, the problem is not nulls per se, the problem is static type systems which does not allow you to specify if a certain value can be null or not.

But I hereby predict that soon people are going to declare that "nulls are bad" and eliminate them even in languages where they are totally fine.

chc · on Dec 4, 2018

The OP specifically calls out ways that nulls cause problems in dynamic languages. In fact, they're very similar to the problems nulls cause in statically typed languages, just ignoring some of them because danger is already priced in with a dynamic language.

Dynamically typed programs are usually informally "duck typed," since it's impossible to do something meaningful with truly arbitrary types most of the time. But just like in statically typed languages, there's always the very real possibility that your duck will instead be a null — and the bug is not the function returning a non-duck, it's you ever assuming you have something that quacks like a duck.

goto11 · on Dec 4, 2018

Well in a dynamically typed language there is also the possibility than something you expect to be an integer is in fact a string. Null or Nil or None is not different in this respect than any other type.

But of course you can make bugs involving nulls. The OP shows an example in a dynamic language where null is returned instead of the regular value if a lookup key is not found. Obviously this is ambiguous when the found value can also be null. But the problem is not the null per se, the problem is using a sentinel value to indicate a special condition when the same value is also a legitimate regular value. This is just bad API design.

cultus · on Dec 4, 2018

The way Clojure handles nulls is ideal for a dynamically typed language. In Clojure, a nil is a perfectly valid object, and it is also a sequence (it's the empty list equivalent). That means that most standard library functions have no problems letting nil flow through them, returning nil. It winds up being quite similar to monadic Option/Maybe, although with no guarantees.

zoul · on Dec 4, 2018

The key word here is explicit. Explicit Maybe/Optional types from Haskell, Swift or Rust are wonderful. After my experience with Elm and Swift, the mere thought of going back to implicit nullability everywhere hurts.

sitharus · on Dec 4, 2018

No, there’s no inherent need for every variable to allow a nothing.

There is a need for a nothing value in many cases and languages without an implicit null, such as Haskell, F# and Rust, employ an ‘option’ type. It’s a dedicated type that either contains a value or nothing. It forces you to declare when you expect a potential null and to check for it.

doubletgl · on Dec 7, 2018

I didn't use the word variable. I think we're talking about the same thing here.

dataflow · on Dec 4, 2018

In C++ I practically never feel like I need a null for anything but a pointer. A null string or integer makes no sense to me.

cmrdporcupine · on Dec 4, 2018

std::optional has the potential to eliminate many of the null-pointer uses.

dataflow · on Dec 4, 2018

If my pointer already supports NULL, why would I put std::optional on top of it? I was saying I don't need a null for anything but a pointer.

cmrdporcupine · on Dec 4, 2018

Mainly to avoid scenarios where a pointer is used instead of a value type only because the author wishes to express that it is optional.

gnulinux · on Dec 4, 2018

Because it's a 0 overhead abstraction and guard against the case when you (or a coworker) forget to check for null. Are we seriously gonna pretend like you never caused a null pointer exception?

dataflow · on Dec 4, 2018

I don't see how it's zero overhead? And honestly while it's obviously not never, it's not exactly frequent either.

gnulinux · on Dec 5, 2018

It's a zero overhead abstraction because it'll compile to the same code.

masklinn · on Dec 4, 2018

std::optional doesn't do that though, you can deref an empty optional and the result's the same as deref'ing an empty unique_ptr or a null pointer: UB.

basil-rash · on Dec 4, 2018

But it makes you stop and thing before doing so. Thats the whole point.

masklinn · on Dec 4, 2018

It doesn't do that any more than a pointer

masklinn · on Dec 4, 2018

std::optional is a null-pointer use, it has the exact same semantics.

kazagistar · on Dec 4, 2018

Even in dynamic languages, it's generally bad practice to have functions that return multiple possible types, and you are often better served by returning a more proper "null object" that accepts the same messages/methods as what would normally be returned. But for the most part, the problem people have with null is related to compile time checks and it not being opt in, which is unrelated to dynamic languages where every return type cold contain... whatever.

doubletgl · on Dec 7, 2018

I agree that returning the same type, if possible, is better, since you can model it so that one specific value (zero, empty string, etc.) expresses the "nothing" accordingly. But I have found that several other developers don't share this view and prefer functions to return false, null, etc. instead of a value in the same space as the normal return type.

enriquto · on Dec 4, 2018

> Isn't there an inherent need in programming to express an explicit "nothing" value?

In numerical calculus: yes, most certainly. What do you expect the result of log(-1) to be? The alternative is to use specially tagged particular numbers as "no-data", and pray so that they do not appear naturally as the result of computations.

kazagistar · on Dec 4, 2018

Technically, you could force the value to be one constrained to a valid range, rather then augmenting the domain, but this is a lot of work and maybe not worth it for pracial use.

da_chicken · on Dec 4, 2018

Some systems already do that. C#'s Double type includes Double.NaN, Double.NegativeInfinity, and Double.PositiveInfinity. Math.Log(-1) does indeed return NaN.

TheCoelacanth · on Dec 4, 2018

That doesn't mean numbers need to always be nullable. It means that `log` doesn't return a number, it returns a nullable number.

syrrim · on Dec 4, 2018

>What do you expect the result of log(-1) to be?

i pi, or more generally i pi n for odd n.

gpderetta · on Dec 4, 2018

Or exceptions.

jmfayard · on Dec 4, 2018

Progress, it's now a solved problem in modern languages like Rust, Swift or Kotlin See for example: https://kotlinlang.org/docs/reference/null-safety.html

masklinn · on Dec 4, 2018

I mean it's been a solved problem since the 70s if not earlier, the problem has always been uptake. And that is not solved e.g. C++ recently introduced std::optional, which is not a type-safe version of a null pointer but is instead a null pointer wrapper for value types.

prmph · on Dec 4, 2018

It is not possible to have a NULL type that works for all situations and has stable semantics.

The issue is, NULL should be a concept, not a value. I see no problem with using sentinel values, so long as they are well designed, and such good design comes with skill and experience, just as with all other aspects of architecture. The quest to have a single value that can be used for all the various possible meanings of NULL, to me, is the root of the problem.

lisper · on Dec 4, 2018

> The quest to have a single value that can be used for all the various possible meanings of NULL, to me, is the root of the problem.

Exactly right. In particular, the conflation of nulls to indicate both error and non-error conditions (e.g. out-of-memory vs end-of-linked list) makes it impossible to distinguish errors from non-errors in many situations, and that is obviously bad.

Ideally you want nulls/sentinels that carry information about where, when, and why they were generated. You want separate nulls for numerical overflow/underflow, end-of-linked-list, out-of-memory, timeout, suppressed error/exception, unpecified/unknown value (preferably a separate one for each type) yada yada yada.

nayuki · on Dec 4, 2018

I agree with pretty much everything in the article. However, I would give Java a lower score because no one uses java.lang.Optional in practice, and there is too much legacy libraries and application code that cannot or will not be changed. Also, the @NotNull annotation isn't in Java SE; it is made available through various third-party libraries.

A language with a null value can dramatically simplify things for a language designer, though. In the case of Java, we know that every array of objects is initialized to null references. Thereafter, we can construct and assign objects to each slot of the array. Otherwise we run into issues that C++ faces - when we construct the array, the field of every object is uninitialized, so they are potentially dangerous if read or destructed, and need the special syntax of placement new to be initialized. The trick to avoiding null here is to avoid pre-allocating an array, and instead to grow a vector one element at a time. The C++ std::vector<E> is very accessible and performant, whereas Java java.util.List<E> is very clunky to use compared to native arrays.

Another case that gets simplified is object construction. When the memory for an object has been allocated but before the user's constructor code has run, what values should the fields have, assuming that they are observable? In a Java constructor, all fields are initially set to null/0, then you simply assign values to fields in the body code of the constructor. In C++ constructors however, you should initialize fields in the initializer list, and then you have still have the option to initialize fields in the body.

I still think pervasive null values are bad for the programmer (rather than the language designer). Now that I have preliminary experience in Rust, I see that its design is much safer and still practical, so I think this language shows the way forward.

arcticbull · on Dec 4, 2018

Re: Array Initialization

One approach you can take is the Rusty "hang up a technical difficulties sign" (unsafe) while you mess around with potentially uninitialized memory, which is valid, but places the burden on you as the library writer. Another would be to initialize your array of pointers as an array of Option<Box<T>> pre-filled with None. Due to pointer alignment you can actually optimize Option<Box<T>> by turning it into a tagged pointer (which I believe is what Rust does) so that None == null at the machine level, while the language exposes a safe interface on top. [1]

Re: Object Construction

With object construction in Rust you can either (a) create all fields in advance and specify them at construction [best] (b) use mem::uninitialized() [bad] or (c) create a builder which has optional fields for everything and yields a constructed option via 'a' later [most work].

[1] https://doc.rust-lang.org/std/option/

bluejekyll · on Dec 4, 2018

It appears that Java will eventually get Value types, which will allow for Optional to be defined on the stack, at least making it actually do something useful.

beardyw · on Dec 5, 2018

That 'nothing' is inconvenient applies the same in mathematics with zero. Why do we have to have a number we can't divide by? What is 0 to the power of 0? It's a special case we always need to worry about. But its inclusion in the number system is not in question.

And I remember my distress using a financial package being told that my unused zero value still MUST have a currency! My pocket is empty, how can it have a currency? If a farmers field is empty - must I say what it is empty of - cows, sheep, aardvarks?

I think worrying about inconsistency here is worrying about the inconsistency of the world we live in. 'Nothing' is a mysterious thing we need to accept and respect.

billpg · on Dec 5, 2018

I recall having a friendly argument with a friend who insisted that 30°C was exactly twice as hot as 15°C.

beardyw · on Dec 5, 2018

It looks as if even in temperature zero is a bit problematic. https://en.m.wikipedia.org/wiki/Zero-point_energy

gambler · on Dec 4, 2018

Rich Hickey's "Maybe Not" should be watched by anyone who thinks nulls/nils/undefineds are okay. It should also be watched by anyone who think that Optional/Maybe/Nullable and good enough:

https://www.youtube.com/watch?v=YR5WdGrpoug

userbinator · on Dec 4, 2018

To someone who has been using Asm and C for decades, these arguments just make no sense. Reading this article reminds me of the arguments against pointers, another thing that's frequently criticised by those who don't actually understand how computers work and try to "solve" problems by merely slathering everything in thicker and thicker layers of leaky abstraction. It's not far from "goto considered harmful" either.

any reference can be null, and calling a method on null produces a NullPointerException.

...which immediately tells you to go fix the code.

There are many times when it doesn’t make sense to have a null. Unfortunately, if the language permits anything to be null, well, anything can be null.

That's not an argument. See above.

3. NULL is a special-case

...because it indicates the absence of a value, which is a special case.

though it throws a NullPointerException when run.

...and the cause is obvious. I'm not even a regular Java user (and don't much like the language myself, but for other reasons) and I know the difference between the Boxed types and the regular ones.

NULL is difficult to debug

Seriously? A "nullpo crash" is one of the more trivial things to debug, because it's very distinctive and makes it easy to trace the value back (0 stands out; other addresses, not so much.) What's actually hard to debug? Extraneous null checks that silently cause failures elsewhere.

The proposed "solution" is straightforward, but if you reserve the special null value to indicate absence then you can make do with just one value instead of a pair, of which half the time half of the value is completely useless. If you can check for absence/null, you will have no problems using Maybe/Optional. If you can't, Maybe/Optional won't help you anyway --- because it's ultimately the same thing, using a value without checking for its absence.

cyphar · on Dec 4, 2018

> another thing that's frequently criticised by those who don't actually understand how computers work and try to "solve" problems by merely slathering everything in thicker and thicker layers of leaky abstraction

I think that you're being quite unkind. Haskell's Maybe type and Rust's Option types are very far from "leaky abstractions" and were developed by people who definitely understand how computers work. In fact, your description of them appears to indicate that you aren't really sure how they work (None doesn't take up "half of the value") -- the point of typeclasses is that cases where NULL is a reasonable value are explicit and your code won't compile if you don't handle NULL cases. Allowing NULL implicitly for many (if not all) types is where problems lie.

It also appears you're arguing that languages which don't have models that are strictly identical to the von Neumann architecture are "merely slathering everything [with] leaky abstraction". Would you argue that LISPs are just leaky abstractions?

giornogiovanna · on Dec 4, 2018

Yep! It's worth restating the fact that wrapping a pointer in an Option in Rust actually takes up NO extra space, because Rust's smart enough to just optimize it back into a nullable pointer.

https://doc.rust-lang.org/std/ptr/struct.NonNull.html

ummonk · on Dec 4, 2018

Well, a better comparison would be Typescript with strict null type checking enabled. You just use algebraic data types to specify whether a value can be null.

bfrydl · on Dec 4, 2018

`Option<T>` in Rust is basically identical to `T | null` in TypeScript.

flipgimble · on Dec 4, 2018

If your whole world is asm and C, then I take it you don't care much about type systems. Bless your heart, you lonely programmer of ephemeral software, may you be employed gainfully fixing your own bugs for decades. The saltiness if mostly for entertainment, please don't take too much offense. For everyone else working at a level of complexity where mistakes are inevitable and costly, types are an essential bicycle for the faulty minds of programmers.

The article is not arguing that we shouldn't express or model the absence of a value. It is arguing that all types having to support "no-value" case leads to error prone code that has historically cost immeasurable amount of money and much credibility and respect. If everything can be null then it takes too much effort to null check every reference, so developers routinely forget or think they know better. Instead it argues that we should model the idea a possible empty values as a separate composable type. Then you can write a large percentage of your code with a guarantee that objects/type/values are never going to be nil, while still handling that possibility in the smaller error checking parts of your code base.

One interesting anecdote is that our team, working in Swift, had to integrate a crash reporting tool and verify that it works. The challenge was that we haven't seen a runtime crash in several months in production.

> A "nullpo crash" is one of the more trivial things to debug

If it happens in your debugging environment in front of your eyes then maybe. Some of us work on software that is used by millions over decades and would never get to see any reports from a majority of crashes.

tjoff · on Dec 4, 2018

Not sure what world you are from. But here on earth apparently garbage collection and javascript is all the rage ;)

Issues with null doesn't even register in comparison.

badestrand · on Dec 4, 2018

> > There are many times when it doesn’t make sense to have a null. Unfortunately, if the language permits anything to be null, well, anything can be null.

> That's not an argument. See above.

It is actually their best point, IMO. I really like how RDBMS/SQL solve this: fields hold values and you specify beforehand whether they can hold NULLs. The author is right, sometimes it does not make sense for variables to be null-able (think ID fields or usernames) but often it does (e.g. a user's avatar). Being able to indicate that would be a nice idea. C++ for example does that, as `Field x` is not nullable but `Field* x` is.

hesselink · on Dec 4, 2018

The thing is, you're focusing on when you've detected that there is an issue (a crash). A lot of the issues with NULL are the fact that you can't easily detect if beforehand. It's not indicated in the types, or the syntax. That means that it's incredibly easy for a NULL issue to sneak into an uncommon branch or scenario, only to be hit in production.

nsajko · on Dec 4, 2018

But why was the scenario not tested before production? Should that not be the case anyway?

cultus · on Dec 4, 2018

You can never guarantee you really have 100% test coverage in all scenarios in complex software.

pjc50 · on Dec 4, 2018

Indeed. It gets asymptotically more expensive. Whereas a typechecker is a system of tests that's able to "cover" 100% of the code.

empath75 · on Dec 4, 2018

It’s cute that you think tests find all problems.

Ygg2 · on Dec 4, 2018

   > ...which immediately tells you to go fix the code.

Assuming your code isn't deeply nested. I've seen cases where null was triggered years after code went into production. In that case you have to:

A) Assume value isn't null and have more readable code

B) Litter the code with null checks.

e.g.

    if (a.getStuff().getValue() == "TEST")

becomes

    if (a != null && a.getStuff() != null && a.getStuff().getValue() == "TEST)

Thing with Maybe/Optional you have to check for presence of None, otherwise your code won't compile. Another smart way is what C# did. Integer can't be null. Integer? can be null.

seanalltogether · on Dec 4, 2018

But expanded null checks could be automated by the compiler if so desired right? Without having to change the nature of null into an optional.

@MAYBE if(a.getStuff().getValue() == "TEST")

lmm · on Dec 4, 2018

You could check every value for null, sure. But a) why would you want to? (and wouldn't it be bad for performance) and b) how would you handle it? Knowing that a value somewhere in your program was null doesn't really help you any.

lmm · on Dec 4, 2018

> ...which immediately tells you to go fix the code.

But which code? The point where you observe the error could be many compilation units away from the code that's broken; it might be in a separate project, or even 3rd-party code.

> ...because it indicates the absence of a value, which is a special case.

Why does it need to be a special case? Is your language incapable of modelling something as simple as "maybe the presence of this kind of value, or maybe absence" with plain old ordinary, userspace values?

> Seriously? A "nullpo crash" is one of the more trivial things to debug, because it's very distinctive and makes it easy to trace the value back (0 stands out; other addresses, not so much.) What's actually hard to debug?

"Tracing the value back" is decidedly nontrivial. And totally unnecessary if you just don't allow yourself to create that kind of value in the first place.

> if you reserve the special null value to indicate absence then you can make do with just one value instead of a pair, of which half the time half of the value is completely useless.

What do you mean? If you're talking semantically, you want absence to be a different kind of thing from a value: it should be treated differently. If you're talking about runtime representation, you can pack an Option into the same space as a known-nonzero type if you want to (Rust does this), but that's an implementation detail.

(Confusing sum types with some kind of pair seems to be a common problem for programmers who haven't used anything but C; sum types are a different kind of thing and it's well worth understanding them in their own right).

> If you can check for absence/null, you will have no problems using Maybe/Optional. If you can't, Maybe/Optional won't help you anyway --- because it's ultimately the same thing, using a value without checking for its absence.

Nonsense on multiple levels. Maybes deliberately don't provide an (idiomatic) way to use them without checking. By having a Maybe type for values that can legitimately be absent, you don't have to permit values that can't be absent to be absent, and therefore you don't have to check most values - rather you handle absence of values that can be absent (the very notion of "checking" comes from a C-oriented way of thinking and isn't the idiomatic way to use maybes/options) and don't need to consider absence for things that can't be absent.

pjc50 · on Dec 4, 2018

The whole point of using type systems is to prevent human errors; a "poka-yoke" for programming.

The great advantage of Maybe/Optional systems is that only some of your references have to use them. You can draw a clear boundary between the parts of the code that have to check everything, and those that can prove it's already been checked.

In assembler we have no real type annotations, but for a long time I've considered trying to design a type-checking structure-orientated assembler.

taco_emoji · on Dec 4, 2018

> ...because it indicates the absence of a value, which is a special case.

But that's exactly the problem -- it's special, meaning it's only useful for certain situations. An ideal type system would provide compile-time guarantees, rather than having to wait for users to report issues. A type system which A) allows you to define variables as non-nullable and B) requires a null guard before every dereference of nullable variables eliminates this entire class of bug. What on earth is wrong with that?

EDIT:

> Seriously? A "nullpo crash" is one of the more trivial things to debug

Even if this were true [0], wouldn't it be easier if such a crash was just never even possible?

[0] which it's not - all a nullpo stack trace tells you is that something important didn't happen, at some point before this

setr · on Dec 4, 2018

In general, it's difficult to make the case that runtime errors are preferable over compile-time errors, unless the difficulty to enable compile-time errors is significant. In the case of Optional<> as a language/DB type, I can't imagine much effort involved, except for uptake.

Especially given that it can trivially be optimized out, since the eventual assembly should very well make use of the 0x0 property. But if you can encode the guarantee in your "high-level" language, why would you not want it?

In fact, I'm not sure how anyone could imagine assert(n != null) scattered throughout the codebase is a pleasant situation, unless of course, as most do, you're skipping the safety check for unsafe reasons.

TickleSteve · on Dec 4, 2018

Completely agree, this is CS theory gone off the deep end...

cultus · on Dec 4, 2018

A simple, safe way of tracking nulls like option is "CS theory going off the deep end?"

Implicitly allowing all code to return nothing, and manually trying to remember what can return null and what can't, and checking that value, is incredibly error prone. It's really crazy that this has been the dominant way of handling the problem for many decades when their is a dead-simple way of ensuring it can't happen.

Null pointer errors, contrary to many claims, show up in production code all the time. Eliminating them is of huge value.

TickleSteve · on Dec 4, 2018

The NULL pointer errors yo're referring to in most cases resource issues. i.e. malloc returning NULL.

This is not the source of the vast majority of pointer errors.

Checking for (and trapping) NULL pointer dereferences is trivial, what is more difficult is the rest of the pointer range that doesn't get checked but is equally invalid, i.e. the other 4-billion (32-bit) possibilities.

Non-NULL-pointer checks are much more important than NULL checks.

The world of pointer issues is very much greater than "ASSERT(ptr!=NULL)".

...and as for correct error-recovery (not error-detection), well, don't get me started.

cultus · on Dec 4, 2018

>This is not the source of the vast majority of pointer errors. >Checking for (and trapping) NULL pointer dereferences is trivial, what is more difficult is the rest of the pointer range that doesn't get checked but is equally invalid, i.e. the other 4-billion (32-bit) possibilities.

I think we write vastly different types of software. I can assure that that null-related errors are extremely common in situations besides resource issues. If it were just a resource-related problem, garbage-collected languages would almost never have issues, yet Java is infamous for NPEs. In Scala, where Options are ubiquitous, I've literally never had a single NPE.

It is very common for libraries to return null just to represent the absense of a result (ex: a row returned from a SQL query has no value for a column). That sort of thing means you have NPEs wholly unrelated to malloc or anything similar. These nulls are expected under normal program operation. They aren't errors. So, it's crazy to not to let the type system assist you in checking for nulls, so you don't forget and wind up with a NPE.

TickleSteve · on Dec 4, 2018

Two things are getting conflated here.

Pointer issues (that I was referring to) and a failure indication.

The most trivial pointer issue is a NULL pointer. This is such a trivial issue to catch its hardly even an error, yet people use that case as the exemplar for NULL issues.

detecting (and handling) failures on the other hand is very much different and more in the spirit of what the option-type arguments are about. In that case, the difficulty is not in detecting the error (that option-types will help with) but the application-level recovery. that is nothing that the language aid you with, its system-design and architecture related.

Basically, its the wrong issue to be thinking about.

cultus · on Dec 4, 2018

>The most trivial pointer issue is a NULL pointer. This is such a trivial issue to catch its hardly even an error, yet people use that case as the exemplar for NULL issues.

How can you claim that NPEs are "hardly ever an error." NPEs are the most common error there is! They are indeed easy to catch, but you need to do so nearly everywhere, obscuring the code and introducing potential for error. There is no real, conceptual difference between something like a malloc returning null or a database query result containing a null. It is the same thing.

A null absolutely is an error if you don't catch it. By not using Options, it's vastly easier for that to happen.

TickleSteve · on Dec 4, 2018

"hardly even an error"

not

"hardly ever an error".

in other words, NULL pointer errors are a trivial error to deal with.

taco_emoji · on Dec 4, 2018

They're even easier to deal with if your type system guarantees you can never get them in the first place.

cultus · on Dec 4, 2018

But yet it is the most common type of bug, even in production. Clearly, it's not that easy to deal with for human programmers. What's the problem with letting the compiler help you?

giornogiovanna · on Dec 4, 2018

Just look at the example on Wikipedia[0]. Tagged unions are super-simple, and make NULL completely unnecessary. NULL causes lots of headaches, while tagged unions have never caused anyone headaches, so removing NULL is kind of the obvious thing to do. In a sufficiently advanced language, such as Rust, they get optimized to equivalent code anyway, so there isn't even any performance loss.

[0]: https://en.wikipedia.org/wiki/Tagged_union#Examples

kiriakasis · on Dec 4, 2018

Cannot understand this position... a non-nullable pointer is pretty much just a normal pointer but the compiler checks if you have tested for null. In rust (and I believe Swift) optional references also have the same size as normal pointer. In some version of non-nullables you only need checks for dereferencing and not for other handling.

kiriakasis · on Dec 4, 2018

I might have confused optionality and non-nullability...

gnulinux · on Dec 4, 2018

Calling this "CS Theory" is intellectual dishonesty.

lalaithion · on Dec 4, 2018

Option<&T>, in Rust, will compile down to a C pointer to T, with Nothing represented by the null pointer.

nsajko · on Dec 4, 2018

Also, when you are aiming for full test coverage null dereferences will be caught during testing.

adrianN · on Dec 4, 2018

Nobody does full path coverage, not even NASA.

plopz · on Dec 4, 2018

sqlite is 100% branch test covered

https://www.sqlite.org/testing.html

monocasa · on Dec 4, 2018

Full path coverage is much more difficult than 100% branch coverage. It's next to impossible in any non trivial codebase that wasn't designed specifically for formal verification.

jeromebaek · on Dec 4, 2018

At least C# has the syntactic sugar to easily check for null references which lets you avoid the horrors of code like `(if s != null && s.length)`. Instead you can type `s?.length`. Never have I appreciated syntactic sugar as much.

rusk · on Dec 4, 2018

A lot of higher languages use the concept of truthiness such that null, or 0-length both evaluate to false:

    if (s) then do stuff with s;

ohazi · on Dec 4, 2018

Other candidates:

- null terminated strings

- machine dependent integer widths

altmind · on Dec 4, 2018

I came here to write that ASCIIZ is the more costly design decision than null, that not only led to crashing programs, but also to security vulnerabilities, sloppy APIs that cannot handle "binary data" and subpar performance.

DaiPlusPlus · on Dec 4, 2018

Null-terminated strings are useful for buffering values of unknown length (e.g. copying from a network stream). Length-prefixed strings have issues of their own (e.g. what size length prefix to use, efficient encoding of variable-length integer prefixes, what happens if a length-prefixed string's calculated end-position is outside the process' memory space?)

thaumasiotes · on Dec 4, 2018

> Null-terminated strings are useful for buffering values of unknown length (e.g. copying from a network stream).

That doesn't seem like a good example. The data arriving over the network arrives length-prefixed, because 0 is a legal byte value for arbitrary data. What do you then gain by throwing away your existing knowledge of the length?

saagarjha · on Dec 4, 2018

> null terminated strings

This is mentioned.

> machine dependent integer widths

What exactly do you dislike about this?

pjc50 · on Dec 4, 2018

It's an area of the language which can produce "works on my machine" bugs?

It becomes less important as we all converge on 64-bit machines with 32-bit "int" types, but it's still a monumental pain point for portability.

rusk · on Dec 4, 2018

> What exactly do you dislike about this?

Or what do you even do about it?

A handful of solutions already exist: - Use a higher-level language - Java