Hacker News new | past | comments | ask | show | jobs | submit login
Pointer Free Programming and the Future of Nim (nim-lang.org)
205 points by mindB on Oct 19, 2017 | hide | past | favorite | 151 comments



Hello guys. I'm one of the core Nim developers, together with Araq (the creator of Nim) who should also be hanging around this thread. Feel free to ask us anything.

This submission is likely the result of the recent livestreams that myself and Araq have been making. You can check out my past livestreams[1] and Araq's[2] at the links below. If you're into watching us live the next time we create a livestream then follow d0m96 (https://go.twitch.tv/d0m96) and araq4k (https://go.twitch.tv/araq4k) on Twitch.

1 - https://www.youtube.com/watch?v=UQ4RvUlXIDI&index=3&list=PLm...

2 - https://www.youtube.com/watch?v=E2qlDKm_WzE, https://www.youtube.com/watch?v=UV38gQfcb9c


This one just to thank you for everything, Dom! BTW, your book is on its way to my desk. I briefly skimmed the pdf, but can't wait to get the dead tree version!:)

Don't forget about embedded systems: they could be the killer platform which makes Nim mainstream overnight.


Thanks for grabbing a copy and killing some trees in the best way possible :)

I won't forget about embedded systems, but I'm not sure this platform has that much power to change things overnight


You guys are doing amazing work! Found Nim through a mention on HN recently and having a ton of fun learning and playing with it!


Same here. Nim has this elegance that is hard to qualify and impossibe to quantify, but it makes it a real pleasure to write code in it. Can't wait to get deeper into it.


I like it, too. Just wish mimx was in a better shape. I need rich text with images and links.


I’m not a huge fan of the modern C++ style that obsessively avoids null pointers and instead uses object references that are created in an invalid/empty state. With a pointer there is a single way to represent “nothing”, regardless of what the object represents. With a reference I have to go read the API to understand the “nothingness” states.

Also, when you’re thinking in pointers, it’s easy to add levels of indirection with the same mental tools (pointers to pointers, etc.) Personally I find it’s easier to solve problems with a limited orthogonal toolset than a sprawling array of marginally differently balanced optimizations, which is what most of C++ has become.


While null may be the simplest way to represent nothing between a large amount of data types, I think the biggest issue (beyond how far null can propagate from the source in error scenarios) is that it's overloaded with too many possible meanings. Does it mean nothing was found? Was an error thrown? Is the value itself null? At least with a well defined API, you can determine these scenarios from a glance. I think that handling types like, None, Err("reason"), or Some(None) (using Rust types) is far clearer than looking at a null pointer and an int error code (which one may ignore).


I do like the Rust way because it’s standard (so far anyway). The situation in C++ is more like “a thousand blooms of nullishness”, where each library and API is frustratingly different in terms of something so fundamental. Objective-C with its messageable nil is so much better in this respect.


Messageable nil has a special place in my heart, but as a default mode of operation, it's certainly a fun way to hide bugs.


Messages to nil ... sure, OK, fine. Allow that, but CRASH because someone sent an unhandled message? That’s just mean.

Either crash when messaging nil, or don’t crash when improperly messaging an object.


If you're talking about Objective-C: it doesn't crash. It sends the message -forwardInvocation: to the receiver. (Nowadays it might try a couple of other things first).

Now the default implementation of -forwardInvocation: raises an exception, which in turn is by default unhandled and therefore goes to the default exception handler which logs and terminates.

However, you can implement your own behavior at every level:

1. Override -forwardInvocation: for your own objects (or one of the other mechanisms)

2. Override -forwardInvocation: in NSObject

3. Handle the exception

4. Install your own default exception handler


It doesn't matter how a language “handles nil or ‘not understood’ [sic] messages” - burning the computer would be okay. As with any other undesired behavior, it's your responsibility to prove it doesn't happen.


I think that everyone here knows that programming without bugs is the ideal, but having the language work against doesn't really help in that regard, does it?


I wouldn't count “burning the computer under impossible conditions” as “working against”. It's just the principle of explosion at work, and pretty much any optimizing compiler that takes advantage of undefined behavior uses it.


No. In this case there is no undefined behaviour, the way this is handled is defined, but to the parent it seems inconsistent.

Undesired behaviour can not be really leveraged for optimisation, and in any case, high(er) level languages should not have UB in their design.


If your program's correctness depends on `nil` being handled this or that way, your design is broken. Fortunately, most programs aren't designed this way: messages to `nil` are seldom actually intended, whether the language's semantics considers them a legitimate operation or not, and whether they are “helpfully” trapped at runtime or not.


This is where `Option` or `Optional` types shine. They force you to unwrap the nilable value at compile time, and are zero-cost abstractions. It's not difficult to use option types, either--especially if the language features syntactic sugar to promote their ergonomics.

Rust, Swift, Guava, etc. all get this right. Option types need to become a language feature for all statically-typed languages. C++ should adopt it too.


> This is where `Option` or `Optional` types shine. They force you to unwrap the nilable value at compile time, and are zero-cost abstractions.

They aren't zero-cost abstractions in most languages. Every unwrap() potentially results in a dynamic check, even if one isn't necessary. There are a number of ways you could extend a type system to track the potential cases of sum types, which would help alleviate this cost.


I think in practice it isn't a higher cost: a lot of code in languages with nullable pointers also has a pile of unnecessary dynamic checks (e.g. checking arguments aren't null).


Exceptions (implemented zero-cost style) are truly zero cost. No dynamic checks as long as no exception is thrown.


Exceptions aren't zero-cost either: they require you to be able to unwind the stack, so you have to keep the frame pointer and have one fewer general-purpose register to work with.


You actually don't need the frame pointer to unwind the stack. DWARF frame info allows the compiler to specify tables that describe the stack layout at every program point, and this can be used to perform unwinding.

Another reason why exceptions aren't zero-cost in practice is that compilers need to model the exceptional control-flow, and they usualy don't do a great job in this situation and miss a lot of potential optimizations.


The cost being compared here is the 'if x.is_none()' or 'if x == null' check, not what happens after it.


Option types for references need be no more expensive than a null pointer check, and I believe this is how they are implemented in Rust. If a dynamic check is not necessary, then you simply do not use an Option type.


In C/C++, you quite often end up in situations when some other condition implies that a pointer is nonnull. In Rust, the equivalent code with Option would require an unnecessary dynamic check.


That's just a problem with ordering. If a condition implies a non null pointer then a non null pointer can imply the condition as well. This is true unless the condition being false does not guarantee a null pointer, in which case the condition is not useful. If some condition implies multiple non null pointers then the option type can be defined on a tuple of references


> in which case the condition is not useful

I contest. The reason for the condition need not be only to indicate whether the other value isn't null; if it was, you could just remove the condition and check nullity. In fact, unless there is either a case where the condition is true and the pointer is null or a case where the condition is false and the pointer is non-null, the condition is useless because checking the pointer would've been enough.

Consider the following silly example, where we have a function

    f(Conductor *c, bool isDriving, Train *t)
with the guarantees that c != NULL, and isDriving==true => t!=NULL. Note that the case isDriving==false does not imply t==NULL; indeed, c could just be waiting for the signal to start driving.


One can express the states more clearly with a slightly upgraded Option/Optional/Maybe, i.e. a full sum-type/enum. For instance, that function could be, in Rust syntax (but the same thing works in Haskell, OCaml, Swift, ...):

  enum ConductorTask {
    Nothing,
    Waiting(Train),
    Driving(Train)
  }
  fn f(c: Conductor, task: ConductorTask)
I think this says everything that your text says, but in a way the compiler understands and can assist the programmer with.

Alternatively, since isDriving only makes sense when there's a train, t could be Option<(bool, Train)>, i.e. an optional train along with a boolean for if it's being driven or not.


An Option type in Rust forces a particular memory layout by requiring the components to be contiguous, which might not be what you want if you are doing low-level optimizations of data layout.


That's true, and does mean one has to drop down to C/C++-style pointers and manual layout when the default enums don't work. However, that's somewhat orthogonal to the semantics about what they can express, especially for function arguments where the value isn't being stored in memory. One common approach is for an enum to be teased apart to be stored, and then rematerialised at the API surface (e.g. HashMap and it semantically storing an Option<(K, V)>).


So put a ref or a box in your optional.


I don't think adding indirection is what the parent was thinking of. It's more like storing the Option<T> Some vs. None bit in its own separate bitvector along with a packed vector of Ts. This layout, due to padding, may be significantly less memory than alternating bool, T like a vector of Option<T>s gives, but it still retains properties like contiguous layout and minimal allocation/indirection.


isDriving remains connected to t. Specifying an Option reference for t allows you to convey two pieces of information: if None, that isDriving is false and you do not care about t, and if Some, you have train t


Put the condition in the type. If you truly know that the pointer is nonnull, you can explain how you know to the compiler.


If, after inlining and so on, the compiler can prove that the condition implies the non-null/Some case, then the check can be optimised away.

If it can't, then the version without the check is potentially unsafe, and the check is not unnecessary.


While I love option types, retrofitting them into languages which already have nulls has problems.

Ask me about using a Java library that could return an option type, which could be null, so you have to check for both the None return value AS WELL AS null.


> Ask me about using a Java library that could return an option type, which could be null, so you have to check for both the None return value AS WELL AS null.

No, you don't -- because those are two different kinds of null!

If your library's function returns Optional.empty(), then it's because it successfully returned no value. If it returns null, though, then the library has a bug and you should crash instead of trying to continue in a known invalid state.

Without the Optional type signature, the function would return null for both of these, and your program will appear to work even though it's just suffered a bug.


> retrofitting them into languages which already have nulls has problems

I'm not sure I agree with that. See how MS did that in C# 7:

https://www.kenneth-truyers.net/2016/01/25/new-features-in-c...


FYI that didn't ship in C# 7

More info: https://github.com/dotnet/csharplang/issues/36


The situation is known and will be fixed when the JVM gets value types, with minimal value types already on the horizon.

https://wiki.openjdk.java.net/display/valhalla/Minimal+Value...

Until then it is an issue that we have to live with.


When or if? I heard one talk about Valhalla some time ago. If this project succeeds it will either be a miracle of engineering or a nightmare to use.

Just one example: Java's type erasure trick does not work anymore when value types are type parameters, so it affects the build process. The current approach seems to be to construct specific instances at run time (or rather at class loader time). That is very late though, because some information (like return types) are lost after compilation.


When.

The ongoing changes related to value types, AOT compilation, and overall mechanical sympathy are driven from pressure in the Fintech industry. Which has been moving into Java during the last decade, and currently is eyeing other stacks that could given them the benefits of Java alongside those features, like e.g. Pony.

So of course Oracle wants Java to stay relevant in those domains.

You are required to annotate type parameters for the old reference behavior, currently that would be with the any modifier.

Something like

    class Data<any T> {
    }


Retrofitting anything into anything always results in warts. Moral of the story: design things right, right from the beginning.


... which is practically impossible with larger projects that you can't oversee in a glance.


So don't bite more than you can chew?


That's a problem you already have in Java with any other type though. If you have a List or a String or whatever it could always be null.


> C++ should adopt it too.

They apparently got the memo: http://en.cppreference.com/w/cpp/utility/optional


For people using boost, this feature has been available for some time: http://www.boost.org/doc/libs/1_60_0/libs/optional/doc/html/...


std::optional is available in C++17. It has also been available in boost for a veeery long time.

Lack of good sugar for pattern matching sometimes makes its use a bit awkward though.


A new proposal is bubbling up.

https://www.youtube.com/watch?v=HaZ1UQXnuC8


Creating objects in “empty” states is lazy programming. The point of references is to guarantee an object exists. But don’t pretend an object exists just to satisfy the reference. Rethink your code to take advantage of the guarantee.


You've illustrated why I'm against the kind of simplistic programming maxims you'll find in "Effective X" books. It's easy to brand certain syntactic constructs as "bad", but unless you teach people good taste (which is fiendishly difficult to learn), all you do by telling people to avoid certain easily-recognizable syntactic patterns is to get them to come up with new horrors.


> You've illustrated why I'm against the kind of simplistic programming maxims you'll find in "Effective X" books.

Another thing you won't find in this book is the C++ Gospel, the One True Path to perfect C++ software. Each of the Items in this book provides guidance on how to develop better designs, how to avoid common problems, or how to achieve greater efficiency, but none of the Items is universally applicable. ...

If you follow all the guidelines all the time, you are unlikely to fall into the most common traps surrounding C++, but guidelines, by their nature, have exceptions. That's why each Item has an explanation. The explanations are the most important part of the book.

Scott Meyers - Effective C++

The problem is not "Effective X" books. The problems is people treating these rules in these books as dogma.


Not following a dogma pretty consistently in a single code base (or part of a very large codebase) very quickly leads to inconsistent code and bikeshed style of decision making.


That, or it could lead to well architected code that changes according to its requirements. YMMV.

The one thing that does never work is blindly following dogma.


> The problem is not "Effective X" books. The problems is people treating these rules in these books as dogma.

You put a bunch of stuff in a book and people are going to copy the stuff from the book into their programs. Bits of front-matter don't matter. Consider the GoF book and the havoc that mindless repetition of design patterns creates.


Still the problem is the people working mindlessly, not the books.

The alternative would be to not ever improve our knowledge.


As GP said, rethink the code, from as far out as you need to. Granted, much code has been written under the explicit constraint of not doing this. In that case, do the nasty thing and then if it matters that much to you, find somewhere else to work that cares a similar amount about producing maintainable software.


Just a reminder that there are many people who prefer not to think in objects.

Rant: Personally since working on my game engine in C, my architecture has become so much simpler, I can easily avoid unnecessecary allocations. My code is much more coherent. Aspect-oriented - which is not at all OO spirit, since OO encourages (trying) to bundle things in self-contained packages.

Yes, that seems hard to believe, since there is so much bad C code out there. With allocations, strings, file system operations etc. all over the place, just as in typical OO code. (Of course some things are really hard to do in C vs C++. The point is forcing myself to have a good architecture and avoid having to make many meaningless decisions).


The game engine people have the right idea with their entity-component-system [1] model. Essentially, it models the program as a big relational database instead of a graph of objects. It can work pretty well.

[1] https://en.wikipedia.org/wiki/Entity%E2%80%93component%E2%80...


Could you provide more precisions / examples of this C code-style ?


I wrote a huge ranty reply, but deleted it because it leads OT. Basically, read up on the link to ECS in the sibling comment. Also, definitely check out "Data-oriented design". There's a talk by Mike Acton on Youtube. "Where there's one, there's many" is a pretty basic but super important insight.

I consider this to be important to code maintainability at least as much as to performance.


I don't get the buzz about null pointers. They are not a problem. They become a problem when you start checking for null where null is not acceptable (most of the places null can't be a meaningful input), which is where is where the original intent starts to become unclear.

Let it go. Let it crash. Just assume inputs are non-null (except where null makes sense). Even C crashes safely on null pointer dereference.


The problem is that the type system doesn't encode whether a point can be null or not, so unless your documentation is amazing (I doubt it) you'll eventually end up getting passed a pointer from someone else, or giving a pointer to someone else who has a different assumption about whether null is allowed. Boom segfault.

There are several solutions in C++.

1. Always check for null. Kind of annoying and lots of people don't for whatever reason.

2. Use references. As you say, annoying because then you can't have null (sensibly) even when it would be really useful.

3. Use std::optional<int&> or something like that. I only just thought of this and don't know if it would work, but I bet it's a pain.

So no great solution. Personally I would use a smart pointer type, document it as well as I can, and always check for null.


> so unless your documentation is amazing

As I pointed out I usually don't have null pointers at all. There might be a few places where a value can very obviously be null, but they are far and far between. No need for documentation.

> Boom segfault.

Which is exactly the right thing to happen (it's the C version of Exceptions in dynamic languages) since the code was incorrect.

> Always check for null. Kind of annoying and lots of people don't for whatever reason.

And now what do you do if you detected null but it was not allowed? Throw an exception? You can have that for free by just not checking.

> Use references. As you say, annoying because then you can't have null (sensibly) even when it would be really useful.

C++ references don't really protect you from NULL. It's just a different syntax for the same thing.

> Use std::optional<int&> or something like that. I only just thought of this and don't know if it would work, but I bet it's a pain.

Yes, pain, big pain. So much line noise and typing (in both senses) around what is essentially an int.

Just don't check. I don't get why there's so many places for null pointers. (As I said elsewhere, I don't have a solution for JSON style microscopic programming, and I don't care about that.).


How do C++ references not protect you from NULL ?

From the standard:

> A reference shall be initialized to refer to a valid object or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. ]


Better try instead of reading standards (and inevitably misreading, or reading differently than compiler authors)...

    #include <stdio.h>
    void test(int& x)
    {
        printf("Hello, world\n"); fflush(stdout);
        printf("and the number is: %d\n", x);
    }
    int main(void)
    {
        int *x = NULL;
        test(*x);
        return 0;
    }
It's just a syntactic discipline. Null references are undefined in C++, just as NULL dereferences are undefined in C.


> Always check for null. Kind of annoying and lots of people don't for whatever reason.

use assert or some custom always-assert-even-on-release macro.

> Use references. As you say, annoying because then you can't have null (sensibly) even when it would be really useful.

well, only use references for non nullable pointers. Otherwise use plain (or smart) pointers.

> Use std::optional<int&> or something like that. I only just thought of this and don't know if it would work, but I bet it's a pain.

std::optional<T&> is unfortunately not part of C++17, mostly because people couldn't agree on operator= semantics:

int x = 1; int y = 2; std::optional<int&> oint = x;

oint = y; // which one is true? assert(x == y); // deep assign assert(&*oint == &y); // rebind

boost::optional supports references and IIRC rebinds on plain assignment.


For parameters where you aren't transferring ownership, just use raw pointers/references. If it's a required parameter make it a reference. If it's optional, use a pointer and check for null.


The problem in C derivative languages is that it isn't enough.

There is no guarantee that the pointers are valid, even if not null.

So besides checking for null, to be correct we need to call OS APIs to check for pointer integrity as well.


That's on the client code. You can't stop callers from doing something stupid (or cosmic rays from flipping bits). At that point crashing fast is preferable.


I disagree, the premise of design-by-contract and clean code methodologies is that functions must ensure the integrity of the data handled by them.


That's theoretic talk. Try doing it in practice and still get things done / be able to maintain the code / see the forest for the trees.

But I have a feeling that we are lacking a bit of context here. Some people seem to focus on web-application style of programming (understandably) where you have lots of trust issues. Whenever data is carried across trust boundaries it needs to be checked (this applies to integrity in general, of which null safety is just a small part).

(On the other hand, deserialization is not about validation of function arguments. Deserializers should assert integrity on the spot before calling into deeper nested functions).


I did it in practice, during the MFC's glory days, several years ago.

Making use of ASSERT_VALID(), VERIFY(), AfxCheckMemory(), AfxIsValidAddress(), AfxIsMemoryBlock() and many other helper functions.

A style enforced at the company's code reviews, which helped a lot our code quality.


> They become a problem when you start checking for null where null is not acceptable (most of the places null can't be a meaningful input), which is where is where the original intent starts to become unclear.

The trouble is it's not always clear whether null is a meaningful value or not. The type of an expression should tell you what values it might take - that's the whole point of a type. Having the same type for pointers that could be null and pointers that will never be null is like having the same type for integers and floating-point numbers.

> Even C crashes safely on null pointer dereference.

False. A null pointer dereference in C is undefined behaviour and can therefore easily be a security flaw (there's been at least one local root vulnerability in Linux due to this).


In database speak, if your function takes nullable values it is "denormalized". If you design your data structures and flows right, you really don't have many of these.

> False. A null pointer dereference in C is undefined behaviour and can therefore easily be a security flaw (there's been at least one local root vulnerability in Linux due to this).

Yes. in theory... and yes I heard there are also some stupid compiler optimizations around this. Still, the point is, what do you get from checking for a condition that shouldn't happen? At most an assert is justified!

You don't check for the infinitely many other conditions that shouldn't happen, either.


> If you design your data structures and flows right, you really don't have many of these.

So it's really important to be able to tell which ones they are!

> what do you get from checking for a condition that shouldn't happen? At most an assert is justified!

An assert is a check, still takes up a lot of reading time if you have to do it for every parameter on every function.

> You don't check for the infinitely many other conditions that shouldn't happen, either.

I check for all the things that the code allows to happen, because sooner or later they will happen. You have to do this if your code ever accepts user input; any input that is possible will, sooner or later, find its way in. So if something shouldn't happen the best course is to make it impossible in the code.


> I check for all the things that the code allows to happen, because sooner or later they will happen.

So if you know it will happen, why don't you fix the bug instead of "recovering"?

> So if something shouldn't happen the best course is to make it impossible in the code.

Exactly what I was suggesting. But adding a check is not "making it impossible". You need to go to the call site and fix the invalid call.


Code changes all the time; you can fix the calls you know about, right now, but you're relying on everyone who calls your function to infer some extra context about it and get it right every time. The defect will keep reoccurring. Whereas if you make it part of the function's type that it can't be called that way, then the compiler enforces that it will be used correctly, automatically, and you never have to fix the same problem again.


That's programming. Defects come (and often stay) as you type.

Yes, it would help a tiny little bit if the compiler would check for nullable pointers. But this comes at a cost of annotation work / maintenance work / worse modularity, and it only helps a bit. There are so many more invariants (is this integer in the range 3-42 and odd or divisible by 12, is that integer a valid index into this other dynamic array...) that you can't practically check with a static system, and they are much much worse because many of them manifest themselves in much subtler ways.

All these formalisms work in some toy examples, but adhering to them makes a big mess as soon as it gets a little more complicated. Most basic example, const. I use const almost only for pointers to static const strings, because otherwise it always ends up in a big mess somehow (because const in one place is non-const in another place, guaranteed). And let's not talk about the const-mess in C++ where so many functions are duplicated just for const, without any pracitcal benefit.

I probably wouldn't mind a very light-weight opt-in guarantee-not-null syntax. But it doesn't matter for me, at all.


> Yes, it would help a tiny little bit if the compiler would check for nullable pointers. But this comes at a cost of annotation work / maintenance work / worse modularity, and it only helps a bit. There are so many more invariants (is this integer in the range 3-42 and odd or divisible by 12, is that integer a valid index into this other dynamic array...) that you can't practically check with a static system, and they are much much worse because many of them manifest themselves in much subtler ways.

Not my experience. It's no more work since you know when you're writing whether it's nullable, it's better for maintenance since you can immediately see which variables are nullable. As for other invariants: look at how often you see a real app you're using fail, or look at the bugs that manifest in production; they're rarely the subtle ones, most of them are the simple stupid cases.

> All these formalisms work in some toy examples, but adhering to them makes a big mess as soon as it gets a little more complicated. Most basic example, const.

No, const is a poor example; it's a wrong abstraction and it's difficult to use as a result.


>There are so many more invariants [...] that you can't practically check with a static system

You can't statically check the invariants directly, but you can make types that statically enforce that the invariants are dynamically checked.

That way you can be sure your assumptions hold when the data gets to your code, and any invalid data will be recognised as such, rather than manifesting as weird behaviour.


So that's managed languages? They exist and have their justifications, but come at a performance hit, and much like static types can practically only check a subset of invariants. Just because there are so many of them.

The way I think about it is simply that I write a program with code and the compiler statically enforces that what I specified there holds at runtime. Code is a pretty nice way to express invariants :-)


> Let it go. Let it crash. Just assume inputs are non-null (except where null makes sense).

That's just asking for a disaster. Null pointer dereferences are undefined behavior in C, and compilers can and do make these assumptions, leading to all kinds of horrors. (https://software.intel.com/en-us/blogs/2015/04/20/null-point...)


> most of the places null can't be a meaningful input

"Maybe something" is not rare. In fact it's often just as common as "Definitely something" in my experience. A customer maybe has a shipping address. You could bend over backwards and say that either he has an address or else the address reference must point to the special no-address value instad of being null.

But in that scenario you still need to check for whether it's the no-address or an actual adreess. Otherwise you might end up doing things like trying to set the street address of the no-address value which could either throw an exception or (worse) suddenly everyone without address lives at that address.

So the only real solution is to use proper types to model "maybe something". You can use a list of addresses for the customer. That naturally supports 0..N. Or you could use a proper option type (which is just a collection with max count 1).


Data design problem. C / pointer style is not good at this JSON-structs style of programming. I agree that algebraic (like Haskell's) datatypes could help ruling out some invalid objects here.

On the other hand, you can reap huge benefits from data-oriented programming (basically, make a separate email table that has a "foreign key" to the customer table) to do processing in the large. This is where C is a very good fit and you have no null pointers :-)


> Let it go. Let it crash

It must be nice to work in an environment where nobody cares about results.


> It must be nice to work in an environment where nobody cares about results.

Not crashing when there's a bug is not "success", it's just "zero visible errors"!

If your workplace really cares about results, they'll understand how futile it is to keep executing even though there's a bug. You start second-guessing your own program, which makes the code quality worse as the assumptions it makes are no longer clear; you won't be alerted to problems, as the program won't be aware of any; there'll be no guarantee that your code isn't just working "by accident" at the moment.

I'd love it if the code I deal with just crashed instead of pretending everything's alright, because it makes errors far harder to diagnose.


No one is talking about keeping execution instead of crashing. C++ smart pointers do not do that, optional types do not do that, no programming language whose creators are in their right mind has ever pursued that. You have just produced a classic example of a straw man argument.


>> Grandparent: "I think you should just let your code crash."

> Parent: "I think that is a bad idea."

It seems pretty clear that we're talking about crashing here.


There is a third option to crashing vs. not crashing: not letting the code compile if there is any chance it might crash. That is what this "buzz about null pointers" is about.


Oh, right. Yeah, that would be nice, but my company doesn't use languages with that feature, and we can't re-write our software, so we're stuck with Option types.


What magic compiler can certify my program will never abruptly stop, or worse do something unintended, because of hardware failure?


I detect a moving goalpost here, but if you're genuinely concerned that hardware might change the value of your bits then nothing else but lockstep execution will help. e.g. TI Hercules: http://www.ti.com/lsds/ti/microcontrollers-16-bit-32-bit/c20...

(Checking for NULL in your program won't save you from hardware failure either)


I hope I'm just being pedantic rather than moving goalposts, I'm all for better assurances. To me the only compiler "not letting the code compile if there is any chance it might crash" is either magical or never compiles a thing.


"not letting the code compile if there is any chance it might crash"; well, normally we assume correct operation for the hardware here in order to make that achievable, otherwise things could simply crash at any time. You're always vulnerable to single-event-upsets from cosmic rays or even alpha decay inside the chip packaging. Being pedantic about this in a discussion of compiler correctness is useless derailing.

"Rowhammer" over in https://news.ycombinator.com/item?id=15515044 is perhaps the most likely problem of hardware non-correctness to worry about.


When talking correctness (or security) I would think pedantry would be more welcome. I think you have a point on derailment if the comment I replied to had given a specific example that refused to compile anything with a chance of crashing, e.g. through guarantees of its type system, and then I come along asking about cosmic rays. But they didn't, they were vague, and I'm still left wondering of an example because even Haskell programs crash with a segfault or bus error sometimes. It's perfectly fine that it can't cover those cases but a compiler did sign off on a program with a chance of crashing, no cosmic rays involved, at best it only assures less likelihood of crashing compared to another compiler.


> Even C crashes safely on null pointer dereference.

Does it? What if you needed to clean up external resources? What if some data was left in an inconsistent state? What if you were in the middle of writing to a file? Etc.

I do agree that crashing is often the right thing to do and of course the machine can die anyway, but sometimes you want to at least attempt to recover.

Also, in much of the software I've written in recent years, crashing was not acceptable because just because one thing (eg request of some kind, message, task, whatever) failed unexpectedly, does not mean that others will too and we still want others to be attempted. Sometimes you can get away with letting the program crash and then restarting it to run the other work, but often things are done concurrently and restarting would lose any in-progress state. If the work takes a long time, restarting all work because one piece failed is often unacceptable.


Yes, considering communication heavy architectures, it's unfortunate that with C the unit of encapsulation is the process. And there's bad support for inter-process communication, etc.

Maybe have a look at Erlang? I suppose it's more aligned with what you want to do. There you also "crash", but only managed, green, processes with very convenient communication channels.


Yes, indeed.

Sadly I’ve never had the opportunity to use Erlang for work, I’d certainly like to give it a try. I suppose I should try it (or elixir) on a side project sometime since it’s always a bad idea to go blind.


> Let it go. Let it crash. Just assume inputs are non-null (except where null makes sense). Even C crashes safely on null pointer dereference.

If the amount of work done before the crash has some value and is lost by the crash, then crashing just isn't an option.


What everybody is missing here is that null pointers where none was expected are just an instance of wrong code. How do you protect yourself from wrong code? (Hint: not by adding more code!)


Perhaps you're just missing the complexity of the real world?

I'm working on quite big finite element desktop applications, where alone the reading of the data can take up quite a bit of time. The user can then apply all kinds of different operations on this data and if one of these operations fails because of a null pointer - then sure, the code is wrong - but nevertheless the user doesn't want to loose all of his previous work and wants to be able to save the changed data.

Sure there're cases that a failed operation might have corrupted the data, but you just can't tell this in every case, and often the data is still valid and only the operation couldn't be applied.

If I've learned something over the years, then that there's not one solution that works for all cases.


So you weren't expecting bad user input? Bad for you!

This is called validation, and in a validation routine you expect bad invalid data. Check inputs at the trust boundaries, and off you go.

Note this is NOT about just null pointers but about integrity in general.


Well, if you want a serious discussion then don't imply something just to be able to make a point.

But if you just want to win an argument: here you go.


Then you should run the risky operations in a different process. It will not only protect you from the operation crashing your main program, but also from data corruption if you use readonly access for the shared data.


>Even C crashes safely on null pointer dereference.

Uh, that's a rather optimistic way to look at this. In C NULL pointer dereference is UB so you shouldn't rely on this behaviour. For one thing the compiler is allowed to do weird things if it can statically assert that a NULL reference takes place, for instance by dropping any code that can be proved to lead to a NULL dereference. Not so safe or friendly.

Most of the time for userland apps dereferencing a NULL pointer for reading or writing will cause a segfault but that might not be the case for bare metal/MMU-less applications. Furthermore even when it does crash it simply terminates the process without any unwinding which might leave non-volatile data in a non-consistent state.

The real problem with pointers is that they're effectively an algebraic data type, they can point to something or be NULL. But the type system can't enforce that, there's no concept of non-nullable pointers in C.

Take the prototype of nanosleep(2) for instance:

    int nanosleep(const struct timespec *req, struct timespec *rem);
Turns out that the 2nd parameter is NULLable if you don't care for the reminder. The first one isn't however, so passing NULL there is a big no-no.

In rust the equivalent signature would be something like:

    fn nanosleep(req: TimeSpec, rem: Option<&mut TimeSpec>) -> Result<(), i32>;
Which is a lot more explicit (mind you, that's probably not what the API of nanosleep would look like in rust since output parameters are not very common, you'd probably return rem in the Result).

The problem is the same on the library side. When you're passed a pointer, should you assume it's nullable? Should you check for it? If you want to be safe you do, and you end up with a bunch of redundant checks.

It's also easy to end up with a NULL-ish pointer that's not NULL if you happen to offset a NULL pointer by mistake (like a NULL array or NULL struct pointer). That can be pretty tricky to track down since the crash doesn't occur when the offset is calculated but rather when the bogus quasi-NULL pointer is dereferenced:

    #include <stdio.h>
    
    struct s {
      int a;
      int b;
    };
    
    void bar(int *i) {
      if (i == NULL) {
        printf("NULL\n");
      } else {
        printf("%d\n", *i);  /* i is invalid but not NULL, this (probably) crashes */
      }
    }
    
    void foo(struct s *ps) {
      int *b = &ps->b;      /* ps assumed not-NULL here */
    
      bar(b);
    }
    
    int main(void) {
      foo(NULL);            /* This is where the actual error takes place,
                               foo's argument is not NULLable */
      return 0;
    }
Also consider what happens if we're using a NULL pointer to a very large buffer. Dereferencing the beginning of the buffer would probably crash but higher addresses might end up in mapped pages and you start accessing and possibly overwriting random parts of the runtime.


There are so many ways things can break if used incorrectly. I don't disagree with most of the things you listed. But what's the conclusion?

> The real problem with pointers is that they're effectively an algebraic data type

No, they are not. Join my religion and preach: pointers are just data, pointers are just data. And be free.

C is about machine representation, and it's typed just enough to map to hardware (specify ABI of functions). Don't be deluded to think you can represent all (or even many) invariants in types. Some other languages try to do that (and it always ends up in a big mess).

If you want more "safety" - which (for the things you still need to implement) just means crash in a more friendly way, not less - then use a managed language. But they come with their own pros and cons, of course.

To address the issue with nanosleep, this is not what we were talking about. nanosleep does expect a null pointer in the second argument. (And I think in this rare case an assert(req) would be justified. Or maybe just split it into 2 functions).


But they are not just data because the language handles NULL differently from other values. The compiler is allowed to treat NULL pointers differently from other pointer values. NULL is special in C, it's not just a pointer to address 0.

>C is about machine representation, and it's typed just enough to map to hardware (specify ABI of functions).

I don't understand that at all. How do enums map to hardware? How do structs? Clearly the line is arbitrary. Would adding generics mean that C maps less to hardware? What would that even mean?

And if C is all about talking directly to the hardware, how come the language has no first class support for SIMD, for preload, for the various CPU flags? And to get back to our point, how does NULL map to hardware? A CPU has no issue addressing at 0, it's nothing special. An assembler doesn't treat address 0 differently from any other, unlike C explicitly does.

Join my religion and preach: C is not a macro assembler, C is not a macro assembler.

>Don't be deluded to think you can represent all (or even many) invariants in types. Some other languages try to do that (and it always ends up in a big mess).

[citation needed]

>To address the issue with nanosleep, this is not what we were talking about. nanosleep does expect a null pointer in the second argument. (And I think in this rare case an assert(req) would be justified. Or maybe just split it into 2 functions).

My point is that in this case the rem pointer is an algebraic data type. Passing NULL here doesn't mean "please store rem at (void*)0" it means "I don't want rem". Except this is not expressed in C's type system so you don't know that unless you read nanosleep's implementation or its documentation. the rem pointer, unlike the req pointer, is nullable, therefore it's effectively an option type but as far as C is concerned they're the same type.

If you assert() it then you can catch the problem early on at a runtime cost (and only when the situation eventually arises). If you make this part of the type system the compiler can validate that statically at compile time.


> How do enums map to hardware? How do structs?

As integers? Sequentially in memory? And that's about as much as I care.

> Would adding generics mean that C maps less to hardware? What would that even mean?

It means that you have control. That's what you need for good and (equally important) consistent performance and low memory footprint. You have pretty good basic low-level abstractions (integers, pointers, structs, functions) that are very convenient to work with (much more convenient than your typical assembler language), and at the same time as low-level as you care to go most of the time. And even then, toolchains have good support to go even lower when you need to.

> [citation needed]

My personal experience with simple languages like C, Python, and complex typed ones like C++, Haskell. (There are also boilerplate ones like Java (not interested, thanks), or ones that swallow errors too easily to be productive, like sh, Javascript).

Also, look at other projects. Look how successful game programmers use C++ (In my filter bubble I watched people like Jonathan Blow, Mike Acton, Casey Muratori on Youtube, and have been following Sean Barrett or looked at imgui or quake3). What they do is pretty much C. I have also been following the Haskell community quite a bit and watched C++ experiments like boost, and they simply have a different focus. They don't get things done (in general), their builds are time consuming and brittle, their code is hard to understand, etc...

> And if C is all about talking directly to the hardware, how come the language has no first class support for SIMD, for preload, for the various CPU flags?

It's not assembler... It's "portable assembler", as some people like to say. And if you need support for SIMD etc, just take second class support or drop to inline assembler. No point in arguing here.

> And to get back to our point, how does NULL map to hardware? A CPU has no issue addressing at 0, it's nothing special. An assembler doesn't treat address 0 differently from any other, unlike C explicitly does.

There are always these language lawyer type unfortunate exceptions and complexities that can be explained by a little bit of history. Personally I'm mostly on x86 and I sometimes assume it's binary 0, but actually I don't really care that much.

> A CPU has no issue addressing at 0, it's nothing special. An assembler doesn't treat address 0 differently from any other, unlike C explicitly does.

Making my point. It's just data. I think NULL is actually just ((void * )0), but it's the representation of 0 that is special on some architectures. That the representation might not be all binary 0s on some architectures is probably pretty complicated to explain. As far as I'm concerned it's a language lawyer thing and I don't care.

C might technically not be simple, but your use of it can (and maybe should) be simple.

> My point is that in this case the rem pointer is an algebraic data type.

What is is? If you insist on looking at it like that, why don't you use Haskell? Though, you will have to take a performance hit and might have a harder time developping new features, because modularity is very bad due to complex typed interfaces.

By the way, I know how the nanosleep interface works. I don't have any problem understanding and remembering that and using it correctly (and it's not like I have used it more than 3 times in my life). I don't think I would make an error, but I wouldn't care if the null option would go away completely, either.

It's much harder to correctly use struct timeval / timespec than calling nanosleep. The null thing is totally a theoretic non-issue.


Not in many important environments, eg especially those without virtual memory where the zero page can be mapped invalid.

Talk to me about how good an idea it is not to check potentially-NULL pointers in embedded systems.


Sure I know about that... I don't do embedded, but the point is, if your program is wrong, it goes wrong. So? Fix it!


An embedded program using NULL may cause some serious damage, even physical. Code has bugs: avoiding the most serious failure modes is important pragmatically speaking.


Is this really considered modern c++ style? Best practice as far as I've ever known it is to disallow wherever possible the allowance of having an invalid state; typically with constructers locking those details down.

Obviously references are better in most circumstances but only because you don't have to check for an empty or invalid state. They offer no performance or optimization on their own.


I think GP meant "smart pointers" where they said "object references". At least, that was the only reading that made sense to me, since it would be a little bit insane (and UB) to create a C++ reference (T&) from a null pointer.


But then what is the difference between what he is complaining about and a traditional pointer.

I always assumed that a default-constructed unique_ptr<T> was simply a NULL pointer under the hood. So you have exactly the same representations of "nothing" as you do with a T


Yes, both unique and shared ptrs default to null and in many ways are drop in replacements for naked pointers.

If the empty state thing was in reference to smart pointers, it'd be a remarkable misunderstanding of how they work and how they should be used.


> If the empty state thing was in reference to smart pointers, it'd be a remarkable misunderstanding of how they work and how they should be used.

No, I don't think so. Many C++ libraries, especially those originally developed before C++11, implement their own smart pointer templates, which may end up having diverging semantics. That's exactly what OP was describing.

Examples off the top of my head include: CEF, Chromium, v8, Qt, glibmm, and the internal codebase where I work.


There are definitely better smart pointer semantics than others -- see the now deprecated auto_ptr.

But it really doesn't fit the argument as framed. All smart pointer implementations are some valuable niceties wrapped around a naked pointer. It has nothing to do with preferring references to pointers; at the end of the day you just have a pointer -- just one that gets default initialized to 0, and deletes itself when you are done with it.

Frankly, the type of C++ developer that spends any serious amount of time trying to come to terms with a smart pointer API; even one they haven't seen before is the type of developer I wouldn't really want manipulating naked pointers.


The only difference being that in the case of a smart pointer, you can't ignore the possibility of a null by accident.


You can easily ignore null by accident while using smart pointers:

    struct foo { int bar; };
    std::unique_ptr<foo> bug;
    bug->bar = 123;


True, didn't think of that. Then indeed they are quite useless, and IIRC they are actuallt deprecated in C++17.


I don't agree they are useless - they don't stop you from accessing nullptrs, but they do help manage the lifetimes of your dynamically created objects.

And I haven't seen anything indicating unique_ptr is to be deprecated - maybe you're thinking about auto_ptr.


Smart pointers are not at all useless and not at all deprecated (you are thinking of auto_ptr; there are still unique_ptr and shared_ptr).

They don't try to address the check for null problem at all; that isn't their intent. All they mean to do is eliminate several classes of common, easy to make mistakes such as not zero initializing or not freeing the resource.


Have you heard of operator bool() ? If the library is implemented by people who know their stuff, each object will define operator bool() and you can then have code like:

Foo foo; foo.init_from_file("/tml/lol"); if (foo) { // do some stuff }

we could be getting Foo objects by value from a factory, which constructs and moves the Foo objects. References are meant for situations where the object CANNOT be null/invalid by contract, so you are missing the point amigo.


The accumulated corpus of C++ libraries has been implemented by people who may or may not have known their stuff at some point in the past 25 years.

You say operator bool() is the one true way. Many other people define "empty" states for their objects that sometimes make sense only to the library author. Others use class wrappers. Someone else came up with a cool trick for a Maybe-style functor, I'm sure. Yet another bright person has built his very own mind-bending approach using move semantics and some odd corner of the C++14 spec.

With C++, you never know what to expect until you dig into the API. It's an impediment to doing anything with the language unless you work on a huge project where coding standards are rigid and external dependencies come in every once a century (which many C++ projects are, certainly).


I fully agree. C++ never had a clear goal. The features of C++ were added almost at random. Stroustrup's original idea was essentially "C is cool and OOP is cool so let's bolt OOP onto C". Retrospectively, OOP was massively overhyped and is the wrong tool for the job for most of the people most of the time. Half of the GoF design patterns just emulate idioms from functional programming. Real OOP languages like Smalltalk and IO express many useful things that C++ cannot. The feature that C needed most was perhaps parametric polymorphism (aka generics in Java and C#, first seen in ML in the late 1970s) but instead of that C++ got templates that weren't designed to solve any particular problem but rather to kind of solve several completely unrelated problems (e.g. generics and metaprogramming). Someone actually discovered by accident that C++ templates are Turing complete and they wrote and published a program that computed prime numbers at compile time. Wow. A remarkable observation that led to decades of template abuse where people used templates to solve problems much better solved by other pre-existing solutions such Lisp macros and ML polymorphism. Worse, this abuse led to even more language features being piled on top, like template partial specialization.

The massive incidental complexity in C++ made it almost impossible to write a working compiler. For example, it remains extremely difficult to write a parser for the C++ language. The syntax also has horrible aspects like List<Set<int>> being interpreted as logical shift right. None of the original C++ compilers were reliable. Only after two decades did we start to see solid C++ compilers (by which time C++ was in decline in industry due to Java and C#).

C++ is said to be fast but the reality is that C++ is essentially only fast when you write C-like code and even then it is only fast for certain kinds of programs. Due to the "you don't pay for what you don't use" attitude, C++ is generally inefficient. RAII injects lots of unnecessary function calls at the end of scope, sometimes even expensive virtual calls. These calls often require data that would otherwise be dead so the data are kept alive, increasing register pressure and spilling and decreasing performance. The C++ exception mechanism is very inefficient (~6x slower than OCaml) because it unwinds the stack frame by frame calling destructors rather than long jumping. Allocation with new and delete is slow compared to a modern garbage collector so people are encouraged to use STL collections but these pre-allocate huge blocks of memory in comparison so you've lost the memory-efficiency of C and then you are advised to write your own STL allocator which is no better than using C in the first place. One of the main long-standing advantages of C over modern languages is the unpredictable latency incurred by garbage collectors. C++ offers the worst of both worlds by not having a garbage collector (making it impossible to leverage useful concepts like purely functional data structures properly) but it encourages all destructors to avalanche so you get unbounded pause times (worse than any production GC). Although templates are abused for metaprogramming they are very poor at it and C++ has no real support for metaprogramming. For example, you cannot write an efficient portable regular expression library in C++ because there is no way to do run-time code generation and compilation as you can in Java, C# and languages dating back to Lisp (1960). So while Java and C# have had regular expressions in their standard libraries for well over 10 years, C++ only just got them and they are slow.

C++ is so complicated that even world experts make rookie mistakes with it. Herb Sutter works for Microsoft and sits on the C++ standards committee where he influences the future of C++. In a lecture he gave his favorite 10-line C++ program, a thread-safe object cache.

My personal feeling is that the new Rust programming language is what C++ should have been. It has useful known features like generics, discriminated unions and pattern matching and useful new features like memory safety without garbage collection.


C++17 has std::optional

If you've ever used unstable software that crashes in a dozen different ways you can thank null pointers.

Modern C++ isn't about substituting references for pointers, it is about dealing with values and removing a level of indirection that pointers gives.

The old style of of using pointers everywhere frequently usually implies a lot of small heap allocations which is also a big performance problem. Both these things end up being huge detriments to speed and stability.


> I’m not a huge fan of the modern C++ style that obsessively avoids null pointers and instead uses object references that are created in an invalid/empty state.

If you've objects with an invalid state then you've nothing won compared to a null pointer.

But the point is to make invalid state not representable as much as possible, then the reasoning of code gets a lot easier, because the invalid state can't distribute throughtout the application.


Creating objects in invalid states is definitely not idiomatic C++.

I do agree that people are a little too fearful of using bare pointers for optional references, if you're using a linter that will catch possible null dereferences it's often the most sensible technique.


What is really undesirable is to have both pointers and references in one programming language as two separate constructs.


Completely off-topic, bit I feel compelled to mention this: The timestamping of this article is exemplary.

Even if it only showed a date, and no time, the time zone is relevant. The Internet is supposed to be for the whole world, not just one time zone.


Does this mean that Nim 1.0 will be postponed even further?

That's OK, it's better to get it right, and it's still useful for small projects in the meantime. But Nim seems like a language that is never truly finished.

I like this direction. I tried to avoid "ref" in my last Nim project, but it's too hard to do that for every type the way the language is currently designed.


> Nim seems like a language that is never truly finished

True for every language.

"1.0" will freeze the spec and it's not like the current versions are unstable or unusable.


It won't be postponed AFAIK


Indeed, Araq's plan is to release v1 ASAP (he said by the end of this year on IRC) and implement the ideas described in this article afterwards for a Nim v2.


I'm a C++ dev and this sounds extremely interesting. I have nim a spin a year ago and really liked what I saw, but GC was a real issue because I'm doing audio development where you end up having to circumvent the GC anyway. I would love to allow my users to create MIDI plugins with nim and this is a key step towards that.


I'd like to add that "circumventing the GC" is supported by default. There are 3 object types in Nim

- `type Foo = object` --> Value type, on the stack

- `type Foo = ref object` --> Reference type, managed by GC

- `type Foo = ptr object` --> Pointer type, manual memory management (with Nim's equivalent for malloc, free and memcopy)

Also Nim does not have 1 GC but several:

- Default Deferred Reference Counting - Not stop the world

- Boehm GC

- Mark and Sweep

- Memory Regions

- Real-time GC (with tunable max-pause)

- GC disabled

GC can also be deactivated per thread (setupForeignThreadGc) or for a section of the code (GC_ref, GC_unref) even on Ref types to improve interoperability with other languages.


You make Zenaudio ALK! That is a really interesting sequencer and looper. Congrats on a great concept.


The syntax and title reminds me of standard Pascal, which has very constrained pointers that, among other things, you can't do arithmetic with --- eventually leading to code that basically reinvents memory itself by using a large array and lots of indexing operations.

(Look at Donald Knuth's TeX for an example of this style.)


With the added benefit of bounds checking.

However all Pascal dialects had extensions for low level pointer manipulation.


Nim was written in Pascal at the very beginning.

But you can't do pointer arithmetics in Nim.

Either

- `type Foo = object` -> Value type, on the stack

- `type Foo = ref object` -> Ref type, managed by GC

- `type Foo = ptr object` -> Pointer type, manual memory management (malloc, free, memcopy ...)


> But you can't do pointer arithmetics in Nim.

Actually you can, by using casts.


Oops, I obviously meant "you can" but I probably typed too fast :P


First let me say that I'm not familiar with Nim. But I understand that it compiles/transpiles to C/C++. And it sounds like they're now trying to move away from dependency on the GC. In that case, might I suggest they consider switching the transpile target to SaferCPlusPlus[1]. It might make things easier as it already addresses the "efficient memory-safety via scope lifetimes without a borrow-checker" issue. And also the (data race) safe sharing of objects between threads. (Note that the documentation is currently rather out-of-date, and a major update is coming.)

[1] shameles plug: https://github.com/duneroadrunner/SaferCPlusPlus


Nim compiles to C/C++/Objective-C and Javascript and all memory safety is handled on the Nim side.

The C/C++ code generated by the Nim compiler is then free to use unsafe constructs including the dreaded goto.


The other year I tried to implement a GC on top of an existing scripting language to handle circular dependency cases. A big problem however was the existing system for referencing objects wasn't particularly well designed for this particular case (e.g. no roots, no thread safety), and I couldn't come up with a solution I found satisfactory and fool-proof.

The fact the author had so many problems with GC bugs is somewhat reassuring. Would be interesting to see a GC-less Nim in a production use-case.


That sink concept sounds really cool. I would really want that semantic in my C and C++ code.


Pretty sure that's the semantics of an `auto_ptr`: http://www.cplusplus.com/reference/memory/auto_ptr/


Suggestion: Add a date, the author, and possibly some version number that indicate what iteration of Nim this refers to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: