If you're new to C++11, Captain Proto (particularly the kj library embedded with...

kentonv · on Oct 28, 2020

Oh hello. Thanks for the kind words.

> Why did Kenton do this? He can speak for himself,

An incomplete list of reasons:

1) At the time I started the project, a lot of things that KJ replaces, like std::optional, didn't actually exist in the C++ standard yet.

2) A lot of the stuff in the standard library is just badly designed. Take std::optional, for instance. You'd think that the whole point of using std::optional instead of a pointer would be to force you to check for null. Unfortunately, std::optional implements operator* and operator-> which are UB if the optional is null -- that's even worse than the situation with pointers, where at least a null pointer dereference will reliably crash. KJ's Maybe is designed to force you to use the KJ_IF_MAYBE() macro to unwrap it, which forces you to think about the null case.

3) A lot of older stuff in the C++ standard library hasn't aged well with the introduction of C++11, or was just awful in the first place (iostream). C++11 really changed the language, and KJ was designed entirely with those changes in mind.

4) The KJ style guide (https://github.com/capnproto/capnproto/blob/master/style-gui...) adopts some particular rules around the use of const to help enforce thread safety, the specific philosophy around exceptions, and some other things, which differ from the C++ standard library's design philosophies. KJ's rules have worked out pretty well in my experience, but they break down when building on an underlying toolkit that doesn't follow them.

5) This is a silly matter of taste, but I just can't stand the fact that type names are indistinguishable from variable names in C++ standard style.

6) Because it was fun.

Do these reasons add up to a good argument for reinventing the wheel? I dunno. I think it has worked well for me but smart people can certainly disagree.

_dh54 · on Oct 28, 2020

> Unfortunately, std::optional implements operator* and operator-> which are UB if the optional is null -- that's even worse than the situation with pointers,

Dereferencing null optionals is UB for consistency with dereferencing pointers. All uses of operator* should have the same semantics and the C++ standards committee did the right thing by ensuring that with optionals. Checking for null in operator* would break consistency.

If you want to dereference an optional that may be null, use the .value_or() method. For the times when you absolutely know the optional has a value use operator*.

If you’re wondering why you would use an optional over a pointer. The idea is that optionals allow you to pass optional data by value. Previously if you wanted to pass optional data, you’d have to do it by reference with a pointer. This is part of c++’s push towards a value-based style, which is more amenable to optimization and more efficient in general for small structs (avoiding the heap, direct access of data). Move semantics are a part of that same push.

kentonv · on Oct 28, 2020

> The idea is that optionals allow you to pass optional data by value.

Yes, and kj::Maybe was doing the same before std::optional was standardized.

It's disappointing that the committee chose only to solve this problem while not also solving the problem of forgetting to check for null -- often called "the billion-dollar mistake".

> Dereferencing null optionals is UB for consistency with dereferencing pointers. All uses of operator* should have the same semantics

My argument is that `std::optional` should not have an operator* at all. `kj::Maybe` does not have operator* nor `operator->`.

> If you want to dereference an optional that may be null, use the .value_or() method. For the times when you absolutely know the optional has a value use operator*.

This is putting a lot of cognitive load on the developer. They must remember which of their variables are optionals, in order to remember whether they need to check for nullness. The fact that they use the same syntax to dereference makes it very easy to get wrong. This is especilaly true in modern IDEs where developers may be relying on auto-complete. If I don't remember the type of `foo`, I'm likely to write `foo->` and look at the list of auto-complete options, then choose one, without ever realizing that `foo` is an optional that needs to be checked for null.

In KJ, you MUST write:

    KJ_IF_MAYBE(value, maybeValue) {
      use(value);
    } else {
      handleNull();
    }

Or if you're really sure the maybe is non-null, you can write:

    use(KJ_ASSERT_NONNULL(maybeValue));

This does a runtime check and throws an exception if the value is null. But more importantly, it makes it really clear to both the writer and the reader that as assumption is being made.

jcelerier · on Oct 28, 2020

> It's disappointing that the committee chose only to solve this problem while not also solving the problem of forgetting to check for null -- often called "the billion-dollar mistake".

it's likely ~30 loc to wrap std::optional in your own type that checks for null. if std::optional checked for null and these `if` branch showed up as taking the couple nanoseconds that make you go past your time budget in your real-time system (especially if people were doing things like if(b) { foo(b); bar(b); baz(*b); }) then you have to reimplement the whole of it instead.

Don't forget that you can still use C++ on 8mhz microcontrollers.

kentonv · on Oct 28, 2020

Again, I'm not arguing that operator* should check for nullness, I'm arguing that it shouldn't exist.

With `kj::Maybe` and `KJ_IF_MAYBE`, using the syntax I demonstrated above, you check for nullness once and as a result of the check you get a direct pointer to the underlying value (if it is non-null), which you then use for subsequent access, so you don't end up repeating the check. So, you get the best of both worlds.

> it's likely ~30 loc to wrap std::optional

It's even easier to replace std::optional rather than wrap it. The value of std::optional is that it's a standard that one would hope would be used across libraries. But because it's flawed, people like me end up writing their own instead.

jcelerier · on Oct 29, 2020

> So, you get the best of both worlds.

I would really not call code that looks like this "best of both worlds"

    KJ_IF_MAYBE(value, maybeValue) {
      use(value);
    } else {
      handleNull();
    }

when compared to

    if(!value) 
      handleNull();
    use(*value);

kentonv · on Oct 30, 2020

Kind of unfair that you didn't include the `else` or the braces in your version, just to make it look shorter.

monocasa · on Oct 28, 2020

The ideal situation is then to not use std::optional for those cases rather than to make std::optional next to useless for it's stated case.

If it gets in the way of your goal on an 8Mhz controller, take the optionality check out of your tight loop and convert to a null pointer safely where it doesn't matter.

Deeply embedded is already used to picking and choosing features, or explicitly running with the bumper rails off in the subset of cases where it matters. We like the normal primitives not being neutered for us because we still use a lot of them outside of our tight loops.

_dh54 · on Oct 28, 2020

> than to make std::optional next to useless for it's stated case.

This is such a ridiculous and obviously false assertion that it’s indistinguishable from satire. Optional is widely used and was modeled from a pre-existing boost class which was itself widely used. Do you actually write C++ professionally?

monocasa · on Oct 28, 2020

Yes.

And I think (hope?) that std::optional is going to make it's way into the dust bins of history like auto_ptr.

When the answer is "just don't deference it if it's null", then why not just use a pointer in the first place?

_dh54 · on Oct 29, 2020

Again, optional is different from pointer because it offers value semantics.

monocasa · on Oct 29, 2020

I know they're different (and the construct unfortunately named optional has some occasional uses); it's just that the semantics of optional don't help it be used as a classic optional type.

jcelerier · on Oct 29, 2020

> it's just that the semantics of optional don't help it be used as a classic optional type.

what do you mean "classic optional type"? boost.optional has worked like that for something like 20 years - it's been in C++ for longer than Maybe has been in Haskell.

tome · on Oct 29, 2020

> it's been in C++ for longer than Maybe has been in Haskell.

Tangentially, how did you conclude that? Haskell has around since 1990 but boost only since 1999, as far as I can tell.

https://en.wikipedia.org/wiki/Haskell_(programming_language)

https://en.wikipedia.org/wiki/Boost_(C%2B%2B_libraries)

monocasa · on Oct 29, 2020

It's been in Standard ML since the 80s.

_dh54 · on Oct 28, 2020

> My argument is that `std::optional` should not have an operator* at all. `kj::Maybe` does not have operator* nor `operator->`.

That’s fair.

>> If you want to dereference an optional that may be null, use the .value_or() method. For the times when you absolutely know the optional has a value use operator

> This is putting a lot of cognitive load on the developer. They must remember which of their variables are optionals,

Not really. Teams with programmers that are bad at keeping track of the state of their variables can simply have a policy to always use .value_or()/.value()

C++ doesn’t impose this on its users because it generally assumes its users are responsible enough to make their own policy.

> The fact that they use the same syntax to dereference makes it very easy to get wrong.

I disagree, the operator* has the same semantics as pointers did, making it no more semantically hazardous. There exists other methods on optional that have the behavior you want.

kentonv · on Oct 28, 2020

> always use .value_or()/.value()

But neither of these solve the problem either. Neither one forces the programmer to really confront the possibility of nullness and write the appropriate if/else block. Throwing an exception rather than crashing is only a slight improvement IMO.

> I disagree, the operator* has the same semantics as pointers did, making it no more semantically hazardous.

It was already severely hazardous with pointers, that's the problem.

pierrebai · on Oct 28, 2020

Both problem could have been solve looooooong ago by introducing a type modifier akin to const that carries if a value is verified (or safe or non-null or other. Pick your synonym).

   int * p; // maybe null!
   int * verified p; // guaranteed non-null!

A looooong time ago (circa... 1994-1995) I designed a hierarchy of smart pointers and had a variety for non-null so that you could declare a function like:

   void foo(non_null_ptr<T>& p);

And know that you don't have to verify for null. All enforced at compile-time. (via the a function on ptr<T> returning a non_null_ptr<T>).

With language support around if() and others, C++ could have mde it even more convenient. Even C could have introduced such a tyupe modifier. Whenever I read about pointers being unsafe and how optionals and maybes are the solution, I roll my eyes, because non-null-ptr do the exact same thing.

The funny thing is C++ has a non-null ptr (with no language support guarantee though): references. Unfortunately, the language made them not resettable, which makes them unusable in many scenario when you'd want them to change value over time, like in most classes members.

_slyo · on Oct 28, 2020

Are you aware of std::reference_wrapper? https://en.cppreference.com/w/cpp/utility/functional/referen...

_dh54 · on Oct 28, 2020

Isn’t “non null optional” the same as just passing the base type?

pierrebai · on Oct 28, 2020

By reference? Yes.

But the idea of a verified type can be extended by using the verified modifier on your own type. For example, you could have a verified matrix type, where the matrix is guaranteed to be valid, non-degenerate. You can apply it to:

   - matrix
   - vector
   - input data of any sort

And if teh compiler allowed the programmer to declare their own type modifier, the world is your oyster: you could for example tag that a matrix is a world matrix while another a local matrix and provide a function that converts from one to the other...

I wrote a small blog post about the idea:

https://www.spiria.com/en/blog/desktop-software/hypothetical...

_dh54 · on Oct 28, 2020

>> always use .value_or()/.value()

> But neither of these solve the problem either. Neither one forces the programmer to really confront the possibility of nullness and write the appropriate if/else block.

.value_or() actually does and you can certainly add a lint check against dereferencing optional or using .value() if you’d like. C++ does not Yet provide Case-style syntax for handling variants like rust, outside of macros and the standard library will certainly not define macros.

I think what you have done for your codebase makes sense based on your preferences but I think the standard optional works pretty well given the variety of code based and styles it’s intended to support.

> It was already severely hazardous with pointers, that's the problem.

So then don’t use the dereference operator.

kentonv · on Oct 28, 2020

kj::Maybe has an `orDefault()` method that is like `.value_or()` but I find that it is almost never useful. You almost always want to execute different logic in the null case, rather than treat it as some default value.

_dh54 · on Oct 29, 2020

Then you can make a quick helper subroutine that adds a monadic interface to optional and you can lint away all non-conforming uses of optional.

ori_b · on Oct 28, 2020

> Dereferencing null optionals is UB for consistency with dereferencing pointers

The point of optional is to avoid being consistent with the bad parts of pointers. And making it undefined rather than a guaranteed crash is even crazier.

Ford doesn't sell cars that burst into flames for consistency with the Pinto.

_dh54 · on Oct 28, 2020

And usage of the dereference operator isn’t intended for uses that would cause things to burst into flames. If you don’t know the state of your variables or you don’t trust your coworkers to know the state of their variables, you can enforce the use of value_or() in your own projects. You don’t get to force superfluous branch stalls on the general C++ user base.

emtel · on Oct 28, 2020

I think your replies in this thread show a complete misunderstanding of what std::optional is for (or at least, what it should be for, in my opinion).

std::optional is for modeling a value that may be null. If the value may be null then you must check if it is null before you dereference it. There is no "forcing of branch stalls", because if used correctly (and designed correctly, which std::optional is not, sadly) it is merely a way for the programmer to use the type system to enforce the use of null checks that are necessary for the correctness of the program anyway.

If you and your coworkers find yourself in a situation in which you know that the value of a pointer cannot be null, then you should not model that value with an optional type, and then you will not only not be required to add a null check, it will be immediately obvious that you don't need one.

_dh54 · on Oct 29, 2020

> If you and your coworkers find yourself in a situation in which you know that the value of a pointer cannot be null, then you should not model that value with an optional type, and then you will not only not be required to add a null check, it will be immediately obvious that you don't need one.

Hmm I think you’re suffering from a lack of imagination and real world experience with efficiently storing data in C++.

There are certainly cases where it makes the most sense to instantiate your value within an optional wrapper while at the same time there being instances within your codebase where that location is known to be non-null. I’m surprised I even have to say that.

An obvious case is when using optional as a global. Other cases are when you’ve checked the optional once at the top of a basic block to be used multiple times after.

emtel · on Oct 30, 2020

> An obvious case is when using optional as a global.

Well, ok, although I think we were doing fine storing such values in unique_ptr. Now you're going to come back and say that you can't ever afford a double indirection when accessing globals, and if so, fine. But you still could have very easily written your own wrapper that suits your needs without demanding that std::optional be relaxed to the point where it cannot provide compile-time safety guarantees.

> Other cases are when you’ve checked the optional once at the top of a basic block to be used multiple times after.

Disagree. The way optional types are supposed to work (and the way I have used them in real code) is that you check it once, and in doing so, you obtain a reference to the stored value (assuming it the check passes). Further accesses to the value thus do not require checks. The type system is thus used to model whether the check has been done or not, and helps you write code that does the minimal number of checks required.

You seem to think everyone else in this thread is an idiot, but I promise you I have written real code with very strict optional types (similar to kj::Maybe) without introducing unnecessary branches.

beached_whale · on Oct 28, 2020

optional has a bunch of reason's why it is better than a pointer too(it cannot be incremented, it's not a failure to initialize, it doesn't implicitly convert to things readily...). Unfortunately we don't have a monadic optional or an optional with a reference member. Both would be very useful. Value or suffers from having to evaluate the or part in all cases, but if we had .transform/.and_then/.or_else members would be really nice. Optional of a reference would allow for a whole swath of try_ methods instead of the idiomatic at( ) interface for checking bounds and retrieving in one call. at( ) suffers that it forces the choice of an exception or checking the index for being in range outside the call and then operator[] is what you want.

_dh54 · on Oct 28, 2020

Some monadic method would be nice but as you probably know, it’s trivial to implement one yourself.

You can store a reference in optional using reference_wrapper https://en.cppreference.com/w/cpp/utility/functional/referen...

beached_whale · on Oct 29, 2020

reference wrapper breaks in generic contexts.

_dh54 · on Oct 29, 2020

Could you provide an example?

clappski · on Oct 29, 2020

You have to add special handling for the case that T == std::reference_wrapper<T> so you can call .get() on it to expose the underlying value. In the case of std::optional vs a pointer type (raw or smart) you can consistently use operator* to get to the underlying value. I think this is what was meant.

beached_whale · on Oct 29, 2020

also if you are doing something like decltype( *val ) to get at the underlying type.

gpderetta · on Oct 28, 2020

well, representing an optional as a pointer, while nice syntactic sugar, is itself a design decision and adding UB there is not nice.

In practice I believe that all standard libraries have a lightweight check mode that can assert even in release mode.

choppaface · on Oct 29, 2020

Wow thanks for the extra context here! My read of capnp was that you probably couldn't write capnp with std::unique_ptr and STL streams as-is (or relying on STL would make it really hard), and thus the capnp design necessitates the core of kj, and once you start there you have to add most of the other stuff in kj. I do very firmly agree that C++11 had holes in either features or support when capnp first rolled out, though C++ has caught up over the years.

I still think if you wanted to re-write a capnp library today, you'd still need kj, or at least most of it, simply for the memory control. The added benefit of kj is that you don't have to deal with C++ STL bugs and quirks. E.g. I believe C++ spec didn't require std::optional to use in-place new until recently ...

Also curious if you have any comments on this read of kj from a software management perspective. I imagine trying to sell the investment of writing something like kj at a company and it being a tough sell, even if capnp was approved. You clearly knew what you were doing from the outset and certainly nobody could have done it better. But I believe capnp sits close to the decision boundary of where many companies decide to invest in greatness or not, and reflection might shed light on why some managers make the wrong choice on something like this.

kentonv · on Oct 30, 2020

No, Cap'n Proto does not rely on memory layout of KJ types or anything like that. You could build it on the standard library approximately just as easily.

In 2013, the C++ standard library was sorely missing a string representation that allowed pointing into an existing buffer, which was important for Cap'n Proto to claim zero-copy. C++17 introduced std::string_view, which would fit the bill, but that wasn't there in 2013, so I wrote my own StringPtr. I added Maybe because I needed it all over the place (and std::optional didn't exist), and then for RPC I needed to create the Promise framework (std::future and such didn't exist yet). At that point I looked at these things and said "these are generally useful outside of Cap'n Proto, I should turn them into a library", and that's how KJ got started.

> I imagine trying to sell the investment of writing something like kj at a company and it being a tough sell

Well, many companies have things "like KJ". Google has Abseil, Facebook has Folly. But just like KJ, these things didn't start with someone saying "Hey I want to make a new C++ toolkit", they started with people writing utility code that they needed for other things, and then splitting that code out to make it reusable. Eventually the utilities accumulate into their own ecosystem. I generally don't add anything to KJ unless I explicitly need it for something else I'm working on. I actually would argue that it would be a bad business decision to spin up a project to create something like KJ or Abseil or Folly from scratch; such projects are likely to spend too much time solving the wrong problems. The best toolkits and platforms come from projects that were really trying to build something on top of that platform, and let their own real-world problems drive the design.

That said, arguably, Cap'n Proto itself is a bit of a counterpoint. I started Cap'n Proto after quitting Google. I did not have any real business purpose, I just wanted to play around with zero-copy, which I'd thought about a lot while working on Protobuf. That said, I did have the previous experience of maintaining Protobuf at Google for several years, which meant I already had a pretty good idea of what the real-world problems looked like, and I stuck pretty closely to Protobuf's design decisions in most ways. And then starting in 2014, I started working on Sandstorm, build on top of Cap'n Proto, and further development was driven by needs there. (And since 2017, Cloudflare Workers has been the main driver.)

I am not sure if the time I spent starting Cap'n Proto in 2013 would have made sense from a business perspective. If I'd wanted to start Sandstorm immediately, building on Protobuf would probably have been the right answer.

I would say that low-level developer tooling in general is pretty tough to make a business out of, because everyone expects it to be free and open source. It's also pretty tough to build as part of another business, because usually creating something new from scratch doesn't justify the cost, vs. using something off the shelf. I feel like the only people who can create new fundamental tools from scratch (especially things like programming languages) are giant companies like Google, and random hackers who are lucky enough to be able to mess around without funding.

Sorry, that probably isn't the answer you were looking for. I don't like it either. :/

choppaface · on Oct 30, 2020

No this is super helpful thank you!

Agree with you about StringPtr and string_view; also std::future; std::optional was not there and also not in-place new for a while at the start I think; lastly, I'm pretty sure unique_ptr would have been a headache over Own. I didn't really mean to suggest capnp relied on memory layout of kj types (and agree it doesnt) but rather I believe even today you'd be very hard pressed to get 100% zero-copy out of the STL.

Abseil and Folly are a lot lot bigger than KJ (folly is more of a playground), and I totally agree they are an amalgamation of utility code at team scale. KJ, though, had mainly only one author though, and it seems I got it right that capnp wouldn't be possible with the STL (at least when it started).

Wasn't so much trying to poke at the question of "does the business say KJ/capnp is necessary?" -- I agree with you that posed that way it can be hard to get a good answer.

More like: how is it best to scope out something on the scale of capnp/kj in the context of a bigger company? Do you just give a team a year and let them run?

I'm excited about capnp in the long run as more and more storage moves to NVME. Zero copy and related tricks are already big parts of Apache Arrow / Parquet; it's an important area to explore.

kentonv · on Oct 31, 2020

> how is it best to scope out something on the scale of capnp/kj in the context of a bigger company? Do you just give a team a year and let them run?

No, frankly, I think that would be a recipe for disaster.

The team needs to instead be tasked with a specific use case, and they need to build the platform to solve the specific problems they face while working on that use case. If you tell people to develop a platform without a specific use case, they will almost certainly build something that solves the wrong problems. Programmers (like humans in general) are mostly really bad at guessing which features are truly needed, vs. what sounds neat but really won't ever be used.

So, sadly, I think that businesses should not directly engage in such projects. But, they should be on the lookout for useful tools that their developers have built in the service of other projects, and be willing to factor those out into a separate project later.

Unfortunately, this all makes it very hard to effect revolutionary change in infrastructure. When the infrastructure isn't your main project, you probably aren't going to make risky bets on new ideas there -- you're going to mostly stick to what is known to work.

So how do we get revolutionary changes? That's tough. I suppose that does require letting a team run wild, but you have to acknowledge that 90% of the time they will fail and produce something that is worthless. If the business is big enough that they can make such bets (Google), then great. But for most tech companies I don't think it's justifiable.

choppaface · on Oct 31, 2020

Totally agree. It's a really hard balancing act. Open source really helps us learn though.

typon · on Oct 28, 2020

This style guide basically forces you to write Rust in C++

kentonv · on Oct 28, 2020

I agree with this summary.

Interestingly, though, at the time I wrote the guide, Rust was in its infancy, and I didn't know anything about it. :)

typon · on Oct 28, 2020

Tbh, I actually prefer the C++ with your style guide than Rust. Now if C++ had a package manager and a crates.io equivalent, I wouldn't look back at Rust at all. Unfortunately, C++ is just too far behind.

(Btw, I gave a presentation about your style guide at my company two years ago, trying to convince people that we should be doing this stuff ;)

kentonv · on Oct 28, 2020

I wish C++ had borrow checking, or something like it.

pjmlp · on Oct 28, 2020

Everything I care about is on vcpkg.

nyanpasu64 · on Oct 28, 2020

There are some differences in the details between KJ C++, and both Rust and my Rust-inspired C++ guidelines:

> Value types always have move constructors (and sometimes copy constructors). Resource types are not movable; if ownership transfer is needed, the resource must be allocated on the heap.

In Rust, all types (including resources) are movable.

> Value types almost always have implicit destructors. Resource types may have an explicit destructor.

What's an explicit destructor? Rust's File type closes upon destruction, and one criticism of the design is that it ignores all errors. The only way to know what errors occurred is to call sync_all() beforehand.

However, "Ownership" and "Reference Counting" (and "Exceptions" to an extent) feel very Rust-like.

> If a class's copy constructor would require memory allocation, consider providing a clone() method instead and deleting the copy constructor. Allocation in implicit copies is a common source of death-by-1000-cuts performance problems. kj::String, for example, is movable but not copyable.

When you include such a class in a larger structure, it breaks the ability for the outer class to derive a copy constructor automatically (even an explicit one, or a private one used by a clone() method). What's the best way to approach this?

kentonv · on Oct 30, 2020

> In Rust, all types (including resources) are movable.

Presumably not when pointers are pointing at them or their members.

In Rust, that is enforced by the compiler, but in C++ it is not. The rule that resource types are not movable is intended to provide some sanity here: this means a resource type can hand out pointers to itself or its members without worrying that it'll be moved at some point, invalidating those pointers.

> What's an explicit destructor? Rust's File type closes upon destruction, and one criticism of the design is that it ignores all errors. The only way to know what errors occurred is to call sync_all() beforehand.

I believe destructors should be allowed to throw, which solves that problem.

Obviously, this opinion is rather controversial. AFAICT, though, the main reason that people argue against throwing destructors is because throw-during-unwind leads to program termination. That, though, was an arbitrary decision that, in my opinion, the C++ committee got disastrously wrong. An exception thrown during the unwind of another exception is usually a side-effect of the first exception and could safely be thrown away, or perhaps merged into the main exception somehow. Terminating is the worst possible answer and I would argue is the single biggest design mistake in the whole language (which, with C++, is a high bar).

In KJ we let destructors throw, while making a best effort attempt to avoid throwing during unwind.

> When you include such a class in a larger structure, it breaks the ability for the outer class to derive a copy constructor automatically (even an explicit one, or a private one used by a clone() method). What's the best way to approach this?

In practice I find that this almost never comes up. Complex data structures rarely need to be copied/cloned. I have written very few clone() methods in practice.

_pmf_ · on Oct 28, 2020

I also absolutely love the kj style, but it's so different from basically everything else that I have a hard time incorporating anything in my daily work in the cess pit.