If you're new to C++11, Captain Proto (particularly the kj library embedded within it-- src/kj ) is a great read: https://github.com/capnproto/capnproto kj semi-re-implements several core C++11 features like Own and Maybe (actually Maybe / std::optional is still pretty new!) https://github.com/capnproto/capnproto/blob/master/c%2B%2B/s... Why did Kenton do this? He can speak for himself, but the core of Captain Proto is /roughly/ like a serializable / portable memory arena, so it was necessary for the design. Reading through kj and _comparing_ it with C++11 will give you some great initial insight into why both are implemented the ways they are. I'm not really advocating you use kj directly or adopt things like capnp's unique comment style, but the codebase is nevertheless very well organized and clear.
While I wouldn't necessarily recommend Boost as a model project / repo ( https://github.com/boostorg ), it's worth checking out to help understand why modern decisions were made the way they were.
> Why did Kenton do this? He can speak for himself,
An incomplete list of reasons:
1) At the time I started the project, a lot of things that KJ replaces, like std::optional, didn't actually exist in the C++ standard yet.
2) A lot of the stuff in the standard library is just badly designed. Take std::optional, for instance. You'd think that the whole point of using std::optional instead of a pointer would be to force you to check for null. Unfortunately, std::optional implements operator* and operator-> which are UB if the optional is null -- that's even worse than the situation with pointers, where at least a null pointer dereference will reliably crash. KJ's Maybe is designed to force you to use the KJ_IF_MAYBE() macro to unwrap it, which forces you to think about the null case.
3) A lot of older stuff in the C++ standard library hasn't aged well with the introduction of C++11, or was just awful in the first place (iostream). C++11 really changed the language, and KJ was designed entirely with those changes in mind.
4) The KJ style guide (https://github.com/capnproto/capnproto/blob/master/style-gui...) adopts some particular rules around the use of const to help enforce thread safety, the specific philosophy around exceptions, and some other things, which differ from the C++ standard library's design philosophies. KJ's rules have worked out pretty well in my experience, but they break down when building on an underlying toolkit that doesn't follow them.
5) This is a silly matter of taste, but I just can't stand the fact that type names are indistinguishable from variable names in C++ standard style.
6) Because it was fun.
Do these reasons add up to a good argument for reinventing the wheel? I dunno. I think it has worked well for me but smart people can certainly disagree.
> Unfortunately, std::optional implements operator* and operator-> which are UB if the optional is null -- that's even worse than the situation with pointers,
Dereferencing null optionals is UB for consistency with dereferencing pointers. All uses of operator* should have the same semantics and the C++ standards committee did the right thing by ensuring that with optionals. Checking for null in operator* would break consistency.
If you want to dereference an optional that may be null, use the .value_or() method. For the times when you absolutely know the optional has a value use operator*.
If you’re wondering why you would use an optional over a pointer. The idea is that optionals allow you to pass optional data by value. Previously if you wanted to pass optional data, you’d have to do it by reference with a pointer. This is part of c++’s push towards a value-based style, which is more amenable to optimization and more efficient in general for small structs (avoiding the heap, direct access of data). Move semantics are a part of that same push.
> The idea is that optionals allow you to pass optional data by value.
Yes, and kj::Maybe was doing the same before std::optional was standardized.
It's disappointing that the committee chose only to solve this problem while not also solving the problem of forgetting to check for null -- often called "the billion-dollar mistake".
> Dereferencing null optionals is UB for consistency with dereferencing pointers. All uses of operator* should have the same semantics
My argument is that `std::optional` should not have an operator* at all. `kj::Maybe` does not have operator* nor `operator->`.
> If you want to dereference an optional that may be null, use the .value_or() method. For the times when you absolutely know the optional has a value use operator*.
This is putting a lot of cognitive load on the developer. They must remember which of their variables are optionals, in order to remember whether they need to check for nullness. The fact that they use the same syntax to dereference makes it very easy to get wrong. This is especilaly true in modern IDEs where developers may be relying on auto-complete. If I don't remember the type of `foo`, I'm likely to write `foo->` and look at the list of auto-complete options, then choose one, without ever realizing that `foo` is an optional that needs to be checked for null.
Or if you're really sure the maybe is non-null, you can write:
use(KJ_ASSERT_NONNULL(maybeValue));
This does a runtime check and throws an exception if the value is null. But more importantly, it makes it really clear to both the writer and the reader that as assumption is being made.
> It's disappointing that the committee chose only to solve this problem while not also solving the problem of forgetting to check for null -- often called "the billion-dollar mistake".
it's likely ~30 loc to wrap std::optional in your own type that checks for null. if std::optional checked for null and these `if` branch showed up as taking the couple nanoseconds that make you go past your time budget in your real-time system (especially if people were doing things like if(b) { foo(b); bar(b); baz(*b); }) then you have to reimplement the whole of it instead.
Don't forget that you can still use C++ on 8mhz microcontrollers.
Again, I'm not arguing that operator* should check for nullness, I'm arguing that it shouldn't exist.
With `kj::Maybe` and `KJ_IF_MAYBE`, using the syntax I demonstrated above, you check for nullness once and as a result of the check you get a direct pointer to the underlying value (if it is non-null), which you then use for subsequent access, so you don't end up repeating the check. So, you get the best of both worlds.
> it's likely ~30 loc to wrap std::optional
It's even easier to replace std::optional rather than wrap it. The value of std::optional is that it's a standard that one would hope would be used across libraries. But because it's flawed, people like me end up writing their own instead.
The ideal situation is then to not use std::optional for those cases rather than to make std::optional next to useless for it's stated case.
If it gets in the way of your goal on an 8Mhz controller, take the optionality check out of your tight loop and convert to a null pointer safely where it doesn't matter.
Deeply embedded is already used to picking and choosing features, or explicitly running with the bumper rails off in the subset of cases where it matters. We like the normal primitives not being neutered for us because we still use a lot of them outside of our tight loops.
> than to make std::optional next to useless for it's stated case.
This is such a ridiculous and obviously false assertion that it’s indistinguishable from satire. Optional is widely used and was modeled from a pre-existing boost class which was itself widely used. Do you actually write C++ professionally?
I know they're different (and the construct unfortunately named optional has some occasional uses); it's just that the semantics of optional don't help it be used as a classic optional type.
> it's just that the semantics of optional don't help it be used as a classic optional type.
what do you mean "classic optional type"? boost.optional has worked like that for something like 20 years - it's been in C++ for longer than Maybe has been in Haskell.
> My argument is that `std::optional` should not have an operator* at all. `kj::Maybe` does not have operator* nor `operator->`.
That’s fair.
>> If you want to dereference an optional that may be null, use the .value_or() method. For the times when you absolutely know the optional has a value use operator
> This is putting a lot of cognitive load on the developer. They must remember which of their variables are optionals,
Not really. Teams with programmers that are bad at keeping track of the state of their variables can simply have a policy to always use .value_or()/.value()
C++ doesn’t impose this on its users because it generally assumes its users are responsible enough to make their own policy.
> The fact that they use the same syntax to dereference makes it very easy to get wrong.
I disagree, the operator* has the same semantics as pointers did, making it no more semantically hazardous. There exists other methods on optional that have the behavior you want.
But neither of these solve the problem either. Neither one forces the programmer to really confront the possibility of nullness and write the appropriate if/else block. Throwing an exception rather than crashing is only a slight improvement IMO.
> I disagree, the operator* has the same semantics as pointers did, making it no more semantically hazardous.
It was already severely hazardous with pointers, that's the problem.
Both problem could have been solve looooooong ago by introducing a type modifier akin to const that carries if a value is verified (or safe or non-null or other. Pick your synonym).
int * p; // maybe null!
int * verified p; // guaranteed non-null!
A looooong time ago (circa... 1994-1995) I designed a hierarchy of smart pointers and had a variety for non-null so that you could declare a function like:
void foo(non_null_ptr<T>& p);
And know that you don't have to verify for null. All enforced at compile-time. (via the a function on ptr<T> returning a non_null_ptr<T>).
With language support around if() and others, C++ could have mde it even more convenient. Even C could have introduced such a tyupe modifier. Whenever I read about pointers being unsafe and how optionals and maybes are the solution, I roll my eyes, because non-null-ptr do the exact same thing.
The funny thing is C++ has a non-null ptr (with no language support guarantee though): references. Unfortunately, the language made them not resettable, which makes them unusable in many scenario when you'd want them to change value over time, like in most classes members.
But the idea of a verified type can be extended by using the verified modifier on your own type. For example, you could have a verified matrix type, where the matrix is guaranteed to be valid, non-degenerate. You can apply it to:
- matrix
- vector
- input data of any sort
And if teh compiler allowed the programmer to declare their own type modifier, the world is your oyster: you could for example tag that a matrix is a world matrix while another a local matrix and provide a function that converts from one to the other...
> But neither of these solve the problem either. Neither one forces the programmer to really confront the possibility of nullness and write the appropriate if/else block.
.value_or() actually does and you can certainly add a lint check against dereferencing optional or using .value() if you’d like. C++ does not Yet provide Case-style syntax for handling variants like rust, outside of macros and the standard library will certainly not define macros.
I think what you have done for your codebase makes sense based on your preferences but I think the standard optional works pretty well given the variety of code based and styles it’s intended to support.
> It was already severely hazardous with pointers, that's the problem.
kj::Maybe has an `orDefault()` method that is like `.value_or()` but I find that it is almost never useful. You almost always want to execute different logic in the null case, rather than treat it as some default value.
> Dereferencing null optionals is UB for consistency with dereferencing pointers
The point of optional is to avoid being consistent with the bad parts of pointers. And making it undefined rather than a guaranteed crash is even crazier.
Ford doesn't sell cars that burst into flames for consistency with the Pinto.
And usage of the dereference operator isn’t intended for uses that would cause things to burst into flames. If you don’t know the state of your variables or you don’t trust your coworkers to know the state of their variables, you can enforce the use of value_or() in your own projects. You don’t get to force superfluous branch stalls on the general C++ user base.
I think your replies in this thread show a complete misunderstanding of what std::optional is for (or at least, what it should be for, in my opinion).
std::optional is for modeling a value that may be null. If the value may be null then you must check if it is null before you dereference it. There is no "forcing of branch stalls", because if used correctly (and designed correctly, which std::optional is not, sadly) it is merely a way for the programmer to use the type system to enforce the use of null checks that are necessary for the correctness of the program anyway.
If you and your coworkers find yourself in a situation in which you know that the value of a pointer cannot be null, then you should not model that value with an optional type, and then you will not only not be required to add a null check, it will be immediately obvious that you don't need one.
> If you and your coworkers find yourself in a situation in which you know that the value of a pointer cannot be null, then you should not model that value with an optional type, and then you will not only not be required to add a null check, it will be immediately obvious that you don't need one.
Hmm I think you’re suffering from a lack of imagination and real world experience with efficiently storing data in C++.
There are certainly cases where it makes the most sense to instantiate your value within an optional wrapper while at the same time there being instances within your codebase where that location is known to be non-null. I’m surprised I even have to say that.
An obvious case is when using optional as a global. Other cases are when you’ve checked the optional once at the top of a basic block to be used multiple times after.
> An obvious case is when using optional as a global.
Well, ok, although I think we were doing fine storing such values in unique_ptr. Now you're going to come back and say that you can't ever afford a double indirection when accessing globals, and if so, fine. But you still could have very easily written your own wrapper that suits your needs without demanding that std::optional be relaxed to the point where it cannot provide compile-time safety guarantees.
> Other cases are when you’ve checked the optional once at the top of a basic block to be used multiple times after.
Disagree. The way optional types are supposed to work (and the way I have used them in real code) is that you check it once, and in doing so, you obtain a reference to the stored value (assuming it the check passes). Further accesses to the value thus do not require checks. The type system is thus used to model whether the check has been done or not, and helps you write code that does the minimal number of checks required.
You seem to think everyone else in this thread is an idiot, but I promise you I have written real code with very strict optional types (similar to kj::Maybe) without introducing unnecessary branches.
optional has a bunch of reason's why it is better than a pointer too(it cannot be incremented, it's not a failure to initialize, it doesn't implicitly convert to things readily...). Unfortunately we don't have a monadic optional or an optional with a reference member. Both would be very useful. Value or suffers from having to evaluate the or part in all cases, but if we had .transform/.and_then/.or_else members would be really nice. Optional of a reference would allow for a whole swath of try_ methods instead of the idiomatic at( ) interface for checking bounds and retrieving in one call. at( ) suffers that it forces the choice of an exception or checking the index for being in range outside the call and then operator[] is what you want.
You have to add special handling for the case that T == std::reference_wrapper<T> so you can call .get() on it to expose the underlying value. In the case of std::optional vs a pointer type (raw or smart) you can consistently use operator* to get to the underlying value. I think this is what was meant.
Wow thanks for the extra context here! My read of capnp was that you probably couldn't write capnp with std::unique_ptr and STL streams as-is (or relying on STL would make it really hard), and thus the capnp design necessitates the core of kj, and once you start there you have to add most of the other stuff in kj. I do very firmly agree that C++11 had holes in either features or support when capnp first rolled out, though C++ has caught up over the years.
I still think if you wanted to re-write a capnp library today, you'd still need kj, or at least most of it, simply for the memory control. The added benefit of kj is that you don't have to deal with C++ STL bugs and quirks. E.g. I believe C++ spec didn't require std::optional to use in-place new until recently ...
Also curious if you have any comments on this read of kj from a software management perspective. I imagine trying to sell the investment of writing something like kj at a company and it being a tough sell, even if capnp was approved. You clearly knew what you were doing from the outset and certainly nobody could have done it better. But I believe capnp sits close to the decision boundary of where many companies decide to invest in greatness or not, and reflection might shed light on why some managers make the wrong choice on something like this.
No, Cap'n Proto does not rely on memory layout of KJ types or anything like that. You could build it on the standard library approximately just as easily.
In 2013, the C++ standard library was sorely missing a string representation that allowed pointing into an existing buffer, which was important for Cap'n Proto to claim zero-copy. C++17 introduced std::string_view, which would fit the bill, but that wasn't there in 2013, so I wrote my own StringPtr. I added Maybe because I needed it all over the place (and std::optional didn't exist), and then for RPC I needed to create the Promise framework (std::future and such didn't exist yet). At that point I looked at these things and said "these are generally useful outside of Cap'n Proto, I should turn them into a library", and that's how KJ got started.
> I imagine trying to sell the investment of writing something like kj at a company and it being a tough sell
Well, many companies have things "like KJ". Google has Abseil, Facebook has Folly. But just like KJ, these things didn't start with someone saying "Hey I want to make a new C++ toolkit", they started with people writing utility code that they needed for other things, and then splitting that code out to make it reusable. Eventually the utilities accumulate into their own ecosystem. I generally don't add anything to KJ unless I explicitly need it for something else I'm working on. I actually would argue that it would be a bad business decision to spin up a project to create something like KJ or Abseil or Folly from scratch; such projects are likely to spend too much time solving the wrong problems. The best toolkits and platforms come from projects that were really trying to build something on top of that platform, and let their own real-world problems drive the design.
That said, arguably, Cap'n Proto itself is a bit of a counterpoint. I started Cap'n Proto after quitting Google. I did not have any real business purpose, I just wanted to play around with zero-copy, which I'd thought about a lot while working on Protobuf. That said, I did have the previous experience of maintaining Protobuf at Google for several years, which meant I already had a pretty good idea of what the real-world problems looked like, and I stuck pretty closely to Protobuf's design decisions in most ways. And then starting in 2014, I started working on Sandstorm, build on top of Cap'n Proto, and further development was driven by needs there. (And since 2017, Cloudflare Workers has been the main driver.)
I am not sure if the time I spent starting Cap'n Proto in 2013 would have made sense from a business perspective. If I'd wanted to start Sandstorm immediately, building on Protobuf would probably have been the right answer.
I would say that low-level developer tooling in general is pretty tough to make a business out of, because everyone expects it to be free and open source. It's also pretty tough to build as part of another business, because usually creating something new from scratch doesn't justify the cost, vs. using something off the shelf. I feel like the only people who can create new fundamental tools from scratch (especially things like programming languages) are giant companies like Google, and random hackers who are lucky enough to be able to mess around without funding.
Sorry, that probably isn't the answer you were looking for. I don't like it either. :/
Agree with you about StringPtr and string_view; also std::future; std::optional was not there and also not in-place new for a while at the start I think; lastly, I'm pretty sure unique_ptr would have been a headache over Own. I didn't really mean to suggest capnp relied on memory layout of kj types (and agree it doesnt) but rather I believe even today you'd be very hard pressed to get 100% zero-copy out of the STL.
Abseil and Folly are a lot lot bigger than KJ (folly is more of a playground), and I totally agree they are an amalgamation of utility code at team scale. KJ, though, had mainly only one author though, and it seems I got it right that capnp wouldn't be possible with the STL (at least when it started).
Wasn't so much trying to poke at the question of "does the business say KJ/capnp is necessary?" -- I agree with you that posed that way it can be hard to get a good answer.
More like: how is it best to scope out something on the scale of capnp/kj in the context of a bigger company? Do you just give a team a year and let them run?
I'm excited about capnp in the long run as more and more storage moves to NVME. Zero copy and related tricks are already big parts of Apache Arrow / Parquet; it's an important area to explore.
> how is it best to scope out something on the scale of capnp/kj in the context of a bigger company? Do you just give a team a year and let them run?
No, frankly, I think that would be a recipe for disaster.
The team needs to instead be tasked with a specific use case, and they need to build the platform to solve the specific problems they face while working on that use case. If you tell people to develop a platform without a specific use case, they will almost certainly build something that solves the wrong problems. Programmers (like humans in general) are mostly really bad at guessing which features are truly needed, vs. what sounds neat but really won't ever be used.
So, sadly, I think that businesses should not directly engage in such projects. But, they should be on the lookout for useful tools that their developers have built in the service of other projects, and be willing to factor those out into a separate project later.
Unfortunately, this all makes it very hard to effect revolutionary change in infrastructure. When the infrastructure isn't your main project, you probably aren't going to make risky bets on new ideas there -- you're going to mostly stick to what is known to work.
So how do we get revolutionary changes? That's tough. I suppose that does require letting a team run wild, but you have to acknowledge that 90% of the time they will fail and produce something that is worthless. If the business is big enough that they can make such bets (Google), then great. But for most tech companies I don't think it's justifiable.
Tbh, I actually prefer the C++ with your style guide than Rust. Now if C++ had a package manager and a crates.io equivalent, I wouldn't look back at Rust at all. Unfortunately, C++ is just too far behind.
(Btw, I gave a presentation about your style guide at my company two years ago, trying to convince people that we should be doing this stuff ;)
There are some differences in the details between KJ C++, and both Rust and my Rust-inspired C++ guidelines:
> Value types always have move constructors (and sometimes copy constructors). Resource types are not movable; if ownership transfer is needed, the resource must be allocated on the heap.
In Rust, all types (including resources) are movable.
> Value types almost always have implicit destructors. Resource types may have an explicit destructor.
What's an explicit destructor? Rust's File type closes upon destruction, and one criticism of the design is that it ignores all errors. The only way to know what errors occurred is to call sync_all() beforehand.
However, "Ownership" and "Reference Counting" (and "Exceptions" to an extent) feel very Rust-like.
> If a class's copy constructor would require memory allocation, consider providing a clone() method instead and deleting the copy constructor. Allocation in implicit copies is a common source of death-by-1000-cuts performance problems. kj::String, for example, is movable but not copyable.
When you include such a class in a larger structure, it breaks the ability for the outer class to derive a copy constructor automatically (even an explicit one, or a private one used by a clone() method). What's the best way to approach this?
> In Rust, all types (including resources) are movable.
Presumably not when pointers are pointing at them or their members.
In Rust, that is enforced by the compiler, but in C++ it is not. The rule that resource types are not movable is intended to provide some sanity here: this means a resource type can hand out pointers to itself or its members without worrying that it'll be moved at some point, invalidating those pointers.
> What's an explicit destructor? Rust's File type closes upon destruction, and one criticism of the design is that it ignores all errors. The only way to know what errors occurred is to call sync_all() beforehand.
I believe destructors should be allowed to throw, which solves that problem.
Obviously, this opinion is rather controversial. AFAICT, though, the main reason that people argue against throwing destructors is because throw-during-unwind leads to program termination. That, though, was an arbitrary decision that, in my opinion, the C++ committee got disastrously wrong. An exception thrown during the unwind of another exception is usually a side-effect of the first exception and could safely be thrown away, or perhaps merged into the main exception somehow. Terminating is the worst possible answer and I would argue is the single biggest design mistake in the whole language (which, with C++, is a high bar).
In KJ we let destructors throw, while making a best effort attempt to avoid throwing during unwind.
> When you include such a class in a larger structure, it breaks the ability for the outer class to derive a copy constructor automatically (even an explicit one, or a private one used by a clone() method). What's the best way to approach this?
In practice I find that this almost never comes up. Complex data structures rarely need to be copied/cloned. I have written very few clone() methods in practice.
I also absolutely love the kj style, but it's so different from basically everything else that I have a hard time incorporating anything in my daily work in the cess pit.
Some of the older glog code is pretty nice with regards to a very vanilla and portable treatment of macros https://github.com/google/glog/tree/master/src/glog
While I wouldn't necessarily recommend Boost as a model project / repo ( https://github.com/boostorg ), it's worth checking out to help understand why modern decisions were made the way they were.