Hacker News new | past | comments | ask | show | jobs | submit login
Abseil: C++ Common Libraries (github.com/abseil)
241 points by ingve on Sept 26, 2017 | hide | past | favorite | 88 comments



A hidden perk of this project is the open source of "C++ Tips of the Week" (TotW internally). The plan, according to the page, is to post the revised and vetted TotW docs to the public.

https://abseil.io/tips/


I think one of the biggest issues for adoption will be the Bazel build system. The C++ world seems to be slowly standardizing on CMake. Having to use another build system to incorporate this library will give people second thoughts about incorporating this library.


Hi there, I work at Google on the Protocol Buffers team. The Abseil team will soon be adding CMake support (and automake, I hope). We need this for Protocol Buffers, which currently supports CMake and automake.


Why would you wish maintenance of either of those things on your coworkers?


As grandparent illustrates, this matters to users. So it matters to me too.


As one of your users, I'd really like to stop maintaining our own BUILD files for proto :-)


I don't follow what you are saying. Protobuf has a BUILD file: https://github.com/google/protobuf/blob/master/BUILD Do you mean something else?

A GitHub issue might be a better place for this discussion.


As another of your users, I really wish the proto_library BUILD rules were open-source :-)


If you feel like you are missing implicit dependencies on building haskell in your project, you could always just add it.


I wasn't aware Protocol Buffers had any relation to Haskell; would you mind explaining what you're referring to here?


Googler-on-googler trolling. The internal version of the build rule they're discussing has some busted dependencies. Because of historical cruft, it brings in a lot of unrelated crap. For example, building a C++ binary might build bring in Haskell dependencies because of a protocol buffer. This is being fixed, but it's been a thorn in our sides for years.


To expand a bit, the proto_library rules were "famously" broken in that it never knew which outputs were actually used (Or so i believe that was the cause, it's been a while)

Thus, it assumed the output was needed for any language specified in the proto library rule. So if your proto library rule offered a haskell version, anything using that proto library rule would do what was necessary (usually, building haskell stuff) to make that output.

The more people offered more languages in their rules, the worse it got.

I believe this was eventually fixed (i stopped coding in the main repo years ago, these days, i only hack on open source projects).

I expect one reason the rule was not open sourced was to not expose this cruft to the rest of the world :)


Glad to hear. This will make using it with our existing code base much easier.


It would be nice if Bazel could generate CMake projects.


Bazel make Cmake projects which then make make/xcode etc projects? If you have to program Bazel anyway why have it insert CMake?


so that other people can consume/build the library without having Bazel installed.

(disclaimer: I love Bazel; I wish more people would just not be afraid of its size)


I think the opposite - the descriptive language of bazel (buck, pants, build.gn and others too) is the key feature. I haven't seen anything that comes so close to defining in very precise, and clean manner your exact dependencies across multiple build platforms.

It is a foreign lang for folks used to MSBuild, .vcxproj/.sln, etc. - like the debug/release, or CRT options are not right away visible (just saw a commit 15-20 hrs ago in bazel to support dynamic CRT linking), but I'm sure this is coming.


My experience has been anything but positive in those systems. I spent ~3 hours trying to get build.gn to work for a "non-googler" config to build Chromium, gave up on it. Ditto Buck, hit some random dependency issue and would fail to run.

Honestly CMake gets a lot of hell but it's the only one I've seen work wide-scale. Short of a greenfield implementation (like say Rust's package manager) I don't think you'll do much better in C++.


CMake is the biggest reason I left C++ for Rust (there are others, of course), which is a shame because I actually came to like modern C++ (after years of using it). I just don't have time to maintain a complex build program (in a crappy, slow language to boot) when Cargo just works out of the box.


Cargo still doesn't work out of the box for linking against binary libraries.

That, alongside the lack of incremental compiler and linking, means my C++ work projects (native libraries for Java and .NET) actually do compile much faster than Rust occasional weekend projects.


Looks like incremental compilation hit beta in Feb: https://internals.rust-lang.org/t/incremental-compilation-be...

For me it's just fine that cargo doesn't support every odd use case of of the box in exchange for supporting the majority of use cases with virtually no effort. CMake hardly supports anything without a bunch of scripting--this includes reproducible builds, unit testing, etc.


I freaking LOVE Bazel. Convert every library I use from CMake to Bazel on first use. Fork + BUILD and you're good to go.


Bazel is not just a build system though. It’s also dependency management system, parallel test system, cross language build system, cross platform build system, and a bunch of other things, with build file syntax a 10 year old could learn in an hour. Some of those things you could plausibly do with CMake, but a lot more effort. Once you learn how to use it (which doesn’t take long) Bazel is pretty darn nice.


Nonetheless, Bazel is very much set up for Google's mentality of building giant static binaries from head and shipping them off to a server somewhere, where they'll be replaced in two weeks with the next release, and then never thought of again. There's some software out here in the world that works like that, but most of it doesn't.

In any case, CMake is _also_ a dependency management system, but works with libraries in the way that they are shipped on a typical Linux distribution. It lets you find common stuff easily, and has multiple extension points in terms of adding support for new libraries. CMake itself can ship your find module, the library you're linking against can ship it, or you can ship it yourself.


Trend wise, applications are expanding more and more into controlling its whole life-cycle from binary to deployment. Repeatable build Containers VM Unikernels Standardized libraries

There are other means to share build artifacts or code across language boundary, not necessarily through availability from OS distros. For example, Docker registry solves this issue with container images and overlay FS.

I think one way or another, Google's model has been proven to work for large-scale (probably the largest C++ code base on earth) c++ code base. In that I have my full confidence of adopting Google's proven approach.

disclaimer: Work at google, but not the c++ team


> Trend wise, applications are expanding more and more into controlling its whole life-cycle from binary to deployment. Repeatable build Containers VM Unikernels Standardized libraries

Maybe in the cloud, however that's only a portion of where you see C++ deployed. For embedded systems(where C & Rust are the only other options) I don't see it going this way anytime soon.


Aren't embedded systems already there? What is a firmware blob if not a monolithic static build artifact? Bazel is perfect for this. I recently taught it how to build boot images for Xilinx FPGAs including a bitstream, first stage bootloader, kernel, and initial ramdisk, all hermetically built and bottled up into a boot.bin. I can reproduce my firmware from source control and get out the same bits. How neat is that?


Speaking of which, any possibility of open sourcing this? We are looking to get into FPGAs and would rather not reinvent a bicycle.


Yes.

It's a little complicated as I did it for a personal project, but I'm also pulling it into Google for my 20% project. It is by FAR easier/faster for me to release under Google's copyright, but there's a bit of "paperwork" I need to go through to get it staged and approved for the Google GitHub.

Since it seems there's interest I'll take a stab at that on Monday. It usually takes a week or two, but no promises. I don't anticipate any blocking issues, though, as the bazel bits don't expose anything about Google.

One caveat -- as released it won't quite be bitwise hermetic, as it depends on an external Xilinx toolchain, and Xilinx does non-hermetic things to their bits that are awkward to work around (you can, but my code to do that is a much less mature shell script full of dd and cat commands to rip apart bitstreams and glue them back together -- I've since found open source tooling for accomplishing the same task, but haven't adopted it yet).

Drop me an email (addr in profile) and I'll tell you when it's up.


Repeatable builds are pretty universally desirable, why wouldn't you want it for embedded systems?

It seems pretty important for security, etc..

Note: Rust and cargo.lock files another example of things going in this direction.


> where C & Rust are the only other options

Java, Ada, Pascal, Basic also happen to be viable options, with compiler vendors still doing quite well.


> Trend wise, applications are expanding more and more into controlling its whole life-cycle from binary to deployment. Repeatable build Containers VM Unikernels Standardized libraries

Isn't this the reason behind the equifax breach? From what I understand (not a java dev) Apache struts is deployed along side the application, so the patch couldn't be applied until someone manually updated the dependencies and deployed a new version of the app.

If this was a "traditional" shared library on a distro then it would have been patched by the system administrators.

When you bundle dependencies via any method you are also bundling the exploits.


traditional shared libraries never have good enough interfaces to not break stuff. So tradtitionally, the shared library isn't updated anyway until the app is fixed and deployed for the new version of the shared lib.

Bundled dependencies are easy to handle from a security perspective.

Build stamping tells you what CLs are contained (or just go by monotonically incrementing number if it's bundled deps). Someone commits the CL that fixes the dependency. In any sane bundled system, this increments that number. You say that any binary in prod must be something past that CL (which can be verified using the build stamping). Done.

If it's running, it's fixed. Modify to have whatever grace period you want. The same works for shared dependencies, of course.

In practice, both shared dependencies and static dependencies have tradeoffs, but security is not one of them. Both can be and should be easy to make secure in the sense of "make sure it contains the fix".

Both may require app redeployment in practice, and expecting them not to is not a good strategy right now (for most programming languages)


> traditional shared libraries never have good enough interfaces to not break stuff. So tradtitionally, the shared library isn't updated anyway until the app is fixed and deployed for the new version of the shared lib.

I agree and disagree. Traditionally a lot of shared libraries are in the OS and quite good at not breaking things, you can run windows update or apt-get upgrade and apps shouldn't break. Breakages generally happen in higher level libraries, but that is a cultural issue more than a technical one. If more care is put into the interface and managing breaking changes then it is doable. But regardless of how libraries are being deployed "move fast and break things" is not an attitude I want from the developers of them.

Most language specific package managers I've used do a terrible job of telling developers which packages are obsolete and no longer supported or of what the ramifications for updating a package is. Most libraries in these repositories don't even support previous versions.

> Bundled dependencies are easy to handle from a security perspective.

But not from a user or corporate perspective. I shudder to think at how many old and unpatched libraries are currently in use in the corporate world where the have an attitude of "deploy, then don't touch until we request a feature". It's an uphill battle to even upgrade when there are no compatibility issues, if they need development time to update then they're likely to stick to an old version forever.

Users are unlikely to upgrade as well for various reasons, so if a shared library is bundled with an app it is unlikely to be patched.


However large and robust the deployment may be, it's still a narrow use case/environment compared. It may be very well outside the norm of C++ use cases, frankly. A lotta people make Windows apps still.


> Once you learn how to use it (which doesn’t take long) Bazel is pretty darn nice.

I've spent a few hundred hours integrating bits of CMake build scripts in to various Bazel builds.

If you're doing something which is explicitly catered for by Bazel itself then it works well, so long as you avoid the bugs.

Otherwise, you'll have to confront the hermetic build abstractions which prevent otherwise simple operations. Bazel toolchain files are just as bad as CMake, if not worse.

I'll give Bazel another try in a few years. For now, I still prefer CMake+Ninja, worts and all.


Hermetic is good though. Hermetic means reproducible.


> Hermetic means reproducible.

Hermetic is a subset of reproducible builds (bit-identical results). You can have a hermetic build and it calls getrandom() as a side-effect that ends up in the binary.

The only thing that bazel prevents is that your build rules do something like that accidentally, without it being glaringly obvious (eg. accessing $RANDOM, $$, /dev/urandom, time, ... in a shell script)


Use Docker?


Bazel is on its way to be more like Google’s internal implementation: distributed builds, object file cache, distributed tests. That means your C++ build and test runs will take barely any time at all, same as they do at Google. That, in turn, enables the kind of velocity unachievable through other means.


I seem to find many sources that Googlers spend heaps of time waiting for test runs. Apparently they're required for about everything you do with code at Google and are the but of dozens of jokes.

All of which makes me think that maybe Google's internal building of code isn't that fast. Or perhaps it's fast, but only in relative terms (as in you're still waiting a few minutes for hello world, but huge programs build "fast", as in a few dozen minutes)

I did test it on hello world, and euhm ... It's not fast. The first build of it is horribly slow (double digit minutes, wtf is it doing ?). Second build is "fast" which should be understood to still be double digit seconds. Running the same program in VSCode and just compiling it with the system-installed compiler is so fast you see the output before the VSCode has finished drawing the terminal window.

Reading the site I also get the impression that Bazel's singular focus is correctness. They don't mind things being ridiculously slow if it increases the odds of being a correct build.


I'm sorry but, "bwahahahha". C++ build and test times at Google are astronomical. It's one of the reasons go exists: they built it to compile and link faster.


If a minute to build and link half a million lines of C++ is “astronomical”, then sure. Try to build that on your local box, then get back to me. Go exists solely because Rob Pike had nothing better to do, and a lot of googlers hate Java.


Yeah, I actually build locally. It takes a while up front (about half an hour) to build Hello World but incrementals are fast.


Bazel does use namespaces on Linux. It's how it can prevent you from accessing files you didn't declare as dependencies.


Some more information is available:

https://abseil.io/

https://opensource.googleblog.com/2017/09/introducing-abseil...

Almost live coverage of Titus Winters's CppCon keynote where this was unveiled:

https://twitter.com/CppCon/status/912742896905863168


His plenary talk at CppCon is well worth it; when its posted online I highly suggest viewing, regardless on whether one uses Abseil or not..


For distro packagers, is the idea that this should have SONAME libabseil.so.0 and remain at libabseil.so.0 forever?

(I know Google's preferred option is that we all realize that concepts like "distros" and "shared libraries" and "SONAMEs" are so last century, and tbh they are, but we'll have them for a long time yet, and it's helpful for upstream libraries to declare what their preferred SONAME is so that you can use the same dynamically-linked compiled binary on multiple distros.)


Since abseil doesn't promise ABI compatibility over time, I think the notion is you don't ship libabseil.so.0 as part of a distro.

(I'm a Googler, but this is not my project.)


Ugh, I missed that they promise indefinite API compatibility but not ABI compatibility.

Distros aren't going to like "statically link it, and recompile all the reverse-dependencies when there's a CVE" at all, but I totally understand why that's a sensible approach for something that Google releases.


It’s hard to see the value here of dynamic linking vs static linking, especially when it seems like a lot of the value is compile-time selection of implementations and lightweight utility code—not exactly a huge attack vector you’d want to be able to upgrade independently.

And of course it introduces dependency hell, which seems like a poor fit for a library without a consistent ABI.


Can anyone say what's the significance/unique value prop of this library when compared to things like boost, bde, etc.?


A couple of my favorite types and functions in absl:

- Span

For when you want to take an array or a vector.

- string_view

Avoids string copies. Makes your program faster. Its creation was a reaction to situations like https://news.ycombinator.com/item?id=8704318. (It predates that revelation, but internally the plague of string copies has been known for some time.)

- Mutex

A better mutex with deadlock detection.

- Substitute

A string formatting function between 5 and 1000s of times faster than snprintf. Profile your code some time. If it's anything like mine, you might be surprised at how much this matters.

- base::Time and base::Duration

Excellent wall time utilities, similar to Go's wall clock time libraries.


"- Span For when you want to take an array or a vector."

How's this different from taking a pair of iterators?

"string_view"

After having read all those articles about c++17's string_view, I still don't really get the real differences. OK, when parsing (i.e. using many substrings) it makes sense (but then you'd usually work with char* anyway); and maybe some people interact with C libraries that only take char* a lot (and in cases where they have to 'cross the boundary' often). So maybe in text-heavy applications (like I imagine C++ application for the web would be) it makes some sense, but I fail to understand why it's causing this much excitement.

"Substitute"

There are so many fast formatting libraries already - what does this one offer over those? Because it's generally a trade off between completeness/type safety on the one hand, and speed on the other. I use boost::format which is quite slow in benchmarks, but again I don't do much string processing.

"base::Time and base::Duration"

In what way are these different from boost::date_time and std::chrono? Here again the documentation (for Abseil) seem to be missing.

I mean, I'm all for batteries-included libraries - but this just seems to be a few loosely-related classes slapped together and the only reason it's even on here is because it's from Google. There are dozens of similar libraries languishing on sourceforge and github. Compare it to e.g. POCO - this is not even in the same league.


> "- Span For when you want to take an array or a vector."

> How's this different from taking a pair of iterators?

Safety and (a little) convenience. If your function takes a pair of iterators, there's no way to ensure (at compile-time anyway) that both iterators are even pointing into the same array/vector. Also, spans can ensure, with bounds-checking, that only elements within a specified sub-range can be accessed.

Such bounds-enforcement is particularly important in cases like, for example, SaferCPlusPlus' "TRASectionSplitter", which is a data type that allows you to partition an array/vector/whatever into subsections that can each be safely accessed/modified concurrently from different threads. ("TRASectionSplitter" is not yet documented, but for those interested, example code can be found here[1].)

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus/blob/master...


> After having read all those articles about c++17's string_view, I still don't really get the real differences.

`string_view` is a much cleaner parameter type. `char*` and a `size_t` is two parameters. `string` is really a character buffer, accepting a character buffer as input is odd. `string_view` can be trivially constructed from most non-string data structures, including `vector<char>` and `array<char>`. And, now that it's standard, `string_view` is a very lightweight dependency to add to your interface compared to `boost` or hand-rolled alternatives.

To that last point, I'm not sure I'd use an abseil `string_view`, at least as an interface type, but I appreciate that they have migration to new standards as a design goal.


> I fail to understand why it's causing this much excitement.

C++ guys just like zero-overhead abstractions. The more we can get, the better. Passing std::string around has always been a sore point, and occasionally prompted misguided optimizations (like refcounting copy-on-write in g++).


Well yes, I'm what one would call a 'C++ guy' myself. Don't pass around, pass around const std::string&. What I see left and right is people saying 'you'll never pass a const std::string& any more!'. The way I see it: you'd pass a string_view in the cases where, in the past, you would have had to copy a part out of a string (which is rare, except in parsers), or when dealing with char* API's; what's better about passing a string_view than passing const string& ?


Nothing. But I would disagree that having to copy a part of a string, or having to pass part of a char buffer that is not a string, is all that rare, even outside of parsers.


> - Mutex

From their "Devguide":

Clients of Mutex must obey these rules:

1. Each time a thread acquires a Mutex it must later release it.

2. A thread may not attempt to release a Mutex unless it holds it.

3. A thread may not attempt to acquire an exclusive lock on a Mutex it already holds.

For basic sharing of resources between threads, using "access requesters"[1] can be safer and more convenient as they automatically take care of these rules for you.

And if you need to use the mutex directly, the SaferCPlusPlus library provides a "recursive_shared_timed_mutex"[2] (the one missing from the standard library), which allows a thread to hold multiple ("read" and/or "write") locks at the same time (relieving the "self-deadlock" issue). The mutex isn't documented, but it functions just as its name suggests.

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus#asynchronou...

[2] https://github.com/duneroadrunner/SaferCPlusPlus/blob/master...


Recursive mutexes? SaferCPlusPlus is not seriously recommending to replace standard/sane mutexes with their disfigured recursive cousins? Yuck.

I thought people agreed long ago that their only valid use-case was papering over broken/non-existing resource access schemes in applications of yore, so you could try to speed them up sprinkling magic multi-threading pixie dust.


SaferCPlusPlus does not recommend relying directly on mutexes at all. For most straightforward cases, you can use the "access requesters" to safely manage asynchronous access automatically.

Recursive mutexes are analogous to having multiple pointers (or iterators), some of which are "non-const", to an object in the same thread at the same time. Some suggest that this too is a bad idea. For example, the Rust language does not allow a "mutable" (i.e. "non-const") reference to an object to co-exist with any other reference to that object, even in the same thread.

SaferCPlusPlus does not necessarily disagree with this position, but it also does not require adherence to it, like Rust does. So if SaferCPlusPlus is going to allow multiple pointers (or iterators) to a shared object in the same thread, then it's going to need to lock the mutex protecting the object multiple times from the same thread. Giving each pointer/iterator its own separate lock, as opposed to having one lock encompass the all the pointer/iterators, allows the locks to be managed automatically, which ensures against data races, and that resource locks will be released as soon as it is safe to do so.

Again, there's rarely any reason you'd need to interact with the mutex directly. It's primarily there to support "access requester" functionality.

Also, note that this is a recursive shared mutex, not just a recursive mutex, which means that, for example, it provides a kind of "upgrade mutex" functionality. So if a thread has a "read" lock, it can obtain a "write" lock (blocking if necessary), without having to give up its read lock. Then when it's done with the write lock, it can release it without fear of losing its read lock (or blocking).


Thanks for the clarification. The analogy to multiple references (with ≥1 of them allowing for mutation) is an interesting one, though I don't quite understand why having that would imply a need for locking the same mutex multiple times. Will read up on the "access requester" mechanism though.


To my eyes, this code is much less clear than code that uses absl mutexes. Also, to my understanding, reentrant mutexes are much slower than ordinary mutexes, and are never necessary.

But, I have been using absl mutexes for more than ten years, so I'm a bit biased.


Yeah, I think it might be a familiarity bias.

Anyway, the point of access requesters are that they are much safer than manually protecting resources with mutexes. That is, using access requesters eliminates the possibility of data races[1]. Which is important because data races can be particularly insidious bugs.

Just like how smart pointers can improve safety by automatically managing object lifetimes, access requesters can improve safety by automatically managing object lifetimes and asynchronous access.

[1] As long as you adhere to the rule that shared objects not have "unprotected" mutable or indirect (i.e. pointers/references) members.


The "Why Adopt Abseil" page does a pretty good job of laying out the value proposition:

http://abseil.io/about/philosophy


To take from javascript, this is a mix of underscore (utility) and, uhh, babel(?) for essentially polyfilling certain runtime features slated for future (hypothetical) approval.


The text is so faint on this web page I hardly want to get through it...


I'm having trouble reading your post, too!


I wonder why? HN has very readable text.


I noticed that too, so I filed a bug in their repo: https://github.com/abseil/abseil.github.io/issues/46


Reader view to the rescue


It's a sad commentary that this is now the de facto correct answer.

Optics => default to _Designer view_. Superflat everything; no affordances.

Pretty! Clean! Broken.


Readable text is overrated in 2017.


Can't speak to Abseil but I won't let boost within 10,000 feet of any codebase I control.

I value my compile times more than anything it provides.


Boost contains over 120 libraries and counting, with over 70 authors and maintainers. There are many parts of Boost that I will never use (e.g. most of the pre c++11 libs), but the pros of having it available when you need it far outweigh the cons.

If a particular library is costing you more in compile time than it's worth in convenience, just rm -rf it from your Boost install -- better yet, write a script to prepend `static_assert(false, "this header is banned by decree of vvanders");` to the top of the banned headers.

There's plenty in Boost to love -- don't let a few rotten apples spoil the bunch.

Full disclosure: I'm a Boost developer


So I should hack apart my install just to get to some basic level of compiler performance?

Last time I looked at boost pulling in any header would pull in all of them, that's why it takes so long to compile. In this day with any modern C++0x11 compiler I don't really see the need for boost.


> Last time I looked at boost pulling in any header would pull in all of them

This has literally never been true.


Just include a header and see how it affects your compile time. I can't confirm what you're saying for example.


I value my developing times more than anything. Boost is one library which is developed and tested on almost every c++ target platform, providing us with stable, tested, and performing common utilities to speed up your developments and run anywhere.

Reasons not to use boost: - My code is one monolithic spaghetti and I do not use unit tests. - I do not want to learn Boost. - I don't trust template hippies.

Not using Boost is an irrational decision and you know it.



Is there any documentation?

This looks like it might be good, but I don't know how to tell whether it is. The "Codemap" is just links to repo directories. The "Abseil Introduction" link leads to basically to another version of the README. It does have an "Abseil C++ Quickstart" link, which tells you how to get the package, but doesn't tell you what's in it.

I see things about design philosophy, compatibility, etc. But what about: what functions can I call, what do they do, and how do I call them?

Is there any way to find out what's in this thing, other than reading source files?

------------------

EDIT. Style guidelines, exceptions policy, leaking checking, platforms guide, linking your code, feature check macros, yadda yadda yadda ....

WHAT FUNCTIONS CAN I CALL, WHAT DO THEY DO, AND HOW DO I CALL THEM?

(Sorry. I'll go calm down now.)


https://abseil.io/docs/cpp/guides/

This provides partial coverage, for some of the most useful bits. I believe additional docs are in the works.


Very partial, yes. But thanks.


It looks like most of the libraries are C++11 implementation of some of the C++14/17 standard libraries.


I wouldn't say "most", though this is definitely a slice of what is offered.

Some of this is also laying the groundwork for the release of further APIs, as well as making it easier for Google to release/support higher-level libraries (eg. Protobuf, TF, etc).


Whatever happened to the B-Tree library? Will it be included in Abseil's container lib some day?


It's still available in the Google Code archive: https://code.google.com/archive/p/cpp-btree/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: