Hacker News new | past | comments | ask | show | jobs | submit login
Demangling C++ Symbols in Rust (fitzgeraldnick.com)
245 points by andrew3726 on Feb 22, 2017 | hide | past | favorite | 44 comments



Tom Tromey, a GNU hacker and buddy of mine, mentioned that historically, the canonical C++ demangler in libiberty (used by c++filt and gdb) has had tons of classic C bugs: use-after-free, out-of-bounds array accesses, etc, and that it falls over immediately when faced with a fuzzer. In fact, there were so many of these issues that gdb went so far as to install a signal handler to catch SIGSEGVs during demangling. It “recovered” from the segfaults by longjmping out of the signal handler and printing a warning message before moving along and pretending that nothing happened. My ears perked up. Those are the exact kinds of things Rust protects us from at compile time! A robust alternative might actually be a boon not just for the Rust community, but everybody who wants to demangle C++ symbols.

Then, later:

Additionally, I’ve been running American Fuzzy Lop (with afl.rs) on cpp_demangle overnight. It found a panic involving unhandled integer overflow, which I fixed. Since then, AFL hasn’t triggered any panics, and its never been able to find a crash (thanks Rust!) so I think cpp_demangle is fairly solid and robust.

That's what I like to see. Targeted useful reimplementations in Rust that play well to its strengths. In this case, as a double benefit to both the Rust ecosystem and to anyone that wants a robust demangling library.


LLVMs libcxxabi has a demangler which doesn't have the worst looking code in the world[0], has no external dependencies (outside of the C++ standard library), has lately been fuzzed[1] and is tested[2].

Switching languages is cool, but the Rust code is actually longer and still uses a hand written parser, so how can you be sure it is any more correct or won't eat all your memory?

[0] http://llvm.org/viewvc/llvm-project/libcxxabi/trunk/src/cxa_...

[1] http://llvm.org/viewvc/llvm-project/libcxxabi/trunk/fuzz/cxa...

[2] http://llvm.org/viewvc/llvm-project/libcxxabi/trunk/test/tes...


The best way is to make sure it's correct to every test case you can find. I.e. include the test s you point to there along with the tests from libiberty etc. That'll at least let you say that's it's as provably correct as the others. The next advantage is going to be that Rust helps prevent certain classes of bugs (in this case memory issues) as long as you're not doing unsafe{} things. And while that rests on Rust's ability to prove that it's a step up from not having that. The handwritten parser doesn't mean much in the advent of testing against a wide array of inputs, if it was parsing incorrectly then it's not going to pass the tests. Sure there might be an edge case where it just recurses but that's also the case of libcxxabi and the others. The code being longer also means basically nothing. It just means that either the code is more verbose or could be shortened somehow it says nothing about it's correctness or viability.


It's true that one can't be sure the Rust version is "any more correct and won't eat all your memory".

However, an even more important question is how many exploitable bugs remain in the LLVM library vs the Rust library. The data suggests that the answer for Rust is zero but the answer for LLVM is greater than zero.


> The data suggests that the answer for Rust is zero but the answer for LLVM is greater than zero.

Well, it's been fuzzed, so there's a pretty decent chance that the LLVM code has zero exploitable flaws. Also, the C++ demangler isn't as security-sensitive as many other more important projects.

That said, as a general rule raw C-style string parsing, which that C++ code is an example of, is asking for trouble. It's painful to write, painful to maintain, painful to debug, and, when it fails, fails in the worst possible way. In my opinion there's little benefit to using C++ for these workloads.


I don't think "it's been fuzzed" is really a binary signal. Every fuzzer has blind spots, areas of the state space that are difficult to access with a reasonable amount of trials.


>That said, as a general rule raw C-style string parsing, which that C++ code is an example of, is asking for trouble. It's painful to write, painful to maintain, painful to debug, and, when it fails, fails in the worst possible way. In my opinion there's little benefit to using C++ for these workloads.

AFAIK Valgrind can catch many memory access related errors. And static analyzers can point out where such errors could happen.

From a different perspective: Why don't people do pascal stile strings with C or C++ (and such) ? I think even i could make a library that does all the string.h things but with length-prefixed strings. People have been complaining for years now about C stile strings (that have been widely used for decades now). But nobody ever did anything about it ?

PS I, myself, don't find it "painful" to use C stile strings. A few basic precautions (like using strncmp(bla,blaa,buffer_size)) and your fine. Off-by-one is more of a problem, at least for me at night (valgrind and llvm static analyzer point even those out).


> AFAIK Valgrind can catch many memory access related errors. And static analyzers can point out where such errors could happen.

True, but Valgrind doesn't exist in every OS that one might need to work with, specially embedded ones, and according to Herb's talk at CppCon 2015, only 1% of the audience was using any form of static analyisis tooling.

The world of C and C++ programming, in typical 9-5 enterprise jobs is quite different from HN ideals about code quality.

> I think even i could make a library that does all the string.h things but with length-prefixed strings. People have been complaining for years now about C stile strings (that have been widely used for decades now). But nobody ever did anything about it ?

Because it takes a big effort to make such library part of ANSI C.

> PS I, myself, don't find it "painful" to use C stile strings. A few basic precautions (like using strncmp(bla,blaa,buffer_size)) and your fine.

You already did a possible security exploit on your example, as you need to guarantee 100% of the time that bla and blaa sizes are always less or equal than buffer_size.

When working with a team it suffices that another person changes your code without accounting for this invariant.


>You already did a possible security exploit on your example, as you need to guarantee 100% of the time that bla and blaa sizes are always less or equal than buffer_size.

bla and blaa buffer sizes, yes. I mostly string file names/paths, that have NAME_MAX and PATH_MAX. So my excuse here is that all my buffers have the same size (not the best excuse, i know). PS I always sanitize on input if the input can be anything (network/ipc sockets, for example).

>True, but Valgrind doesn't exist in every OS that one might need to work with, specially embedded ones, and according to Herb's talk at CppCon 2015, only 1% of the audience was using any form of static analyisis tooling.

Quick google shows some tools for windows. I don't know how good they are. Clangs static analyzer seems to work on windows, for static analysis. But if it compiles on linux/bsd/osx then people should check before shipping the code.

This just reminded me of the latest Linus rant[0].

Then there is fuzzing and formal verification, that are too much for "normal" programs (these are not C specific).

>The world of C and C++ programming, in typical 9-5 enterprise jobs is quite different from HN ideals about code quality.

As is in the world of hobby C and C++ programming. And python, and ruby, and haskell, and... It just shows more in C. But C is still the king when it comes to portable and efficient programs, and that won't change soon (although there is less need for such programs, as "embedded" today equals "has only 512MB RAM").

[0] http://lkml.iu.edu/hypermail/linux/kernel/1702.2/05171.html


  People have been complaining for years now about C stile strings (that have been widely used for decades now). But nobody ever did anything about it ?
There have been many efforts (http://www.and.org/vstr/comparison), but ultimately they all failed to find widespread adoption because they're solving the wrong problem.

For any non-trivial data munging operation experienced C programmers quickly ditch C strings in favor of vectors. The simplest approach employs a simple pointer and length (or boundary) tuple; i.e. a vector. You can get fancy by wrapping them in a struct, but even that isn't necessary and, especially wrt API design, often needlessly forces interface users to create temporary "slice" objects with an annoying type peculiar to the interface. Newer languages offer nicer interfaces for slices, but simple pointers are hard to beat in terms of simplicity and usability. (The only problem with simple pointer derivation and manipulation a la C is that they're difficult for a compiler to both verify the correct use of _and_ to aggressively optimize. You must choose one or the other. Requiring the user to use specialized, compiler sanctioned primitive aggregate types in languages like Rust is a way to meet the compiler half-way.)

Also, for complex parsing tasks experienced C programmers will often code a straight-up state machine, or leverage a parser generator. In both cases C-style NUL terminated strings aren't even visible in the rearview mirror.

IME, I've found that parsing of data is one of the areas where C excels. And by parsing I don't mean attacking data with regular expressions. Likewise, for creating complex data structures like graphs C excels, especially when you want to employ intrusive data structure patterns for efficiency and clarity. Pointers are wonderful abstractions that way.

There are a lot of difficulties with C, particularly regarding memory management. But string processing is not one of them, except for programmers for whom at that moment parsing is synonymous with crude hacks using regular expressions or the limited interfaces for C-style strings. It's self-inflicted. The solution doesn't require a complex framework. Addressing the issue merely requires reframing the task. Fortunately, when reframing is too burdensome, for quick and dirty string hacking there are plenty of alternative languages.


> The data suggests that the answer for Rust is zero

Let's not go overboard. Claiming the number of bugs is zero in any non-trivial piece of software is generally a losing proposition, no matter the language.

That said, I think I know what you're trying to express, and I would say it like this: The number of bugs in a program written in Rust compared to C or C++ should be statistically less, all other things being equal. Certain classes of bugs are impossible with Rust at best, and easier to locate or isolate at worst (assuming you haven't wrapped the entire program in one giant unsafe block).


For bugs in general sure, but I specifically said "exploitable bugs" which is a different story. I don't think it's going too far to say that the probability of a Rust compiler bug, or a bug in unsafe Rust library code, leading to an exploitable issue in the Rust demangler, is low.


The implication being that it's impossible to write exploitable code without using unsafe? First, that's not a claim I've heard made, second, I'm far too conservative (having been around long enough) to believe Rust code in general can't be exploited, given enough time and effort by smart people, even if that's the current belief. Whole new classes of exploits occasionally pop up, and you can't reliably protect from that which you know nothing about.


I don't know if does not have the worst looking code in the world, but in my opinion it is far far worst that what I can bear with. It should die, the sooner the better, and we'll be happy for anyone to provide a replacement. If this wasn't written in Rust I'd adopt it in LLVM!


That brings up an interesting question, since it sounds like you might have some inside information: Is LLVM amenable to using a library for this functionality, or is it considered something which must be maintained in-project? I ask because at the point that something is compiled to a library, the source implementation shouldn't really matter (ignoring silly things like embedding a VM), so at that point this being implemented in Rust shouldn't matter too much (beyond it possibly being less familiar to the LLVM devs).

Not that I'm actually advocating for that at this point. This is a new library, and as stated in the blog still has some differences in output compared to other similar libraries.


You have to understand that one of LLVM goal is to provide a full toolchain, through reusable library components. So we're seeking to provide our own solution for this, and we're actually providing this feature as a library for a long time. If you're on macOS, it is the default libc++-abi and you can have it available on any macOS system:

  $ nm /usr/lib/libc++abi.dylib  | grep cxa_demangle
  00000000000018e0 T ___cxa_demangle
Which means you can compile the following c++ file with clang++ and gets demangling from the system:

  #include <iostream>
  extern "C" char *__cxa_demangle(const char *mangled_name, char *buf, size_t *n, int *status);
  int main() {
    std::cout << "Demangled: " << __cxa_demangle("_Z17GetExecutablePathPKcb", nullptr, nullptr, nullptr) << "\n";
  }
We need a great demangler in LLVM because we're using it in the LLDB debugger for instance. The LLVM one is quite slow actually, if anyone is interested in writing a (clean) very performant C++ one within LLVM, I'm volunteering to help the review and integration :)

(edit: formatting)


> You have to understand that one of LLVM goal is to provide a full toolchain, through reusable library components

Fair enough, that makes sense. If you are aiming to be a full toolchain, it's useful to have it in-project.

Edit: Whoops! s/fool/full/


> If this wasn't written in Rust I'd adopt it in LLVM!

Would it be possible to translate the Rust code into C++ and thus basically get the Rust safety in C++?


Revive the C backend to LLVM that was discontinued in 2012 for being too buggy and compile Rust straight to C.


LLVM is coded in C++, and C++ has higher-level constructs to which Rust constructs should be relatively easy to map safely and with a readable end result.


Coincidentally, I recently discovered that code I landed in the mozilla repo a couple of weeks ago crashes gdb when it tries to demangle (and c++filt on the command line). Looks to be infinite recursion. It's still in the current Nightly binary.

_ZN2js18CompartmentChecker5checkIN2JS8GCVectorI4jsidLm0ENS_15TempAllocPolicyEEEEEN7mozilla8EnableIfIXsrNS7_6IsSameIDTclptcvPT_LDn0E5beginEEDTclptcvSB_LDn0E3endEEEE5valueEvE4TypeERKSA_

coming from

    template <typename Container>
    typename mozilla::EnableIf<
        mozilla::IsSame<
            decltype(((Container*)nullptr)->begin()),
            decltype(((Container*)nullptr)->end())
        >::value
    >::Type
    check(const Container& container) { ... }


Out of curiosity, what version of c++filt are you using?

I tried with 2.27.51 (Debian), and it works:

    mozilla::EnableIf<mozilla::IsSame<decltype ((((JS::GCVector<jsid, 0ul, js::TempAllocPolicy>*)((decltype(nullptr))0))->begin)()), decltype ((((JS::GCVector<jsid, 0ul, js::TempAllocPolicy>*)((decltype(nullptr))0))->end)())>::value, void>::Type js::CompartmentChecker::check<JS::GCVector<jsid, 0ul, js::TempAllocPolicy> >(JS::GCVector<jsid, 0ul, js::TempAllocPolicy> const&)


2.26.1-1.fc25 from Fedora 25


> Linkers only support C identifiers for symbol names.

This is only true of UNIX system linkers, before the FOSS and UNIX clones wave, it was common for each compiler to have its own language specific linker.


Kind of a blessing and a curse- reduces links to a common subset of functionality, but also makes it trivial to link together code from any languages that support the platform's C ABI.


Can it help to implement C++ bindings for Rust?


My understanding is that yes, this is part of that puzzle. The author of this post works on https://crates.io/crates/bindgen


How bad are the rules for MSVC compared to the Itanium ones and would you consider adding support for it too?

Not that I have a use case in mind or anything, just curious.


Pretty bad, actually. There is no official documentation of the name mangling, so everything that is known is reverse engineered. In particular, there are outright bugs in the name mangling scheme, and one of the rules appears to rely on hashing the function body to produce a result. I'd give you examples, but, I don't have any of them on hand (I'm mostly relying on recollection from conversations with David Majnemer).


one of the rules appears to rely on hashing the function body to produce a result

Gross! I wonder what that's for. Somebody's hack to allow same-name/same-interface header inline functions to coexist?


The Microsoft name mangling scheme is explained in Agner Fog's PDF on calling conventions: http://www.agner.org/optimize/calling_conventions.pdf


That's really only a fraction of it. The only definitive documentation that I know of is clang's implementation of its mangling scheme: https://github.com/llvm-mirror/clang/blob/master/lib/AST/Mic...


> These days, almost every C++ compiler uses the Itanium C++ ABI’s name mangling rules. The notable exception is MSVC, which uses a completely different format.

You stay classy, Microsoft.

> Its not just the grammar that’s huge, the symbols themselves are too. Here is a pretty big mangled C++ symbol from SpiderMonkey [...] That’s 355 bytes!

Here's a >4kB symbol I encountered while liberating some ebooks from an abandoned DRM app:

    tetraphilia::transient_ptrs<tetraphilia::imaging_model::PixelProducer<T3AppTraits> >::ptr_type tetraphilia::imaging_model::MakeIdealPixelProducer<tetraphilia::imaging_model::XWalkerCluster<tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalkerList3<tetraphilia::imaging_model::const_UnifiedGraphicXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits>, 0ul, 0, 1ul, 0ul, 0, 0ul, 0ul, 0, 0ul, 1ul>, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::const_IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::OneXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::OneXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 0ul> > > >, tetraphilia::TypeList<tetraphilia::imaging_model::XWalkerCluster<tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalkerList3<tetraphilia::imaging_model::const_UnifiedGraphicXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits>, 0ul, 0, 1ul, 0ul, 0, 0ul, 0ul, 0, 0ul, 0ul>, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::const_IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::OneXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::OneXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 0ul> > > >, tetraphilia::TypeList<tetraphilia::imaging_model::XWalkerCluster<tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalkerList3<tetraphilia::imaging_model::const_UnifiedGraphicXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits>, 0ul, 0, 1ul, 0ul, 0, 0ul, 0ul, 0, 0ul, 1ul>, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::const_IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::const_IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> > > >, tetraphilia::Terminal> >, T3AppTraits, tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits>, tetraphilia::imaging_model::SeparableOperation<tetraphilia::imaging_model::ClipOperation<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> > > >(tetraphilia::ArgType<tetraphilia::TypeList<tetraphilia::imaging_model::XWalkerCluster<tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalkerList3<tetraphilia::imaging_model::const_UnifiedGraphicXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits>, 0ul, 0, 1ul, 0ul, 0, 0ul, 0ul, 0, 0ul, 1ul>, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::const_IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::OneXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::OneXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 0ul> > > >, tetraphilia::TypeList<tetraphilia::imaging_model::XWalkerCluster<tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalkerList3<tetraphilia::imaging_model::const_UnifiedGraphicXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits>, 0ul, 0, 1ul, 0ul, 0, 0ul, 0ul, 0, 0ul, 0ul>, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::const_IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::OneXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::OneXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 0ul> > > >, tetraphilia::TypeList<tetraphilia::imaging_model::XWalkerCluster<tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalkerList3<tetraphilia::imaging_model::const_UnifiedGraphicXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits>, 0ul, 0, 1ul, 0ul, 0, 0ul, 0ul, 0, 0ul, 1ul>, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::const_IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> >, tetraphilia::imaging_model::GraphicXWalker<tetraphilia::imaging_model::const_IgnoredRasterXWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 0ul, 0, 1ul, 1ul>, tetraphilia::imaging_model::const_SpecializedRasterXWalker<unsigned char, 2ul, -1, 3ul, 3ul> > > >, tetraphilia::Terminal> > > >, T3AppTraits::context_type&, tetraphilia::imaging_model::Constraints<T3AppTraits> const&, tetraphilia::imaging_model::SeparableOperation<tetraphilia::imaging_model::ClipOperation<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> > >, tetraphilia::imaging_model::const_GraphicYWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> > const*, tetraphilia::imaging_model::const_GraphicYWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> > const*, tetraphilia::imaging_model::const_GraphicYWalker<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> > const*, tetraphilia::imaging_model::SegmentFactory<tetraphilia::imaging_model::ByteSignalTraits<T3AppTraits> >*)


> You stay classy, Microsoft.

I believe Microsoft's mangling scheme is actually older than gcc's current one. While gcc changed its name mangling everywhere to a new one based on Intel's IA-64 ABI, MSVC probably kept its own unchanged from compiler release to compiler release. I don't recall the reason for gcc changing its name mangling; perhaps better standard compliance?

IIRC, the MSVC scheme has an annoying property, in that "class foo" and "struct foo" have different mangling, while they're supposed to be completely interchangeable according to the C++ standard (other than the default access being "private" for class and "public" for struct).


If you want some huge symbols, introduce Boost Lambda into a project. Doing a single for_each with a Boost Lambda can introduce symbols of several 100k.


Nice! I'm not near a C++ environment atm, but I promise an upvote to anyone who digs up one of these beasts for us to gawk at!


This is one that I found from running nm on a binary:

    _ZN5boost6detail7variant15visitation_implIN4mpl_4int_ILi40EEENS1_20visitation_impl_stepINS_3mpl6v_iterINS7_6v_itemI25SelectedEntityChangedDataNS9_IN11InputAction17ServerCommandDataENS9_INSB_18SetupBlueprintDataENS9_INSB_13BuildRailDataENS9_I2IDI20CustomInputPrototypetENS9_IN10ActionData22TrainWaitConditionDataENS9_INSI_18TrainWaitConditionENS9_INSB_22BuildTerrainParametersENS9_I27DeciderCombinatorParametersNS9_I30ArithmeticCombinatorParametersNS9_INSB_18PlayerJoinGameDataENS9_INSB_7CrcDataENS9_INSB_20SetBlueprintIconDataENS9_I20AbilitySpecificationNS9_INSB_17TakeEquipmentDataENS9_INSB_18PlaceEquipmentDataENS9_I6VectorNS9_IdNS9_ISF_I9ItemGrouphENS9_INSB_15MarketOfferDataENS9_ISsNS9_INSI_33BehaviorModeOfOperationParametersENS9_INSI_17TrainScheduleDataENS9_INSB_18GuiTextChangedDataENS9_INSB_14GuiChangedDataENS9_INSB_12GuiClickDataENS9_INSB_20SelectItemParametersENS9_INSI_24LogisticFilterSignalDataENS9_INSI_22LogisticFilterItemDataENS9_ISF_I19TechnologyPrototypetENS9_INSI_10SignalDataENS9_INSI_26CircuitConditionParametersENS9_INSB_19SetFilterParametersENS9_INSB_19BuildItemParametersENS9_INSB_16CancelCraftOrderENS9_INSB_9CraftDataENS9_I13ShootingStateNS9_IhNS9_ItNS9_IjNS9_IbNS9_ISF_I13ItemPrototypetENS9_ISF_I15RecipePrototypetENS9_I28ItemStackTargetSpecificationNS9_I11RidingStateNS9_I9DirectionNS9_INSB_14SelectAreaDataENS9_I12RealPositionNS7_7vector0INS3_2naEEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELi0EEELl40EEENS8_IS32_Ll48EEEEENS1_14invoke_visitorINS1_11get_visitorIKS10_EEEEPKvNS_7variantINS1_13over_sequenceINS7_8vector48IS1N_S1M_S1L_S1K_S1J_S1I_S1G_bjthS1E_S1D_S1C_S1B_S1A_S19_S18_S17_S15_S14_S13_S12_S11_S10_SZ_SY_SsSX_SW_dSU_ST_SS_SR_SQ_SP_SO_SN_SM_SL_SK_SJ_SH_SE_SD_SC_SA_EEEEJEE18has_fallback_type_EEENT1_11result_typeEiiRS3K_T2_NS3_5bool_ILb0EEET3_PT_PT0_
The c++filt output is 13,776 characters, so I can't easily copy it here.


Well, my memory was a bit wrong (it's been about 6 years since I last looked at this), and it's not as bad as I remember. It's not great, mind you, but here are the binary sizes for a simple program:

Boost Lambda (unstripped): 44176 bytes

Boost Lambda (stripped): 18808 bytes

C++11 Lambda (unstripped): 24896 bytes

C++11 Lambda (stripped): 14712 bytes

Test program is pretty simple:

  #include <algorithm>
  #include <iostream>
  #include <vector>
  #include <boost/lambda/lambda.hpp>

  using namespace boost::lambda;

  int main(int, const char**)
  {
    std::vector<int> v;

    for (int i = 0; i < 100; ++i)
      v.push_back(i);

    //std::for_each(std::begin(v), std::end(v), std::cout << _1 << constant('\n'));

    std::for_each(std::begin(v), std::end(v), [](int i) { std::cout << i << '\n'; });

    return 0;
  }
I'm using Boost 1.58.0 and GCC 5.4.0 with -std=c++1y flag only to get the numbers above.

(edit: formatting)


> Here's a >4kB symbol I encountered while liberating some ebooks from an abandoned DRM app:

Scrolling the quoted symbol on mobile was one of the most hilarious moments I had on HN. Thanks.


"Fun" game: find the opening parenthesis without using the find tool.


>> These days, almost every C++ compiler uses the Itanium C++ ABI’s name mangling rules. The notable exception is MSVC, which uses a completely different format.

> You stay classy, Microsoft.

This is only true on the HN universe of clang, gcc and MSVC++ trio.

Out there in the real world, there are plenty of C++ compilers being used.

https://en.wikipedia.org/wiki/List_of_compilers#C.2B.2B_comp...


> You stay classy, Microsoft.

I'd argue that it has to do with backwards compatibility, but every version of MSVC breaks binary compatibility anyway.



I really enjoy reading these pieces where someone rewrites something in rust and it turns out better than the old C version due to rusts safety features. Usually those kinds of projects are just a rewrite to someones favorite language of the month, for little tangible benefit other than their own satisfaction or education. These rust ones seem to show tangible benefits to the language itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: