Hacker News new | past | comments | ask | show | jobs | submit login
Mold 1.0: the first stable and production-ready release of the high-speed linker (github.com/rui314)
236 points by matt_d on Dec 15, 2021 | hide | past | favorite | 65 comments



FWIW, the author is also the original author of LLVM's "lld" linker, and has written multiple C compilers.

That makes them the author of both of the two fastest linkers in existence, AFAIK. This person has a very impressive resume.

https://github.com/rui314/mold/blob/main/docs/design.md

  > "Concretely speaking, I want to use the linker to link a Chromium executable (~1.8 GiB in size) just in 1 second. LLVM's lld, the fastest open-source linker which I originally created a few years ago, takes about 12 seconds to link Chromium on my machine. So the goal is 12x performance bump over lld. Compared to GNU gold, it's more than 50x."
There is good discussion as well in this prior post, and Reddit thread with the author in it a while back:

https://news.ycombinator.com/item?id=26233244

https://www.reddit.com/r/cpp/comments/kxvw5c/mold_a_modern_l...


Rui Ueyama is one of the most interesting coders of our age. chibicc in particular is one of the loveliest projects I've seen. I found it so captivating that when it got posted to HN last year I dropped what I was doing for a month just to hack on it. It's rare to find codebases with enough clarity to be educational. I learned so much reading his chibicc code. For example, in a world where the orthodox solution to these kinds of things is to use Bison or Antlr, the thought never would have occurred to me before that writing a parser in C could be so easy. That's how I was able to write an assembler for it. Rui is also living proof that the will and motivation to simplify also generalizes to the ability to create production-quality tools with superior performance, as evidenced by Mold. Very exciting to see his work take off.


This is some high praise coming from Justine Tunney.


Yeah, but he refused to protect his unicode identifiers, and just went with the C committee solution to provide no solution to Unicode identifier security. I expected better.


Are you confusing me with someone else? I have no idea what you are talking about.


He's probably talking about https://justine.lol/dox/ansic-identifiers.txt I'm not sure what he wants to see happen here but I remember him asking me for a timing safe memcmp function in Cosmopolitan a while back and it's hard to refuse a small reasonable demand.


I feel like this makes it even more confusing as to why this "drop-in replacement" is yet another new competing project rather than an upgrade/enhancement to an existing linker (such as lld)... has open source as a community concept failed so hard that people are unable to even contribute to and collectively improve projects they themselves started?


Why is Chromium so huge?


This might be a nerd-sniping question because there are lots of reasons and people will fight over which ones are most relevant. e.g. V8, rendering, myriad web protocols, security measures, etc. From my understanding at least.


Debug info is huge.


Are they using split dwarf? This avoids linking the debug info.


Back when I was using Gentoo, Chromium was the biggest dependency on my machine, by far. When there was an update I couldn't use my machine for like 30 minutes.


What would you do to shrink it?


That would require me knowing the answer to the question I asked.


It includes a mountain of stuff. Think of all the web APIs that exist.


I have been using mold every day for almost a year, and it has dramatically improved my quality of life by decreasing link times on the Rust project I work on from approximately 10 seconds to less than one second. This makes a big difference in keeping focus during the edit-compile-run loop. Thank you!


Maybe a dumb question but how would I go about using it with rust(c) and cargo? Would love to try it out, if it decreases total compile times.


RUSTFLAGS="-C linker=clang -C link-arg=-fuse-ld=/path/to/mold" cargo build

or persitant in your project:

.cargo/config.toml

[target.x86_64-unknown-linux-gnu] linker = "clang" rustflags = ["-C", "link-arg=-fuse-ld=/PATH/TO/mold"]


-fuse-ld has been replaced by --ld-path in clang 12+


Good to know, thanks.

fuse-ld still works though.


I wonder -- "-fuse-ld" has some somewhat surprising behavior in how clang ends up discovering the linker. I think that even if clang has a sibling `lld` in the same distribution, "-fuse-ld=lld" will pick "ld.lld" from the $PATH if it's present in there before the directory where clang and lld are installed.

So maybe that "--ld-path" option helps resolve ambiguity by expecting an explicit path instead of a linker name.


Thank you. Will try it later


"mold -run cargo build" would also work.


Warning: don't do this if you use rust-analyzer or any other IDE that uses cargo check.

The flags are part of the cache hash, so your IDE and cli constantly invalidate the cache and compile from scratch.


> decreasing link times on the Rust project I work on from approximately 10 seconds to less than one second.

But enough about writing “hello, world“ programs in Rust.

I kid.


I think this is an interesting model: https://github.com/rui314/mold/blob/main/LICENSE

I'm glad he's doing this as (A)GPL. It's great work and it's Free for the people. If you want the option for it to be private, feel free to step up and do the right thing.


> Note: I'm looking for a sponsor who wants to purchase the copyright of this work and relicense it under a more liberal license such as the MIT license. For now, mold is released under the GNU AGPL v3.

https://github.com/rui314/mold/blob/main/LICENSE


Why Affero? Are there cloud linker tools? Or simply because it is one of the most restrictive open source licenses?


If it were GPL, it would have been just the same as GNU linker. If it were MIT, no one wouldn't have an incentive to purchase the project. Some companies have a policy to not allow AGPL software at all in their orgs, which should give them more incentive to relicense, no matter whether the policy makes sense or not.


Given the goals stated I think it is because it is the most restrictive license.

In fact for many years this was the only value I saw in AGPL: to provide a best possible starting point for an upsell while claiming it was all about Free Software.


It's pretty clear we need AGPL when anything can be compiled to wasm soon.


Since WASM blobs would run in the end-user's browser, GPL is sufficient to allow the user to request source. AGPL is more useful for preventing e.g. cloud build services.


> most restrictive

It's the least restrictive for user freedom.


AGPL contrains the rights of the user. That's its job. The whole point is to be GPL, which prevents users from distributing modifications to the original program, with the addition of preventing users from offering their modifications as a service over the Internet.

MIT, Apache, and BSD have none of those restrictions on users. How is APL the "least restrictive"?


GPL does not prevent users from distributing modifications: it simply requires them to give _their_ users the same freedom if they do so. The AGPL further updates this for the cloud era.


> it simply requires them

This is the part where the (A)GPL constrains the rights of its users. Requiring user A to give user B more liberal licensing is constraining the rights of A for the benefit of B. Even if you think it's the "right" way, it's still a constraint.


There's certainly cloud IDEs.


I wonder about AGPL "infecting" proprietary CI pipelines.


Please read the AGPL very carefully before saying things like this. There is no reasonable interpretation where some cli program in part of a CI pipeline would infect the result of the pipeline (i.e. build artifacts) unless it actually put itself into the result. (E.g. gcc puts a small amount of code from libgcc into the executable, but that explicitly has a different license).

The "A" of the AGPL expands the distribution rules of the GPL to include network access but doesn't really expand the virality or combined-works parts. It "infects" your program just as much as any GPL program would.


I don't mean the build artifacts; I mean the CI build scripts and such.


Is that CI provided as a propietary service? If it applied, it would only apply to users of the CI service. So, publish your custom internal CI code to your own developers.


Dumb question: Why is this a separate project, instead of being an improvement to the LLVM linker (given they have the same author)? Does this have any chance of being adopted by LLVM?


Perhaps because it's not just for LLVM?

Given the comments in the repo README about it being a drop in replacement for the GNU linker, it looks like could be useful for several other scenarios too (but I defer to those who can explain this more clearly/correctly than I!)


According to the rejected patch to GCC to support arbitrary linkers [0], mold or any other linker is probably not working very well with GCC. So unless they're wrong, this doesn't really seem like a generic drop-in solution anyway, despite trying to be one.

[0] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573833.h...


At the very least, lld is pretty much drop-in. Aside from some subtle differences about resolution order (IIRC lld implicitly has "--start-group"/"--end-group"), everything just works with lld in place of ld.bfd. My recollection was that I had just about the same experience w/gold -- it was just a drop-in replacement.

All of my experience was with using clang as the driver, so I can't speak to why gcc has trouble making a similar feature work like expected.


The key part:

"Note, all these extra linkers (lld, mold) will not really work properly, gcc during configuration detects various assembler and linker properties on which it then relies on and I'm sure neither lld nor mold supports those features."


It can in fact link gcc-ada specific things and others.


It does not use any LLVM libraries and has no feature specific to LLVM, so I think it simply doesn't have a reason to be a subproject of it (or some other large umbrella project).


That doesn't stop other subprojects currently in the monorepo.

Rui, regarding -- "a sponsor who wants to purchase the copyright of this work and relicense it under a more liberal license such as the MIT license." Would you accept sponsorship for someone who wants to relicense under LLVM(apache)? And do you have a ballpark asking price?


If you take a look at the CONTRIBUTING.md, Mold has taken an approach that is pretty novel to me. Instead of requiring a CLA to allow for relicensing, all patches are required to be released as dual-license AGPLv3 and MIT. I personally love this approach, but it could make it difficult to relicense to anything but MIT.


MIT is a very permissive and compatible with lots of other open-source licenses. For example, you can sublicense MIT-licensed code under GPL. Not sure if MIT is compatible with the LLVM license, though.


I don't want to write an asking price here, and honestly I don't know how to valuate an open-source project. But I could have been working for Google as a staff engineer and enjoy a decent salary instead of doing this project, so, well...


Hmm, I thought you were at Google. Did you leave? Regardless of whether you're at Google I think you deserve fair compensation. But IMO the value of mold isn't limited by your opportunity cost of a salary. It's state of the art linker and someone like Apple/FB/etc can and should be willing to compensate you.


I left Google last year. And you are right; mold (or anything that I create) should not be valued at my opportunity cost but instead at its intrinsic value.


Anyone know the state of link time optimisation on this? Last time I checked it was advised to use this for dev builds only


I initially had the same question but seeing the author’s comments on this feature (which are generally positive) I personally have come to the conclusion that it is a misfeature.

LTO is not necessary for development builds, or builds where the speed of the write-compile-test loop matter. This is exactly the use case that mold was designed for and it is designed well. Adding LTO support would only add bloat to the code especially since mold’s techniques to improve the efficiency of linking don’t really apply to LTO (where link time is dominated by whole program analysis). I have no issue using the compiler’s native linker when doing production LTO builds and conceptually that makes more sense to me anyway.

I would prefer it if the author avoided features when doing so benefits innovating on the core use case.


LTO is planned, but not a priority. I think the author will work on macOS support next, before anything else.


Pre-1.0 discussion: https://news.ycombinator.com/item?id=26233244 (122 comments)


In the previous discussion we discussed how fast mold is at linking ELF files. I was also wondering (unvoiced) if it would also be possible to make a ridiculously fast linker for PE/COFF (Windows).

Somebody else asked the same question in an issue and the answer is yes, in principle, but it'll take a lot of work. That's not unlike how solid ELF support took a lot of work. I see in the v1.0 release notes, Windows support is nominally planned for v3.0.


> As soon as the second process writes a result file to a filesystem, it notifies the first process, and the first process exits. The second process can take time to exit, because it is not an interactive process.

Looks like a trade-off which may favor speed over reliability. If a syscall (like msync) in 2nd process will return an error there will no way to abort a build if 1st (main) process already has finished.


The second process doesn't technically report until it has closed references to the binary it is writing (last time I checked). If anything errors after that, it doesn't effect your created executable in any way.



Lots of interesting things in the design document: https://github.com/rui314/mold/blob/main/docs/design.md


Does this also work for embedded targets?


It does not intend to support linker scripts, which would be a problem for embedded. It also does not support LTO, and I dont think that a non-LTO link of an embedded executable can take any noticeable amount of time with any linker.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: