Hacker News new | past | comments | ask | show | jobs | submit login
Google open-sources Rust crate audits (googleblog.com)
256 points by taintegral on May 23, 2023 | hide | past | favorite | 48 comments



>Before a project starts using a new crate, members usually perform a thorough audit to measure it against their standards for security, correctness, testing, and more.

Do they? I mean really? Let's lay aside the fact that it's almost impossible to eyeball security. I just cannot imagine that Google works so differently to every company I've ever worked at that they actually carefully check the stuff they use. Every company I've worked at has had a process for using external code. Some have been stricter than others, none have meaningfully required engineers to make a judgement on the security of code. All of them boil down to speed-running a pointless process.

And that leaves apart the obvious question: I want to use a crate, I check it 'works' for what I need. Some middle manager type has mandated I have to now add it to crate audit (FYI, this is the point I dropped importing the library and just wrote it myself) so I add it to crate audit. Some other poor sap comes along and uses it because I audited it, but he's working on VR goggles and I was working on in-vitro fertilization of cats and he's using a whole set of functions that I didn't even realise were there. When his VR Goggles fertilize his beta testers eyes with cat sperm due to a buffer overflow, which of us get fired?


Some useful context here:

https://chromium.googlesource.com/chromiumos/third_party/rus...

Seems there are 3-4 folks who helped build this and spent a lot of time doing initial audits; they outsource crypto algorithm audits to specialists.


Before the layoffs I worked on a security checks team (“ISE Hardening”) at Google. Google requires for almost all projects that code is physically imported into the SCS; when this code touches anything at all, extremely stringent security checks run at build-time.

These checks often don’t attempt to detect actual exploit paths, but for usage of APIs that simply may lead to vulnerability. These checks can only be disabled per file or per symbol and per check by a member of the security team via an allowlist change that has to be in the same commit.

This is not perfect but is by far the most stringent third party policy I’ve seen or worked with. The cost of bringing 3p code into the fold is high.

The flipside of this is that Google tech ends up with an insular and conservative outlook. I’d describe the Googl stack as ‘retro-futuristic’. It is still extremely mature and effective.


Like many here I haven't seen the Google sausage being made, but I've had many Googler coworkers and friends over the years. I've learned that they may really be in another universe (e.g. put every single line of code over all space and time in the same SCCS, oh and write a new kind of build system while you're at it because otherwise that...doesn't work). So possibly they just don't use external dependencies, and the small number they do use really are "properly" audited?

But meanwhile in the regular universe, yes it happens the way you say.


Google uses a fair number of external dependencies. But Google imposes a fairly heavy cost to add a new dependency. You (and usually your team) has to commit to supporting updating the dependency in the future (only one version of a dependency is allowed at any given repo snapshot), and fixing bugs. Often it is easier just to write code yourself for trivial dependencies (nobody is using left-pad!).

Adding a dependency also generates a change list (because dependencies are vendored), and so the normal code review guidelines apply. Both the person adding the dependency and the reviewer should read through the code to make sure that the code is in a good state to be submitted, like any other code (excluding style violations). Small bugs can be fixed with follow up CLs. If the author/reviewer doesn’t understand e.g. the security implications of adding the dependency, they should not submit the CL.


I've talked to many Googlers over the years, and your summary is consistent with what I've heard before, so I don't think you're lying. But this is still the most insane dependency managenent scheme I've ever heard of. Is Google truly so far up their own ass that they make it harder to pull in a third party library than write the code in-house? Why is Google so allergic to using a package manager like every other software project in open source?

You depend on any modern JS library like Babel or Webpack and it pulls in a dependency tree consisting of hundreds of packages. I cannot fathom that the expected and approved workflow is for someone to check in their node_modules directory and be expected to security-audit every single line, and "own" that source code for the entirety of Google. Sounds absolutely insane.

Not to mention needing to hand-audit that every transitive dependency of Babel and Webpack works with every other module in the repository, because of the one-version policy that exists for some "good" reason.


> But this is still the most insane dependency managenent scheme I've ever heard of. Is Google truly so far up their own ass that they make it harder to pull in a third party library than write the code in-house? Why is Google so allergic to using a package manager like every other software project in open source?

In the context of working in a highly sensitive business environment, I think the typical defaults of most package managers are way more insane than the practices being described (vendoring, auditing etc.) I think google is just being upfront about the costs of dependencies, which are often hidden by package managers. At the end of the day it's just code written by other people and using that code blindly has huge risks.

I think this is pretty context specific though. Do I care if my hobby project goes down for a day because a dependency auto-updated and broke something? Not really.


> Is Google truly so far up their own ass that they make it harder to pull in a third party library than write the code in-house?

From the descriptions in this thread, pulling in a third-party library is still far easier than writing the code in-house for them.

At least, it sounds to me like for adding the kind of example you gave, their process for adding the dependency is on the order of person weeks or in the worst case months, while writing the code themselves would be on the order of person years or decades.


I think it is interesting how both possible stories get criticized.

Option 1. Google has minor but uninteresting restrictions on pulling into //third_party: "well these audits are obviously useless because nobody reviews the code that closely."

Option 2. Google has very strong restrictions on pulling into //third_party: "this is so far up its own ass and completely unproductive."


> All of them boil down to speed-running a pointless process.

There's a pretty large gap between auditing every line of code and doing nothing. Google does a good job managing external dependencies within their monorepo. There's dedicated tooling, infrastructure, and processes for this.


Starting over a decade ago, I instituted auditing packages used from a Cargo-like network package manager, in an important system that handled sensitive data.

I set up the environment to disable the normal package repo access. Every third-party package we wanted to use had to be imported into a mirror in our code repo and audited. (THe mirror also preserved multiple versions at once, like the package manager did.) New versions were also audited.

One effect of this was that I immediately incurred a cost when adding a new dependency on some random third party, which hinted at the risk. For example, if a small package had pulled in a dozen other dependencies, which I also would've had to maintain and audit, I would've gone "oh, heck, no!" and considered some other way.

At a later company, in which people had been writing code pulling on the order of a hundred packages from PyPI (and not tracking dep versions), yet it would have to run in production with very-very sensitive customer data... that was interesting. Fortunately, by then, software supply chain attacks were a thing, so at least I had something to point to, that my concern wasn't purely theoretical, but a real active threat.

Now that I have to use Python, JavaScript, and Rust, the cavalier attitudes towards pulling in whatever package some Stack Overflow answer used (and whatever indirect dependencies that package adds) are a source of concern and disappointment. Such are current incentives in many companies. But it's nice to know that some successful companies, like Google, take security and reliability very seriously.


Yes, some people review literally every line. Cargo-crev has a field for thoroughness. Many reviews are just "LGTM", but some reviewers really take time to check for bugs and have flagged dodgy code.


> When his VR Goggles fertilize his beta testers eyes with cat sperm due to a buffer overflow, which of us get fired?

The PM gets promoted for encouraging fast experimentation!


At least sometimes: https://cloud.google.com/assured-open-source-software

Only 1000 packages but certainly seems they do that for a subset.


> When his VR Goggles fertilize his beta testers eyes with cat sperm due to a buffer overflow

Ahh, classic undefined behavior.


Well just today I found unsoundness in a crate I was auditing. It turned out that the crate had since removed the entire module of functionality in question so I couldn't submit a bug, but it led me to take steps to remove use of the crate entirely.


Don't forget that you need to do this not only for the crate you depend on, but the whole dependency subtree that comes with it as well.


Can someone explain why cargo-vet doesn't include a cryptographic hash of the crate contents?

My understanding is that this repository, and similar ones from Mozilla and others, says: "I, person X from trustworthy organization Y, have reviewed version 1.0 of crate foo and deemed it legit" (for a definition of trustworthy and legit).

But now how does that help me if I want to be careful about what I depend on and supply-chain attacks? I ask for version 1.0 of crate foo but might get some malicious payload without knowing it.


That's already prevented by the checksum which is present for all crate versions in the registry index, which is set in stone on publish and verified by cargo on download. See e.g. https://github.com/rust-lang/crates.io-index/blob/74f1b1e064...


Hmm, but then you have to trust 1) github, 2) anyone with commit access to that repository.

It's not the worst thing I suppose: #1 is a problem anyway for trusting Google/Mozilla's repo of audits, and #2 can be noticed by others so hard to pull of some supply chain attack that way.

But I would still feel more confident if the audit log contained a copy of the checksum, and ideally itself was signed with author's keys.


https://lib.rs/cargo-crev does this, with the entire chain from the crate data to the reviewer's trusted identity. However, this adds a lot of complexity.

cargo-vet went for the other extreme of being super simple. To fill in their review report you don't even need any tooling.


Curious if any senior devs on HN can comment on the importance/effectiveness of audits for crates?

I’m a junior C++ dev that dabbles with rust in my free time, and I always feel a bit nervous when pulling huge dependency trees with tons of crates into projects.

I would assume most places would turn away from the “node.js” way of doing these things and would just write internal versions of things they need.

Again I am junior, so maybe my worries are way over blown.


I think in a lot of C++ and ex-C++ orgs you see this sentiment a lot, and sometimes for good reason. Sometimes that code has security or performance reasons to worry about this. On the other hand, it often doesn't.

On the other hand, Python folks and JavaScript users (which make up a lot of emigres to Rust) probably don't care enough about their supply chain. That's how you end up with misspelled packages causing viruses in production and other disasters.

The short answer to this is that it actually depends a lot on what you are doing.


> That's how you end up with misspelled packages causing viruses in production and other disasters.

For all the stories about malicious packages on PyPI and whatnot: I can't recall ever seeing a story about "misspelled packages caused us problems in production". Most of these packages have downloads in the low-hundreds at best, and I wouldn't be surprised if the vast majority are from the attackers testing it and bots automatically downloading packages for archiving, analysis, etc. I've come to think it's not as much of a big deal as it's sometimes made out to be.

The closest I've seen is the whole event-stream business where the maintainer transferred it to someone else who promptly inserted some crypto-wallet stealing code, but that's a markedly different scenario (and that also seems quite rare; it was over 4 years ago).


> For all the stories about malicious packages on PyPI and whatnot: I can't recall ever seeing a story about "misspelled packages caused us problems in production".

https://medium.com/@alex.birsan/dependency-confusion-4a5d60f...

Discussed at the time: https://news.ycombinator.com/item?id=26087064


That's a different thing; it would (ab)use some package tools' preference of public packages over private ones (at least in some configurations). It's not really a "supply chain issue" but more of a "footgun in some package tools"-issue.


Well they've been subpoenad so probably something happened.


This surprises me that most people that use rust come from python and JavaScript. I would think the reason rust is so popular is from people moving from C and C++ and getting all the nice modern features to do systems with.

Python and JavaScript people I would imagine find rust annoying since it’s all the niceties they are use to but with a bunch of rules on top.


I see people coming to Rust from all angles. It’s a nice sweet spot. I came to it from Haskell and on my team of three the other two devs came to Rust from C++. I can opine the motivation looks something like this:

* From Haskell - looking for a strong type system (with sum types, typeclasses) but is “widely accepted”. No GC.

* From C++ - looking for low-level capabilities (pointers, references) with improved safety. Improved manual memory management.

* From Python/JS - looking for performance with a familiar feeling ecosystem and a welcoming community

I think the Python and JS folks will have the hardest go of it, but they also have the most to gain.


I went Python → Rust. Rust is high-level enough for me to remain competitively productive with Python, and the type-system just helps so much. I can't tell you how nice it has been to not have to see "Object of type NoneType has no attribute 'foo'" anymore. Also,

  def foo(obj):
    # ...
And me wondering "what is 'obj'?" and then pulling that root … and pulling … and pulling …. Also "data", another just fabulous name for a variable, really narrows the possibilities. Even once you know the "type" of the variable, oftentimes I'd find that the type definition would subtle shift and morph in different parts of the program: they'd all want a Duck, but have varying opinions on what a Duck actually is. You cannot commit such BS with an actual type-checker.

There is more up-front work with the compiler, but it pays off: the code that passes the compiler is of much better quality.

Also, Option<T> is very nice to have when you need an actual Option<T> (i.e., generically), which Python lacks. (No, `None` is not it: You run into problems when T == Option<U>, and you have None and Some(None) — Python's cannot differentiate between the two.) Also sum types in general.


Right but you essentially just mentioned strong typing. There are a billion languages that don’t run into those issues.

Java, Go, C, Scala, Haskell, etc all fix those type issues.


Sure, I suppose, but sum types (which I mention) eliminate all but Scala and Haskell from that list.


Python and Javascript programmers so massively outnumber all other programmers that they are the majority of people converting to any language.


The "node.js" way of doing things, and it's dysfunction, is nearly exclusive to node because Javascript lacks a standard library and npm's haphazard way of running things. Java, Ruby, Python, even my grandfather's Perl have had "modules" for years with none of the fear that is typically associated with Node.

Personally, C++ aversion to sane dependency management is more about C++'s "I know better than you" culture and legacy cruft (packages are usually managed by the distro, not the language) than actually having any serious security implications.


This is slowly changing wiht conan and vcpkg increasing adoption.

Still most environments I worked on, always had internal repos for packages, no CI/CD server talks to the outside world and vendoring isn't allowed.


in a way rust's standard library is close to node's than python's. You can't really do much without getting some crates in.


> I would assume most places would turn away from the “node.js” way of doing these things and would just write internal versions of things they need.

Incorrect assumption, look up the left pad fiasco [1]. Its importance is really a personal opinion; convince nearly always trumps security so if the NPM way allows you to increase sales by ~10% you'll see people continuing to do it.

Google is fairly principled though, all of the 3p code is internally vendored and supposed to be audited by the people pulling in that code/update.

[1]: https://www.google.com/search?q=leftpad+broke+the+internet


Writing your own version of everything means it's probably more tuned to your needs. But unless it's a core part of your software it will also be worse because you can't justify putting many resources into it. It also means new hires will have to learn a lot more. It's one of the (many) reasons why it's so hard to onboard into C/C++ projects, because every standard building block is bespoke and somehow different than what everyone else does. Of course if you are really big you just have those resources, which is why Meta or Google can have bespoke everything.

On security it's a tradeoff. The open-source version is an easier target for attackers, but might be much more battle-tested and thus more bug-free. Audits are the attempt to have the best of both worlds here, and since they again can be crowd-sourced (with cargo-vet and cargo-cev both working on this) it scales even for companies that aren't Google-sized.


I've reviewed hundreds of Rust crates. It's tedious and boring. The results are boring too — their code is mostly good! Big dependency trees have a reputation for being hot garbage, but that's not my experience. In Rust the small focused crates tend to do one thing, and do it well.


> I would assume most places would turn away from the “node.js” way of doing these things and would just write internal versions of things they need.

I assume most places don't care.


Dependencies are dependencies in rust as in C++. I found it's extremely rare that homegrown library that have similar functionality to (used) open-source libraries are better from a security stand-point.

At least in Rust a large part of the security issues that would be VERY time consuming to audit at scale through your dependency tree (whether internal or public) are covered by the compiler/borrow checker/type-system.

In that sense I would take on an larger amount of dependency in Rust than I would in C++ while sleeping better.


See also Google Cloud’s Assured Open Source Software service:

https://cloud.google.com/assured-open-source-software


Interesting stuff! I think everyone seems to come up with their own solutions. I think security in general is a matter of who you trust, and things only work when we build a network of trust.

Imagine if all companies and rust developers started sharing what crates they were confident in + what other organizations they trust as well. If you could then create your own set of such companies, and then choose a dependency depth you were willing to go down to, you might be able to quickly vet a number of crates this way, or at least see the weird crates that demand a bit more attention.

If this could be added to whackadep[1] then you'd be able to monitor your Rust repo pretty solidly!

[1]: https://www.cryptologie.net/article/550/supply-chain-attacks...


What's the state of supply chain auditing in general these days?


SE Daily podcast episode "CAP Theorem 23 Years Later with Eric Brewer" touches on it a bit. On the whole an excellent episode.

https://softwareengineeringdaily.com/2023/05/12/cap-theorem-...


I wonder why they went with cargo vet instead of the more cross-language crev:

https://github.com/crev-dev/


Hmmm, is it really a good idea to de-duplicate audits? This is a situation where I want multiple parties to each separately do their own audits


Ah yes, security by another person who happens to work at Google saying "lgtm".




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: