the bad actor was a co-maintainer of the repo (and even more active than the original maintainer for quite some time) with full commit rights. This was strait committed to master, no PR and no review required.
edit: also this was heavily obfuscated in some binary files that were marked as test files ("good" and "bad" xz compressed test file). No way to spot this if you don't know what you're looking for.
Assume your co-contibutor was not always malicious. They passed all past vetting efforts. But their motives have changed due to a secret cause - they're being extorted by a criminal holding seriously damaging material over them and their family.
What other controls would you use to prevent them contributing malicious commits, besides closely reading your co-contributor's commits, and disallowing noisy commits that you don't fully comprehend and vouch for?
We assume that it'd be unethical to surveil the contributor well enough to detect the change in alliance. That would violate their privacy.
Is it reasonable to say, "game over, I lose" in that context? To that end, we might argue that an embedded mole will always think of ways to fool our review, so this kind of compromise is fatal.
But let's assume it's not game over. You have an advanced persistent threat, and you've got a chance to defeat them. What, besides reviewing the code, do you do?
Yeah, no. Code review isn't going to catch all bugs, but it does catch a ton as long as it's done sincerely and well. You'd have an extremely hard time trying to sneak code with a syntax problem like this into Linux, for example. The community values and rewards nitpickery and fine-toothing, and for good reason.
I’m not sure why you’d say that we’re “moving towards” this sort of build system complexity.
This is 1990s autoconf bs that has not yet been excised from the Linux ecosystem. Every modern build system, even the really obtuse ones, are less insane than autoconf.
And the original purpose of this was not for efficiency, but to support a huge variety of target OSes/distros/architectures, most of which are no longer used in any real capacity.
I think the point is: In code reviews, if you see a blob like that you would ask for more information. Me as lead developer, I go every monday through all commits on master, and PRs pushed in the last days, because I unfortunately cannot review every single PR, but I delegate it to the team.. nevertheless, Monday, I review the last week commits.. Quite funny that it didn't raise any attention. One can say: "right, its open source, people do it in their free time", ok, fine, but not the people working for SUSE, which for instance allowed this code reach their packages, even though they have multiple review steps there..
I see, my point was more than this shouldn’t be allowed. I think part of the problem with a lot of things is we’re allowing complexity for the sake of complexity.
No one has simplicity-required checks. My previous post should say “allows things like this”.
But what are you suggesting exactly? The code fragment you quoted was awk code. Awk is a generic programming language. Any programming language can be written to be complex and unreadable.
> Any programming language can be written to be complex and unreadable.
The question is you as lead developer, reviewing a commit with a complex and unreadable code snippet, what would you do?
You would reject it of course, which is exactly why this code never appeared in a commit. The stage 0 of the exploit was not checked in, but directly added to the autogenerated build script in the release tarball, where, even if someone did review the script, it looks plausibly like other autogenerated build gunk. The complex and unreadable scripts in the further stages were hidden inside binary test files, so no one reviewing the commit that added them (https://git.tukaani.org/?p=xz.git;a=commit;h=cf44e4b) would directly see that code.
> No way to spot this if you don't know what you're looking for.
I would expect most people to at least ask for more clarification on random changes to `head` offsets, honestly - or any other diff there.
If they had access to just merge whatever with no oversight, I guess the blame is more on people using this in other projects without vetting their basic security of projects they fully, implicitly trust, though. As bad as pulling in "left-pad" in your password hashing lib at that point.
The "random binaries in the repo" part is also egregious, but more understandable. Still not something that should have gotten past another pair of eyes, IMHO.
> without vetting their basic security of projects they fully
this sort of vetting you're talking about is gonna turn up nothing. Most vetting is at the source code level anyway, not in the tests, nor the build files tbh. It's almost like a checkbox "cover your ass" type work that a hired consultant would do.
Unless you're someone in gov't/military, in which case yes, you'd vet the code deeply. But that costs an arm and a leg. Would a datacenter/hosting company running ssh servers do that?
I meant more in the sense that if you're creating an open source project, especially one with serious security implications, you should be extremely aware that you have a dependency that a single individual can update with minimal oversight. Somewhat idealistic take, maybe, but not something you should just be able to ignore either.
The commit messages for the test files claim they used an RNG to generate them. The guy making the release tarball then put the final line in the right place without checking it in.
There could potentially be many things you would not want to commit to git. Binary files and generated files come to mind. There could also be transformations of code for performance or portability reasons. Or ones that require huge third party dependencies that are only used to the build script.
There are many potential reasons to publish a release tarball where some of these steps are already done. It could be done in a reproducible way. Look at sqlite for an example of an extremely well maintained open source library that publishes not one but two source code tarballs for various stages of the build.
These calls to change source code distribution just because it was a small part of the attack vector in this particular case seems misguided to me. It may still be a good idea but only as part of a much larger effort for reproducible builds. In itself it would accomplish nothing, apart from a wake of uncertainty that would only make future attacks easier. Especially in this case, where the maintainer could have acted in a number of other ways, and indeed did. The entirety of the backdoor was added in a regular git commit a long time ago.
I think a lot of it is probably historical. When debian or red hat infrastructure came up there was no git; projects were still often in source control during development but tarballs were still the major distribution mechanism to normal people. Though before git they'd sometimes have packages that would be based on an SVN or cvs snapshot back in the day, in absence of releases.
I believe what happens in debian is that they host their own mirror of source tarballs, since going to some random website or git repo means it could be taken down from under them. So I guess if the package is built straight from a repo they'd probably make a tarball of it anyway.
code repository are not necessarily git based. Plus you would need to put the effort in monitoring the activity of the repository for changes.
Until last month, would you refuse a tar package from the official maintainer, I wouldn't, especially when there was a mention of a bugfix that might have been tripping our build system
For example nginx is using mercurial (with admittedly a github mirror for convenience), and a lot of OSS are still using subversion and CVS, and my guess is that there are some project which might run with less free source control software ( most likely for historical purpose, or use case that might be the strong point of that software).
Other than that, why wouldn't the user be the one to build their own software package.