Hacker News new | past | comments | ask | show | jobs | submit login

How on earth did any of this make it through a code review and get merged in? It seems absurdly careless, unless I am missing something.



the bad actor was a co-maintainer of the repo (and even more active than the original maintainer for quite some time) with full commit rights. This was strait committed to master, no PR and no review required.

edit: also this was heavily obfuscated in some binary files that were marked as test files ("good" and "bad" xz compressed test file). No way to spot this if you don't know what you're looking for.


Not only were they a co-maintainer, but if you're relying on code review to ensure correctness and security, you've already lost the battle.

Code reviews are more about education and de-siloing.


Assume your co-contibutor was not always malicious. They passed all past vetting efforts. But their motives have changed due to a secret cause - they're being extorted by a criminal holding seriously damaging material over them and their family.

What other controls would you use to prevent them contributing malicious commits, besides closely reading your co-contributor's commits, and disallowing noisy commits that you don't fully comprehend and vouch for?

We assume that it'd be unethical to surveil the contributor well enough to detect the change in alliance. That would violate their privacy.

Is it reasonable to say, "game over, I lose" in that context? To that end, we might argue that an embedded mole will always think of ways to fool our review, so this kind of compromise is fatal.

But let's assume it's not game over. You have an advanced persistent threat, and you've got a chance to defeat them. What, besides reviewing the code, do you do?


Corporative espionage comes to mind....


In open source, code review is absolutely about correctness and security.


In some open source, this is true.

In an awful lot of open source, code review is a vanishingly rare commodity because there aren't enough committers left.


Yeah, no. Code review isn't going to catch all bugs, but it does catch a ton as long as it's done sincerely and well. You'd have an extremely hard time trying to sneak code with a syntax problem like this into Linux, for example. The community values and rewards nitpickery and fine-toothing, and for good reason.


In addition… if your build system has things like this as OK:

> xz -dc $top_srcdir/tests/files/$p | eval $i | LC_ALL=C sed "s/\(.\)/\1\n/g" | LC_ALL=C awk 'BEGIN{FS="\n";RS="\n";ORS="";m=256;for(i=0;i<m;i++){t[sprintf("x%c",i)]=i;c[i]=((i*7)+5)%m;}i=0;j=0;for(l=0;l<8192;l++){i=(i+1)%m;a=c[i];j=(j+a)%m;c[i]=c[j];c[j]=a;}}{v=t["x" (NF<1?RS:$1)];i=(i+1)%m;a=c[i];j=

You should probably expect the potential for abuse?

We’re moving towards complexity that is outpacing human ability for any one person to understand, explain, and thus check an entire object.

And for what? Build efficiency? Making a “trick” thing? When was the project ever going to go back and make things simpler? (Never)


I’m not sure why you’d say that we’re “moving towards” this sort of build system complexity.

This is 1990s autoconf bs that has not yet been excised from the Linux ecosystem. Every modern build system, even the really obtuse ones, are less insane than autoconf.

And the original purpose of this was not for efficiency, but to support a huge variety of target OSes/distros/architectures, most of which are no longer used in any real capacity.


This is not part of autotools output. This is part of the backdoor. Not arguing about autotools drawbacks though.


I think the point is: In code reviews, if you see a blob like that you would ask for more information. Me as lead developer, I go every monday through all commits on master, and PRs pushed in the last days, because I unfortunately cannot review every single PR, but I delegate it to the team.. nevertheless, Monday, I review the last week commits.. Quite funny that it didn't raise any attention. One can say: "right, its open source, people do it in their free time", ok, fine, but not the people working for SUSE, which for instance allowed this code reach their packages, even though they have multiple review steps there..


> has not yet been excised from the Linux ecosystem

That is my point. I should have written allows and not has.


To be clear: the build system did not use the code fragment you quoted. This complex awk code is a later stage of the backdoor.


I see, my point was more than this shouldn’t be allowed. I think part of the problem with a lot of things is we’re allowing complexity for the sake of complexity.

No one has simplicity-required checks. My previous post should say “allows things like this”.


Unless I’m misunderstanding, all this code was embedded and hidden inside the obfuscated test files.

None of this would have been visible in commits or diffs at all.


Your point is likely entirely valid, but the example you used is the wrong one.


Who “allowed things like this”? - this was obfuscated behind a binary posing as an actually corrupt “test” file.


But what are you suggesting exactly? The code fragment you quoted was awk code. Awk is a generic programming language. Any programming language can be written to be complex and unreadable.


> Any programming language can be written to be complex and unreadable. The question is you as lead developer, reviewing a commit with a complex and unreadable code snippet, what would you do?


You would reject it of course, which is exactly why this code never appeared in a commit. The stage 0 of the exploit was not checked in, but directly added to the autogenerated build script in the release tarball, where, even if someone did review the script, it looks plausibly like other autogenerated build gunk. The complex and unreadable scripts in the further stages were hidden inside binary test files, so no one reviewing the commit that added them (https://git.tukaani.org/?p=xz.git;a=commit;h=cf44e4b) would directly see that code.


But this awk code was not committed in the clear so it was not possible to review. It was hidden in a binary file, compressed and encrypted.


> No way to spot this if you don't know what you're looking for.

I would expect most people to at least ask for more clarification on random changes to `head` offsets, honestly - or any other diff there.

If they had access to just merge whatever with no oversight, I guess the blame is more on people using this in other projects without vetting their basic security of projects they fully, implicitly trust, though. As bad as pulling in "left-pad" in your password hashing lib at that point.

The "random binaries in the repo" part is also egregious, but more understandable. Still not something that should have gotten past another pair of eyes, IMHO.


> without vetting their basic security of projects they fully

this sort of vetting you're talking about is gonna turn up nothing. Most vetting is at the source code level anyway, not in the tests, nor the build files tbh. It's almost like a checkbox "cover your ass" type work that a hired consultant would do.

Unless you're someone in gov't/military, in which case yes, you'd vet the code deeply. But that costs an arm and a leg. Would a datacenter/hosting company running ssh servers do that?


I meant more in the sense that if you're creating an open source project, especially one with serious security implications, you should be extremely aware that you have a dependency that a single individual can update with minimal oversight. Somewhat idealistic take, maybe, but not something you should just be able to ignore either.


This is the problem of projects that allow direct access and lack code review.


The commit messages for the test files claim they used an RNG to generate them. The guy making the release tarball then put the final line in the right place without checking it in.


What is the reason distros are still building from release tarballs rather than a git checkout that can be verified against a public git repo?


There could potentially be many things you would not want to commit to git. Binary files and generated files come to mind. There could also be transformations of code for performance or portability reasons. Or ones that require huge third party dependencies that are only used to the build script.

There are many potential reasons to publish a release tarball where some of these steps are already done. It could be done in a reproducible way. Look at sqlite for an example of an extremely well maintained open source library that publishes not one but two source code tarballs for various stages of the build.

These calls to change source code distribution just because it was a small part of the attack vector in this particular case seems misguided to me. It may still be a good idea but only as part of a much larger effort for reproducible builds. In itself it would accomplish nothing, apart from a wake of uncertainty that would only make future attacks easier. Especially in this case, where the maintainer could have acted in a number of other ways, and indeed did. The entirety of the backdoor was added in a regular git commit a long time ago.


I think a lot of it is probably historical. When debian or red hat infrastructure came up there was no git; projects were still often in source control during development but tarballs were still the major distribution mechanism to normal people. Though before git they'd sometimes have packages that would be based on an SVN or cvs snapshot back in the day, in absence of releases.

I believe what happens in debian is that they host their own mirror of source tarballs, since going to some random website or git repo means it could be taken down from under them. So I guess if the package is built straight from a repo they'd probably make a tarball of it anyway.


code repository are not necessarily git based. Plus you would need to put the effort in monitoring the activity of the repository for changes.

Until last month, would you refuse a tar package from the official maintainer, I wouldn't, especially when there was a mention of a bugfix that might have been tripping our build system

For example nginx is using mercurial (with admittedly a github mirror for convenience), and a lot of OSS are still using subversion and CVS, and my guess is that there are some project which might run with less free source control software ( most likely for historical purpose, or use case that might be the strong point of that software).

Other than that, why wouldn't the user be the one to build their own software package.


There are no code review on packets with 1 active maintainer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: