How on earth did any of this make it through a code review and get merged in? It...

plg94 · 2024-03-31T02:36:04 1711852564

the bad actor was a co-maintainer of the repo (and even more active than the original maintainer for quite some time) with full commit rights. This was strait committed to master, no PR and no review required.

edit: also this was heavily obfuscated in some binary files that were marked as test files ("good" and "bad" xz compressed test file). No way to spot this if you don't know what you're looking for.

jghn · 2024-03-31T07:04:39 1711868679

Not only were they a co-maintainer, but if you're relying on code review to ensure correctness and security, you've already lost the battle.

Code reviews are more about education and de-siloing.

barfbagginus · 2024-03-31T16:56:58 1711904218

Assume your co-contibutor was not always malicious. They passed all past vetting efforts. But their motives have changed due to a secret cause - they're being extorted by a criminal holding seriously damaging material over them and their family.

What other controls would you use to prevent them contributing malicious commits, besides closely reading your co-contributor's commits, and disallowing noisy commits that you don't fully comprehend and vouch for?

We assume that it'd be unethical to surveil the contributor well enough to detect the change in alliance. That would violate their privacy.

Is it reasonable to say, "game over, I lose" in that context? To that end, we might argue that an embedded mole will always think of ways to fool our review, so this kind of compromise is fatal.

But let's assume it's not game over. You have an advanced persistent threat, and you've got a chance to defeat them. What, besides reviewing the code, do you do?

malkia · 2024-03-31T20:02:57 1711915377

Corporative espionage comes to mind....

edflsafoiewq · 2024-03-31T17:00:03 1711904403

In open source, code review is absolutely about correctness and security.

ted_dunning · 2024-04-01T06:44:28 1711953868

In some open source, this is true.

In an awful lot of open source, code review is a vanishingly rare commodity because there aren't enough committers left.

ajross · 2024-03-31T13:57:57 1711893477

Yeah, no. Code review isn't going to catch all bugs, but it does catch a ton as long as it's done sincerely and well. You'd have an extremely hard time trying to sneak code with a syntax problem like this into Linux, for example. The community values and rewards nitpickery and fine-toothing, and for good reason.

SV_BubbleTime · 2024-03-31T04:15:28 1711858528

In addition… if your build system has things like this as OK:

> xz -dc $top_srcdir/tests/files/$p | eval $i | LC_ALL=C sed "s/$.$/\1\n/g" | LC_ALL=C awk 'BEGIN{FS="\n";RS="\n";ORS="";m=256;for(i=0;i<m;i++){t[sprintf("x%c",i)]=i;c[i]=((i*7)+5)%m;}i=0;j=0;for(l=0;l<8192;l++){i=(i+1)%m;a=c[i];j=(j+a)%m;c[i]=c[j];c[j]=a;}}{v=t["x" (NF<1?RS:$1)];i=(i+1)%m;a=c[i];j=

You should probably expect the potential for abuse?

We’re moving towards complexity that is outpacing human ability for any one person to understand, explain, and thus check an entire object.

And for what? Build efficiency? Making a “trick” thing? When was the project ever going to go back and make things simpler? (Never)

necubi · 2024-03-31T06:21:03 1711866063

I’m not sure why you’d say that we’re “moving towards” this sort of build system complexity.

This is 1990s autoconf bs that has not yet been excised from the Linux ecosystem. Every modern build system, even the really obtuse ones, are less insane than autoconf.

And the original purpose of this was not for efficiency, but to support a huge variety of target OSes/distros/architectures, most of which are no longer used in any real capacity.

ibotty · 2024-03-31T08:21:34 1711873294

This is not part of autotools output. This is part of the backdoor. Not arguing about autotools drawbacks though.

pelasaco · 2024-03-31T15:41:46 1711899706

I think the point is: In code reviews, if you see a blob like that you would ask for more information. Me as lead developer, I go every monday through all commits on master, and PRs pushed in the last days, because I unfortunately cannot review every single PR, but I delegate it to the team.. nevertheless, Monday, I review the last week commits.. Quite funny that it didn't raise any attention. One can say: "right, its open source, people do it in their free time", ok, fine, but not the people working for SUSE, which for instance allowed this code reach their packages, even though they have multiple review steps there..

SV_BubbleTime · 2024-03-31T14:22:46 1711894966

> has not yet been excised from the Linux ecosystem

That is my point. I should have written allows and not has.

mrb · 2024-03-31T07:28:08 1711870088

To be clear: the build system did not use the code fragment you quoted. This complex awk code is a later stage of the backdoor.

SV_BubbleTime · 2024-03-31T14:04:48 1711893888

I see, my point was more than this shouldn’t be allowed. I think part of the problem with a lot of things is we’re allowing complexity for the sake of complexity.

No one has simplicity-required checks. My previous post should say “allows things like this”.

zaroth · 2024-03-31T14:44:45 1711896285

Unless I’m misunderstanding, all this code was embedded and hidden inside the obfuscated test files.

None of this would have been visible in commits or diffs at all.

swells34 · 2024-03-31T15:12:51 1711897971

Your point is likely entirely valid, but the example you used is the wrong one.

epcoa · 2024-03-31T16:03:49 1711901029

Who “allowed things like this”? - this was obfuscated behind a binary posing as an actually corrupt “test” file.

mrb · 2024-03-31T14:58:53 1711897133

But what are you suggesting exactly? The code fragment you quoted was awk code. Awk is a generic programming language. Any programming language can be written to be complex and unreadable.

pelasaco · 2024-03-31T16:01:15 1711900875

> Any programming language can be written to be complex and unreadable. The question is you as lead developer, reviewing a commit with a complex and unreadable code snippet, what would you do?

edflsafoiewq · 2024-03-31T17:16:35 1711905395

You would reject it of course, which is exactly why this code never appeared in a commit. The stage 0 of the exploit was not checked in, but directly added to the autogenerated build script in the release tarball, where, even if someone did review the script, it looks plausibly like other autogenerated build gunk. The complex and unreadable scripts in the further stages were hidden inside binary test files, so no one reviewing the commit that added them (https://git.tukaani.org/?p=xz.git;a=commit;h=cf44e4b) would directly see that code.

mrb · 2024-04-09T16:45:25 1712681125

But this awk code was not committed in the clear so it was not possible to review. It was hidden in a binary file, compressed and encrypted.

Martinussen · 2024-03-31T11:53:46 1711886026

> No way to spot this if you don't know what you're looking for.

I would expect most people to at least ask for more clarification on random changes to `head` offsets, honestly - or any other diff there.

If they had access to just merge whatever with no oversight, I guess the blame is more on people using this in other projects without vetting their basic security of projects they fully, implicitly trust, though. As bad as pulling in "left-pad" in your password hashing lib at that point.

The "random binaries in the repo" part is also egregious, but more understandable. Still not something that should have gotten past another pair of eyes, IMHO.

chii · 2024-03-31T14:56:28 1711896988

> without vetting their basic security of projects they fully

this sort of vetting you're talking about is gonna turn up nothing. Most vetting is at the source code level anyway, not in the tests, nor the build files tbh. It's almost like a checkbox "cover your ass" type work that a hired consultant would do.

Unless you're someone in gov't/military, in which case yes, you'd vet the code deeply. But that costs an arm and a leg. Would a datacenter/hosting company running ssh servers do that?

Martinussen · 2024-04-02T09:28:46 1712050126

I meant more in the sense that if you're creating an open source project, especially one with serious security implications, you should be extremely aware that you have a dependency that a single individual can update with minimal oversight. Somewhat idealistic take, maybe, but not something you should just be able to ignore either.

1letterunixname · 2024-03-31T07:23:04 1711869784

This is the problem of projects that allow direct access and lack code review.

asveikau · 2024-03-31T01:17:34 1711847854

The commit messages for the test files claim they used an RNG to generate them. The guy making the release tarball then put the final line in the right place without checking it in.

sgerenser · 2024-03-31T11:09:01 1711883341

What is the reason distros are still building from release tarballs rather than a git checkout that can be verified against a public git repo?

xorcist · 2024-04-01T09:43:26 1711964606

There could potentially be many things you would not want to commit to git. Binary files and generated files come to mind. There could also be transformations of code for performance or portability reasons. Or ones that require huge third party dependencies that are only used to the build script.

There are many potential reasons to publish a release tarball where some of these steps are already done. It could be done in a reproducible way. Look at sqlite for an example of an extremely well maintained open source library that publishes not one but two source code tarballs for various stages of the build.

These calls to change source code distribution just because it was a small part of the attack vector in this particular case seems misguided to me. It may still be a good idea but only as part of a much larger effort for reproducible builds. In itself it would accomplish nothing, apart from a wake of uncertainty that would only make future attacks easier. Especially in this case, where the maintainer could have acted in a number of other ways, and indeed did. The entirety of the backdoor was added in a regular git commit a long time ago.

asveikau · 2024-03-31T16:45:36 1711903536

I think a lot of it is probably historical. When debian or red hat infrastructure came up there was no git; projects were still often in source control during development but tarballs were still the major distribution mechanism to normal people. Though before git they'd sometimes have packages that would be based on an SVN or cvs snapshot back in the day, in absence of releases.

I believe what happens in debian is that they host their own mirror of source tarballs, since going to some random website or git repo means it could be taken down from under them. So I guess if the package is built straight from a repo they'd probably make a tarball of it anyway.

dvhh · 2024-03-31T16:42:21 1711903341

code repository are not necessarily git based. Plus you would need to put the effort in monitoring the activity of the repository for changes.

Until last month, would you refuse a tar package from the official maintainer, I wouldn't, especially when there was a mention of a bugfix that might have been tripping our build system

For example nginx is using mercurial (with admittedly a github mirror for convenience), and a lot of OSS are still using subversion and CVS, and my guess is that there are some project which might run with less free source control software ( most likely for historical purpose, or use case that might be the strong point of that software).

Other than that, why wouldn't the user be the one to build their own software package.

ihsoy · 2024-03-31T14:52:17 1711896737

There are no code review on packets with 1 active maintainer.