XZ backdoor story – Initial analysis

xg15 · 2024-04-13T18:01:40 1713031300

> Unlike other supply chain attacks we have seen in Node.js, PyPI, FDroid, and the Linux Kernel that mostly consisted of atomic malicious patches, fake packages and typosquatted package names, this incident was a multi-stage operation that almost succeeded in compromising SSH servers on a global scale.

I think this sentence actually understates what happened.

What I find more frightening than the technical aspects of the backdoor is the amount and degree of "social engineering" involved: The backdoor was the final product, but inserting it was only possible because at that point the entire xz project had long been taken over by malicious actors ("Jia Tan" and friends) - after a yearlong psychological operation against the maintainer. All that without the maintainer or anyone else noticing anything.

That's spy novel stuff - and if something like this is possible, it makes me wonder what else might be going on in other projects right now.

(You can see the same mentality reflected in the backdoor code itself - I find it remarkable how much of the code is devoted to active misdirection. It's not just trying to look harmless, the code actively tries to build a "narrative" through commit messages, comments, variable names, choice of commands, etc, of what it's ostensibly doing while really doing something completely different - leading anyone who'd inspect the code first to doubt themselves and their understanding of the code, then suspect a bug and then, maybe, suspect malicious intent)

abkolan · 2024-04-14T04:39:50 1713069590

Yes, this absolutely mind boggling. Not insinuating or suggesting anything, but if I could wear my tinfoil hat for a bit. I wonder if there was a significant psy-op conducted IRL so that the original author couldn't devote time and would eventually handover ownership to bad actors.

I hope some three letter agencies are conducting a deeper investigation into this.

jjav · 2024-04-15T07:22:24 1713165744

> it makes me wonder what else might be going on in other projects right now

I hope this can be a lesson learned. A lot is going on. There are lots and lots of agencies (both government-funded in all countries as well as black market ones) with a mission to own backdoors. They're all well-funded. That's what they do.

It is frustrating to watch every HN thread related to backdoors filled with rejections of this possibility as paranoia or silly tinfoil hatting. As if it wasn't happening. Here's one concrete case that was caught. Countless others, not yet caught.

And think about this one: this was on an open source project, which makes it easier to catch. Still we got lucky. Now consider all the closed source products, where backdooring just means infiltrating and/or arm-twisting a single organization. This happens often. Nobody wants to believe it but it's common. Anyone who has worked on tech infrastructure companies has some stories. They are difficult to tell due to NDAs and worse, but it happens.

npteljes · 2024-04-13T23:18:54 1713050334

>leading anyone who'd inspect the code first to doubt themselves and their understanding of the code

I completely agree. What seems incredible to me is that how much manipulation, thoughtfulness, patience, perseverance went into this. It's either a product of the obsession of someone, or a private security firm's or state actor's doing, who do these things to multiple projects as part of a 9-5 job.

schuyler2d · 2024-04-13T14:37:36 1713019056

Quite rightly most attention has been on the operational success of the backdoor and how it achieves it's goals.

But I would love some deeper analysis of the mistakes and over-engineering parts. In the Bryan Cantril interview [1] Andrés suggests there are a lot of dumb things done because it was a kind of off-the-shelf backdooring piece that didn't necessarily know how it would be deployed. Eg the symbol table lookup that led him to investigate.

Similarly, why were they chopping off 48 bytes using RC4 [2]

So I'd love folks to talk about how (given more time/better team) this could have been better or where they messed up more

[1] https://youtu.be/jg5F9UupL6I?si=gvXsYFXgagkGOMd4 [2] https://twitter.com/matthew_d_green/status/17744729080201014...

vlovich123 · 2024-04-13T01:41:33 1712972493

So if I'm understanding correctly, a useful hardening step would be to ensure that each dynamically linked library had it's own GOT & that the tables are marked as read-only once dynamic linking completes (i.e. you can't reach across dynamic boundaries to patch someone else's ifunc entries).

That would improve the supply chain security of code getting linked in somewhere but never executed.

EDIT: Or even better, perhaps ifunc should be implemented in a declarative fashion so that you can't just cause arbitrary code execution from each library you link against. That might be harder to implement at this point considering back compat, but probably is something that could be layered in over a longer time period (i.e. if you build any library with "declaratively linked ifunc" feature bit set, then the dynamic linker will force all linked libraries to have that feature flag or fail to launch).

jerjerjer · 2024-04-13T02:50:47 1712976647

From another angle: build systems.

Currently building most libraries usually involves executing a series of highly complex byzantine scripts, requiring turing-complete environment. This gives attacker an endless attack surface, and when the build process is hijacked - well, the opportunities are there.

Moving to a declarative build process with only a limited state machine as an executor would help. Requiring all source blobs being reproducible might also be something to think about.

duped · 2024-04-13T14:13:24 1713017604

This is how we got autoconf in the first place - an under specified and under powered build configuration language (make) which led to people that did need arbitrary logic to generate the build files at configuration time.

Don't limit the build system language. It makes builds more complex, not less.

kimixa · 2024-04-13T04:55:44 1712984144

Having a "simple" declarative build definition won't help if the thing that interprets and executes that definition is shipped with the package, as said interpreter is likely orders of magnitude more complex and harder to personally review. As is what happened with the xz example - the "attack" was hidden in code that is autogenerated by autotools, not the build or configuration definitions.

People put trust in distros and packagers to having something of a review chain - there's 0% chance you personally be an expert in everything executing on your workstation right now (outside of maybe toy systems). I'm not an expert in m4 or bash, but I hope enough experts are in the chain to get to my distro's package library are that such things are less likely. But that is all bypassed here.

I think this particular approach is a one-off, as I know of no other build environment where it's expected to have the generated executable of the build system "helpfully" packaged in the tarball as a packaging step.

If it is in some I'm not aware of, I hope that decision is being re-examined now.

bonzini · 2024-04-13T13:02:56 1713013376

No, the attack was not hidden in code that is autogenerated by autotools. If that was the case, rerunning autoconf (which most distros do) would have disabled the attack.

Instead it was hidden in plain sight in the source that is processed by autoconf.

> it's expected to have the generated executable of the build system "helpfully" packaged in the tarball as a packaging step.

While this is true, most distros as I mentioned above rerun autoconf to ensure consistency in the tests and possibly to include autoconf bug fixes.

rcxdude · 2024-04-13T13:59:29 1713016769

The jumping off point was modified autotools output. re-running autogen did effectively disable the attack. The payload was stashed in some test files, but the build system needed to jump through quite some hoops to actually get that into the compiled library. Apparently the target distributions didn't re-run autogen for this package.

masfuerte · 2024-04-13T14:21:23 1713018083

This is what the early reporting said but the article has additional info.

The code to include the backdoor in the build was in an m4 script.

The initial reporting said that this code was not present in the github source, but the post-autogen code (including the attack) was included in the github releases.

The article says that this modified script was present in the source on tukaani.org, which was controlled by the attacker and used by the distros as their upstream.

If you downloaded from github and reran autogen you were OK. If you downloaded from tukaani and reran autogen, like the distros did, you lost.

markhahn · 2024-04-13T14:44:51 1713019491

merely keeping build and test environments isolated would have entirely prevented this attack.

better hygeine that way would also simplify both environments, since the tools for each are fairly different.

bonzini · 2024-04-13T16:13:58 1713024838

How are they different? Both use make and the C compiler.

xz is not different in this respect from any other C, C++, Go or Rust program.

kimixa · 2024-04-13T18:15:25 1713032125

the "build-to-host.m4" file seems to originally be from gnulib, and if that is installed on the system is not required by the build. So I see that as "part of the build system" myself.

I mean the github repository with exactly the same .ac and .am files works fine with local automake/autoconf generation without that file existing. And thus no backdoor (the test files are still there, but "harmless" without the initial kicking off point to actually de-obfuscate and include their contents)

bonzini · 2024-04-13T19:25:15 1713036315

Gnulib is not installable, it is meant to be copied (aka vendored) in the sources of the program that use it.

> if that is installed on the system is not required by the build

This specific file defines a macro that is used by autoconf, not by the build. If it is installed on the system it is not required by autoconf, but then gnulib is practically never installed.

Your original message blamed the backdoor on "the generated executable". This m4 file is not a generated file and not an executable. It is simply vendoring like you often see in other languages.

kimixa · 2024-04-13T23:03:57 1713049437

I think it was more "hiding" as vendored code rather than really being in that category. The git repo never contained that "vendoring", as the m4/gettext.m4 file doesn't exist autoreconf just copies one from a system store, (which on my machine never calls the tainted BUILD_TO_HOST macros in the first place, which also doesn't exist in the upstream xz git repo).

"Vendoring" by copying untracked files into the tarball seems discourteous to the definition. It seems to rely on the "possibly odd" behavior of autoreconf to allow files that happen to have the same name to override system-installed versions? I guess on the belief that local definitions can override them is useful? But that certainly bit them in the ass here. As to get a "completely" clean autoconf rebuild it looks like you have to delete matching files manually.

MichaelZuo · 2024-04-13T11:21:37 1713007297

This doesn't work for anything actually complex.

For example if you've ever taken a look at the bluetooth specs you would not trust a single person in the world to implement it correctly and you probably wouldn't even trust an arbitrarily large team to implement it correctly.

Unless they had a long demonstrated and credible track record of shipping perfectly functional products and an effectively unlimited budget, i. e. Apple and maybe 1 or 2 other groups, at most.

jancsika · 2024-04-13T13:22:00 1713014520

> For example if you've ever taken a look at the bluetooth specs you would not trust a single person in the world to implement it correctly and you probably wouldn't even trust an arbitrarily large team to implement it correctly.

I messed around a tiny bit with Bluetooth on Linux recently. Going to rankly speculate that Bluetooth is such a special case of hell such that it makes a distracting example here.

I mean, as a joke suppose we wanted to design a 3.5 mm patch cord that pitch shifts down a 1/4 step for randomly chosen stretches. It turns out to be easy-- just remove the wire from the casing and replace it with cheap bluetooth chips at either end. You'll get that behavior for free! :)

Compare to, say, USB, where your point above would apply just as well. I wouldn't be distracted by that example because even cheapo, decades-old USB drives to this day let me read/write without interleaving zeros in my data.

Shit, now I'm distracted again. Does Bluetooth audio even have the notion of a buffer size that I can set from the sender? And I guess technically the receiver isn't interleaving the signal with zeros-- it's adjusting the rate at which it sends blocks of the received data to the audio subsystem.

Was Bluetooth audio basically designed just for the human voice under the assumption we're constantly pitch shifting?

Oops, I almost forgot-- the sample-rate shift is preceded by a dropout, so I do get interleaved zeros in the audio data! I actually get a smoother ramp in my Prius changing from battery to ICE than I do in my wireless audio system!!!

Anyhow, what were we talking about again?

AtlasBarfed · 2024-04-13T07:48:46 1712994526

It needs money and people.

Government funding for defense of the economy and computing in the "free world".

Certainly the defense department has a billion to spare, and so does the EU

nonrandomstring · 2024-04-13T12:54:28 1713012868

Yes. Other than the vulnerability of developers/maintainers the other big takeaway I get from this incident is that build systems have become dangerously unwieldy. There's just too many moving parts and too many places to hide bad stuff.

tsimionescu · 2024-04-13T13:52:47 1713016367

Build systems have always been like this. It is in fact more recent build systems that are limiting the amount of crazyness you can do compared to the older ones.

saagarjha · 2024-04-13T02:51:38 1712976698

Yes and no, but mostly no. This would prevent simple use of ifuncs in this way, but it's important to understand that the author of this could inject arbitrary code into the library that ends up in the address space of a sensitive process. At that point, all bets are off: it could remap the GOT as writable if it so chose, or (this is mostly here for the EDR people will certainly bring it up after reading this) if trying to do that is flagged as "suspicious" or the OS gains the ability to block such a transition, the injected code can subvert control flow in hundreds of other ways. It has arbitrary read/write, code execution, everything: there is no security mitigation that can stop it. If it so wishes it can leak private keys and send them to the attacker directly. It can spawn a shell. Trying to design protections at this stage of compromise is a fool's errand.

tux3 · 2024-04-13T08:47:58 1712998078

>it's important to understand that the author of this could inject arbitrary code into the library that ends up in the address space of a sensitive process. At that point, all bets are off: it could remap the GOT as writable if it so chose, or (this is mostly here for the EDR people will certainly bring it up after reading this) if trying to do that is flagged as "suspicious" or the OS gains the ability to block such a transition, the injected code can subvert control flow in hundreds of other ways.

I think it's important to push back on a very specific point here. It's true in general that if the attacker has added a backdoor in your library, and you're going to call that library's code, you've pretty much lost. Game over. Go home.

But this was a very different attack, in that the attackers couldn't directly target sshd. They couldn't manage to target any library that sshd calls directly either.

The library with the backdoor is code that was never actually called at runtime. This is important because it means it *doesn't* have a hundred ways to reach code execution. It only had a few select ways, mainly constructors and indirect function resolvers.

The amount of weirdness in glibc's runtime loader is not unbounded. There aren't actually a hundred places where it allows random libs to run code before main. And we should take a good look at those couple place that are clearly juicy high-value gadgets to attacker.

When glibc's runtime loader first loads a binary, first reaches a relocation for a STT_GNU_IFUNC, everything is still in a pristine state and no arbitrary code can run without being explicitly called. Attackers don't have magical powers that allow them to run code before you hand them the control flow. At this point in the runtime loader, an ifunc resolver cannot do anything without being caught. It cannot "just open /proc/self/mem" or "just call mprotect". It cannot "just disassemble the caller" or "just overwrite the GOT".

I really want to hammer home how different that is from letting the attacker run code after main. There's nothing you can do if you directly call a backdoored library. But we shouldn't let attackers get away with spending 400ms parsing and disassembling ELFs in memory in an ifunc resolver of all things. Overwriting the GOT like nobody's watching.

The backdoor isn't magic. For all the beautiful sophistication that went into it, it made many mistakes along the way that could have led to a detection. From valgrind errors to unacceptable amount of noise (400ms!) before main.

saagarjha · 2024-04-13T11:03:40 1713006220

Well, they picked a target that doesn't get called directly, and found a way to sneak code into it without a static constructor. If that didn't work (and I don't fundamentally think it wouldn't–people aren't checking these very closely; the ifunc stuff is just obfuscatory bonus) they would target something that was directly used.

tux3 · 2024-04-13T11:51:28 1713009088

I would be happy with that result. Targeting something that's directly used by sshd means a much smaller attack surface. It's much harder for the attackers.

The danger with supply-chain attacks is that it could come from practically anywhere. Attackers can choose to target an overworked maintainer in a third-party library, and it's much easier for them than going after OpenSSH itself.

About the OpenSSH maintainers, they're known for being the paranoid amongst the paranoid. No one's infallible, but if attackers are forced to go directly after them instead of bullying smaller libraries, I'll have a reason to feel safer about the reduced attack surface :)

vlovich123 · 2024-04-13T20:39:45 1713040785

Think R^X bit style. The kernel would participate in such a way that you can seal the GOT to be read only once the linker is done with it. Then it doesn't matter that a library came in later as there's nothing it can do to modify the GOT as there's no mechanism to make it writable. As you mention, the same protection would need to exist for the executable code itself to avoid overwriting code mapped from disk (I believe that's probably true but I'm not 100% certain).

Defense in depth is about adding more & more hurdles for attackers which raises the costs involved. Even state actors have a budget.

inkyoto · 2024-04-14T02:07:51 1713060471

Making the GOT read only is problematic as the dynamic linking is lazy by default, and R^X'ing the GOT would make the deferred dynamic symbol resolution or subsequent dlopen calls fail.

It would be simpler to statically link sshd (or, more generically, all sensitive system binaries) and launch them in the R^X mode via, say, specifying the required capability either in the binary descriptor or at the system configuration level (SELinux) – to force-launch the process in the R^X mode for its code pages. The problem, of course, is that not every Linux distribution comes with SELinux enabled as a default.

vlovich123 · 2024-04-14T21:04:21 1713128661

As someone else pointed out, sshd is set to be eagerly linked which sidesteps the problem of subsequent dlopen calls. But even subsequent dlopen calls should work in my scheme if every library gets its own protected GOT - a dlopen would create a new GOT for that library, populate it & seal it. It wouldn't have access to modify any other GOT because those would be sealed & subsequent dlopens of other libraries would use new GOTs.

ajross · 2024-04-13T02:57:40 1712977060

> the tables are marked as read-only once dynamic linking completes

Alas this won't work. Dynamic linking is lazy, there's no moment when it's "complete". The correct function pointers get loaded and inserted into the table (in place of stubs) when called the first time, which can be arbitrarily far into the future. In fact in most large library ecosystems (gtk apps, etc...) most of the linked functions are never called at all.

CUViper · 2024-04-13T06:13:01 1712988781

Best practice is to resolve them early now: https://www.redhat.com/en/blog/hardening-elf-binaries-using-...

lyu07282 · 2024-04-13T08:39:01 1712997541

in this case it takes advantage of the fact that sshd is compiled with early binding not lazy. The ifunc resolver function is called early, right after the dynamic libraries are loaded. It's eager binding (that's what LD_BIND_NOW=1 / -W,-z,now are doing) is a security feature, GOT table will be readonly early. Didn't help with security in this case lol

lyu07282 · 2024-04-14T00:14:59 1713053699

just to add a detail:

with lazy binding the function resolver might not even be called at all, the link tree is like sshd.elf -> systemd.so -> lzma.so. If systemd uses a ifunc symbol in lzma but sshd is not using that symbol: if lazy binding the resolver will never run, if eager binding the resolver will run. Something the backdoor also took advantage of

afh1 · 2024-04-13T11:41:13 1713008473

If you build for the host architecture you can completely disable ifunc without losing anything. In Gentoo it is common to build with -march=native, and disabling ifunc is as simple as setting -multiarch in glibc's USE flags. I've seen no negative impact from it.

markhahn · 2024-04-13T14:48:48 1713019728

goodie! more, and less-controlled build environments to compromise, what could possibly go wrong?

candiddevmike · 2024-04-13T13:42:19 1713015739

Are there any programming languages that can sandbox library imports?

jjav · 2024-04-15T07:28:22 1713166102

Yes/no. The Java security manager can, but it was deprecated ;-(

I know it was a pain to use correctly. But still.

pcwalton · 2024-04-13T16:38:34 1713026314

Not really, except for a few researchy object capability based systems. (This is kind of the promise of ocap systems: functions you call can only use what they're given and nothing more.)

If you don't trust a library, you can run it in a separate sandboxed process (or app domain, or wasm container, or whatever your language provides).

BobbyTables2 · 2024-04-13T04:14:50 1712981690

And yet such is all trivially defeated by writing read-only memory using /proc/self/mem.

Even a Rust program with no unsafe code can do this.

Patching readonly libc code from Python without external modules is also trivial, resulting in arbitrary execution!

usr1106 · 2024-04-13T10:02:38 1713002558

As far as the 3 first stages are concerned that article adds little to what has been known for 2 weeks. Just a nice write-up with flow charts.

But the analysis of the binary in that detail seems new to me.

So how has the source code shown there been produced? Running a disassember, understanding what the code does and renaming everything to descriptive names? That looks quite like quite an achievement in 2 weeks.

r721 · 2024-04-13T10:12:14 1713003134

>Authors GReAT

>Global Research & Analysis Team, Kaspersky Lab

https://securelist.com/author/great/

The article is apparently authored by malware analysis team from Kaspersky Lab, so they are probably quite good at reverse engineering binaries.

usr1106 · 2024-04-13T10:22:43 1713003763

Sure. But that does not answer any questions about their tooling or methods. Considered a business secret? They present the readable C code of the binary payload like it has appeared just out of the blue.

saagarjha · 2024-04-13T10:59:05 1713005945

They're using IDA Pro, with names that they presumably came up with themselves by analyzing what the code does.

giantpotato · 2024-04-13T10:48:47 1713005327

The tool used in those white screenshots is called IDA pro, a decompiler.

https://hex-rays.com/ida-pro/

buster · 2024-04-13T07:26:27 1712993187

What I am really curious about is what caused the initial delay in ssh which was what triggered the research into xz in the first place. Has someone figured that out?

pja · 2024-04-13T07:54:20 1712994860

It's doing an extra ECC operation on every connection, but that shouldn't take 500ms on a modern CPU.

The people who have reverse engineered the code say it also requires the command message to be bound to the ssh host key as well, so if the host key is an RSA key then it might be doing an extra RSA decryption operation on every connection as well?

That would probably do it.

honzaik · 2024-04-13T10:12:12 1713003132

Another interesting thing regarding the ECC is they use Ed448, compared to something conventional like ECDSA with P-256 or Ed25519, which is way slower (30x-ish slower verification):

(OpenSSL benchmark numbers)

                              sign/s verify/s

   256 bits ecdsa (nistp256)   34642.6 11700.3

                              sign/s verify/s

   456 bits EdDSA (Ed448)   3209.5 409.5

There is basically no incentive to use Ed448 unless you think ECDSA with 256-bit curves is insecure or will become in the near future.

Aachen · 2024-04-13T11:14:37 1713006877

At >400 operations per second, that doesn't explain most of the 2/second (500ms) operations the discoverer apparently observed. Were they running on an old Pi and you on a currently high-end CPU or so? Basically, what hardware is this benchmark on?

honzaik · 2024-04-13T11:20:19 1713007219

I mean it highly depends on the CPU so I only posted it to show the relative slowdown compared to ECDSA. I ran this on my free tier Google Cloud server so it is not some super CPU.

However yes, even on this, not so powerful CPU, it doesnt take 500ms so I dont think it explains it.

Aachen · 2024-04-13T17:21:44 1713028904

Thanks! That indeed sounds like it rules this out as the reason why it was found.

Curious that asymmetric crypto isn't even the slow part here. Feels like they just messed up somewhere (but I don't have the low-level asm/C skills to check that without considerable time investment)

tux3 · 2024-04-13T08:27:34 1712996854

I believe it's all the ELF parsing and disassembly in memory that happens on startup.

The really went crazy with the home-made x86 disassembler and everything they do with it. Must have been super fun to write though! Someone clearly had a great time coming up with clever ideas for a backdoor.

pja · 2024-04-13T08:35:12 1712997312

That only happens once at startup though?

IIRC the original detection was because sshd was using a lot more CPU coping with the torrent of spam connections any sshd on the internet gets than it usually would.

tux3 · 2024-04-13T08:53:57 1712998437

sshd forks a new clean process for each connection, so this whole machinery happens each time you connect. Each connection is a full fork/exec, and the backdoor has to set itself up from scratch again, including all the parsing and hooking.

rsc · 2024-04-13T14:45:14 1713019514

I have looked in the sshd code at https://github.com/openssh/openssh-portable and I cannot find it forking and re-execing _itself_. It forks and execs other commands, of course, and it forks to handle new connections but does not re-exec in those paths that I can see.

If some inetd-like program were listening on port 22 and fork+exec'ing sshd to handle each incoming connection, that would explain it. But on my systemd-based Linux system I see a long-running sshd that appears to be taking care of port 22.

I do agree that it seems like the best explanation of the delay is that somehow sshd was being exec'ed per connection, but I haven't seen all the dots connected yet.

tux3 · 2024-04-13T15:02:34 1713020554

This is behavior introduced in OpenSSH 3.9 (grep for "re-exec" in the release notes https://www.openssh.com/txt/release-3.9)

You should also be able to see the error message "sshd re-exec requires execution with an absolute path" in the source. If you follow the `rexec_flag` tested above this message, you can see where it calls execv, later in the code.

rsc · 2024-04-15T00:44:21 1713141861

I see it. Thanks very much! That could certainly be clearer...

markhahn · 2024-04-13T14:55:07 1713020107

not ecc - this article mentioned all the effort put into hiding suspicious strings. all those string-obfuscating lookups took lots of extra cycles.

Beijinger · 2024-04-13T14:15:39 1713017739

Somebody suggested: On purpose.

Easy way to find out from outside if server is compromised without trying to inject code first.

lyu07282 · 2024-04-13T08:23:34 1712996614

The authors show a great deal of knowledge of glibc internals, the sort of things you really have to be neck deep in the source code for. Lots of novel techniques in there.

The custom elf parser and disassembler are so complex that I can't imagine they didn't use that code / or will never use that code for something else in the future.

I wonder if this gets the serious investigation it deserves, but I doubt it.

nonane · 2024-04-13T07:42:07 1712994127

Has anyone done an analysis of the bug in the backdoor that caused valgrind errors and slowness in SSH which eventually exposed it?

lyu07282 · 2024-04-13T08:52:36 1712998356

Invalid memory write apparently: https://www.mail-archive.com/valgrind-users@lists.sourceforg... The valgrind "fix" was to disable ifunc which disabled the backdoor so the errors went away.

The slowness derived from all the symbol and instruction lookups it did afaik

itomato · 2024-04-13T09:10:55 1712999455

Oxide and Friends interviewed Andres Freund for their show on Wednesday where he discussed the discovery, including the slowness from all the symbol translations

https://open.spotify.com/episode/0C04JlCpMFe07C1XCswbZ9?si=B...

Aachen · 2024-04-13T11:11:19 1713006679

Not everyone has Spotify, better just link their own site where you can play it directly: https://oxide.computer/podcasts/oxide-and-friends/1843393

itomato · 2024-04-13T12:12:47 1713010367

Handier for sure, but not on my clipboard

johnwheeler · 2024-04-13T08:25:36 1712996736

Here’s a twist - with the amount of effort the attacker went through to evade detection in their scripts and code, this whole project could function as a diversion or at least fallback to one while multiple efforts go on simultaneously.

How do you get one step ahead of that? Does the community focusing on SSHD impact some other part of the system as a whole? Some other technological aspect? Social?

Tin foil hats are fun!

bandrami · 2024-04-13T10:58:40 1713005920

I'm not optimistic. Nobody ever even audits the binary contents of flatpaks on flathub (were they actually built from the source? the author attests so!). You don't really even need a fancy delivery mechanism like this backdoor had.

Intralexical · 2024-04-13T12:04:19 1713009859

> Nobody ever even audits the binary contents of flatpaks on flathub (were they actually built from the source? the author attests so!).

IME/IIRC There aren't (or shouldn't be) any binary contents on Flathub that are submitted by the author, at least for projects with source available? You're supposed to submit a short, plain-text recipe instead, which then gets automatically built from source outside the control of the author.

> The Flathub service then uses the manifest from your repository to continuously build and distribute your application on every commit.

https://docs.flathub.org/docs/for-app-authors/submission/#ho...

Usually the recipes should just list the appropriate Git-based URLs to get the source code, or, for proprietary applications, the official .DEBs. Kinda like AUR, but JSON/YAML. Easy to audit if you want:

https://github.com/orgs/flathub/repositories

Plus, if properly configured, Flatpak is supposed to securely sandbox every application anyway, although other systems it needs to be compatible with (E.G. X11) currently prevent that.

bandrami · 2024-04-13T18:27:52 1713032872

I agree there shouldn't be, and it would be fairly easy to check in the json file that it actually does what it "should" do in the build and nothing else, but have you ever actually checked? The only time I've ever even looked was when I wanted to change a build option for some reason, and that's not often.

And then the frequent use of git submodules makes it even worse: that's often a whole lot of code that even the author of the flatpak doesn't have control over.

Intralexical · 2024-04-13T20:58:51 1713041931

Hm. You're right. I haven't. And I guess the entire problem is that we shouldn't just assume somebody else has.

The JSON manifest is a much smaller attack surface than uploading random binaries would be though. And the standardized build procedure should make it relatively easy to tell if something's out of the ordinary and should be raising eyebrows, or even automate much of it.

Maybe stick an `alias CheckFlatHub=` for a LLM prompt, or just some plain regexes, in `.bashrc`? Looking for fishing URLs and install commands sounds like a relatively simple problem, as far as security challenges go.

smallerfish · 2024-04-13T11:07:21 1713006441

It seems like 75% of people are comfortable installing latest & greatest developer tools with `curl > bash`, so it's no wonder that flatpak and similar have momentum. Really, we desperately need a better sandboxing UX in linux.

dTal · 2024-04-13T12:55:05 1713012905

"Sandboxing" is an understandable kneejerk reaction to the threat of untrustworthy software, but I think it's the wrong one - the classic "technical solution to a political problem". Software must compose. After all, how would sandboxing have helped here? You cannot sandbox sshd!

In fact we already have a very good solution, and it's simply the infrastructure of a typical Linux distro (replicated by package managers on other systems, such as F-Droid). It's quite an impressive feat really - a diaspora of independently trusted organizations, each with their own governance structures and PGP-enforced webs-of-trust, crafting public repositories of known-good software built from publicly published source code using publicly published build scripts, delivered to the user over strong encryption. On the whole this does an incredible job of keeping malicious or subversive software out, and we've all got so comfortable with just being able to "apt-get install <foo>" without worrying about malware that we've forgotten what a dangerous business running software from the internet used to be.

Which is why this xz attack is so unusual and surprising - it's like the police stopping an attempted murder in the nick of time in a sleepy crime-free town. As shocking as it is, we must ignore the cries of "something must be done" and resist the temptation to put up barbed wire fences all over our nice town.

smallerfish · 2024-04-13T13:12:02 1713013922

I very much like distros, but unfortunately they are by nature slow to adopt new versions.

For example, Ubuntu 24.04 will ship without Deno. It will also ship with Cargo 1.75, which is too old to build Deno (there's some weird packaging issue). So, anybody who wants Deno on Ubuntu has to install it using either `curl | bash`, nix (which of course seems promising), or asdf (which I know nothing about). Most devs would choose curl|bash.

Why is Deno not packaged for Ubuntu/Debian? Because it's "hard", and it's hard in different ways for different distros (the debian ticket for this has sat without action for 2-3 years, and I think the Deno ticket a similar amount of time). Cross-distro solutions like flatpak (or nix) are 100% necessary if we are to get rid of curl|bash. But, the sandboxing story with flatpak/snap is currently pretty bad; apps just fail in mysterious ways if they try to access resources they're not allowed access to. Nobody wants Windows Vista, but it has been done reasonably well in various other OSes.

jjav · 2024-04-15T07:32:53 1713166373

> I very much like distros, but unfortunately they are by nature slow to adopt new versions.

That's a feature. I don't want anything bleeding edge. Give me older battle tested to hell and back code. Nothing new.

bandrami · 2024-04-13T19:02:21 1713034941

Sandboxing always struck me as missing the main point, since in nearly every case the only actually valuable data is the data the application is supposed to handle. Everything else I can basically reimage for free at any point.

moooo99 · 2024-04-13T12:10:10 1713010210

I think Flatpak being a open source effort for a distro independent distribution channel is already pretty dope UX. Having this verification would sure be nice, but Verified publishers now being a thing is also an advantage over some other formats being used

christophilus · 2024-04-13T11:32:15 1713007935

That is better to me than installing a random binary. You can trivially audit the curl. Also, the curl > bash thing usually comes from the project maintainer (as in the case of Bun) vs some rando flatpacker.

smallerfish · 2024-04-13T11:53:37 1713009217

Not if the bash part then turns around and installs a random binary, which is often the case. How do you know how thoroughly it's been audited by third parties? Does it have a security patch mechanism built into it?

johnisgood · 2024-04-13T12:31:05 1713011465

In all fairness, Jia was indeed the co-maintainer of the project. He modified SECURITY.md as well[1].

[1] https://github.com/tukaani-project/xz/commit/780d2c236de0e47...

account42 · 2024-04-15T11:31:41 1713180701

No, you cannot audit the curl at all. What a website serves you now is not neccessarily what the website serves you one minute from now.

hmottestad · 2024-04-13T08:48:56 1712998136

You could write all code yourself for absolutely everything with your own dedicated crew of vetted developers.

Or buy closed source from someone like Microsoft, hoping that they have the resources and actually use the resources to scrutinise code more vigorously.

And there is always the approach of having a great secops team to detect strange network activity and attempts to escalate.

mkup · 2024-04-13T10:44:25 1713005065

My idea of getting one step ahead of similar backdoors is to wrap sshd traffic into a spiped tunnel. Spiped is compiled from source and linked statically (last stable version from 2021).

markhahn · 2024-04-13T14:52:08 1713019928

jeez, why not just compile ssh static?

mkup · 2024-04-13T18:11:00 1713031860

I still want sshd updated by my OS package manager. Statically linked spiped, which is out of scope of my OS package manager, is just a second line of defense.

cpach · 2024-04-13T12:57:10 1713013030

spiped is solid. Wireguard is another good option that is included in the Linux kernel.

hackeraccount · 2024-04-13T16:05:15 1713024315

My tin foil hat idea is for people to be suspicious about open source contributors. Not from countries or even one's who have seemingly never been met.

But from contributors who have no history and no footprint other then the project they're working on. That should be a red flag in the future.

mathverse · 2024-04-13T08:44:41 1712997881

We dont need any tinfoil hats. FOSS projects were being backdoored even before they became a thing the difference is that this time it's state sponsored actors doing it.

The old groups GOBBLES,ADM,ac1db1tch3z,~el8 were doing it, private "security researchers" like isec.pl were doing it.

This time it's a problem because state actors are abusing the corporate capitalism that created this era of underpaid people working on foundational projects. The bad actors have unlimited resources for their objectives.

That's basically what created the demand and ineption of groups like NSO,Zerodium etc..

Basically before that exploits and backdoors were worthless and hackers hoped to be sponsored or hired by companies like Qualys.

xhkkffbf · 2024-04-13T15:37:10 1713022630

How did "corporate capitalism" create this situation? Some guy decided to volunteer his time to write some code. If anything, it might be said that corporate capitalism paid him well enough so he had the spare time?

And I don't see what money has to do with any of this. There could be some well-paid programmer slipping backdoors into proprietary code just as easily. The salary or openness didn't affect the outcome. It's just as easy for salaried programmers to betray their bosses and their users.

AndyMcConachie · 2024-04-13T08:34:21 1712997261

[flagged]

glitchcrab · 2024-04-13T09:09:39 1712999379

Conversely, do you have evidence against any of it?

darthrupert · 2024-04-13T11:33:13 1713007993

1. Have multiple LLMs scrutinize every pull request that is created in the public.

2. Stop accepting shit tools like M4 and C for creating any software.

You can never make the work of black hats impossible but you can trivially make it a whole lot harder.

uecker · 2024-04-13T16:16:08 1713024968

I am quite happy with the quality of C. M4 not so much. In terms of supply chain attacks, cargo is the worst I have ever seen.

Intralexical · 2024-04-13T12:29:18 1713011358

> 1. Have multiple LLMs scrutinize every pull request that is created in the public.

---

> You:

> Below is a hexdump of a test file added to our code repository. Is it any cause for concern?

    ```
    0000000 375   7   z   X   Z  \0  \0 004 346 326 264   F 002  \0   ! 001
    0000010  \b  \0  \0  \0 330 017   # 023 001  \0  \f   #   #   #   #   H
    0000020   e   l   l   o   #   #   #   #  \0  \0  \0  \0 022 210 337 004
    0000030   Y   r 201   B  \0 001   %  \r   q 031 304 266 037 266 363   }
    0000040 001  \0  \0  \0  \0 004   Y   Z 375   7   z   X   Z  \0  \0 004
    0000050 346 326 264   F 002  \0   ! 001  \b  \0  \0  \0 330 017   # 023
    0000060 340 005 026 001   _   ]  \0 005     245   -   U 273 230 004   |
    0000070 306 231 274 240   f   M 315   1 255  \v   )     270   (  \f   .
    0000080 274   3 263   ~   * 323 312   n 324   C 034   Q   Y 314 217 341
    0000090 350 343 357   @ 227 271 247   w 026 255 237   U   *   H 367 240
    00000a0   K 260   p   9   5 370   j 017 203   ? 307   ] 005 326 220 240
    00000b0 352   I 236   `   ? 304 306 365 246 375 226   [ 257   a 177   -
    00000c0 261 033 204 001 203 251 346 252 243 244   < 016   h 256 273 356
    00000d0 320 206   ! 355 367   P   ) 247 035 226 357 026 224 324   u   K
    00000e0 255   8 277 326 350 347 205  \n   : 220 334 342   u 216   M 357
    00000f0  \f   =   c 316 244   %  \0 233   E 354   x   " 376 250 266 247
    0000100   ^ 303   S 273 220 271   0   | 232 222 252 033 251  \r 372   g
    0000110   n 221 177   ) 270   l  \v   X 315   w   ,   (  \t 254   *   $
    0000120 023     031   !   | 003   J   0 262 002 030 230 321           4
    0000130 340 233 216 337 360 261   \   ] 216 016 020 206 001 357 346 332
    0000140 337   z 306 363 374   ] 354 323 335   [ 337 214   z 330 226 027
    0000150   j 255 037 260   ' 027 356 333   F   W   V 200 030   H   ; 337
    0000160  \n 245   q   7 372 360   I 263   ; 326 316 321 306 030   q   K
    0000170 337 264 030   1 345   z 032   @ 357 301   ] 302   U   < 037 262
    0000180 363 272 241 231 346   s  \0 337 216 235 375 236   [ 037 215 037
    0000190 274 341 033 200  \0  \0  \0  \0   u 216 336   U 336   r   1   u
    00001a0  \0 001 311 002 227  \n  \0  \0  \n 224   r 242 261 304   g 373
    00001b0 002  \0  \0  \0  \0 004   Y   Z 375   7   z   X   Z  \0  \0 004
    00001c0 346 326 264   F 002  \0   ! 001  \b  \0  \0  \0 330 017   # 023
    00001d0 001  \0  \r   #   #   #   #   W   o   r   l   d   #   #   #   #
    00001e0  \n  \0  \0  \0   x 360  \v   ) 314   g 337 330  \0 001   & 016
    00001f0  \b 033 340 004 037 266 363   } 001  \0  \0  \0  \0 004   Y   Z
    0000200
    ```

> Copilot:

> The hexdump you've provided appears to be a binary file. It's difficult to determine if it's a cause for concern just by looking at the hexdump. Here are a few things you could consider:

> 1. *File Origin*: If this file was added by a trusted team member and it's expected to be part of your code repository, it's probably safe. If the file's origin is unknown or it was added unexpectedly, you might want to investigate further.

> 2. *File Purpose*: Understanding what this file is used for can help determine if it's a cause for concern. If it's a binary used for testing or a necessary part of your application, it might be fine. If its purpose is unclear, it's worth checking.

> 3. *File Analysis*: You could use file analysis tools to examine the binary file. This could provide information about what the file does, which can help you decide if it's a cause for concern.

> 4. *Security Scans*: Consider running security scans on the file. There are many tools available that can scan binary files for known security vulnerabilities.

> 5. *Code Review*: If this file was added as part of a commit, make sure the commit was reviewed thoroughly. Unexpected binary files can sometimes be a sign of an insecure direct object reference (IDOR) vulnerability.

> Remember, when in doubt, it's always better to err on the side of caution. If you're unsure, consider reaching out to a security professional. They can help you analyze the file and determine if it poses any risk.

Lol.

---

> 2. Stop accepting shit tools like M4 and C for creating any software.

> You can never make the work of black hats impossible but you can trivially make it a whole lot harder.

What could be more trivially done than simply replacing C for all code?

m3kw9 · 2024-04-13T03:10:56 1712977856

I’ve seen some google zero day vulnerability hack analysis and those looked out of this world but this hack must be one of the all time

asveikau · 2024-04-13T02:47:53 1712976473

I saw the xz repo is back on GitHub, Lasse and a new contributor were cleaning it up. They removed ifunc support and they committed code to generate test files into the repo, so that they can do test files without blobs. It looks like they're doing good stuff.

saagarjha · 2024-04-13T02:52:57 1712976777

> a new contributor

Jia Tan, is that you?

asveikau · 2024-04-13T03:06:25 1712977585

Very funny. This one:

https://github.com/tukaani-project/xz/commits?author=thesame...

https://github.com/thesamesam

Seems like he's a Gentoo contributor.

asddubs · 2024-04-16T09:03:28 1713258208

oh no, gentoo has been compromised

willsmith72 · 2024-04-13T04:07:07 1712981227

I know part of the fun of oss is anyone can contribute behind an anonymous github username, but is this a wake up call for the package owners?

Do we need at least some form of basic identity verification (even in the form of an employer link, LinkedIn, or similar)?

CiPHPerCoder · 2024-04-13T04:29:43 1712982583

I've argued in a blog post [1] that we need to delineate between "open source developer" and "supplier". If we don't do that, calling thankless unpaid volunteers and hobbyists a "supply chain" is kind of insulting [2].

I don't believe that "identity verification" for F/OSS developers is a good idea. Suppliers? Sure. That can be a contract negotiation when you decide how much you pay for it.

Also, I don't think identity verification helps when your adversary is a nation state, which can just falsify government identification if it suits them.

[1] https://scottarc.blog/2024/04/04/open-source-supply-chains-a...

[2] https://crankysec.com/blog/supply/

xcrunner529 · 2024-04-13T04:37:02 1712983022

Just because it can by beaten doesn’t mean making it harder isn’t useful. This person/team used a VPN. Masking your location is a big red flag for just dev work like this. These things could be exposed in UI.

CiPHPerCoder · 2024-04-13T04:41:28 1712983288

> Just because it can by beaten doesn’t mean making it harder isn’t useful.

Fair.

> This person/team used a VPN. Masking your location is a big red flag for just dev work like this. These things could be exposed in UI.

I disagree strongly, and am surprised to hear this argument on Hacker News of all places.

ogurechny · 2024-04-13T05:34:39 1712986479

People are so used to see artificial bureaucratic structures as more real than their real counterparts that they constantly invent such naive solutions. “Just make the gub'ment provide an official paper (with a stamp) that Joe Random Dude is a real developer, a father of two, not a fan of satanic metal music, and the project will be safe”.

xcrunner529 · 2024-04-14T05:05:51 1713071151

People already do this in a general sense for authenticating a person and deciding if they’re trustworthy.

ogurechny · 2024-04-14T07:30:39 1713079839

Not “trustworthy”, but “able to play their role in certain scenes”. When you're outside of that structure, those decorated clothes lose any meaning.

jamespo · 2024-04-13T09:07:12 1712999232

The VPN is just part of the picture (sock puppet accounts complaining about speed of dev, no meaningful history of other contributions from the dev, no trusted "personal network" for the dev, etc) that in hindsight should have raised red flags.

xcrunner529 · 2024-04-14T05:05:06 1713071106

If they constantly are on a VPN and not willing to disclose a real location or IP then I fail to see why they should be trusted when they don’t provide anything trustworthy themselves.

markhahn · 2024-04-13T14:59:10 1713020350

new project idea: OpenBackgroundCheck

volunteer osint researchers attempt to dox any identity you submit, so you can know whether they're the Right Kind of OSS contributor or not.

/s

asveikau · 2024-04-13T16:25:41 1713025541

Most people you interact with electronically, you don't even bother trying to see if they're using a VPN or make any attempt to geolocate them.

xcrunner529 · 2024-04-14T05:03:50 1713071030

Sure but GitHub could have that as a badge to provide useful info that can help with vetting someone who wants to be a maintainer.

mbs159 · 2024-04-15T14:08:31 1713190111

Not everyone dev that uses a VPN or something like Tor is doing so due to some malicious reasons. Some people face challenges regarding privacy.

some_furry · 2024-04-13T04:22:54 1712982174

No. I love contributing as my fursona sometimes. I don't want to dox myself to do so.

We shouldn't need to know who you are to be verify that your contribution is trustworthy. Code should stand on its own merits. Ideas like this tend to dovetail into "real name policies", which suck https://www.eff.org/deeplinks/2014/09/facebooks-real-name-po...

maxbond · 2024-04-13T11:44:10 1713008650

Identity is a red herring. Backdoors are intentional vulnerabilities and are discoverable through vulnerability analysis.

You can't solve identity, and real, verified people can still betray you. Even trustworthy people can betray you, either unwillingly or unwittingly.

You can't solve vulnerability analysis either (it's the halting problem), but you also can't shirk it. You might as well go all in on it rather than undermining the basis of open source collaboration and many eyes making bugs shallow.

But it certainly is a wake up call of some kind.

usefulcat · 2024-04-13T04:46:46 1712983606

> Do we need at least some form of basic identity verification

For dedicated attackers (as was the case here) that will move the goal posts ever so slightly farther away, but not much else. I see how it's tempting to focus on identity, but I don't see how it's productive.

cesarb · 2024-04-13T11:01:24 1713006084

> Do we need at least some form of basic identity verification (even in the form of an employer link, LinkedIn, or similar)?

Not all free software contributors are employed. They might be too young (it's not unusual to start contributing to free software while still in school), too old and already retired, be part of a family where another family member is the working one, do freelance or informal work, or be searching for work at the moment.

And even those who are employed do not necessarily have it visible on their employer's website; for instance, I don't think my current employer has any publicly available page listing all employees, and I never had (and don't have any desire for) a LinkedIn account.

logro · 2024-04-13T10:33:35 1713004415

For a state level attack, that wouldn't even help. They'd just use state level forged documents.

redox99 · 2024-04-13T05:06:08 1712984768

State actors (probably the case here) can handle all that stuff.

danieldk · 2024-04-13T07:30:37 1712993437

If an identity (known or pseudonymous) was signed by a government, there would at least be some accountability. Like if Jia Tan's identity is signed by government X, then there is a good hint that they are a state actor for X or otherwise there is a burden for X to come up with convincing evidence that this is not the case.

Obviously, every system is going to have weaknesses, but it would at least introduce some accountability.

The main issues are all related to privacy. What if your government is not nice and you don't want them to track all your contributions?

wasmitnetzen · 2024-04-13T08:05:23 1712995523

Let's say we would have definite proof that this was a Chinese op. Then what? No other government is gonna complain too hard because they know that their own agencies are doing the very same thing too.

Additionally, creating fake identities in other states is base level spying.

danieldk · 2024-04-13T10:14:46 1713003286

No other government is gonna complain

Maybe not. But to the open source community it does provide more information. Now it's completely unclear who compromised xz, if the account had a government-level signature of some sort, more is known. Also, the open source community could have different levels of trust in authors with different signing authorities [1].

Additionally, creating fake identities in other states is base level spying.

We are not talking about a fake identity here (as in a fake passport or whatever), but getting a signature from a device that is presumably in a facility that is disconnected from the network. The user's identities would also live on a hardware device (e.g. chip embedded in a passport/id).

You could steal the hardware device to steal someone's developer identity. But the identity can be revoked once the device is reported/stolen. So, it is hard to play the long game with a stolen identity, outside blackmailing or otherwise compelling a developer to make their identity available (but even in that case it'd give some leads in understanding who the actor is).

Apple Developer signing keys are similar. You can verify where an app comes from. If a developer goes rogue or their key is compromised, Apple can revoke it. Apple could do evil stuff by creating sock puppet accounts, but that would devalue the trust in their signing key.

---

[1] I should note that I am not in favor of this. It's a privacy nightmare and it would give governments a lot of control over developers. I just think the argument upthread that a state actor could fake anything is not true. If e.g. the US had such an identity signing system, a foreign power could not forge the identities as long as the US private key is not compromised.

jaza · 2024-04-13T10:07:22 1713002842

https://www.linkedin.com/pub/dir/Jia/Tan

Take your pick! 1600+ to choose from.

ahoka · 2024-04-13T14:28:30 1713018510

some projects are asking for physical id and signing a contract since decades

robryk · 2024-04-13T10:43:22 1713005002

> The only variable inside its own table is yolAbejyiejuvnup=Evjtgvsh5okmkAvj, which in this context serves as a kill switch.

This seems false? The trie contained also strings like `WAYLAND_DISPLAY=`, so IIUC the backdoor would remain dormant if someone tried to run sshd from something that looks like an interactive session.

matt3210 · 2024-04-13T05:44:29 1712987069

I hope there aren’t any like this which went unnoticed :-(

HankB99 · 2024-04-13T15:32:11 1713022331

I'm curious about the technical skills needed to pull this off. I'm not a maintainer and have never built a Debian package. It appears to me that it required a fairly deep knowledge of the package build pipeline to accomplish this. Is that well enough documented somewhere that someone with a technical background and desire to learn could pick it up? What's the possibility that this person (or team) has been an active participant in the Debian community? Likewise with the RH technology and team, if that differs.

chuckadams · 2024-04-13T18:12:46 1713031966

The exploit obviously took someone with lots of black hat experience to write. The trie-based string comparison trick is particularly clever, though even that is fairly simple compared to the stuff that comes out of the likes of the NSO group (check out the Wikipedia article for "Weird Machines", that's the sort of stuff they write). The build process of the exploit is also somewhat interesting, but what kicks it off is just an extra m4 file that autogen picks up and dutifully runs with its out of the box settings. That extra file was just added by hand to the dist tarballs, since the server was also owned by the attacker. It doesn't appear in the repo anywhere. Five minutes of reading autoconf docs and you too can create an autoconf trojan with ease.

The various Jia Tan sockpuppets were actively pushing both Debian and Redhat to include the backdoored version of xz, so the answer to your last question is yes, they are in fact active participants in the community.

rvba · 2024-04-14T12:13:49 1713096829

How can we know if not all other blobs in Linux are not spyware?

I think there is a small de-blobbed kernel but it is unpopular

andy_ppp · 2024-04-13T10:23:47 1713003827

Imagine all the back doors people haven’t discovered…

ta1243 · 2024-04-13T10:24:55 1713003895

Especially in closed source software where nobody has an incentive to look, any performance hits will be ignored, code analysis tools can't be run (why bother - can't exactly sell it as a new feature)

Animats · 2024-04-13T06:42:04 1712990524

Why is "hooking" allowed at all in non-debug code? The legit applications all involve tracing or profiling.

rwmj · 2024-04-13T06:47:03 1712990823

What does "non-debug" code mean in this context? Most distros compile everything with -g and then separate out the debug symbols into a different package, because you almost always want to be able to attach gdb to any process when something goes wrong (and no, recompiling at that point is a much worse idea, because it means installing compilers on your production system as well as not being able to debug what was actually running).

It doesn't matter anyway, because once you're running inside a process you can do most anything you like. This was a convenient mechanism, but there are scores of ways to scan over the symbols and modify code inside the same process.

Animats · 2024-04-13T07:15:24 1712992524

"You almost always want to be able to attach gdb to any process when something goes wrong".

Now, this is why we can't have secure systems.

Somehow, I've never needed to attach GDB to a running process. Start up programs under GDB, yes. Read crash dumps, yes. If you're using a debugger in a production system, something has gone horribly wrong.

troad · 2024-04-13T07:41:34 1712994094

> If you're using a debugger in a production system, something has gone horribly wrong.

That's right, hence the debugger.

rwmj · 2024-04-13T08:30:47 1712997047

Or you want to capture accurate core dumps for a tool like ABRT. We (Red Hat) work very closely with customers to find out what is and isn't acceptable to be running on their production systems, including customers who store very sensitive data.

cozzyd · 2024-04-22T04:31:47 1713760307

How do you debug rare hangs?

saagarjha · 2024-04-13T11:00:54 1713006054

Hooking of what?

Animats · 2024-04-13T20:15:27 1713039327

Live-patching code so that a call to something becomes a call to something else.

saagarjha · 2024-04-13T20:57:52 1713041872

How are you going to prevent this in general? At some point you’re going to make an indirect call.

phatfish · 2024-04-13T13:09:07 1713013747

It's a helpful collection of information on this topic. But Kaspersky... really? No one with any sense has that voluntary backdoor in their network.

dvsfish · 2024-04-13T02:32:05 1712975525

Great read - the lengths the attacker went to to not only hide it but also build the trust to be able to pull this off is incredible. So calculated. One can only imagine they felt a great deal of panic and despair as the security community lit up the issue, regardless of a near zero chance of repercussions. Very clever - those involved in catching this are, quietly, heroes

jerjerjer · 2024-04-13T02:57:46 1712977066

The use of trie structure to compare strings without hardcoding raw, searchable string values is very inspired.

But yes, some black hat had a really really bad day: being so close to owning any ssh-enabled recently-updated box and when they can finally see a finish line it is just poof... gone in a day. It was two years of effort from at least one guy and more likely a team.

Terr_ · 2024-04-13T06:33:21 1712990001

> The use of trie structure to compare strings without hardcoding raw, searchable string values is very inspired.

Aside, this is one of my grumpy-old-dude opinions when it comes to regular non-malicious code: Source "greppability" is a virtue, one that must be balanced against "don't repeat yourself" and "compile time checking", etc.

Some examples off the top of my head:

1. One import/alias per line, no wildcards.

2. If you have to concatenate a literal string across multiple lines, try to ensure the break isn't inside an interesting substring someone would search for.

3. If you write a compile-time constant like foo=4*6*9, the comments around it should also contain the result. Sure, someone might screw up and forget to keep the comment in-sync, but it can also be invaluable when someone else is tearing their hair out trying to find out why the production logs are saying nothing but "Illegal frump: 216".

Don't get me wrong, I love a good IDE, and many will offer "Find Usages" or "Find Literal"... but it doesn't always work all the time, for all languages in the project at once, for everyone on a team who might use other tools, or during a PR review through a web-browser, etc.

btown · 2024-04-13T12:33:54 1713011634

Along the lines of source greppability, at some point someone will try to replace successive a.foo_x, a.foo_y, and a.foo_z accesses with a loop over foo_{suffix}, thus breaking any attempt to grep for foo_y. That someone may be me! This should almost always be struck down in code review wherever possible. Being a bit more DRY isn’t worth it!

ncruces · 2024-04-13T10:30:30 1713004230

If you have to concatenate a literal string across multiple lines, try to ensure the break isn't inside an interesting substring someone would search for.

This is called semantic line breaks and is also important to get (admittedly more naive) diffs sensible.

https://sembr.org/

kubanczyk · 2024-04-13T13:49:43 1713016183

4. Avoid complex regexes. E.g. prefer '>>>>>' over '>{5}'

Terr_ · 2024-04-13T18:29:24 1713032964

On the other hand, the shorter version communicates "five" without a human needing to count them up. Either way, some grep-ability would come from hits on samples within unit tests.

lenerdenator · 2024-04-13T04:15:10 1712981710

I think it goes beyond just this particular attempt. I bet they were looking for other targets, targets that are widely-used but maintained by resource-deficient projects. They wanted to do maximum damage with the fewest possible supply chain attacks.

The fact that many projects are now searching deep and wide through their commit histories, looking at just who they're taking code from, beginning to develop frameworks for attack mitigation and remediation... an entire type of previously very-promising attack is completely burned. This has been a massive defeat. And it happened all by chance.

fauigerzigerk · 2024-04-13T07:03:23 1712991803

>an entire type of previously very-promising attack is completely burned.

I fear it's not just the attack that is burned. If new contributors have to be distrusted and/or go through some sort of vetting that isn't based on the merit of their contributions, that is a terrible blow to the entire open source movement.

The threshold for young coders without a great deal of history and without a network of contacts to become contributors to important open source projects has just gone up massively.

Brian_K_White · 2024-04-13T09:25:47 1713000347

Nah. Hardly going to change anything for the worse, only for the better.

The bar is still trivial. The worst part is that it will be difficult to be anonymous. There are a lot of valid, even life & death reasons for anonymity, and that will now be much harder.

So the loss is all the contributors who don't dare let their government or employer see them helping the wrong projects, or in some cases don't want to disclose that they are any good at coding or particularly interested in security etc.

bradleyjg · 2024-04-13T11:11:14 1713006674

I bet a bunch of employers don’t want any unauthorized contributions to open source. For governments it seems much more niche—-only a few specific projects would raise red flags.

tw04 · 2024-04-13T11:59:09 1713009549

The average business still isn’t going to have any clue who GitHub user coderdude57 is in real life. But coderdude57 may be forced to hop on a video meeting to prove who he is to a project lead before his code is accepted.

cqqxo4zV46cp · 2024-04-13T10:25:33 1713003933

“GitHub Verified”.

mgaunard · 2024-04-13T12:08:53 1713010133

The main vector of attack was patching the code as part of running tests.

I would expect to see more projects try to protect against this in general by attempting to separate the building and testing procedures.

cesarb · 2024-04-13T12:29:11 1713011351

> The main vector of attack was patching the code as part of running tests.

No, it was not.

The test files were used as carriers for the bulk of the malicious code, but running the tests did nothing malicious; the supposedly corrupted test file did act as a corrupted file in the test. What extracted and executed the code hidden in these test files was a bit of code injected in the configuration steps, which in turn modified the compilation steps to extract and execute more code hidden in these test files, which in turn extracted and injected the backdoor code. This all happened even if the tests were never run.

> I would expect to see more projects try to protect against this in general by attempting to separate the building and testing procedures.

The suggestions I've seen on this line of thinking were to not only separate the building and testing steps (so that the testing steps cannot affect the output of the build step), but also to remove all test files before doing the build, and adding them back for the testing step. The important part being to remove all binary test files before doing the build, and having two separate steps only as a side effect of not being able to test without putting the test files back.

I don't think this approach is going to be adopted, as it's too much work for little gain; people are focusing more on making sure all files on the tarball used for the build either come from the corresponding tag in the repository, or can be proven to have been deterministically generated from files found in the corresponding tag in the repository. That would have either caught or ignored the key file added to the tarball which started the extraction and injection steps.

mgaunard · 2024-04-13T16:23:39 1713025419

It's injected as part of configuring the build of the tests.

Just make the two separate projects, where the test project depends on installed binaries.

KeplerBoy · 2024-04-13T05:26:00 1712985960

How much effort was it really? Yes, they worked on it over 2 years, but I guess it wasn't more effort than a few hours every other week (apart from engineering the actual exploit). After all putting in a full time effort as an unpaid contributor would be suspicious in itself.

Assuming they work 40 hours a week and are doing this in a team (presumably every major player has such a team or is scrambling to get one now), one must expect many potentially infiltrated projects out there.

NamTaf · 2024-04-13T11:23:51 1713007431

“Apart from engineering” is doing some seriously heavy lifting here. Writing the actual code is likely just an afterthought by comparison. The engineering - identification of a target weakness, the design of the exploit chain, etc. - is overwhelmingly going to have been the lion’s share of the effort.

issafram · 2024-04-13T04:57:16 1712984236

It was most likely done by a government. Not just one hacker.

gizmo686 · 2024-04-13T06:18:28 1712989108

Governments are made up of people. There is likely at least 1 real person (if not more) for whom this attack has been the entirety of their professional life for years.

Brian_K_White · 2024-04-13T09:32:51 1713000771

If so, then they were paid for those hours and are perfectly whole right now. Maybe their career takes a dent but I do not weep for whoever this was no matter which government they worked for, unless it was so bad they were actually a slave and forced to do this against their will.

The government who paid them just lost the investment, but who cares about them? It's only a good thing if this does not get a reputation for being a good invesment.

If it was a criminal, then the same as the government they lost the investment but again that's only a good thing.

cesarb · 2024-04-13T10:35:00 1713004500

> > and when they can finally see a finish line it is just poof... gone in a day. It was two years of effort from at least one guy and more likely a team.

> If so, then they were paid for those hours and are perfectly whole right now.

People are not ideal money-fed automatons. Even if you're fully paid for your software development, it feels bad to see all your work thrown away.

Brian_K_White · 2024-04-13T10:42:18 1713004938

If only I had also said something like "maybe their career takes a dent but I do not weep for anyone who worked on this" to indicate that I understood that obviosity.

cesarb · 2024-04-13T11:24:41 1713007481

> > People are not ideal money-fed automatons. Even if you're fully paid for your software development, it feels bad to see all your work thrown away.

> maybe their career takes a dent but I do not weep for anyone who worked on this

Even if they were fully paid, and even if their career is not affected at all (or even improves, "I wrote the xz backdoor" would be an impressive line in a curriculum if you disregard the moral aspects), it can still feel bad to see your work thrown away so close to the finish line. People are not career-driven automatons.

But I agree with you, I do not feel bad for the author of this backdoor; whoever did that does deserve to see this work unceremoniously dumped into the trash. But I can understand why the ones who worked on that code would feel bad about it.

thaumasiotes · 2024-04-13T14:52:17 1713019937

> Maybe their career takes a dent

I don't see why; news coverage has pretty uniformly taken the view that "this guy was an evil genius, and we escaped his incredibly well-executed plot by the skin of our teeth".

cqqxo4zV46cp · 2024-04-13T10:30:49 1713004249

Are you being argumentative for the sake of it?

The statement was that someone had a really bad day.

The implication that people are driven by money and nothing else, or even that they have no right to feel like they had a bad day if they were paid, is absurd.

Nobody is saying that you should be sympathetic. It’s just an interesting comment: An interesting thing to think about. A worthwhile contribution to the conversation. This was someone’s very bad ending to a very long project.

Brian_K_White · 2024-04-13T12:35:15 1713011715

Ok, true. It was a bad day.

jlmorton · 2024-04-13T09:45:39 1713001539

> It was two years of effort from at least one guy and more likely a team.

You think these guys have standups?

jaza · 2024-04-13T09:50:36 1713001836

Gonna be one hell of a retro...

kevindamm · 2024-04-13T10:23:17 1713003797

I wonder if they do blameless post-mortems..

rnts08 · 2024-04-13T11:53:11 1713009191

With the likely culprit they won't get blamed just a bullet in the back of the head and a bill for it sent to the family.

ankitg12 · 2024-04-13T11:16:42 1713007002

Blame it on outside party.

lovasoa · 2024-04-13T15:31:34 1713022294

The project manager who said performance issues would be fixed in v2 is going to have a hard time.

saagarjha · 2024-04-13T10:52:07 1713005527

wjholden · 2024-04-13T06:12:44 1712988764

I agree, I had never heard of using a trie as a means of hiding strings.

I'm not familiar with any programming language that provides tries in a standardized way. They're not so hard to code, so I wonder if this will become a trend in future malware.

jumelles · 2024-04-13T03:06:20 1712977580

Honestly? Good. Anyone doing this sort of work willingly deserves worse.

optimalsolver · 2024-04-13T06:53:56 1712991236

That they were harassing and manipulating a lone, unthanked maintainer who had already told them he was dealing with mental issues makes them evil, IMO.

The honorable thing for "Jia" to do after this epic failure is seppuku, or whatever his or her local equivalent is.

cqqxo4zV46cp · 2024-04-13T10:35:16 1713004516

Nobody sees themselves as the bad guy, and that’s not the same as “some people are just fundamentally selfish”. There are definitely loads of people that’d feel like the end justifies the means. There are plenty of people for whom a workday involves doing far worse things for the world than cyberbullying one person, and will look you in the eye and justify it. Plenty of that stuff is socially acceptable in many many mainstream circles. “Being mean to a maintainer” is just one that this community is especially sensitive to, because it involves a highly personified victim that they can relate to.

optimalsolver · 2024-04-13T11:35:22 1713008122

These maintainers add vast amounts of value to the modern world, though most of the people that benefit indirectly from their work can't really conceive of what it is they do.

People like "Jia" are pure parasites. It's one of the best cases of "why we can't have nice things" I've ever seen.

qweqwe14 · 2024-04-13T15:25:57 1713021957

Yeah, they add vast amounts of "value", including (accidentally) reinforcing the status-quo.

It would definitely be interesting to see what would happen if the attack wasn't noticed, but instead people focus their interest on attacking Jia Tan because "wow, that guy is one hell of an asshole, it sure is a great thing that he failed!".

Whether or not this attack was the rare one that failed out of many similar ones is largely irrelevant to people. Quick, discuss this one particular case where we noticed it, news flash and all.

> People like "Jia" are pure parasites

They are "parasites" because they don't do what they are "supposed to"? That's pretty crazy. I guess every person that's doing what matches their interests is somehow a bad person/parasite/etc. Or is that only if what they do is forbidden by some rule book? Do you see what I'm getting at here?

"pls sir dont create linux backdoar"

optimalsolver · 2024-04-13T17:18:24 1713028704

>I guess every person that's doing what matches their interests is somehow a bad person/parasite/etc

If they inflict harm on their fellow humans in order to do so, then literally yes.

There's a reason why a society will collapse if you have too many of this type of person.

qweqwe14 · 2024-04-13T21:03:26 1713042206

HFT is also harmful, and so are the majority of startups that don't do anything actually useful and just take VC money. Those are just a few examples off the top of my head.

What I'm saying is that there are a lot more technically legal ways to profit that harm society, some of them more nefarious than what Jia Tan did.

Doing things that are bad for the society in a fucked up society seems justifiable. It doesn't necessarily make you a bad person.

People just have a more averse reaction to things that are obviously bad, even if in practice there are way worse things that initially seem innocuous and are actually legal to do. That's just the textbook example of hypocrisy.

inkyoto · 2024-04-13T04:33:17 1712982797

The fundamental question remains: considering the in-depth knowledge of low-level (a small scale disassembler), system programming and the practical application of data structures (i.e. using a trie for all string operations to avoid the use of string constants), was it a state sponsored attack, or it was a one man show who did it for personal gain? Attempting to backdoor nearly every single Linux instance of sshd goes beyond the definition of just being brazen.

dataflow · 2024-04-13T05:14:20 1712985260

> The fundamental question remains: considering the in-depth knowledge of low-level (a small scale disassembler), system programming and the practical application of data structures (i.e. using a trie for all string operations to avoid the use of string constants), was it a state sponsored attack

I think you're missing where the difficulty is. I'd argue the technical knowledge here isn't the hard part here. You can find more than enough folks who would know how to do everything you listed outside of any state-sponsored context. The hard part is the methodical execution, including the opsec, backup plans, plausible deniability, false identities, knowing how to play the long game, being able to precisely target what you want, understanding human psychological tendencies, identifying unforeseen security vulnerabilities, etc... these are the things that I imagine are likely to distinguish talented individuals from state-sponsored actors, not something like "knows tries and x86 assembly".

And taking all those into account, this absolutely smells state-sponsored. I just see no way that a private entity had enough of an incentive, foresight, experience, and patience to play such a long game with such careful maneuvers.

inkyoto · 2024-04-13T08:11:36 1712995896

I do not think I am missing out: the supply chain attack which included playing the long game, subverting the trust of the development community is the real issue that the open source community has no defences against – the thwarted attack has surpassed the scale of all previous supply chain attacks on NodeJs, Python and similar ecosystems and went deep down into the low level, technical layers as well.

The assault was comprehensive, holistic and systematic in its approach – this article does not mention it, but other reports have indicated that the person behind it also managed to compromise the PKI layer at the edge between OpenSSL and sshd which brings an extra level of complexity to the backdoor.

bradleyjg · 2024-04-13T11:21:23 1713007283

A lot of the filter is just limited time and energy. A college kid has plenty of that, but not the right skills. There are more than a small number of people with the right skills but they have day jobs. That’s what makes think whoever was working on this was being paid upfront to do so. That said an obsessive with an easy day job and not a ton of family responsibilities is always a possibility.

yodsanklai · 2024-04-13T11:51:28 1713009088

> There are more than a small number of people with the right skills but they have day jobs. That’s what makes think whoever was working on this was being paid upfront to do so.

Doesn't this apply to many OSS contributors? people with skills and free time?

bradleyjg · 2024-04-13T11:53:22 1713009202

I’d guess that the vast majority of new open source code last year (weighted by loc and usage) was written by people being paid to do so.

codezero · 2024-04-13T04:46:27 1712983587

That nobody is talking about the person's IP addresses (the xz project was hosted on a personal server of the maintainer), or any details about their actions indicates to me it was a state actor and the original xz maintainer is cooperating with law enforcement to uncover their actions.

miohtama · 2024-04-13T08:07:29 1712995649

Based on working hours Israeli is likely

https://rheaeve.substack.com/p/xz-backdoor-times-damned-time...

There are other countries on the same time zone, but only Israel has history of using and selling hacks of this level.

fl7305 · 2024-04-13T11:22:35 1713007355

Or someone further west who did this after work during the evenings.

Or someone further east with a classical hacker "let's get up late in the afternoon, and start the serious hacking late at night."

cesarb · 2024-04-13T11:48:42 1713008922

From that article, "To further investigate, we can try to see if he worked on weekends or weekdays: was this a hobbyist or was he paid to do this? The most common working days for Jia were Tue (86), Wed (85), Thu (89), and Fri (79)." That makes it more likely this work was done during working hours; someone doing things outside of work hours would be more likely to produce the same amount (or more) on weekends and holidays.

fl7305 · 2024-04-14T10:09:13 1713089353

Yes, that would point towards regular office hours and not hacking hours.

tonypace · 2024-04-13T11:43:20 1713008600

Moscow is the same time zone as Israel.

willvarfar · 2024-04-13T06:16:22 1712988982

Whilst the ip addresses and email headers etc should be examined meticulously, in the distant hope that they lead somewhere, the chances are that they won't. Very basic opsec.

cdelsolar · 2024-04-13T05:35:26 1712986526

I thought it was on GitHub pages?

codezero · 2024-04-13T06:06:33 1712988393

Before pages the project was hosted on the server of the original maintainer.

rwmj · 2024-04-13T06:40:59 1712990459

CISA is investigating it.

AtNightWeCode · 2024-04-13T08:35:44 1712997344

I think it is a professional attempt. It does not have to be a goverment. There are other cyber criminals.

lmm · 2024-04-13T08:48:32 1712998112

> It does not have to be a goverment. There are other cyber criminals.

Criminals with the resources to maintain such a complex operation for two years?

trallnag · 2024-04-13T10:13:53 1713003233

Hobbyist around the world maintain projects for years and decades

rvnx · 2024-04-13T10:16:16 1713003376

Yes, crypto-boys for example. If the reward is 1B USD in crypto stolen from wallets why not do spend a year adding a backdoor ?