Very annoying - the apparent author of the backdoor was in communication with me over several weeks trying to get xz 5.6.x added to Fedora 40 & 41 because of it's "great new features". We even worked with him to fix the valgrind issue (which it turns out now was caused by the backdoor he had added). We had to race last night to fix the problem after an inadvertent break of the embargo.
He has been part of the xz project for 2 years, adding all sorts of binary test files, and to be honest with this level of sophistication I would be suspicious of even older versions of xz until proven otherwise.
EDIT: Lasse Collin's account @Larhzu has also been suspended.
EDIT: Github has disabled all Tukaani repositories, including downloads from the releases page.
--
EDIT: Just did a bit of poking. xz-embedded was touched by Jia as well and it appears to be used in the linux kernel. I did quick look and it doesn't appear Jia touched anything of interest in there. I also checked the previous mirror at the tukaani project website, and nothing was out of place other than lagging a few commits behind:
EDIT: Just sent an email to the last committer to bring it to their attention.
EDIT: It's been removed.
--
jiatan's Libera info indicates they registered on Dec 12 13:43:12 2022 with no timezone information.
-NickServ- Information on jiatan (account jiatan):
-NickServ- Registered : Dec 12 13:43:12 2022 +0000 (1y 15w 3d ago)
-NickServ- Last seen : (less than two weeks ago)
-NickServ- User seen : (less than two weeks ago)
-NickServ- Flags : HideMail, Private
-NickServ- jiatan has enabled nick protection
-NickServ- *** End of Info ***
/whowas expired not too long ago, unfortunately. If anyone has it I'd love to know.
They are not registered on freenode.
EDIT: Libera has stated they have not received any requests for information from any agencies as of yet (30th Saturday March 2024 00:39:31 UTC).
EDIT: Jia Tan was using a VPN to connect; that's all I'll be sharing here.
Just for posterity since I can no longer edit: Libera staff has been firm and unrelenting in their position not to disclose anything whatsoever about the account. I obtained the last point on my own. Libera has made it clear they will not budge on this topic, which I applaud and respect. They were not involved whatsoever in ascertaining a VPN was used, and since that fact makes anything else about the connection information moot, there's nothing else to say about it.
I am not LE nor a government official. I did not present a warrant of any kind. I asked in a channel about it. Libera refused to provide information. Libera respecting the privacy of users is of course something I applaud and respect. Why wouldn't I?
Respect not giving out identifying information on individuals whenever someone asks, no matter what company they work for and what job they do? Yes. I respect this.
Good question, though I can imagine they took this action for two reasons:
1. They don't have the ability to freeze repos (i.e. would require some engineering effort to implement it), as I've never seen them do that before.
2. Many distros (and I assume many enterprises) were still linking to the GitHub releases to source the infected tarballs for building. Disabling the repo prevents that.
The infected tarballs and repo are still available elsewhere for researchers to find, too.
They could always archive it. Theoretically (and I mean theoretically only), there's another reason for Microsoft to prevent access to repo: if a nation state was involved, and there've been backdoor conversations to obfuscate the trail.
Archiving the repo doesn't stop the downloads. They would need to rename it in order to prevent distro CI/CD from keeping downloading untrustworthy stuff.
The latest commit is interesting (f9cf4c05edd14, "Fix sabotaged Landlock sandbox check").
It looks like one of Jia Tan's commits (328c52da8a2) added a stray "." character to a piece of C code that was part of a check for sandboxing support, which I guess would cause the code to fail to compile, causing the check to fail, causing the sandboxing to be disabled.
If your project becomes complex enough eventually you need tests for the configure step. Even without malicious actors its easy to miss that a compiler or system change broke some check.
The alpine patch includes gettext-dev which is likely also exploited as the same authors have been pushing gettext to projects where their changes have been questioned
It's still the wrong way to go about things. Tests are there for a reason, meaning if they fail you should try to understand them to the point where you can fix the problem (broken test or actual bug) instead of just wantonly distabling tests until you get a green light.
Asking this here too: why isn't there an automated A/B or diff match for the tarball contents to match the repo, auto-flag with a warning if that happens? Am I missing something here?
The tarballs mismatching from the git tree is a feature, not a bug. Projects that use submodules may want to include these and projects using autoconf may want to generate and include the configure script.
> The release tarballs upstream publishes don't have the same code that GitHub has. This is common in C projects so that downstream consumers don't need to remember how to run autotools and autoconf. The version of build-to-host.m4 in the release tarballs differs wildly from the upstream on GitHub.
Multiple suggestions on that thread on how that's a legacy practice that might be outdated, especially in the current climate of cyber threats.
> Those days are pretty much behind us. Sure, you can compile code and tweak software configurations if you want to--but most of the time, users don't want to. Organizations generally don't want to, they want to rely on certified products that they can vet for their environment and get support for. This is why enterprise open source exists. Users and organizations count on vendors to turn upstreams into coherent downstream products that meet their needs.
> In turn, vendors like Red Hat learn from customer requests and feedback about what features they need and want. That, then, benefits the upstream project in the form of new features and bugfixes, etc., and ultimately finds its way into products and the cycle continues.
"and when the upstream is tainted, everyone drinks poisoned water downstream, simple as that!"
I think this has been in the making for almost a year. The whole ifunc infrastructure was added in June 2023 by Hans Jansen and Jia Tan. The initial patch is "authored by" Lasse Collin in the git metadata, but the code actually came from Hans Jansen: https://github.com/tukaani-project/xz/commit/ee44863ae88e377...
There were a ton of patches by these two subsequently because the ifunc code was breaking with all sorts of build options and obviously caused many problems with various sanitizers. Subsequently the configure script was modified multiple times to detect the use of sanitizers and abort the build unless either the sanitizer was disabled or the use of ifuncs was disabled. That would've masked the payload in many testing and debugging environments.
The hansjans162 Github account was created in 2023 and the only thing it did was add this code to liblzma. The same name later applied to do a NMU at Debian for the vulnerable version. Another "<name><number>" account (which only appears here, once) then pops up and asks for the vulnerable version to be imported: https://www.mail-archive.com/search?l=debian-bugs-dist@lists...
That looks exactly like what you'd want to see to disguise the actual request you want, a number of pointless upstream updates in things that are mostly ignored, and then the one you want.
Jia Tan getting maintainer access looks like it is almost certainly to be part of the operation. Lasse Colling mentioned multiple times how Jia has helped off-list and to me it seems like Jia befriended Lasse as well (see how Lasse talks about them in 2023).
Also the pattern of astroturfing dates back to 2022. See for example this thread where Jia, who has helped at this point for a few weeks, posts a patch, and a <name><number>@protonmail (jigarkumar17) user pops up and then bumps the thread three times(!) lamenting the slowness of the project and pushing for Jia to get commit access: https://www.mail-archive.com/xz-devel@tukaani.org/msg00553.h...
Naturally, like in the other instances of this happening, this user only appears once on the internet.
Also seeing this bug. Extra valgrind output causes some failed tests for me. Looks like the new version will resolve it. Would like this new version so I can continue work.
Wow, what a big pile of infrastructure for a non-optimization.
An internal call via ifunc is not magic — it’s just a call via the GOT or PLT, which boils down to function pointers. An internal call through a hidden visibility function pointer (the right way to do this) is also a function pointer.
The even better solution is a plain old if statement, which implements the very very fancy “devirtualization” optimization, and the result will be effectively predicted on most CPUs and is not subject to the whole pile of issue that retpolines are needed to work around.
Right, IFUNCs make sense for library function where you have the function pointer indirection anyway. Makes much less sense for internal functions - only argument over a regular function pointer would be the pointer being marked RO after it is resolved (if the library was linked with -z relro -z now), but an if avoids even that issue.
It’s certainly a pseudonym just like all the other personas we’ve seen popping up on the mailing list supporting this “Jia Tan” in these couple of years. For all intents and purposes they can be of any nationality until we know more.
PSA: I just noticed homebrew installed the compromised version on my Mac as a dependency of some other package. You may want to check this to see what version you get:
xz --version
Homebrew has already taken action, a `brew upgrade` will downgrade back to the last known good version.
GitHub disabled the xz repo, making it a bit more difficult for nix to revert to an older version. They've made a fix, but it will take several more days for the build systems to finish rebuilding the ~220,000 packages that depend on the bootstrap utils.
What should they be relying on instead? Maybe rsync everything to an FTP server? Or Torrents? From your other comments, you seem to think no one should ever use GitHub for anything.
"great" for whom? I've seen enough of the industry to immediately feel suspicious when someone uses that sort of phrasing in an attempt to persuade me. It's no different from claiming a "better experience" or similar.
I don't know how you can be missing the essence of the problem here or that comments point.
Vague claims are meaningless and valueless and are now even worse than that, they are a red flag.
Please don't tell me that you would accept a pr that didn't explain what it did, and why it did it, and how it did it, with code that actually matched up with the claim, and was all actually something you wanted or agreed was a good change to your project.
Updating to the next version of a library is completely unrelated. When you update a library, you don't know what all the changes were to the library, _but the librarys maintainers do_, and you essentially trust that librarys maintainers to be doing their job not accepting random patches that might do anything.
Updating a dependency and trusting a project to be sane is entirely a different prospect from accepting a pr and just trusting that the submitter only did things that are both well intentioned and well executed.
If you don't get this then I for sure will not be using or trusting your library.
// The "-2" is included because the for-loop will
// always increment by 2. In this case, we want to
// skip an extra 2 bytes since we used 4 bytes
// of input.
i += 4 - 2;
I’ve long thought that those “this new version fixes bugs and improves user experience” patch notes that Meta et al copy and paste on every release shouldn’t be permitted.
Tell me about it. I look at all these random updates that get pushed to my mobile phone and they all pretty much have that kind of fluff in the description. Apple/Android should take some steps to improve this or outright ban this practice. In terms of importance to them though I imagine this is pretty low on the list.
I have dreamed about an automated LLM system that can "diff" the changes out of the binary and provide some insight. You know give back a tiny bit of power to the user. I'll keep dreaming.
It's worse, as someone who does try to privide release notes I'm often cut off by the max length of the field. And even then, Play only shows you the notes for the latest version of the app.
Ugh. That's especially annoying because they're trying to be hip with slang and use a metaphor that requires cultural knowledge that you can't really assume everyone has.
Interesting that one of the commits commented on update of the test file that it was for better reproducibility for having been generated by a fixed random seed (although how goes unmentioned). For the future, random test data better be generated as part of the build, rather than being committed as opaque blobs...
I agree on principle, but sometimes programmatic generating test data is not so easy.
E.g.: I have a specific JPEG committed into a repository because it triggers a specific issue when reading its metadata. It's not just _random_ data, but specific bogus data.
But yeah, if the test blob is purely random, then you can just commit a seed and generate in during tests.
Debian have reverted xz-utils (in unstable) to 5.4.5 – actual version string is “5.6.1+really5.4.5-1”. So presumably that version's safe; we shall see…
Is that version truly vetted? "Jia Tan" has been the official maintainer since 5.4.3, could have pushed code under any other pseudonym, and controls the signing keys. I would have felt better about reverting farther back, xz hasn't had any breaking changes for a long time.
After reading the original post by Andres Freund, https://www.openwall.com/lists/oss-security/2024/03/29/4, his analysis indicates that the RSA_public_decrypt function is being redirected to the malware code. Since RSA_public_decrypt is only used in the context of RSA public key - private key authentication, can we reasonably conclude that the backdoor does not affect username-password authentication?
Yeah. Had a weird problem last week where GitHub was serving old source code from the raw url when using curl, but showing the latest source when coming from a browser.
Super frustrating when trying to develop automation. :(
These shouldn't be suspended, and neither should their repositories. People might want to dig through the source code. It's okay if they add a warning on the repository, but suspending _everything_ is a stupid thing to do.
This can also be handled relatively easily. They can disable the old links and a new one can be added specifically for the disabled repository. Or even just let the repository be browsable through the interface at least.
Simply showing one giant page saying "This respository is disabled" is not helpful in any way.
1. You don't actually know what has been done by whom or why. You don't know if the author intended all of this, or if their account was compromised. You don't know if someone is pretending to be someone else. You don't know if this person was being blackmailed, forced against their will, etc. You don't really know much of anything, except a backdoor was introduced by somebody.
2. Assuming the author did do something maliciously, relying on personal reputation is bad security practice. The majority of successful security attacks come from insiders. You have to trust insiders, because someone has to get work done, and you don't know who's an insider attacker until they are found out. It's therefore a best security practice to limit access, provide audit logs, sign artifacts, etc, so you can trace back where an incursion happened, identify poisoned artifacts, remove them, etc. Just saying "let's ostracize Phil and hope this never happens again" doesn't work.
3. A lot of today's famous and important security researchers were, at one time or another, absolute dirtbags who did bad things. Human beings are fallible. But human beings can also grow and change. Nobody wants to listen to reason or compassion when their blood is up, so nobody wants to hear this right now. But that's why it needs to be said now. If someone is found guilty beyond a reasonable doubt (that's really the important part...), then name and shame, sure, shame can work wonders. But at some point people need to be given another chance.
100% fair -- we don't know if their account was compromised or if they meant to do this intentionally.
If it were me I'd be doing damage control to clear my name if my account was hacked and abused in this manner.
Otherwise if I was doing this knowing full well what would happen then full, complete defederation of me and my ability to contribute to anything ever again should commence -- the open source world is too open to such attacks where things are developed by people who assume good faith actors.
I think I went through all the stages of grief. Now at the stage of acceptance here’s what I hope: I hope justice is done. Whoever is doing this be they a misguided current black hat (hopefully, future white hat) hacker, or just someone or someones that want to see the world burn or something in between that we see justice. And then forgiveness and acceptance and all that can happen later.
Mitnick reformed after he was convicted (whether you think that was warranted or not). Here if these folks are Mitnick’s or bad actors etc let’s get all the facts on the table and figure this out.
What’s clear is that we all need to be ever vigilant: that seemingly innocent patch could be part of a more nefarious thing.
We’ve seen it before with that university sending patches to the kernel to “test” how well the core team was at security and how well that went over.
Anyways. Yeah. Glad you all allowed me to grow. And I learned that I have an emotional connection to open source for better or worse: so much of my life professional and otherwise is enabled by it and so threats to it I guess I take personally.
It is reasonable to consider all commits introduced by the backdoor author untrustworthy. This doesn't mean all of it is backdoored, but if they were capable of introducing this backdoor, their code needs scrutiny. I don't care why they did it, whether it's a state-sponsored attack, a long game that was supposed to end with selling a backdoor for all Linux machines out there for bazillions of dollars, or blackmail — this is a serious incident that should eliminate them from open-source contributions and the xz project.
There is no requirement to use your real name when contributing to open source projects. The name of the backdoor author ("Jia Tan") might be fake. If it isn't, and if somehow they are found to be innocent (which I doubt, looking at the evidence throughout the thread), they can create a new account with a new fake identity.
They might have burnt the reputation built for this particular pseudonym but what is stopping them from doing it again? They were clearly in it for the long run.
I literally said "they", I know, I know, in English that can also be interpreted as a gender unspecific singular.
Anyways, yes it is an interesting question whether he/she is alone or they are a group. Conway's law probably applies here as well. And my hunch in general is that these criminal mad minds operate individually / alone. Maybe they are hired by an agency but I don't count that as a group effort.
Every Linux box inside AWS, Azure, and GCP and other cloud providers that retains the default admin sudo-able user (e.g., “ec2”) and is running ssh on port 22.
I bet they intended for their back door to eventually be merged into the base Amazon Linux image.
You don't need a "ec2" user. A backdoor can just allow root login even when that is disabled for people not using the backdoor.
It just requires the SSH port to be reachable unless there is also a callout function (which is risky as people might see the traffic).
And with Debian and Fedora covered and the change eventually making its way into Ubuntu and RHEL pretty much everything would have this backdoor.
my understanding is that any Debian/RPM-based Linux running sshd would become vulnerable in a year or two. The best equivalent of this exploit is the One Ring.
So the really strange thing is why they put so little effort into making this undetectable. All they needed was to make it use less time to check each login attempt.
In the other hand it was very hard to detect. The slow login time was the only thing that gave it away. It more seems like they were so close to being highly successful. In retrospect improving the performance would have been the smart play. But that is one part that went wrong compared to very many that went right.
Distro build hosts and distro package maintainers might not be a bad guess. Depends on whether getting this shipped was the final goal. It might have been just the beginning, part of some bootstrapping.
Not sure why are people downvoting you... it's pretty unlikely that various Chinese IoT companies would just decide it's cool to add a backdoor, which clearly implies that no matter how good their intentions are, they simply might have no other choice.
There are roughly speaking two possibilities here:
1. His machine was compromised, he wasn't at fault past having less than ideal security (a sin we are all guilty of). His country or origin/residence is of no importance and doxing him isn't fair to him.
2. This account was malicious. There's no reason we should believe that the identity behind wasn't fabricated. The country of origin/residence is likely falsified.
In neither case is trying to investigate who he is on a public forum likely to be productive. In both cases there's risk of aiming an internet mob at some innocent person who was 'set up'.
The back door is in the upstream GitHub tarball. The most obvious way to get stuff there is by compromising an old style GitHub token.
The new style GitHub tokens are much better but it’s somewhat intransparent what options you need. Most people also don’t use expiring tokens.
The authors seems to have a lot of oss contributions, so probably an easy target to choose.
I think the letters+numbers naming scheme for both the main account and the sockpuppets used to get him access to xz and the versions into distros is a strong hint at (2). Taking over xz maintainership without any history of open source contributions is also suspicious.
But my point is that people living in China might be "forced" to do such things, so we unfortunately can't ignore the country. Of course, practically this is problematic since the country can be faked
Fascinating. Just yesterday the author added a `SECURITY.md` file to the `xz-java` project.
> If you discover a security vulnerability in this project please report it
privately. *Do not disclose it as a public issue.* This gives us time to
work with you to fix the issue before public exposure, reducing the chance
that the exploit will be used before a patch is released.
Reading that in a different light, it says give me time to adjust my exploits and capitalize on any targets. Makes me wonder what other vulns might exist in the author's other projects.
In this particular case, there is a strong reason to expect exploitation in the wild to already be occurring (because it's an intentional backdoor) and this would change the risk calculus around disclosure timelines.
But in the general case, it's normal for 90 days to be given for the coordinated patching of even very severe vulnerabilities -- you are giving time not just to the project maintainers, but to the users of the software to finish updating their systems to a new fixed release, before enough detail to easily weaponize the vulnerability is shared. Google Project Zero is an example of a team with many critical impact findings using a 90-day timeline.
As someone in security who doesn't work at a major place that get invited to the nice pre-notification notifications, I hate this practice.
My customers and business are not any less important or valuable than anyone else's, and I should not be left being potentially exploited, and my customers harmed, for 90 more days while the big guys get to patch their systems (thinking of e.g. Log4J, where Amazon, Meta, Google, and others were told privately how to fix their systems, before others were even though the fix was simple).
Likewise, as a customer I should get to know as soon as someone's software is found vulnerable, so I can then make the choice whether to continue to subject myself to the risk of continuing to use it until it gets patched.
> My ... business are not any less ... valuable than anyone else's,
Plainly untrue. The reason they keep distribution minimal is to maximise the chance of keeping the vuln secret. Your business is plainly less valuable than google, than walmart, than godaddy, than BoA. Maybe you're some big cheese with a big reputation to keep, but seeing as you're feeling excluded, I guess these orgs have no more reason to trust you than they have to trust me, or hundreds of thousands of others who want to know. If they let you in, they'd let all the others in, and odds are greatly increased that now your customers are at risk from something one of these others has worked out, and either blabbed about or has themselves a reason to exploit it.
Similarly plainly, by disclosing to 100 major companies, they protect a vast breadth of consumers/customer-businesses of these major companies at a risk of 10,000,000/100 (or even less, given they may have more valuable reputation to keep). Changing that risk to 12,000,000/10,000 is, well, a risk they don't feel is worth taking.
> Your business is plainly less valuable than google, than walmart, than godaddy, than BoA.
The company I work for has a market cap roughly 5x that of goDaddy and we're responsible for network connected security systems that potentially control whether a person can physically access your home, school, or business. We were never notified of this until this HN thread.
If your BofA account gets hacked you lose money. If your GoDaddy account gets hacked you lose your domain. If Walmart gets hacked they lose... What money and have logistics issues for a while?
Thankfully my company's products have additional safeguards and this isn't a breach for us. But what if it was? Our customers can literally lose their lives if someone cracks the security and finds a way to remotely open all the locks in their home or business.
Don't tell me that some search engine profits or someone's emails history is "more valuable" than 2000 schoolchildren's lives.
How about you give copies of the keys to your apartment and a card containing your address to 50 random people on the streets and see if you still feel that having your Gmail account hacked is more valuable.
I think from an exposure point of view, I'm less likely to worry about the software side of my physical security being exploited that the actual hardware side.
None of the points you make are relevant since I have yet to see any software based entry product whose software security can be concidered more than lackluster at best, maybe your company is better since you didn't mention a name I can't say otherwise.
What I'm saying is your customers are more likely to have their doors physically broken than remotely opened by software and you are here on about life and death because of a vuln in xz?
If your companies market cap is as high as you say and they are as security aware as you say why aren't they employing security researchers and actively on the forefront of finding vulns and reporting them? That would get them an invite to the party.
Sorry, but that's not a serious risk analysis. The average person would be hurt a lot more by a godaddy breach by a state actor than by a breach of your service by a state actor.
But I don't want anyone else to get notified immediately because the odds that somebody will start exploiting people before a patch is available is pretty high. Since I can't have both, I will choose the 90 days for the project to get patches done and all the packagers to include them and make them available, so that by the time it's public knowledge I'm already patched.
I think this is a Tragedy of the Commons type of problem.
Caveat: This assume the vuln is found by a white hat. If it's being exploited already or is known to others, then I fully agree the disclosure time should be eliminated and it's BS for the big companies to get more time than us.
OpenSSL's "notification of an upcoming critical release" is public, not private.
You do get to know that the vulnerability exists quickly, and you could choose to stop using OpenSSL altogether (among other mitigations) once that email goes out.
Yeah I worked in FAANG when we got the advance notice of a number of CVEs. Personally I think it's shady, I don't care how big Amazon or Google is, they shouldn't get special privileges because they are a large corporation.
I don't think the rationale is that they are a large corporation or have lots of money. It's that they have many, many, many more users that would be affected than most companies have.
I imagine they also have significant resources to contribute to dealing with breaches - eg, analysing past cookouts by the bad actor, designing mitigations, etc.
If OP is managing something that is critical to life - think fire suppression controllers, or computers that are connected to medical equipment, I think it becomes very difficult to compare that against financial assets.
At a certain scale, "economic" systems become critical to life. Someone who has sufficiently compromised a systemically-important bank can do things that would result in riots breaking out on the street all over a country.
You could use the EPA dollar to life conversion ratio.
Though anything actually potentially lethal shouldn't really have a standard Internet connection. E.g. nuclear power plants, trains, planes controls, heavy industrial equipment, nuclear weapons...
In that case OP should not design systems were a sshd compromise can have a life-threatening impact. Just because it's easier for everything to be controlled from the cloud doesn't mean that others need to feel sympathy when that turnes out to be as bad of an idea as everyone else has said.
a. Use commercial OS vendors who will push out fixes.
b. Set up a Continuous Integration process where everything is open source and is built from the ground up, with some reliance on open source platforms such as distros.
One needs different types of competence and IT Operational readiness in each approach.
> b. Set up a Continuous Integration process where everything is open source and is built from the ground up, with some reliance on open source platforms such as distros.
Whether its reasonable is debatable, but that type of time frame is pretty normal for things that aren't being actively exploited.
This situation is perhaps a little different as its not an accidental bug waiting to be discovered but an intentionally placed exploit. We know that a malicious person already knows about it.
Detecting a security issue is one thing. Detecting a malicious payload is something completely different. The latter has intent to exploit and must be addressed immediately. The former has at least some chance of noone knowing about it.
I think you have to take the credibility of the maintainer into account.
If it's a large company, made of people with names and faces, with a lot to lose by hacking its users, they're unlikely to abuse private disclosure. If it's some tiny library, the maintainers might be in on it.
Also, if there's evidence of exploitation in the wild, the embargo is a gift to the attacker. The existence of a vulnerability in that case should be announced, even if the specifics have to be kept under embargo.
In this case the maintainer is the one who deliberately introduced the backdoor. As Andres Freund puts it deadpan, "Given the apparent upstream involvement I have not reported an upstream bug."
> imho it depends on the vuln. I've given a vendor over a year, because it was a very low risk vuln.
But why? A year is a ridiculous time for fixing a vulnerability even a minor one. If a vendor is taking that long its because they don't prioritize security at all and are just dragging their feet.
I've always laughed my ass off at the idea of a disclosure window. It takes less than a day to find RCE that grants root privileges on devices that I've bothered to look at. Why on earth would I bother spending months of my time trying to convince someone to fix something?
If this question had a reliable (and public) answer then the world would be a very different place!
That said, this is an important question. We, particularly those us who work on critical infrastructure or software, should be asking ourselves this regularly to help prevent this type of thing.
Note that it's also easy (and similarly catastrophic) to swing too far the other way and approach all unknowns with automatic paranoia. We live in a world where we have to trust strangers every day, and if we lose that option completely then our civilization grinds to a halt.
But-- vigilance is warranted. I applaud these engineers who followed their instincts and dug into this. They all did us a huge service!
Yeah thanks for saying this; I agree. And as cliche as it is to look for a technical solution to a social problem, I also think better tools could help a lot here.
The current situation is ridiculous - if I pull in a compression library from npm, cargo or Python, why can that package interact with my network, make syscalls (as me) and read and write files on my computer? Leftpad shouldn’t be able to install crypto ransomware on my computer.
To solve that, package managers should include capability based security. I want to say “use this package from cargo, but refuse to compile or link into my binary any function which makes any syscall except for read and write. No open - if I want to compress or decompress a file, I’ll open the file myself and pass it in.” No messing with my filesystem. No network access. No raw asm, no trusted build scripts and no exec. What I allow is all you get.
The capability should be transitive. All dependencies of the package should be brought in under the same restriction.
In dynamic languages like (server side) JavaScript, I think this would have to be handled at runtime. We could add a capability parameter to all functions which issue syscalls (or do anything else that’s security sensitive). When the program starts, it gets an “everything” capability. That capability can be cloned and reduced to just the capabilities needed. (Think, pledge). If I want to talk to redis using a 3rd party library, I pass the redis package a capability which only allows it to open network connections. And only to this specific host on this specific port.
It wouldn’t stop all security problems. It might not even stop this one. But it would dramatically reduce the attack surface of badly behaving libraries.
The problem we have right now is that any linked code can do anything, both at build time and at runtime. A good capability system should be able to stop xz from issuing network requests even if other parts of the process do interact with the network. It certainly shouldn't have permission to replace crc32_resolve() and crc64_resolve() via ifunc.
Another way of thinking about the problem is that right now every line of code within a process runs with the same permissions. If we could restrict what 3rd party libraries can do - via checks either at build time or runtime - then supply chain attacks like this would be much harder to pull off.
I'm not convinced this is such a cure-all as any library must necessarily have the ability to "taint" its output. Like consider this library. It's a compression library. You would presumably trust it to decompress things right? Like programs? And then you run those programs with full permission? Oops..
It’s not a cure-all. I mean, we’re talking about infosec - so nothing is. But that said, barely any programs need the ability to execute arbitrary binaries. I can’t remember the last time I used eval() in JavaScript.
I agree that it wouldn’t stop this library from injecting backdoors into decompressed executables. But I still think it would be a big help anyway. It would stop this attack from working.
At the big picture, we need to acknowledge that we can’t implicitly trust opensource libraries on the internet. They are written by strangers, and if you wouldn’t invite them into your home you shouldn’t give them permission to execute arbitrary code with user level permissions on your computer.
I don’t think there are any one size fits all answers here. And I can’t see a way to make your “tainted output” idea work. But even so, cutting down the trusted surface area from “leftpad can cryptolocker your computer” to “Leftpad could return bad output” sounds like it would move us in the right direction.
Of course we need to trust people to some degree. There's an old Jewish saying - put your trust in god, but your money in the bank. I think its like that. I'm all for trusting people - but I still like how my web browser sandboxes every website I visit. That is a good idea.
We (obviously) put too much trust in little libraries like xz. I don't see a world in which people start using fewer dependencies in their projects. So given that, I think anything which makes 3rd party dependencies safer than they are now is a good thing. Hence the proposal.
The downside is it adds more complexity. Is that complexity worth it? Hard to say. Thats still worth talking about.
i guess the big opensource community should put a little bit more trust in statistics or integrate statistic evaluation in their decission making to use specific products in their supply chains.
This approach could work for dynamic libraries, but a lot of modern ecosystems (Go, Rust, Swift) prefer to distribute packages as source code that gets compiled with the including executable or library.
The goal is to restrict what included libraries can do. As you say, in languages like Rust, Go or Swift, the mechanism to do this would also need to work with statically linked code to work. And thats quite tricky, because there are no isolation boundaries between functions in executables.
It should still be possible to build something like this. It would just be inconvenient. In rust, swift and go you'd probably want to implement something like this at compile time.
In rust, I'd start by banning unsafe in dependencies. (Or whitelisting which projects are allowed to use unsafe code.) Then add special annotations on all the methods in the standard library which need special permissions to run. For example, File::open, fork, exec, networking, and so on. In cargo.toml, add a way to specify which permissions your child libraries get. "Import serde, but give it no OS permissions". When you compile your program, the compiler can look at the call tree of each function to see what actually gets called, and make sure the permissions match up. If you call a function in serde which in turn calls File::open (directly or indirectly), and you didn't explicitly allow that, the program should fail to compile.
It should be fine for serde to contain some utility function that calls the banned File::open, so long as the utility function isn't called.
Permissions should be in a tree. As you get further out in the dependency tree, libraries get fewer permissions. If I pass permissions {X,Y} to serde, serde can pass permission {X} to one of its dependencies in turn. But serde can't pass permission {Q} to its dependency - since it doesn't have that capability itself.
Any libraries which use unsafe are sort of trusted to do everything. You might need to insist that any package which calls unsafe code is actively whitelisted by the cargo.toml file in the project root.
>It should still be possible to build something like this. It would just be inconvenient.
Inconvenient is quite the understatement. Designing and implementing something like this for each and every language compiler/runtime requires hugely more effort than doing it on the OS level. The likelihood of mistakes is also far greater.
Perhaps it's worth exploring whether it can be done on the LLVM level so that at least some languages can share an implementation.
A process can do little to defend itself from a library it's using which has full access to its same memory. There is no security boundary there. This kind of backdoor doesn't hinge on IFUNC's existence.
Honestly, I don't have a lot of hope that we can fix this problem for C on linux. There's just so much historical cruft in present, spread between autotools, configure, make, glibc, gcc and C itself that would need to be modified to support capabilities.
The rule we need is "If I pull in library X with some capability set, then X can't do anything not explicitly allowed by the passed set of capabilities". The problem in C is that there is currently no straightforward way to firewall off different parts of a linux process from each other. And dynamic linking on linux is done by gluing together compiled artifacts - with no way to check or understand what assembly instructions any of those parts contain.
I see two ways to solve this generally:
- Statically - ie at compile time, the compiler annotates every method with a set of permissions it (recursively) requires. The program fails to compile if a method is called which requires permissions that the caller does not pass it. In rust for example, I could imagine cargo enforcing this for rust programs. But I think it would require some changes to the C language itself if we want to add capabilities there. Maybe some compiler extensions would be enough - but probably not given a C program could obfuscate which functions call which other functions.
- Dynamically. In this case, every linux system call is replaced with a new version which takes a capability object as a parameter. When the program starts, it is given a capability by the OS and it can then use that to make child capabilities passed to different libraries. I could imagine this working in python or javascript. But for this to work in C, we need to stop libraries from just scanning the process's memory and stealing capabilities from elsewhere in the program.
Or take the Chrome / original Go approach: load that code in a different process, use some kind of RPC. With all the context switch penalty... sigh, I think it is the only way, as the MMU permissions work at a page level.
Firefox also has its solution of compiling dependencies to wasm, then compiling the wasm back into C code and linking that. It’s super weird, but the effect is that each dependency ends up isolated in bounds checked memory. No context switch penalty, but instead the code runs significantly slower.
> We, particularly those us who work on critical infrastructure or software
We should also be asking ourselves if we are working on critical infrastructure. Lasse Collin probably did not consider liblzma being loaded by sshd when vetting the new maintainer. Did the xz project ever agree to this responsibility?
We should also be asking ourselfs if each dependency of critical infrastructure is worth the risk. sshd linking libsystemd just to write a few bytes into an open fd is absurd. libsystemd pulling in liblzma because hey it also does compressed logging is absurd. Yet this kind of absurd dependency bloat is everywhere.
We live in a time of populous, wealthy dictatorships that have computer-science expertise are openly hostile to the US and Canada.
North America is only about 5% of the world's population. [1] (We can assume that malicious actors are in North America, too, but this helps to adjust our perspective.)
The percentage of maliciousness on the Internet is much higher.
Huh? The empirical evidence we have - thanks to Snowden leaks - paints a different picture. NSA is the biggest malicious actor with nearly unlimited resources at hand. They even insert hardware backdoors and intercept shipment to do that.
Honestly it seems like a state-based actor hoping to get whatever high value target compromised before it's made public. Reporting privately buys them more time, and allows them to let handlers know when the jig is up.
I've long since said that if you want to hide something nefarious you'd do that in the GNU autoconf soup (and not in "curl | sh" scripts).
Would be interesting to see what's going on here; the person who did the releases has done previous releases too (are they affected?) And has commits going back to 2022 – relatively recent, but not that recent. Many are real commits with real changes, and they have commits on some related projects like libarchive. Seems like a lot of effort just to insert a backdoor.
Edit: anyone with access can add files to existing releases and it won't show that someone else added it (I just tested). However, the timestamp of the file will be to when you uploaded it, not that of the release. On xz all the timestamps of the files match with the timestamp of the release (usually the .tar.gz is a few minutes earlier, which makes sense). So looks like they were done by the same person who did the release. I suspected someone else might have added/altered the files briefly after the release before anyone noticed, but that doesn't seem to be the case.
Well, it is much easier said than done. Philosophically I agree, but in the real world where you have later commits that might break and downstream projects, etc, it isn't very practical. It strikes me as in a similar vein to high school students and beauty pageant constestants calling for world peace. Really great goal, not super easy to implement.
I would definitely be looking at every single commit though and if it isn't obviously safe I'd be drilling in.
Some of those commits might fix genuine vulnerabilities. So you might trade a new backdoor for an old vulnerability that thousands of criminal orgs have bots for exploiting.
Damage wise, most orgs aren't going to be hurt much by NSA or the Chinese equivalent getting access, but a Nigerian criminal gang? They're far more likely to encrypt all your files and demand a ransom.
Still.. At this point the default assumption should be every commit is a vulnerability or facilitating a potential vulnerability.
For example, change from safe_fprintf to fprintf. It would be appropriate that every commit should be reviewed and either tweaked or re-written to ensure the task is being done in the safest way and doesn't have anything that is "off" or introducing a deviation from the way that codebase standardly goes about tasks within functions.
randomly reverting two years of things across dozens of repositories will break them, almost definitely make them unbuildable, but also make them unreleasable in case any other change needs to happen soon.
all of their code needs to be audited to prove it shouldn't be deleted, of course, but that can't happen in the next ten minutes.
I swear that HN has the least-thought-through hot takes of any media in the world.
Not really. xz worked fine 2 years ago. Roll back to 5.3.1 and apply a fix for the 1 security hole that was fixed since that old version. (ZDI-CAN-16587)
Hoe will you do that practically though? That’s probably thousands of commits upon which tens or hundred thousand commits from others were built. You can’t just rollback everything two years and expect it not to break or bring back older vulnerabilities that were patched in those commits.
Couldn't the autoconf soup be generated from simpler inputs by the CI/CD system to avoid this kind of problem? Incomprehensible soup as a build artifact (e.g. executables) is perfectly normal, but it seems to me that such things don't belong in the source code.
(This means you too, gradle-wrapper! And your generated wrapper for your generated wrapper. That junk is not source code and doesn't belong in the repo.)
Yes, it's usually regenerated already. However even the source is often pretty gnarly.
And in general, the build system of a large project is doing a lot of work and is considered pretty uninteresting and obscure. Random CMake macros or shell scripts would be just as likely to host bad code.
This is also why I like meson, because it's much more constrained than the others and the build system tends to be more modular and the complex parts split across multiple smaller, mostly independent scripts (written in Python or bash, 20-30 lines max). It's still complex, but I find it easier to organize.
> And in general, the build system of a large project is doing a lot of work and is considered pretty uninteresting and obscure. Random CMake macros or shell scripts would be just as likely to host bad code.
Build systems can even have undefined behaviour in the C++ sense. For example Conan 2 has a whole page on that.
The other thing besides the autoconf soup is the XZ project contains incomprehensible binaries as "test data"; the "bad-3-corrupt_lzma2.xz"
part of the backdoor that they even put in the repo.
It's entirely possible they could have got that injection through review, even if they had that framwork and instead put it in source files used to generate autoconf soup.
gradle-wrapper is just a convenience, you can always just build the project with an installed version of gradle.
Although I get your point, it’s a great place to hide nefarious code.
Pure speculation but my guess is a specific state actor ahem is looking for developers innocently working with open source to then strongarm them into doing stuff like this.
many people are patriots of their countries. if state agency would approach them proposing to have paid OSS work and help their country to fight terrorism/dictatorships/capitalists/whatever-they-believe, they will feel like killing two birds with one job
While this seems plausible, it is notable that this person seems to be anonymous from the get go. Most open source maintainers are proud of their work and maintain publicly available personas.
While I don't doubt there are people who would gladly do this work for money/patriotism/whatever, adding a backdoor to your own project isn't really reconcilable with the motivations behind wanting to do OSS work.
One thing that is annoying is that many open source projects have been getting "garbage commits" apparently from people looking to "build cred" for resumes or such.
Easier and easier to hide this junk in amongst them.
I mean, a backdoor at this scale (particularly if it wasn't noticed for a while and got into stable distros) could be worth millions. Maybe hundreds of millions (think of the insider trading possibilities alone, not to mention espionage). 2 years doesn't seem like that much work relative to the potential pay off.
This is the sort of case where america's over the top hacking laws make sense.
Maybe I'm miss understanding things, but it seems like anyone can publish an exploit on the internet without being a crime. In the same way encryption is free speech.
It would seem unlikely this guy would be also logging into peoples boxes after this.
It seems a much tougher job to link something like this to an intentional unauthorized access.
At this point, we have no confirmed access via compromise.
Do you know of a specific case where the existence of a backdoor has been prosecuted without a compromise?
Who would have standing to bring this case? Anyone with a vulnerable machine? Someone with a known unauthorized access. Other maintainers of the repo?
IANAL but it is unclear that a provable crime has been committed here
It's not worth your time or the reader's time trying to come up with a technicality to make it perfectly legal to do something we know little about, other than it's extremely dangerous.
Law isn't code, you gotta violate some pretty bedrock principles to pull off something like this and get away with it.
Yes, if you were just a security researcher experimenting on GitHub, it's common sense you should get away with it*, and yes, it's hard to define a logical proof that ensnares this person, and not the researcher.
* and yes, we can come up with another hypothetical where the security researcher shouldn't get away with it. Hypotheticals all the way down.
1. It should be legal to develop or host pen-testing/cracking/fuzzing/security software that can break other software or break into systems. It should be illegal to _use_ the software to gain _unauthorised_ access to others' systems. (e.g. it's legal to create or own lockpicks and use them on your own locks, or locks you've been given permission to pick. It's not legal to gain unauthorised access _using_ lockpicks)
2. It should be illegal to develop malware that _automatically_ gains unauthorised access to systems (trojans, viruses, etc.). However, it should be legal to maintain an archive of malware, limiting access to vetted researchers, so that it can be studied, reverse-engineered and combatted. (e.g. it's illegal to develop or spread a bioweapon, but it's ok for authorised people to maintain samples of a bioweapon in order to provide antidotes or discover what properties it has)
3. What happened today: It should be illegal to intentionally undermine the security of a project by making bad-faith contributions to it that misrepresent what they do... even if you're a security researcher. It could only possibly be allowed done if an agreement was reached in advance with the project leaders to allow such intentional weakness-probing, with a plan to reveal the deception and treachery.
Remember when university researchers tried to find if LKML submissions could be gamed? They didn't tell the Linux kernel maintainers they were doing that. When the Linux kernel maintainers found out, they banned the entire university from making contributions and removed everything they'd done.
No, people being polite and avoiding the more direct answer that'd make people feel bad.
The rest of us understand that intuitively, and that it is already the case, so pretending there was some need to work through it, at best, validates a misconception for one individual.
Less important, as it's mere annoyance rather than infohazard: it's wildly off-topic. Legal hypotheticals where a security researcher released "rm -rf *" on GitHub and ended up in legal trouble is 5 steps downfield even in this situation, and it is a completely different situation. Doubly so when everyone has to "IANAL" through the hypotheticals.
> but it seems like anyone can publish an exploit on the internet without being a crime
Of course. The mere publishing of the exploit is not the criminal part. Its the manner & intent in which it was published that is the problem.
> At this point, we have no confirmed access via compromise.
While i don't know the specifics for this particular law, generally it doesn't matter what you actually did. What is relavent is what you tried to do. Lack of success doesn't make you innocent.
> Who would have standing to bring this case?
The state obviously. This is a criminal matter not a civil one. You don't even need the victim's consent to bring a case.
By this logic you could say that leaving a poisoned can of food in a public pantry is not a crime because poison is legal for academic purposes, and whoever ate it took it willingly.
Also, I think getting malicious code into a repo counts as a compromise in and of itself.
Yeah this was my first thought too. Though I think the case against autoconf is already so overwhelming I think anyone still using it is just irredeemable; this isn't going to persuade them.
For those panicking, here are some key things to look for, based on the writeup:
- A very recent version of liblzma5 - 5.6.0 or 5.6.1. This was added in the last month or so. If you're not on a rolling release distro, your version is probably older.
- A debian or RPM based distro of Linux on x86_64. In an apparent attempt to make reverse engineering harder, it does not seem to apply when built outside of deb or rpm packaging. It is also specific to Linux.
- Running OpenSSH sshd from systemd. OpenSSH as patched by some distros only pulls in libsystemd for logging functionality, which pulls in the compromised liblzma5.
Debian testing already has a version called '5.6.1+really5.4.5-1' that is really an older version 5.4, repackaged with a newer version to convince apt that it is in fact an upgrade.
It is possible there are other flaws or backdoors in liblzma5, though.
Focusing on sshd is the wrong approach. The backdoor was in liblzma5. It was discovered to attack sshd, but it very likely had other targets as well. The payload hasn't been analyzed yet, but _almost everything_ links to libzma5. Firefox and Chromium do. Keepassxc does. And it might have made arbitrary changes to your system, so installing the security update might not remove the backdoor.
From what I'm understanding it's trying to patch itself into the symbol resolution step of ld.so specifically for libcrypto under systemd on x86_64. Am I misreading the report?
That's a strong indication it's targeting sshd specifically.
Lots of software links both liblzma and libcrypto. As I read Andres Freund's report, there is still a lot of uncertainty:
"There's lots of stuff I have not analyzed and most of what I observed is purely from observation rather than exhaustively analyzing the backdoor code."
I did a quick diff of the source (.orig file from packages.ubuntu.com) and the content mostly matched the 5.4.5 github tag except for Changelog and some translation files. It does match the tarball content, though.
So for 5.4.5 the tagged release and download on github differ.
It does change format strings, e.g.
+#: src/xz/args.c:735
+#, fuzzy
+#| msgid "%s: With --format=raw, --suffix=.SUF is required unless writing to stdout"
+msgid "With --format=raw, --suffix=.SUF is required unless writing to stdout"
+msgstr "%s: amb --format=raw, --suffix=.SUF és necessari si no s'escriu a la sortida estàndard"
There is no second argument to that printf for example. I think there is at least a format string injection in the older tarballs.
FYI, your formatting is broken. Hacker News doesn't support backtick code blocks, you have to indent code.
Anyway, so... the xz project has been compromised for a long time, at least since 5.4.5. I see that this JiaT75 guy has been the primary guy in charge of at least the GitHub releases for years. Should we view all releases after he got involved as probably compromised?
My TLDR is that I would regard all commits by JiaT75 as potentially compromised.
Given the ability to manipulate gitnhistory I am not sure if a simple time based revert is enough.
It would be great to compare old copies of the repo with the current state. There is no guarantee that the history wasn't tampered with.
Overall the only safe action would IMHO to establish a new upstream from an assumed good state, then fully audit it. At that point we should probably just abandon it and use zstd instead.
Zstd belongs to the class of speed-optimized compressors providing “tolerable” compression ratios. Their intended use case is wrapping some easily compressible data with negligible (in the grand scale) performance impact. So when you have a server which sends gigabits of text per second, or caches gigabytes of text, or processes a queue with millions of text protocol messages, you can add compression on one side and decompression on the other to shrink them without worrying too much about CPU usage.
Xz is an implant of 7zip's LZMA(2) compression into traditional Unix archiver skeleton. It trades long compression times and giant dictionaries (that need lots of memory) for better (“much-better-than-deflate”) compression ratios. Therefore, zstd, no matter how fashionable that name might be in some circles, is not a replacement for xz.
It should also be noted that those LZMA-based archive formats might not be considered state-of-the-art today. If you worry about data density, there are options for both faster compression at the same size, and better compression in the same amount of time (provided that data is generally compressible). 7zip and xz are widespread and well tested, though, and allow decompression to be fast, which might be important in some cases. Alternatives often decompress much slowly. This is also a trade-off between total time spent on X nodes compressing data, and Y nodes decompressing data. When X is 1, and Y is in the millions (say, software distribution), you can spend A LOT of time compressing even for relatively minuscule gains without affecting the scales.
It should also be noted that many (or most) decoders of top compressing archivers are implemented as virtual machines executing chains of transform and unpack operations defined in archive file over pieces of data also saved there. Or, looking from a different angle, complex state machines initializing their state using complex data in the archive. Compressor tries to find most suitable combination of basic steps based on input data, and stores the result in the archive. (This is logically completed in neural network compression tools which learn what to do with data from data itself.) As some people may know, implementing all that byte juggling safely and effectively is a herculean task, and compression tools had exploits in the past because of that. Switching to a better solution might introduce a lot more potentially exploited bugs.
Arch Linux switched switched from xz to zstd, with neglectable increase in size (<1%) but massive speedup on decompression. This is exactly the use case of many people downloading ($$$) and decompressing. It is the software distribution case. Other distributions are following that lead.
You should use ultra settings and >=19 as the compression level. E.g. arch used 20 and higher compression levels do exist, but they were already at a <1% increase.
It does beat xz for these tasks. It's just not the default settings as those are indeed optimized for the lzo to gzip/bzip2 range.
My bad, I was too focused on that class in general, imagining “lz4 and friends”.
Zstd does reach LZMA compression ratios on high levels, but compression times also drop to LZMA level. Which, obviously, was clearly planned in advance to cover both high speed online applications and slow offline compression (unlike, say, brotli). Official limit on levels can also be explained by absence of gains on most inputs in development tests.
Distribution packages contain binary and mixed data, which might be less compressible. For text and mostly text, I suppose that some old style LZ-based tools can still produce an archive roughly 5% percent smaller (and still unpack fast); other compression algorithms can certainly squeeze it much better, but have symmetric time requirements. I was worried about the latter kind being introduced as a replacement solution.
bzip2 is a pig that has no place being in the same sentence as lzo and gzip. It's nieche was maximum compression no matter the speed but it hasn't been relevant even there for a long time.
Yet tools still need to support bzip2 because bzip2 archives are still out there and are still being produced. So we can't get rid of libbz2 anytime soon - same for liblzma.
Note that the xz CLI does not expose all available compression options of the library. E.g. rust release tarballs are xz'd with custom compression settings.
But yeah, zstd is good enough for many uses.
> Given the ability to manipulate gitnhistory I am not sure if a simple time based revert is enough.
Rewritten history is not a real concern because it would have been immediately noticed by anyone updating an existing checkout.
> Overall the only safe action would IMHO to establish a new upstream from an assumed good state, then fully audit it. At that point we should probably just abandon it and use zstd instead.
This is absurd and also impossible without breaking backwards compatibility all over the place.
> Debian testing already has a version called '5.6.1+really5.4.5-1' that is really an older version 5.4, repackaged with a newer version to convince apt that it is in fact an upgrade.
Debian has epochs, but it's a bad idea to use them for this purpose.
Two reasons:
1. Once you bump the epoch, you have to use it forever.
2. The deb filename often doesn't contain the epoch (we use a colon which isn't valid on many filesystems), so an epoch-revert will give the same file name as pre-epoch, which breaks your repository.
So, the current best practice is the +really+ thing.
Honestly, the Gentoo-style global blacklist (package.mask) to force a downgrade is probably a better approach for cases like this. Epochs only make sense if your upstream is insane and does not follow a consistent numbering system.
Gentoo also considers the repository (+overlays) to be the entire set of possible versions so simply removing the bad version will cause a downgrade, unlike debian and RPM systems where installing packages outside a repository is supported.
Stop the cap your honor. There is not a single filesystem that prevents you from using colons in filenames except exfat, I went ahead and checked and ext4, xfs, btrfs, zfs, and even reiserfs let you use any characters you want except \0 and /.
And I fail to see why bumping the epoch would ever be a problem. Using the epoch not a reason why its bad.
.deb has epochs too, but I think Debian developers avoid it where possible because 1:5.4.5 is interpreted as newer than anything without a colon, so it would break eg. packages that depend on liblzma >= 5.0, < 6. There may be more common cases that aren't coming to mind now.
Seems like debian is mixing too many things into the package version - version used for deciding on upgrades and abi version for dependencies should be decoupled like it is in modern RPM distros.
If a binary library ABI is backwards-incompatible, they change the package name. I was just guessing at the reason epoch is avoided and that <6 is probably an awful example.
So now I actually bothered to look it up, and it turns out the actual reason is that the epoch changes what version is considered "greater", but it's not part of the .deb filename, so you still can't reuse version numbers used in the past. If you release 5.0, then 5.1, then you want to rollback and release 1:5.0, it's going to break things in the Debian archives. https://www.debian.org/doc/debian-policy/ch-binary.html#uniq...
I really like the XBPS way of the reverts keyword in the package template that forces a downgrade from said software version. It's simple but works without any of the troubles RPM epochs have with resolving dependencies as it's just literally a way to tell xbps-install that "yeah, this is a lower version number in the repository but you should update anyway".
> If you're not on a rolling release distro, your version is probably older.
Ironic considering security is often advertised as a feature of rolling release distros. I suppose in most instances it does provide better security, but there are some advantages to Debian's approach (stable Debian, that is).
The article gives a link to a simple shell script that detects the signature of the compromised function.
> Running OpenSSH sshd from systemd
I think this is irrelevant.
From the article: "Initially starting sshd outside of systemd did not show the slowdown, despite the backdoor briefly getting invoked." If I understand correctly the whole section, the behavior of OpenSSH may have differed when launched from systemd, but the backdoor was there in both cases.
Maybe some distributions that don't use systemd strip the libxz code from the upstream OpenSSH release, but I wouldn't bet on it if a fix is available.
> From the article: "Initially starting sshd outside of systemd did not show the slowdown, despite the backdoor briefly getting invoked." If I understand correctly the whole section, the behavior of OpenSSH may have differed when launched from systemd, but the backdoor was there in both cases.
It looks like the backdoor "deactivates" itself when it detects being started interactively, as a security researcher might. I was eventually able to circumvent that, but unless you do so, it'll not be active when started interactively.
However, the backdoor would also be active if you started it with an shell script (as the traditional sys-v rc scripts did) outside the context of an interactive shell, as TERM wouldn't be set either in that context.
> Maybe some distributions that don't use systemd strip the libxz code from the upstream OpenSSH release, but I wouldn't bet on it if a fix is available.
> Maybe some distributions that don't use systemd strip the libxz code from the upstream OpenSSH release, but I wouldn't bet on it if a fix is available.
OpenSSH is developed by the OpenBSD project, and systemd is not compatible with OpenBSD. The upstream project has no systemd or liblzma code to strip. If your sshd binary links to liblzma, it's because the package maintainers for your distro have gone out of their way to add systemd's patch to your sshd binary.
> From the article: "Initially starting sshd outside of systemd did not show the slowdown, despite the backdoor briefly getting invoked." If I understand correctly the whole section, the behavior of OpenSSH may have differed when launched from systemd, but the backdoor was there in both cases.
From what I understand, the backdoor detects if it's in any of a handful of different debug environments. If it's in a debug environment or not launched by systemd, it won't hook itself up. ("nothing to see here folks...") But if sshd isn't linked to liblzma to begin with, none of the backdoor's code even exists in the processes' page maps.
I'm still downgrading to an unaffected version, of course, but it's nice to know I was never vulnerable just by typing 'ldd `which sshd`' and not seeing liblzma.so.
I think the distributions that do use systemd are the ones that add the libsystemd code, which in turn brings in the liblzma5 code. So, it may not be entirely relevant how it is run, but it needs to be a version of OpenSSH patched.
I did notice that my debian-based system got noticeably slower and unresponsive at times the last two weeks, without obvious reasons. Could it be related?
I read through the report, but what wasn't directly clear to me was: what does the exploit actually do?
My normal internet connection has such an appalling upload that I don't think anything relevant could be uploaded. But I will change my ssh keys asap.
> I did notice that my debian-based system got noticeably slower and unresponsive at times the last two weeks, without obvious reasons. Could it be related?
Possible but unlikely.
> I read through the report, but what wasn't directly clear to me was: what does the exploit actually do?
It injects code that runs early during sshd connection establishment. Likely allowing remote code execution if you know the right magic to send to the server.
I hope Lasse Collin is doing OK! Here is a older message from him [1]
"I haven't lost interest but my ability to care has been fairly limited
mostly due to longterm mental health issues but also due to some other
things. Recently I've worked off-list a bit with Jia Tan on XZ Utils and
perhaps he will have a bigger role in the future, we'll see.
It's also good to keep in mind that this is an unpaid hobby project.
"
Github (Microsoft) are in a unique position to figure out if his account is hacked or not, and find a way to reach him. I hope they reach out and offer him some proper support! Economic support (if that's needed), or just help clearing his name.
This is another tale of how we are building multi trillion dollar industries on the back of unpaid volunteers. It's not github 'job', and many other organisations have benefited even more from Lasses work, but they are in a unique position, and would be literally pocket change for them.
In a movie his mental health issues would likely have been caused intentionally by the attacker, setting the stage for the mole to offer to step in just at the right time. Seems a bit far fetched in this case though for what looks like a tangential attack.
Does it? I expect that finding someone vulnerable was the more likely approach rather than messing with the life of a stable maintainer, but it does seem very much like the attacker was acting with malicious intent from the start of his interaction with the xz project.
I would like to see more attention given to this. I'm capable of compartmentalization and not over-guilting myself, but holy hell, I really hope he's doing alright. This would kind of destroy me.
I was actually telling my dad about this. I have a project, 500+ users, not quite root access, but enough to cause serious damage. I can think of at least one covert way to backdoor the binary artifacts from it.
About two years ago, someone showed up, started making good commits. In this case, they have some other community rep that goes back a bit further but... man it's an unsettling feeling.
teach me how.
help me learn how, please.
any resources with practical utility you can share?
or any class of therapists that are good at teaching this with right frameworks offered?
thank you
A couple of years ago I wrote a Go library that wraps the xz C code and allows you to do xz compression in Go: https://github.com/jamespfennell/xz
About a week ago I received the first PR on that repo, to upgrade to 5.6.1. I thought it was odd to get such a random PR...it's not the same GitHub account as upstream though.
As a bit of an aside, I would never accept a PR like this, and would always update $large_vendored_dependency myself. This is unreviewable, and trivial to insert any backdoor (unless you go through the motions of updating it yourself and diffing, at which point the PR becomes superfluous). I'd be wary even from a well-known author unless I knew them personally on some level (real-life or via internet). Not that I wouldn't trust them, but people's machines or accounts can get compromised, people can have psychotic episodes, things like that. At the very least I'd like to have some out-of-band "is this really you?" signal.
This is how I once inserted a joke in one of our (private) repos that would randomly send cryptic messages to our chat channel. This was pretty harmless and just a joke (there's some context that made it funny), but it took them years to find it – and that was only because I told them after I quit.
That said, looking at the GitHub account I'd be surprised if there's anything nefarious going on here. Probably just someone using your repo, seeing it's outdated, and updating it.
The (most?) popular SQLite driver for Go often gets PRs to update the SQLite C amalgamation, which the owner politely declines (and I appreciate him for that stance, and for taking on the maintenance burden it brings).
In this case, the project is using Git submodules for its vendored dependencies, so you can trivially cryptographically verify that they have vendored the correct dependency just by checking the commit hash. It looks really crazy on Github but in most git clients it will just display the commit hash change.
The dopamine hits from updating stuff should come to an end, it should be thought of as adding potentially new bugs or exploits, unless the update fixes a CVE. Also Github needs to remove the green colors and checkmarks in PR's to prevent these dopamine traps from overriding any critical thinking
Counterpoint: if you wait to keep things up to date until there's a CVE, there's a higher likelihood that things will break doing such a massive upgrade, and this may slow down a very time-sensitive CVE response. Allowing people to feel rewarded for keeping things up to date is not inherently a bad thing. As with all things, the balance point will vary from project to project!
Exactly. You don’t want to be bleeding edge (churn, bugs) but in general you usually don’t want to be on the oldest supported version either (let alone unsupported).
Risk/reward depends on the usecase of course. For a startup I’d be on the .1 version of the newest major version (never .0) if there are new features I want. For enterprise, probably the oldest LTS I can get away with.
I strongly disagree. If you don’t update your dependencies then it’s easy to lose the institutional knowledge of how to update them, and who actually owns that obscure area of your code base that depends on them. Then you get a real CVE and have to work out everything in a hurry.
If you have a large code base and organisation then keep doing those upgrades so it won’t be a problem when it really matters. If it’s painful, or touches too many areas of the code you’ll be forced to refactor things so that ceases to be a problem, and you might even manage to contain things so well that you can swap implementations relatively easily when needed.
To be honest, I probably wouldn't have noticed the comments on the PR if it wasn't for that since my Github notifications are an absolute mess. Thankfully, my employer has been super supportive throughout this :D
I don't want to read too much into it, but the person (supposedly) submitting the PR seems to work at 1Password since December last year, as per his Linkedin. (And his Linkedin page has a link to the Github profile that made the PR).
They're definitely a real person. I know cause that "1Password employee since December" is a person I know IRL and worked with for years at their prior employer. They're not a no-name person or a fake identity just FYI. Please don't be witch hunting; this genuinely looks like an unfortunate case where Jared was merely proactively doing their job by trying to get an externally maintained golang bindings of XZ to the latest version of XZ. Jared's pretty fantastic to work with and is definitely the type of person to be filing PRs on external tools to get them to update dependencies. I think the timing is comically bad, but I can vouch for Jared.
No, this account made a PR and their commits were signed [1]. Take a look at their other repositories, e.g. they did AoC 2023 in Rust and published it, the commits in that repository are signed by the same key. So this is not (just) a GitHub account compromise.
I find this aspect to be an outlier, the other attacker accounts were cutouts. So this doesn't quite make sense to me.
I am *not* a security researcher, nor a reverse engineer. There's lots of
stuff I have not analyzed and most of what I observed is purely from
observation rather than exhaustively analyzing the backdoor code.
I love this sort of technical writing from contributors outside the mainstream debugging world who might be averse to sharing. What an excellently summarized report of his findings that should be seen as a template.
FWIW, it felt intimidating as hell. And I'm fairly established professionally. Not sure what I'd have done earlier in my career (although I'd probably not have found it in the first place).
> Not sure what I'd have done earlier in my career
To anybody in this sorta situation, you should absolutely share whatever you have. It doesn’t need to be perfect, good, or 100% accurate, but if there’s a risk you could help a lot of people
This story is an incredible testament to how open-source software can self-regulate against threats, and more broadly, it reminds us that we all stand on the shoulders of contributors like you. Thank you!
Honestly, you only get this kind of humility when you're working with absolute wizards on a consistent basis. That's how I read that whole analysis. Absolutely fascinating.
This guy's interactions seem weird but it might just be because of the non-native english or a strange attitude, or he's very good at covering his track e.g. found a cpython issue where he got reprimanded for serially opening issues: https://github.com/python/cpython/issues/115195#issuecomment...
If I saw that on a $dayjob project I'd pit him as an innocuous pain in the ass (overly excited, noisy, dickriding).
Here's a PR from 2020 where he recommends / requests the addition of SCRAM to an SMTP client: https://github.com/marlam/msmtp/issues/36 which is basically the same thing as the PR you found. The linked documents seem genuine, and SCRAM is an actual challenge/response authentication method for a variety of protocols (in this case mostly SMTP, IMAP, and XMPP): https://en.wikipedia.org/wiki/Salted_Challenge_Response_Auth...
Although, and that's a bit creepy, he shows up in the edition history for the SCRAM page, the edit mostly seem innocent though he does plug his "state of play" github repository.
What? They're just asking for some features there?
Ya'll need to calm down; this is getting silly. Half the GitHub accounts look "suspicious" if you start scrutinizing everything down the the microscopic detail.
Hey, I remember this guy! Buddy of someone who tried to get a bunch of low quality stuff into ifupdown-ng, including copying code with an incompatible license and removing the notice. He's in every PR, complaining the "project is dead". He even pushes for the account to be made "team member".
The PR + angry user pushing for the PR author to gain commit access spiel is definitely suspiciously similar to what happened with xz-utils. Possible coincidence but worth investigating further.
Imagine a more competent backdoor attempt on xz(1)—one that wouldn't have been noticed this quickly. xz is everywhere. They could pull off a "reflections on trusting trust": an xz which selectively modifies a tiny subset of the files it sees, like .tar.xz software tarballs underlying certain build processes. Not source code tarballs (someone might notice)—tarballs distributing pre-compiled binaries.
edit to add: Arch Linux' entire package system used to run on .tar.xz binaries (they switched to Zstd a few years ago [0]).
clickhouse has pretty good github_events dataset on playground that folks can use to do some research - some info on the dataset https://ghe.clickhouse.tech/
Yeah. It would be interesting to see who adopted to the compromised versions and how quickly, compared to how quickly they normally adopt new versions (not bots pulling upgrades, but how quickly maintainers approve and merge them)
If there were a bunch of people who adopted it abnormally fast compared to usual, might point to there being more "bad actors" in this operation (said at the risk of sounding paranoid if this turns out to be a state run thing)
> If the code is complex, there could be many more exploits hiding.
Then the code should not be complex. Low-level hacks and tricks (like pointer juggling) should be not allowed and simplicity and readability should be preferred.
Yes, but my point was that at the level of performance tools like this are expected to operate at, it’s highly probable that you’ll need to get into incredibly esoteric code. Look at ffmpeg – tons of hand-written Assembly, because they need it.
To be clear, I have no idea how to solve this problem; I just don’t think saying that all code must be non-hacky is the right approach.
Performance can be bought with better hardware. It gets cheaper and cheaper every year. Trustworthiness cannot be purchased in the same way. I do not understand why performance would ever trumph clean code, especially for for code that processes user provided input.
This attitude is how we get streaming music players that consume in excess of 1 GiB of RAM.
Performant code needn’t be unclean; it’s just often using deeper parts of the language.
I have a small project that became absolute spaghetti. I rewrote it to be modular, using lots of classes, inheritance, etc. It was then slower, but eminently more maintainable and extensible. I’m addressing that by using more advanced features of the language (Python), like MemoryView for IPC between the C libraries it calls. I don’t consider this unclean, but it’s certainly not something you’re likely to find on a Medium article or Twitter take.
I value performant code above nearly everything else. I’m doing this for me, there are no other maintainers, and it’s what I enjoy. You’re welcome to prioritize something else in your projects, but it doesn’t make other viewpoints objectively worse.
Performant code does not need to be unclean, exactly! My original point was just to not put performance on a pedestal. Sure, prioritize it, but correct and clean should come first - at least for foundational libraries that others are supposed to build upon.
I maintain ML libraries that run on microcontrollers with kilobytes of memory, performance is a friend of mine ;)
I suggest you run your browsers Javascript engine in interpreter mode to understand how crippling the simple and sraight forward solution is to performance.
I guess because at server farm level, performance/efficiency translate to real million USD savings. In general, at scale ends (the cloud and the embedded) this matters a lot. In resource limited environments like raspberry pi, this design philosophy wins over many users between DIY and the private industry.
I hate this argument. If current hardware promises you a theoretical throughput of 100 MB/s for an operation, someone will try to hit that limit. Your program that has no hard to understand code but gives me 5 MB/s will loose in the face of a faster one, even if that means writing harder to understand code.
No, but often it is far worse than 95%. A good example is random.randint() vs math.ceil(random.random() * N) in Python. The former is approximately 5x slower than the latter, but they produce effectively the same result with large enough values of N. This isn’t immediately apparent from using them or reading docs, and it’s only really an issue in hot loops.
Another favorite of mine is bitshifting / bitwise operators. Clear and obvious? Depends on your background. Fast as hell? Yes, always. It isn’t always needed, but when it is, it will blow anything else out of the water.
Bitwise is highly context dependent. There are simple usages like shifts to divide/multiply by 2. Idiomatic patterns that are clean when wrapped in good reusable and restricted macros, like for common registers manipulation in microcontrollers. And other uses that are anything from involuntary obfuscation to competition grade obfuscation.
> There are simple usages like shifts to divide/multiply by 2.
Clean code should not do that as the compiler will do that.
Clean code should just say what it wants to do, not replace that with low-level performance optimizations. (Also wasn't performance to be obtained from newer hardware?)
Faster and more complex hardware can also have bugs or back doors, as can cheaper hardware. That said, I'm not happy with buggy and untrustworthy code either.
If this is a conspiracy or a state-sponsored attack, they might have gone specifically for embedded devices and the linux kernel. Here archived from tukaani.org:
> XZ Embedded is a relatively small decompressor for the XZ format. It was developed with the Linux kernel in mind, but is easily usable in other projects too.
> *Features*
> * Compiled code 8-20 KiB
> [...]
> * All the required memory is allocated at initialization time.
This is targeted at embedded and real-time stuff. Could even be part of boot loaders in things like buildroot or RTEMS. And this means potentially millions of devices, from smart toasters or toothbrushes to satellites and missiles which most can't be updated with security fixes.
One scenario for malicious code in embedded devices would be a kind of killswitch which listens to a specific byte sequence and crashes when encountering it. For a state actor, having such an exploit would be gold.
One of my complaints about so many SciFi stories is the use of seemingly conventional weapons. I always thought that with so much advanced technology that weapons would be much more sophisticated. However if the next "great war" is won not by the side with the most destructive weapons but by the side with the best kill switch, subsequent conflicts might be fought with weapons that did not rely on any kind of computer assistance.
This is eerily similar to Einstein's (purported) statement that if World War III was fought with nuclear weapons, World War IV would be fought with sticks and stones. Similar, but for entirely different reasons.
I'm trying to understand why the characters in Dune fought with swords, pikes and knives.
> I'm trying to understand why the characters in Dune fought with swords, pikes and knives.
At least part of the reason is that the interaction between a lasgun and a shield would cause a powerful explosion that would kill the shooter too. No one wants that and no one will give up their shield, so they had to go back to melee weapons.
No, there is a in-world reason at least for no drones. Wikipedia:
> However, a great reaction against computers has resulted in a ban on any "thinking machine", with the creation or possession of such punishable by immediate death.
tl;dr - Machine intelligences existed in Dune history, were discovered to be secretly controlling humanity (through abortion under false pretenses, forced sterilization, emotional/social control, and other ways), then were purged and replaced with a religious commandment: "Thou shalt not make a machine in the likeness of a human mind"
No, and there is a (piloted) drone attack in the first book -- Paul is attacked by a hunter-seeker.
The reason nobody tries to use the lasgun-shield interaction as a weapon is because the resulting explosion is indistinguishable from a nuclear weapon, and the Great Convention prohibits the use of nukes on human targets.
Just the perception of having used a nuclear device would result in the House which did so becoming public enemy #1 and being eradicated by the Landsraad and Sardaukar combined.
@Potro: If you liked the movie, read the books. I don't read a lot anymore, but during sick leave I started with the first book. Didn't stop until I finished the main story, including the sequels by Frank Herbert's son about a month later. That's like... uh... nine books?
In the book Paul is attacked by an insect drone while in his room. The drone was controlled by a Harkonnen agent placed weeks in anticipation inside a structure of the palace so it was also a suicide attack as the agent had no chance to escape and would die of hunger/thirsty if not found.
It's related to excessive coupling between modules and low coherence.
There is a way for programs to implement the systemd readiness notification protocol without using libsystemd, and thus without pulling in liblzma, which is coupled to libsystemd even though the readiness notification protocol does not require any form of compression. libsystemd provides a wide range of things which have only weak relationships to each other.
There are in fact two ways, as two people independently wrote their own client code for the systemd readiness notification protocol, which really does not require the whole of libsystemd and its dependencies to achieve. (It might be more than 2 people nowadays.)
This is only evidence that libsystemd is popular. If you want to 0wn a bunch of systems, or even one particular system but make it non-obvious, you choose a popular package to mess with.
BeOS isn't getting a lot of CVEs attached to it, these days. That doesn't mean its good or secure, though.
It's easy to have your existing biases validated if you already dislike systemd. The reality is that systemd is much more coherently designed than its predecessors from a 'end user interface' point of view, hence why its units are largely portable etc. which was not the case for sysvinit.
The reality is that it is not systemd specifically but our modern approach to software design where we tend to rely on too much third party code and delight in designing extremely flexible, yet ultimately extremely complex pieces of software.
I mean this is even true as far as the various CPU attack vectors have shown in recent years, that yes speculative execution is a neat and 'clever' optimization and that we rely on it for speed, but that maybe that was just too clever a path to go down and we should've stuck with simpler designs that would maybe led to slower speedups but a more solid foundation to build future CPU generations on.
Let's be real, sshd loading random libraries it doesn't actually need because distros patched in a kitchen sink library is inexcusable. That kitchen sink library is libsystemd and it follows the same kitchen sink design principle that systemd-opponents have been criticising all along. But its easier to accuse them of being biased rather consider that maybe they have a point.
People hate systemd from an ethical, philosophical, and ideological standpoint.
People love systemd for the efficiency, economics, etc.
It's like ideal vs production.
That is just technical disagreements and sour grapes by someone involved in a competing format (Lzip).
There’s no evidence Lasse did anything “wrong” beyond looking for / accepting co-maintainers, something package authors are taken to task for not doing every time they have life catching up or get fed up and can’t / won’t spend as much time on the thing.
Yes, nothing points to the inventor of the format and maintainer for decades has done anything with the format to make it suspect. If so, the recent backdoor wouldn't be needed.
It's good to be skeptic, but don't drag people through the mud without anything to back it up.
If a project targets a high-profile, very security sensitive project like the linux kernel from the start, as the archived tukaani web site linked above shows, it is justified to ask questions.
Also, the exploit shows a high effort, and a high level of competence, and a very obvious willingness to play a long game. These are not circumstances for applying Hanlon's razor.
Are you raising the same concerns and targeting individuals behind all other sensitive projects? No, because that would be insane.
It's weird to have one set of standards to a maintainer since 2009 or so, and different standards for others. This witch hunt is just post-hoc smartassery.
Yes, I think if a project has backdoors and its old maintainers are unable to review them, I am more critical than with normal projects. As said, compression is used everywhere and in embedded systems, it touches a lot of critical stuff. And the project went straight for that since the beginning.
And this is in part because I can not even tell for sure that he even exists. If I had met him a few times in a bar, I would be more inclined to believe he is not involved.
> As said, compression is used everywhere and in embedded systems, it touches a lot of critical stuff. And the project went straight for that since the beginning.
> You appeal to trust people and give them the benefit of doubt which is normally a good thing. But is this appropiate here?
Yes.
Without evidence to the contrary there is no reason to believe Lasse has been anything other than genuine so all you're doing is insulting and slandering them out of personal satisfaction.
And conspiratorial witch hunts are actively counter-productive, through that mode of thinking it doesn't take much imagination to figure out you are part of the conspiracy for instance.
1. An important project has an overburdened / burnt out maintainer, and that project is taken over by a persona who appears to help kindly, but is part of a campaign of a state actor.
2. A state actor is involved in setting up such a project from the start.
The first possibility is not only being an asshole to the original maintainer, but it is also more risky - that original maintainer surely feels responsible for his creation and could ring alarm bells. This is not unlikely because he knows the code. And alarm bells is something that state actors do not like.
The second possibility has the risk of the project not being successful, which would mean a serious investment in resources to fail. But that could be countered by having competent people working on that. And in that case, you don't have any real persons,just account names.
I don't think state actors would care one bit about being assholes. Organized crime black hats probably wouldn't either.
The original maintainer has said in the past, before Jia Tan's increased involvement and stepping up as a maintainer, that he couldn't put as much into the project due to mental health and other reasons [1]. Seems to fit possibility number one rather well.
If you suspect that Lasse Collin was somehow in it from the start, that'd mean the actor orchestrated the whole thing about mental health and not being able to keep up with sole maintainership. Why would they even do that if they had the project under their control already?
Of course we don't know what's really been happening with the project recently, or who's behind the backdoor and how. But IMO creating suspicions about the original maintainer's motives based entirely on speculation is also a bit assholey.
More layers of obfuscation. For example in order to be able to attribute the backdoor to a different party.
It is of course also possible that Lasse Collins is a nice real person who just has not been able to review this. Maybe he is too ill,or has to care for an ill spouse, or perhaps he is not even alive any more. Who knows him as a person (not just an account name) and knows how he is doing?
That is kinda crazy - state actors don't need to care about that level of obfuscation. From a state's perspective the situation here would be simple - hire a smart & patriotic programmer to spend ~1+ years maintaining an important package, then they slip a backdoor in. There isn't any point in making it more complicated than that.
They don't even need plausible deniability, groups like the NSA have been caught spying on everyone and it doesn't hurt them all that much. The publicity isn't ideal. But it only confirms what we already new - turns out the spies are spying on people! Who knew.
There are probably dozens if not hundreds of this sort of attempt going on right now. I'd assume most don't get caught. Or go undetected for a many years which is good enough enough. If you have government money on the budget, it makes sense to go with large-volume low-effort attempts rather than try some sort of complex good-cop-bad-cop routine.
You're correct about a great many things.
State actors do things in broad-daylight, get exposed, and it's no fuss to them at all.
But that depends on which "sphere of influence" you live in.
Russia and China have made major changes to key parts of their critical infrastructure based on revelations that might only result in a sub-committee in US Congress.
But to establish a significant contributor to a key piece of software, not unlike xz, is an ideal position for a state actor.
The developer doesn't even need to know who/why, but they could be financially/ideologically aligned.
This is what intelligence officers do. They manage real human assets who exist naturally.
But to have someone long-established as an author of a project is the exact type of asset they want. Even if they push the code, people immediately start considering how it could have been done by someone else.
Yes, it's conspiratorial/paranoid thinking but there's nothing more paranoid than state intelligence trade craft.
It makes me wonder. Is it possible to develop a robust Open Source ecosystem without destroying the mental health of the contributors? Reading his posting really made me feel for him. There are exceedingly few people who are willing do dedicate themselves to developing critical system in the first place. Now there is the burden of extensively vetting every volunteer contributor who helps out. This does not seem sustainable. Perhaps users of open source need to contribute more resources/money to the software that makes their products possible.
False dichotomy much? It doesn't have to be a motivated state actor pulling the strings from the begging. It could also just be some guy, who decided he didn't care anymore and either wanted to burn something or got paid by someone (possibly a state actor) to do this.
Recall that the original maintainer had mental health issues and other things that likely led to the perceived need to bring on someone to help maintain xz.
This brings up some integrity questions about you and other people bringing forth accusations in order to make the original maintainer feel pressure to bring on someone else to replace the one that inserted a backdoor after several years of ostensibly legitimate commits.
Hopefully this helps you see that these sorts of accusations are a slippery slope and unproductive. Heck, you could then turnaround and accuse me of doing something nefarious by accusing you.
I don’t stalk all of your social media posts, so from my perspective I don’t see any of the solutions you’ve posted elsewhere — which brings up a good point to keep in mind: none of us see the complete picture (or can read minds to know what someone else really thinks).
The possibility can be kept in mind and considered even if it isn’t being actively discussed. I think in this case, most people think he is not malicious — and feel that unless new compelling evidence to show otherwise appears, potentially starting a harmful rumor based on speculation is counterproductive.
You might not be trying to start a rumor, but other people could when they try to answer the questions from a place of ignorance — if you take a look at the comments on a gist summarizing the backdoor, there are quite a few comments by z-nonymous that seem to be insinuating that other specific GitHub users are complicit in things by looking at their commits in various non-xz repositories.
No one is running cover, just that most information so far points to the original maintainer not knowing that the person brought on to help out had ulterior motives, and likely wasn’t even who they purported to be. If you were running an open source project and facing burnout as the sole maintainer, I’d imagine you’d exercise perfect judgement and do a full background check on the person offering to help? I think many of us would like to believe we’d do better, but the reality is, most of us would have fallen for the same trick. So now imagine having to deal with the fallout not just on the technical side, but also the never-ending questions surrounding your professional reputation that people just keep bring up — sounds like a recipe for depression, possibly even suicidal thoughts.
I am running an open source project. Yes if someone was eager to help and was making changes to things that involved security, I would make them doxx themselves and submit to a background check
Well, good for you being one of the few exceptions who would make everyone submit themselves to a proper background check (presumably also covering the cost) before giving any write/commit access to the repo. That’s more than even most large open source projects do before giving access.
Thanks, but you assume too much. I outlined the circumstances under which i would require a background check, so you might want to reread. any other questions?
As I understand it Jia was contributing things like tests, not making changes that involve “security”. They just turned the commit, and eventual ability to make releases on the xz GitHub after “earning” more trust (+ access to GitHub pages hosted under tukaani domain), into something they could use to insert a backdoor.
No questions. Anyone can become a victim to social engineering — I believe the short answer to your question about all the downvotes is that a lot of people recognize how they could have fallen for something similar, and empathize that Lasse is likely now going through a rather difficult time.
I have no question about the downvotes, bud. You're very verbose. Still not sure why you revived an account you haven't commented with in 6 years just to run cover. I find you to be a highly suspicious individual and I really have nothing more to say to you.
I suppose I think verbose-ness will help people see the other side of things. I think I was also trying to convince myself that you aren’t just into conspiracy theories, but given that you’re now accusing me of being suspicious… :shrug: it did come full circle where in my first comment I said you would start accusing me. I guess neither of us have anything more to say to each other because we are both too locked into our own beliefs.
It's possible that he was intentionally pressured and his mental health made bad or worse by the adversary to increase stress. The adversary would then propose to help them reduce the stress.
It argues the topic pretty well: xz is unsuitable for long-term archival. The arguments are in-depth and well worded. Do you have any argument to the contrary beyond "sour grapes"?
I can understand wanting your project to succeed, it's pretty natural and human, but it's flagrant Antonio had a lot of feels about the uptake of xz compared to lzip, as both are container formats around raw lzma data streams and lzip predates xz by 6 months. His complaint article about xz is literally one of the "Introductory links" of lzip.
Neither is lzip since it doesn't contain error correction codes. You can add those with an additional file (to any archive) e.g. via par2 but then most of the points in the linked rant become irrelevant.
Collateral damage yes, but it seems like he is currently away from the internet for an extended time. So it could be that Github needed to suspend his account in order to bypass things that he would otherwise have to do/approve? Or to preempt the possibility that his account was also compromised and we don't know yet.
No. I mean that the link you shared is a opinion piece about the xz file format, and those opinions are fully unrelated to today's news and only serve to further discredit Lasse Collin who for all we know have been duped and tricked by a nation state, banned by github and is having a generally shitty time.
There may be some suboptimal things about security of the XZ file format, I don't know.
I bet you there are less than optimal security choices in your most cherished piece of software as well.
This thread is about an exploit that does not rely on any potential security problems in the DESIGN of the xz FORMAT. Therefore your point, even if valid as a general one, is not really relevant to the exploit we're discussing.
Further, there's some proof needed that any potential suboptimal aspects of the security design of the xz FORMAT was designed such so that it could be exploited later or simply because no programmer is an expert on every aspect of security ever. I mean you could be the most security conscious programmer and your chain could still be compromised.
Security today is such a vast field and it takes so little to get you compromised that proclaiming anything 'secure design' these days is practically impossible.
I bet you an audit of lzip would find plenty of security issues, would those be intentional?
1) Are there no legit code reviews from contributors like this? How did this get accepted into main repos while flying under the radar? When I do a code review, I try to understand the actual code I'm reviewing. Call me crazy I guess!
2) Is there no legal recourse to this? We're talking about someone who managed to root any linux server that stays up-to-date.
> 2) Is there no legal recourse to this? We're talking about someone who managed to root any linux server that stays up-to-date.
Any government which uses GNU/Linux in their infrastructure can pitch this as an attempt to backdoor their servers.
The real question is: will we ever even know who was behind this? If it was some mercenary hacker intending to resell the backdoor, maybe. But if it was someone working with an intelligence agency in US/China/Israel/Russia/etc, I doubt they'll ever be exposed.
Reflecting on the idea of introducing a validation structure for software contributions, akin to what RPKI does for BGP routing, I see significant potential to enhance security and accountability in software development.
Such a system could theoretically bring greater transparency and responsibility, particularly in an ecosystem where contributions come from all corners.
Implementing verifiable identity proofs for contributors might be challenging, but it also presents an opportunity to bolster security without compromising privacy and the freedom to contribute under pseudonyms.
The accountability of those accepting pull requests would also become clearer, potentially reducing the risk of malicious code being incorporated.
Of course, establishing a robust validation chain for software would require the commitment of everyone in the development ecosystem, including platforms like GitHub. However, I view this not as a barrier but as an essential step towards evolving our approach to security and collaboration in software development.
The actual inclusion code was never in the repo. The blobs were hidden as lzma test files.
So you review would need to guess from 2 new test files that those are, decompressed, a backdoor and could be injected which was never in the git history.
Ok, go ahead and scrutinize those files without looking at the injection code that was never in the repo? Can you find anything malicious? Probably not - it looks like random garbage which is what it was claimed to be.
"Jia Tan" was not a contributor, but a maintainer of the project. The key point here is that this is a multi-year project of infiltrating the xz project and gaining commit access.
In a large tech company (including ones I have worked at), sometimes you have policy where every change has to be code reviewed by another person. This kind of stuff isn't possible when the whole project only has 1-2 maintainers. Who's going to review your change other than yourself? This is the whole problem of OSS right now that a lot of people are bringing up.
I maintain a widely used open source project myself. I would love it if I could get high quality code review for my commits similar to my last workplace lol, but very very few people are willing to roll up their sleeves like that and work for free. Most people would just go to the Releases page and download your software instead.
>How did this get accepted into main repos while flying under the radar? When I do a code review, I try to understand the actual code I'm reviewing. Call me crazy I guess!
And? You never do any mistakes? Google "underhanded C contest"
It's hardly surprising given that parsing is generally considered to be a tricky problem. Plus, it's a 15 years old project that's widely used. 750 commits is nothing to sneer about. No wonder the original maintainer got burned out.
For the duration of a major release, up until ~x.4 pretty much everything from upstream gets backported with a delay of 6-12 months, depending on how conservative to change the rhel engineer maintaining this part of the kernel is.
After ~x.4 things slow down and only "important" fixes get backported but no new features.
After ~x.7 or so different processes and approvals come into play and virtually nothing except high severity bugs or something that "important customer" needs will be backported.
Sadly, 8.6 and 9.2 kernel are the exception to this. Mainly as they are openshift container platform and fedramp requirements.
The goal is that 8.6, 9.2 and 9.4 will have releases at least every two weeks.
Maybe soon all Z streams will have a similar release cadence to keep up with the security expectations, but will keep a very similar expectations that you outlined above.
I imagine it might be easier to just compromise a weakly protected account than to actual put in a 2 years long effort with real contributions. If we mandated MFA for all contributors who contribute to these really important projects then we can know with greater certainty if it was really a long con vs. a recently compromised account.
For some random server, sure. For a state sponsored attack? Having an embedded exploit you can use when convenient, or better yet an unknown exploit affecting every linux-based system connected to the internet that you can use when war breaks out - that's invaluable.
Having one or two people on payroll to occasionally add commits to a project isn't exactly that expensive if it pays off. There are ~29,000,000 US government employees (federal, state and local). Other countries like China and India have tens of millions of government employees.
Even if they contract it out, at $350/hr (which is not a price that would raise any flags), that is less that $750k. Even with a fancy office, couple of laptops and 5' monitors, this is less than a day at the bombing range or a few minutes keeping an aircraft carrier operational.
Even a team of 10 people working on this - the code and social aspect - would be a drop in the bucket for any nation-state.
I find it funny how MFA is treated as if it would make account takeover suddenly impossible. It's just a bit more work, isn't it? And a big loss in convenience.
I'd much rather see passwords entirely replaced by key-based authentication. That would improve security. Adding 2FA to my password is just patching a fundamentally broken system.
Customer service at one of my banks has an official policy of sending me a verification code via email that I then read to them over the phone, and that's not even close to the most "wrong" 2FA implementation I've ever seen. Somehow that institution knows what a YubiKey is, but several major banks don't.
I'm security consultant in the financial industry. I've literally been involved in the decision making on this at a bank. Banks are very conservative, and behave like insecure teenagers. They won't do anything bold, they all just copy each other.
I pushed YubiKey as a solution and explained in detail why SMS was an awful choice, but they went with SMS anyway.
It mostly came down to cost. SMS was the cheapest option. YubiKey would involve buying and sending the keys to customers, and they having the pain/cost of supporting them. There was also the feeling that YubiKeys were too confusing for customers. The nail in the coffin was "SMS is the standard solution in the industry" plus "If it's good enough for VISA it's good enough for us".
Interesting. I assumed a lot of client software for small banks was vendored - I know that's the case for brokerages. Makes it all the weirder that they all imitate each other.
Here's the thing about SMS: your great aunt who doesn't know what a JPEG is, knows what a text is. Ok, she might not fully "get it" but she knows where to find a text message in her phone. My tech-literate fiancée struggles to get her YubiKey to work with her phone, and I've tried it with no more luck than she's had. YubiKeys should be supported but they're miles away from being usable enough to totally supplant other 2FA flows.
I'd guess part of the reason is that customers would blame the bank when their YubiKey doesn't work, which would become a nuisance for them as much as the YubiKey's usability issues are a nuisance for the customer.
I mean your employer wasn’t wrong. Yubikeys ARE way too confusing for the average user, way too easy to lose, etc. maybe have it as an option for power users, but they were right it would be a disastrous default.
Financial institutions are very slow to adopt new tech. Especially tech that will inevitably cost $$$ in support hours when users start locking themselves out of their accounts. There is little to no advantage to being the first bank to implement YubiKey 2FA. To a risk-averse org, the non-zero chance of a botched rollout or displeased customers outweighs any potential benefit.
A friensd bank, hopefully not the one i use, only allow a password off 6 digits. Yes You read it right, 6 fucking digits to login, i hace him the asvice to run away from that shitty bank
Did this bank start out as a "telephone bank"? One of the largest German consumer banks still does this because they were the first "direct bank" without locations and typing in digits on the telephone pad was the most secure way of authenticating without telling the "bank teller" your password. So it was actually a good security measure but it is apparently too complicated to update their backend to modern standards.
Nope, I read The Register (UK based) and they've had scandals from celebrities having their confidential SMS messages leaked; SMS spoofing; I think they even have SIM cloning going on every now and then in UK and some European countries. (since The Register is a tech site, my recollection is some carriers took technical measures to prevent these issues while quite a few didn't.)
I don't think it's a thing that happens that often in UK etc.; but, it doesn't happen that frequently in the US either. It's just a thing that can potentially happen.
Its also been a problem in Australia, Optus (2nd biggest teleco) used to allow number porting or activating sim against an existing account with a bare minimum of detail - Like a name, address and date of birth. If you had those details of a target you could clone their SIM and crack any SMS based MFA.
I don’t know about other parts, but here in France SMS is a shitshow. I regularly fail to receive them even though I know I have good reception.
This happened the other day while I was on a conference call with perfect audio and video using my phone’s mobile data.
A few weeks back, had some shop which sends out an SMS to inform you the job’s done tell me this is usually hit and miss when I complained about not hearing from them.
Many single radio phones can either receive sms/calls, or transmit data.
My relative owns such a device and cannot use internet during calls or receive/make calls during streaming like YT video playback.
In my case this is an iPhone 14 pro. I'm pretty sure I can receive calls while using data, since I often look things up on the internet while talking to my parents.
And, by the way, the SMS in question never arrived. I don't know if there's some kind of timeout happening, and the network gives up after a while. Some 15 years ago I remember getting texts after an hour or two if I only had spotty reception. This may of course have changed in the meantime, plus this is a different provider.
SMS is not E2E encrypted, so for all intents is just a plain text message that can/has been snooped. Might as well just send a plaintext emails as well.
I recently had an issue with a sim card and went to phone store that gave me a new one and disabled the old. They're supposed to ask for ID, but often doesn't bother. This is true for pretty much every country. Phone 2FA is simply completely insecure.
Banks are in a tough spot. Remember, banks have you as a customer, they also have a 100 year old person who still wants to come to the branch in person as a customer. Not everyone can grapple with the idea of a Yubikey, or why their bank shouldn't be protecting their money like it did in the past.
The problem is that the bank will automatically enable online access and SMS-confirmed transfers for that 100 year old person who doesn't even know how to use Internet.
Not actually. Even if you enabled passkey, you still can login to their phone app via SMS. So it is not more secure. People who knows how to do SMS attacks certainly knows how to install a mobile app. And BofA gave their customers a fake assurance.
yeah someone replied to one of my comments about adding MFA that an attacker can get around all that simply by buying the account from the author. I was way too narrowly focused on the technical aspects and was completely blind to other avenues like social engineering, etc.
>I'd much rather see passwords entirely replaced by key-based authentication
I've never understood how key-based systems are considered better. I understand the encryption angle, nobody is compromising that. But now I have a key I need to personally shepherd? where do I keep it, and my backups, and what is the protection on those places? how many local copies, how many offsite? And I still need a password to access/use it, but with no recourse should I lose or forget. how am I supposed to remember that? It's all just kicking the same cans down the same roads.
Passkeys are being introduced right now in browsers and popular sites like a MFA option, but I think the intention is that they will grow and become the main factor in the future.
I liked the username, password and TOTP combination. I could choose my own password manager, and TOTP generator app, based on my preferences.
I have a feeling this won't hold true forever. Microsoft has their own authenticator now, Steam has another one, Google has their "was this you?" built into the OS.
Monetization comes next? "View this ad before you login! Pay 50c to stay logged in for longer?"
MS Azure Active Entra's FIDO2 implementation only allows a select list of vendors. You need a certification from FIDO ($,$$$), you need to have an account that can upload on the MDS metadata service, and you need to talk to MS to see if they'll consider adding you to the list
It's not completely closed, but in practice no one on that list is a small independent open source project, those are all the kind of entrenched corporate security companies you'd expect
But the way it is designed, you can require a certain provider, and you can bet at least some sites will start requiring attestation from Google and or Apple.
Do they do attestation by default? I thought for Apple at least that was only a feature for enterprise managed devices (MDM). Attestation is also a registration-time check, so doesn’t necessarily constrain where the passkey is synced to later on.
I couldn’t imagine trying to train the general public to use mTLS and deploy that system.
I’m not even sure it is difficult. Most people I’ve talked to in tech don’t even realize it is a possibility. Certificates are “complicated” as they put it.
> Google has their "was this you?" built into the OS.
Not only that, but it's completely impossible to disable or remove that functionality or even make TOTP the primary option. Every single time I try to sign in, Google prompts my phone first, giving me a useless notification for later, and I have to manually click a couple of buttons to say "no I am not getting up to grab my phone and unlock it for this bullshit, let me enter my TOTP code". Every single time.
Doesn't passkeys give the service a signature to prove what type of hardware device you're using? e.g. it provides a way for the server to check whether you are using a software implementation? It's not really open if it essentially has type of DRM built in.
You're thinking of hardware-backed attestation, which provides a hardware root of trust. I believe passkeys are just challenge-response (using public key cryptography). You could probably add some sort of root of trust (for example, have the public key signed by the HSM that generated it) but that would be entirely additional to the passkey itself.
Passkeys do have the option of attestation, but the way Apple at least do them means Apple users won't have attestation, so most services won't require attestation.
KeepassXC is working on supporting them natively in software, so you would not need to trust big tech companies, unless you are logging into a service that requires attestation to be enabled.
Password managers are adding support (as in they control the keys) and I've used my yubikeys as "passkeys" (with the difference that I can't autofill the username).
It's a good spec. I wish more people who spread FUD about it being a "tech-giant" only thing would instead focus on the productive things like demanding proper import/export between providers.
You realise that the second your password manager has it, then it's no longer MFA but it's just 1 factor authentication with extra steps right?
Password manager turns something you know into something you own. If also the something you own is in the password manager itself… it's the same as requiring extra long passwords.
This is a state sponsored event.
Pretty poorly executed though as they were tweaking and modifying things in their and other tools after the fact though.
As a state sponsored project. What makes you think this is their only project and that this is a big setback?
I am paranoid myself to think yesterdays meeting went like :
"team #25 has failed/been found out. Reallocate resources to the other 49 teams."
As I said recently in a talk I gave, 2FA as implemented by pypy or github is meaningless, when in fact all actions are performed via tokens that never expire, that are saved inside a .txt file on the disk.
In pypi to obtain a token that is limited in scope you must first generate an unlimited token.
True story.
In gh you can generate a limited one, but it's not really clear on what the permissions actually mean, so it's trial and error… which means most people will get tired and grant random stuff to have them working.
I didn’t know that about pypi but github has seemed ok to me. I’ve also implemented my own scoped authentication systems so even if they’re not perfect I know it can be done
they might not have been playing the long con. maybe approached by actors willing to pay them a lot of money to try and slip in a back door. I'm sure a deep dive into code contributions would clear that up for anyone familiar with the code base and some free time.
They did fuck up quite a bit though.
They injected their payload before they checked if oss-fuzz or valgrind or ... would notice something wrong.
That is sloppy and should have been anticipated and addressed BEFORE activating the code.
Anyway. This team got caught. What are the odds that this state-actor that did this, that this was the only project / team / library that they decided to attack?
Doesn't it mandate it for everyone? I don't use it anymore and haven't logged in since forever, but I think I got a series of e-mails that it was being made mandatory.
It will soon. I think I have to sort it out before April 4. My passwords are already >20 random characters, so I wasn't going to do it until they told me to.
If you are using pass to store those, check out pass-otp and browserpass, since GitHub still allows TOTP for MFA. pass-otp is based on oathtool, so you can do it more manually too if you don't use pass.
Most people will have to sync their passwords (generally strong and unique, given that it's for github) to the same device where their MFA token is stored, rendering it (almost) completely moot, but at a significantly higher risk of permanent access loss (depending on what they do with the reset codes, which, if compromised, would also make MFA moot.) (a cookie theft makes it all moot as well.)
The worse part is that people think they're more protected, when they're really not.
Bringing everyone up to the level of "strong and unique password" sounds like a huge benefit. Even if your "generally" is true, which I doubt, that leaves a lot of gaps.
Doesn't help that a lot of companies still just allow anyone with access to the phone number to gain access to the account (via customer support or automated SMS-based account recovery).
SMS 2FA is the devil. It’s the reason I haven’t swapped phone numbers even though I get 10-15 spam texts a day. The spam blocking apps don’t help and intrude on my privacy.
Browsers don't save the TOTP seed and auto fill it for you for one, making it much less user friendly than a password in practice.
The main problem I have with MFA is that it gets used too frequently for things that don't need that much protection, which from my perspective is basically anything other than making a transfer or trade in my bank/brokerage. Just user-hostile requiring of manual action, including finding my phone that I don't always keep on me.
It's also often used as a way to justify collecting a phone number, which I wouldn't even have if not for MFA.
You know google authenticator doesn't matter right? You know you could always copy your totp seeds since day one regardless of which auth app or it's features or limits right? You know that a broken device does not matter at all, because you have other copies of your seeds just like the passwords, right?
When I said they are just another password, I was neither lying nor in error. I presume you can think of all the infinite ways that you would keep copies of a password so that when your phone or laptop with keepassxc on it breaks, you still have other copies you can use. Well when I say just like a password, that's what I mean. It's just another secret you can keep anywhere, copy 50 times in different password managers or encrypted files, print on paper and stick in a safe, whatever.
Even if some particular auth app does not provide any sort of manual export function (I think google auth did have an export function even before the recent cloud backup, but let's assume it didn't), you can still just save the original number the first time you get it from a qr code or a link. You just had to know that that's what those qr codes are doing. They aren't single-use, they are nothing more than a random secret which you can keep andbcopy and re-use forever, exactly the same as a password. You can copy that number into any password manager or plain file or whatever you want just like a password, and then use it to set up the same totp on 20 different apps on 20 different devices, all working at the same time, all generating valid totp codes at the same time, destroy them all, buy a new phone, retrieve any one of your backup keepass files or printouts, and enter them into a fresh app on a fresh phone and get all your totp fully working again. You are no more locked out than by having to reinstall a password manager app and access some copy of your password db to regain the ordinary passwords.
The only difference from a password is, the secret is not sent over the wire when you use it, something derived from it is.
Google authenticators particular built in cloud copy, or lack of, doesn't matter at all, and frankly I would not actually use that particular feature or that particular app. There are lots of totp apps on all platforms and they all work the same way, you enter the secret give it a name like your bank or whatever, select which algorithm (it's always the default, you never have to select anything) and instantly the app starts generating valid totp codes for that account the same as your lost device.
Aside from saving the actual seed, let's say you don't have the original qr code any more (you didn't print it or screenshot it or right-click save image?) there is yet another emergency recovery which is the 10 or 12 recovery passwords that every site gives you when you first set up totp. You were told to keep those. They are special single-use passwords that get you in without totp, but each one can only be used once. So, you are a complete space case and somehow don't have any other copiesbof your seeds in any form, including not even simple printouts or screenshots of the original qr code, STILL no problem. You just burn one of your 12 single-use emergency codes, log in, disable and re-enable totp on that site, get a new qr code and a new set of emergency codes. Your old totp seed and old emergency codes no longer work so thow those out. This time, not only keep the emergency codes, also keep the qr code, or more practical, just keep the seed value in the qr code. It's right there in the url in the qr code. Sometimes they even display the seed value itself in plain text so that you can cut & paste it somewhere, like into a field in keepass etc.
In fact keepass apps on all platforms also will not only store the seed value but display the current totp for it just like a totp app does. But a totp app is a more convenient.
And for proper security, you technically shouldn't store both the password and the totp seed for an account in the same place, so that if someone gains access to one, they don't gain access to both. That's inconvenient but has to be said just for full correctness.
I think most sites do a completely terrible job of conveying just what totp is when you're setting it up. They tell you to scan a qr code but they kind of hide what that actually is. They DO all explain about the emergency codes but really those emergency codes are kind of stupid. If you can preserve a copy of the emergency codes, then you can just as easily preserve a copy of the seed value itself exactly the same way, and then, what's the point of a hanful of single-use emergency passwords when you can just have your normal fully functional totp seed?
Maybe one use for the emergency passwords is you could give them to different loved ones instead of your actual seed value?
Anyway if they just explained how totp basically works, and told you to keep your seed value instead of some weird emergency passwords, you wouldn't be screwed when a device breaks, and you would know it and not be worried about it.
Now, if, because of that crappy way sites obscure the process, you currently don't have your seeds in any re-usable form, and also don't have your emergency codes, well then you will be F'd when your phone breaks.
But that is fixable. Right now while it works you can log in to each totp-enabled account, and disable & reenable totp to generate new seeds, and take copies of them this time. Set them up on some other device just to see that they work. Then you will no longer have to worry about that.
My original, correct, message was perfectly short.
You don't like long full detailed explainations, and you ignore short explainations. Pick a lane!
A friend of mine a long time ago used to have a humorous classification system, that people fell into 3 groups: The clued, The clue-able, The clue-proof.
Some people already understand a thing. Some people do not understand a thing, but CAN understand it. Some people exist in a force bubble of their own intention that actively repels understanding.
I see that in your classification system an important entry is missing. The ones who disagree.
In your quest to convince me you forgot to even stop to ponder if you're right at all. And in my view, you aren't.
Perhaps the problem isn't that I don't understand you. Perhaps I understand you perfectly well but I understand even more, to realise that you're wrong :)
This is a silly thing to argue about but hey I'm silly so let's unpack your critique of the classification system
There is no 4th classification. It only speaks of understanding not agreeing.
Things that are matters of opinion may still be understood or not understood.
Whether a thing is a matter of opinion or a matter of fact, both sides of a disagreement still slot into one of those classes.
If a thing is a matter of opinion, then one of the possible states is simply that both sides of a disagreement understand the thing.
In this case, it is not a matter of opinion, and if you want to claim that I am the one who does not understand, that is certainly possible, so by all mrans, show how. What fact did I say that was not true?
Keep trying soldier. You never know. (I mean _I_ know, but you don't. As far as you know, until you go find out, I might be wrong.)
Whatever you do, don't actually go find out how how it works.
Instead, continue avoiding finding out how it works, because holy cow after you've gone this far... it's one thing to just be wrong about something, everyone always has to start out not understanding something, that's no failing, but to have no idea what you're talking about yet try to argue about it, in error the whole time..., I mean they (me) were such an insufferable ass already trying to lecture YOU, but for them (me) to turn out to have been simply correct in every single fact they spoke, without even some technicality or anything to save a little face on? Absolutely unthinkable.
Definitely better to save yourself from that by just never investigating.
My original statement was only that this is not a 2fa problem, which was and still is true.
The fact that you did not know this does not change this fact.
I acknowledged that web sites don't explain this well, even actively hide it. So it's understandable not to know this.
But I also reminded that this doesn't actually matter because you WERE also given emergency recovery passwords, and told to keep them, and told why, and how important they were.
You were never at risk of being locked out from a broken device EVEN THOUGH you didn't know about saving the seed values, UNLESS you also discarded the emergency codes, which is not a 2fa problem, it's an "I didn't follow directions" problem.
And even if all of that happened, you can still, right now, go retroactively fix it all, and get all new seed values and save them this time, as long long as your one special device happens to be working right now. It doesn't matter what features google authenticator has today, or had a year ago. It's completely and utterly irrelevant.
My premis remains correct and applicable. Your statement that 2fa places you at at risk was incorrect. You may possibly be at risk, but if so you did that to yourself, 2fa did not do that to you.
> But I also reminded that this doesn't actually matter because you WERE also given emergency recovery passwords, and told to keep them, and told why, and how important they were.
Ah yes those… the codes I must go to a public library to print, on a public computer, public network and public printer. I can't really see any problem with the security of this.
And then I must never forget where I put that very important piece of paper. Not in 10 years and after moving 3 times…
You can save a few bits of text any way you want. You can write them in pencil if you want just as a backup against google killing your google drive or something. Or just keep them in a few copies of a password manager db in a few different places. It's trivial.
What in the world is this library drama?
No one is this obtuse, so your arguments are most likely disingenuous.
But if they are sincere, then find a nephew or someone to teach you how your computer works.
Libraries? Remembering something for 10 years? Moving? Oh the humanity!
Yes, you can keep them on the same device if you choose to.
Or not. You decide how much effort you want and where you want to place the convenience vs security slider.
Yes, if you keep both factors not only on the same device but in the same password manager, then both factors essentially combine into nothing but a longer password.
I did say from the very first, that the seeds are nothing other than another password.
Except there is still at least one difference which I will say for at least the 3rd time... the totp secret is not transmitted over the wire when it is used, the password is. That is actually a significant improvement all by itself even if you do everything else the easy less secure way.
And you do not have to store the seeds the convenient less secure way. You can have them in a different password app with a different master password on the same device, or on seperate devices, or in seperate physical forms. You can store them any way you want, securely, or less securely.
The point is that even while opting to do things all the very secure way, you are still not locked out of anything when a single special device breaks, because you are not limited to only keeping a single copy of the seeds or the emergency passwords in a single place like on a single device or a single piece of paper.
You are free to address any "but what about" questions you decide you care about in any way you feel like.
The only way you were ever screwed is by the fact that the first time you set up 2fa for any site, most sites don't explain the actual mechanics but just walk you through a sequence of actions to perform without telling you what they actually did, and so at the end of following those directions you ARE left with the seeds only stored in a single place. And in the particular case of Google Authenticator, stored in a more or less inaccessible place in some android sqlite file you can't even manually get to without rooting your phone probably. And were never even told about the seed value at all. You were given those emergency passwords instead.
That does leave you with a single precious drvice that must not break or be lost. But the problem is only a combination of those bad directions given by websites, and the limitations of one particular totp app when that app didn't happen to display or export or cloud-backup the seeds until recently.
Even now Googles answer is a crap answer, because Google can see the codes unencrypted on their server, and Google can kill your entire gooogle account at sny time and you lose everything, email, drive , everything, instantly, no human to argue with. That is why I said even today I still would not use Google Authenticator for totp.
Except even in that one worst case, you still had the emergency passwords, which you were always free to keep in whatever way works for you. There is no single thing you must or must not do, there is only what kinds of problems are the worst problems for you.
Example: if what you are most concerned about is someone else getting ahold of a copy of those emergency passwords, then you want to have very few copies of them and they should be off-line and inconvenient to access. IE a printed hard copy in a safe deposit box in switzerland.
If what you are most concerned about is accidentally destroying your life savings by losing the password and the investment site has no further way to let you prove your ownership, then keep 10 copies in 10 different physical forms and places so that no matter what happens, you will always be able to access at least one of them. One on goggle drive, one on someone else's google drive in case yours is killed, one on onedrive, one on paper at home, one on paper in your wallet, one on your previous phonr that you don't use but still works, etc etc.
You pick whichever is your biggest priority, and address that need however you want, from pure convenience to pure security and all possible points in between. The convenient way has security downsides. The secure way has convenience downsides. But you are not forced to live with the downsides of either the convenient way or the secure way.
> Why opposed to MFA? Source code is one of the most important assets in our realm.
Because if you don't use weak passwords MFA doesn't add value. I do recommend MFA for most people because for most people their password is the name of their dog (which I can look up on social media) followed by "1!" to satisfy the silly number and special character rules. So yes please use MFA.
But if your (like my) passwords are 128+bits out of /dev/random, MFA isn't adding value.
If you have a keylogger, they can also just take your session cookie/auth tokens or run arbitrary commands while you're logged in. MFA does nothing if you're logging into a service on a compromised device.
Keyloggers can be physically attached to your keyboard. There could also be a vulnerability in the encryption of wireles keyboards. Certificate-based MFA is also phishing resistant, unlike long, random, unique passwords.
There are plenty of scenarios where MFA is more secure than just a strong password.
These scenarios are getting into some Mission Impossible level threats.
Most people use their phones most of the time now, meaning the MFA device is the same device they're using.
Of the people who aren't using a phone, how many are using a laptop with a built in keyboard? It's pretty obvious if you have a USB dongle hanging off your laptop.
If you're using a desktop, it's going to be in a relatively secure environment. Bluetooth probably doesn't even reach outside. No one's breaking into my house to plant a keylogger. And a wireless keyboard seems kind of niche for a desktop. It's not going to move, so you're just introducing latency, dropouts, and batteries into a place where they're not needed.
Long, random, unique passwords are phishing resistant. I don't know my passwords to most sites. My web browser generates and stores them, and only uses them if it's on the right site. This has been built in functionality for years, and ironically it's sites like banks that are most likely to disable auto fill and require weak, manual passwords.
I mean, both can be true at the same time. I have to admit that I only use MFA when I'm forced to, because I also believe my strong passwords are good enough. Yet I can still acknowledge that MFA improves security further and in particular I can see why certain services make it a requirement, because they don't control how their users choose and use their passwords and any user compromise is associated with a real cost, either for them like in the case of credit card companies or banks, or a cost for society, like PyPI, Github, etc.
I don't think phishing is such an obscure scenario.
The point is also that you as an individual can make choices and assess risk. As a large service provider, you will always have people who reuse passwords, store them unencrypted, fall for phishing, etc. There is a percentage of users that will get their account compromised because of bad password handling which will cost you, and by enforcing MFA you can decrease that percentage, and if you mandate yubikeys or something similar the percentage will go to zero.
> I don't think phishing is such an obscure scenario.
For a typical person, maybe, but for a tech-minded individual who understands security, data entropy and what /dev/random is?
And I don't see how MFA stops phishing - it can get you to enter a token like it can get you to enter a password.
I'm also looking at this from the perspective of an individual, not a service provider, so the activities of the greater percentage of users is of little interest to me.
> That's why I qualified it with "certificate-based". The private key never leaves the device
Except that phishing doesn't require the private key - it just needs to echo back the generated token. And even if that isn't possible, what stops it obtaining the session token that's sent back?
From my understanding, FIDO isn't MFA though (the authenticator may present its own local challenge, but I don't think the remote party can mandate it).
There's also the issue of how many sites actually use it, as well as how it handles the loss of or inability to access private keys etc. I generally see stuff like 'recovery keys' being a solution, but now you're just back to a password, just with extra steps.
The phisher can just pass on whatever you sign, and capture the token the server sends back.
Sure, you can probably come up with some non-HTTPS scheme that can address this, but I don't see any site actually doing this, so you're back to the unrealistic scenario.
No, because the phisher will get a token that is designated for, say, mircos0ft.com which microsoft.com will not accept. It is signed with the user's private key and the attacker cannot forge a signature without it.
A password manager is also not going to fill in the password on mircos0ft.com so is perfectly safe in this scenario. You need a MitM-style attack or a full on client compromise in both cases, which are vulnerable to session cookie exfiltration or just remote control of your session no matter the authentication method.
If I were trying to phish someone, I wouldn't attack the public key crypto part, so how domains come into play during authentication doesn't matter. I'd just grab the "unencrypted" session token at the end of the exchange.
Even if you somehow protected the session token (sounds dubious), there's still plenty a phisher could do, since it has full MITM capability.
Session keys expire and can be scoped to do anything except reset password, export data, etc…that’s why you’ll sometimes be asked to login again on some websites.
If you're on a service on a compromised device, you have effectively logged into a phishing site. They can pop-up that same re-login page on you to authorize whatever action they're doing behind the scenes whenever they need to. They can pretend to be acting wonky with a "your session expired log in again" page, etc.
This is part of why MFA just to log in is a bad idea. It's much more sensible if you use it only for sensitive actions (e.g. changing password, authorizing a large transaction, etc.) that the user almost never does. But you need everyone to treat it that way, or users will think it's just normal to be asked to approve all the time.
Some USB keys have a LCD screen on it to prevent that. You can comprise the computer that the key was inserted to, but you cannot comprise the key. If you see the things messages shows up on your computer screen differs from the messages on the key, you reject the auth request.
The slogan is "something you know and something you have", right?
I don't have strong opinions about making it mandatory, but I turned on 2FA for all accounts of importance years ago. I use a password manager, which means everything I "know" could conceivably get popped with one exploit.
It's not that much friction to pull out (or find) my phone and authenticate. It only gets annoying when I switch phones, but I have a habit of only doing that every four years or so.
You sound like you know what you're doing, that's fine, but I don't think it's true that MFA doesn't add security on average.
Right. I don't ever want to tie login to a phone because phones are pretty disposable.
> I don't think it's true that MFA doesn't add security on average
You're right! On average it's better, because most people have bad password and/or reuse them in more than one place. So yes MFA is better.
But if your password is already impossible to guess (as 128+ random bits are) then tacking on a few more bytes of entropy (the TOTP seed) doesn't do much.
Those few bits are the difference between a keylogged password holder waltzing in and an automated monitor noticing that someone is failing the token check and locking the account before any damage occurs.
I think your missing parents point, both are just preshared keys, one has some additional fuzz around it so that the user in theory isn't themselves typing the same second key in all the time, but much of that security is in keeping the second secret in a little keychain device that cannot itself leak the secret. Once people put the seeds in their password managers/phones/etc its just more data to steal.
Plus, the server/provider side remains a huge weak point too. And the effort of enrolling/giving the user the initial seed is suspect.
This is why the FIDO/hardware passkeys/etc are so much better because is basically hardware enforced two way public key auth, done correctly there isn't any way to leak the private keys and its hard has hell to MITM. Which is why loss of the hw is so catastrophic. Most every other MFA scheme is just a bit of extra theater.
Exactly, that's it. Two parties have a shared secret of, say 16 bytes total, upon which authentication depends.
They could have a one byte long password but a 15 byte long shared secret used to compute the MFA code. The password is useless but the MFA seed is unguessable. Maybe have no password at all (zero length) and 16 byte seed. Or go the other way and a 16 byte password and zero seed. In terms of an attacker brute forcing the keyspace, it's always the same, 16 bytes.
We're basically saying (and as a generalization, this is true) that the password part is useless since people will just keep using their pets name, so let's put the strenght on the seed side. Fair enough, that's true.
But if you're willing to use a strong unique password then there's no real need.
(As to keyloggers, that's true, but not very interesting. If my machine is already compromised to the level that it has malicious code running logging all my input, it can steal both the passwords and the TOTP seeds and all the website content and filesystem content and so on. Game's over already.)
> This is why the FIDO/hardware passkeys/etc are so much better
Technically that's true. But in practice, we now have a few megacorporations trying to own your authentication flow in a way that introduces denial of service possibilities. I must control my authentication access, not cede control of it to a faceless corporation with no reachable support. I'd rather go back to using password123 everywhere.
Your password is useless when it comes to hardware keyloggers.
We run yearly tests to see if people check for "extra hardware". Needles to say we have a very high failure rate.
It's hard to get a software keylooger installed on a corp. machine. It's easy to get physical access to the office or even their homes and install keyloggers all over the place and download the data via BT.
> Your password is useless when it comes to hardware keyloggers.
You are of course correct.
This is where threat modeling comes in. To really say if something is more secure or less secure or a wash, threat modeling needs to be done, carefully considering which threats you want to cover and not cover.
I this thread I'm talking from the perspective of an average individual with a personal machine and who is not interesting enough to be targeted by corporate espionage or worse.
Thus, the threat of operatives breaking into my house and installing hardware keyloggers on my machines is not part of my threat model. I don't care about that at all, for my personal use.
For sensitive company machines or known CxOs and such, yes, but that's a whole different discussion and threat model exercise.
Which helps with some kinds of threats, but not all. It keeps someone from pretending to be the maintainer -- but if an actual maintainer is compromised, coerced, or just bad from the start and biding their time, they can still do whatever they want with full access rights.
Not MFA but git commit signing. I don't get why such core low-level projects don't mandate it. MFA doesn0t help if a github access token is stolen and I bet most of use such a token for pushing from an IDE.
Even if an access token to github is stolen, the sudden lack of signed commit should raise red flags. github should allow projects to force commit signing (if not already possible).
Then the access token plus the singing key would need to be stolen.
But of course all that doesn't help in the here more likley scenario of a long con by a state-sponsored hacker or in case of duress (which in certain countries seems pretty likley to happen)
This seems like a great way to invest in supporting open source projects in meantime if these projects are being used by these actors. Just have to maintain an internal fork without the backdoors
Maybe someone can disrupt the open source funding problem by brokering exploit bounties /s
Which like, also wouldn't be totally weird if I found out that the xz or whatever library maintainer worked for the DoE as a researcher? I kind of expect governments to be funding this stuff.
From what I read on masto, the original maint had personal life breakdown, etc. Their interest in staying as primary maint is gone.
This is a very strong argument for FOSS to pick up the good habit of ditching/un-mainlining projects where they are sitting around for state actors to volunteer injecting commits to, and dep-stripping active projects from this cruft.
Who wants to maintain on a shitty compression format? Someone who is dephunting, it turns out.
Okay so your pirate-torrent person needs liblzma.so Offer it in the scary/oldware section of the package library that you need to hunt down the instructions to turn on. Let the users see that it's marked as obsolete, enterprises will see that it should go on the banlist.
Collin worked on XZ and its predecessor ~15 years. It seems that he did that for free, at least in recent times. Anyone will lose motivation to work for free over this period of time.
At the same time, XZ became a cornerstone of major Linxus distributions, being systemd dependency and loaded, in particular, as part of sshd. What could go wrong?
In hindsight, the commercial idea of Red Hat, utilizing the free work of thousands of developers working "just for fun", turned out to be not so brilliant.
On the contrary, this is a good example for why 'vulnerable' OSS projects that have become critical components, for which the original developer has abandoned or lost interest, should be turned over to an entity like RedHat who can assign a paid developer. It's important to do this before some cloak and dagger rando steps out of the shadows to offer friendly help, who oh by the way happens to be a cryptography and compression expert.
A lot of comments in this thread seem to be missing the forest for the trees: this was a multiyear long operation that targeted a vulnerable developer of a heavily-used project.
This was not the work of some lone wolf. The amount of expertise needed and the amount of research and coordination needed to execute this required hundreds of man-hours. The culprits likely had a project manager....
Someone had to stalk out OSS developers to find out who was vulnerable (the xz maintainer had publicly disclosed burnout/mental health issues); then the elaborate trap was set.
The few usernames visible on GitHub are like pulling a stubborn weed that pops up in the yard... until you start pulling on it you don't realize the extensive reality lying beneath the surface.
The implied goal here was to add a backdoor into production Debian and Red Hat EL. Something that would take years to execute. This was NOT the work of one person.
Um, what? This incident is turning into such a big deal because xz is deeply ingrained as a core dependency in the software ecosystem. It's not an obscure tool for "pirates."
Warning, drunk brain talking. But a LLM driven email based "collaborator" could play a very long gMw adding basic features to a code made whilst earning trust backed by a generated online presence. My money is on a resurgance in the Web of Trust.
The web of trust is a really nice idea, but it works badly against that kind of attacks. Just consider that in the real world, most living people (all eight billions) are linked by only six degrees of separation. It really works, for code and for trusted social relations (like "I lend you 100 bucks and you pay me them back when you get your salary") mostly when you know the code author in person.
This is also not a new insight. In the beginning of the naughties, there was a web site named kuro5hin.org, which experiemented with user ratings and trust networks. It turned out impossible to prevent take-overs.
IIRC, kuro5hin and others all left out a crucial step in the web-of-trust approach: There were absolutely no repercussions when you extended trust to somebody who later turned out to be a bad actor.
It considers trust to be an individual metric instead of leaning more into the graph.
(There are other issues, e.g. the fact that "trust" isn't a universal metric either, but context dependent. There are folks whom you'd absolutely trust to e.g. do great & reliable work in a security context, but you'd still not hand them the keys to your car)
At least kuro5hin modeled a degradation of trust over time, which most models still skip.
It'd be a useful thing, but we have a long way to go before there's a working version.
Once you add punishment for handing out trust to bad actors, even in good faith (which you can't prove/disprove anyway), then you also need to somehow provide siginificant rewardsf for handing out trust to good actors - otherwise everyone is going to play it safe and not vouch for anyone and your system becomes useless.
There were experiments back in the day. Slashdot had one system based on randomly assigned moderation duty which worked pretty great actually, except that for the longest time you couldn't sort by it.
Kuro5hin had a system which didn't work at all, as you mentioned.
But the best was probably Raph Levien's Advogato. That had a web of trust system which actually worked. But had a pretty limited scope (open source devs).
Now everyone just slaps an upvote/downvote button on and calls it a day.
You're likely being downvoted because the Github profile looking like east Asian isn't evidence of where the attacker/attackers are from.
Nation states will go to long lengths to disguise their identity. Using broken Russian English when they are not Russian, putting comments in the code of another language, and all sorts of other things to create misdirection.
That's certainly true-- at the very least it "seems" like Asian, but it could very well be from any nation. If they were patient enough to work up to this point they would likely not be dumb enough to leak such information.
I've analysed the backdoor myself and it's very sophisticated, not poorly made at all. The performance problem is surprising in this context, but I think next time they won't make that mistake.
I guess it seems like the operational parts are a bit poorly done. Valgrind issues, adding a new version with symbols removed, the aforementioned performance issues. Like i would assume the type of person who would do this sort of thing, over a 2 year period no less, would test extensively and be sure all their i's are dotted. Its all kind of surprising given how audacious the attack is.
There are so many variations of Linux/FreeBSD and weird setups and environments that it's almost guaranteed that you'll hit a snag somewhere if you do any major modification like inserting a backdoor.
It's hard enough to get code to work correctly; getting it to be also doing something else is even harder.
The way they went around it, however, was brilliant. Completely reduce the variables to directly target whatever it is you're attacking. Reminds me of stuxnet somewhat.
Note that in this case the backdoor was only inserted in some tarballs and enabled itself only when building deb/rpm packages for x86-64 linux and with gcc and the gnu linker. This should already filter out the most exotic setups and makes it harder to reproduce.
But they almost got away with it. We could have found ourselves 5 years later with this code in all stable distribution versions, IoT devices etc.
Also, we only catch the ones that we ... catch. The ones that do everything perfectly, unless they come out and confess eventually, we don't get to "praise" them for their impeccable work.
Do you have a writeup or any details as to what it does? The logical thing based on this post is that it hooks the SSH key verification mechanism to silently allow some attacker-controlled keys but I wonder if there's more to it?
I was starting one, but the openwall message linked here is far more detailed and gets much further than I did. It's fiendishly difficult to follow the exploit.
sshd starts with root privileges and then proceeds to, in summary:[1]
1. Parse command line arguments
2. Setup logging
3. Load configuration files
4. Load keys/certificates into memory (notably including private keys)
5. Listen on a socket/port for incoming connections
6. Spawn a child process with reduced permissions (on Linux, using seccomp filters [2]) to respond to each incoming connection request
This backdoor executes at order 0 before sshd's main function is invoked, overwriting internal sshd functions with compromised ones. As some ideas of what the backdoor could achieve:
1. Leak server private keys during handshakes with users (including unauthenticated users) allowing the keys to be passively stolen
2. Accept backdoor keys as legitimate credentials
3. Compromise random number generation to disable perfect forward secrecy
4. Execute code on the host (supplied remotely by a malicious user) with the 'root' permissions available to sshd upon launch. On most Linux distributions, systemd-analyze security sshd.service will give a woeful score of 9.6/10 (10 being the worst).[3] There is essentially NO sandboxing used because an assumption is made that you'd want to login as root with sshd (or sudo/su to root) and thus would not want to be restricted in what filesystem paths and system calls your remote shell can then invoke.
The same attacker has also added code to Linux kernel build scripts which causes xz to be executed (xz at this point has a backdoor compiled into it) during the build of the Linux kernel where xz compression is used for the resulting image. Using this approach, the attacker can selectively choose to modify certain (or all) Linux kernel builds to do some very nasty things:
1. Leak Wireguard keys allowing them to be passively intercepted.
2. Compromise random number generation, meaning keys may be generated with minimal entropy (see Debian certificate problem from a few years ago).
3. Write LUKS master keys (keys used by dm-crypt for actually decrypting disks) to disks in retrievable format.
4, Introduce remote root code execution vulnerabilities into basic networking features such as TCP/IP code paths.
Sure, however the problem that software is really hard also impacts bad actors. So it's probably at least as hard to write that one line logic bug and have it do exactly what you intended as to write equivalent real code that works precisely as intended.
Pointing dogs (bird dogs) are made to point in the direction where they have perceived game. Good dogs are then not distracted by anything and stand there motionless, sometimes so far that they have to be carried away because they cannot turn away themselves.
Funny how Lasse Collin started to ccing himself and Jia Tan from 2024-03-20 (that was a day of tons of xz kernel patches), he never did that before. :)
It looks like someone may have noticed a unmaintained or lightly maintained project related to various things, and moved to take control of it.
Otherwhere in the discussion here someone mentions the domain details changed; if you have control of the domain you have control of all emails associated with it.
Also interesting, to me, how the GMail account for the backdoor contributor ONLY appears in the context of "XZ" discussions. Google their email address. Suggests a kind of focus, to me, and a lack of reality / genuineness.
> pipeing into this shell script which now uses "eval"
I don’t actually see an issue with that `eval`.
Why would one consider running `xz` followed by `eval`-ing its output more insecure than just running `xz`? If `xz` wants to do shenanigans with the privileges it already has, then it wouldn’t need `eval`’s help for that.
then try to understand the pattern. they backdoored by modifying the build process of packages. now consider the $XZ is also from a backdoored build and the call recognizes in the same way with parameters --robot --version and the shell environment with the hint "xz_wrap.sh" from the piped process. a lot stuff to recognize for the $XZ process that it run as part of a kernel build.
Maybe they put advanced stuff in a backdoored $XZ binary to modify the kernel in a similar way they modified lzma based packages in the build process.
because in order to put backdoor into xz executable, you need to infect its sources. and in order to infect the sources, you need to use a similar technique to hide the modification
"started to cc himself" seems to be simply "contributing to a new project and not having git-send-email fully set up". By default git-send-email Cc the sender, though in practice it's one of the first options one changes.
My favorite part was the analysis of "I'm not really a security researcher or reverse engineer but here's a complete breakdown of exactly how the behavior changes."
You only get this kind of humility when you're working with absolute wizards on a consistent basis.
That's completely crazy, the backdoor is introduced through a very cryptic addition to the configure script. Just looking at the diff, it doesn't look malicious at all, it looks like build script gibberish.
This is my main take-away from this. We must stop using upstream configure and other "binary" scripts. Delete them all and run "autoreconf -fi" to recreate them. (Debian already does something like this I think.)
> We must stop using upstream configure and other "binary" scripts. Delete them all and run "autoreconf -fi" to recreate them.
I would go further than that: all files which are in a distributed tarball, but not on the corresponding git repository, should be treated as suspect.
Distributing these generated autotools files is a relic of times when it could not be expected that the target machine would have all the necessary development environment pieces. Nowadays, we should be able to assume that whoever wants to compile the code can also run autoconf/automake/etc to generate the build scripts from their sources.
And other than the autotools output, and perhaps a couple of other tarball build artifacts (like cargo simplifying the Cargo.toml file), there should be no difference between what is distributed and what is on the repository. I recall reading about some project to find the corresponding commit for all Rust crates and compare it with the published crate, though I can't find it right now; I don't know whether there's something similar being done for other ecosystems.
One small problem with this is that autoconf is not backwards-compatible. There are projects out there that need older autoconf than distributions ship with.
The test code generated by older autoconf is not going to be work correctly with newer GCC releases due to the deprecation of implicit int and implicit function declarations (see https://fedoraproject.org/wiki/Changes/PortingToModernC), so these projects already have to be updated to work with newer autoconf.
Typing `./configure` wont work but something like `./configure CFLAGS="-Wno-error=implicit-function-declaration"` (or whatever flag) might work (IIRC it is possible to pass flags to the compiler invocations used for checking out the existence of features) without needing to recreate it.
Also chances are you can shove that flag in some old `configure.in` and have it work with an old autoconf for years before it having to update it :-P.
Yes, it sucks to add yet another wrapper but that’s what you get for choosing non backwards compatible tools in the first place. In combination with projects that don’t keep up to date on supporting later versions.
> Why do we distribute tarballs at all? A git hash should be all thats needed...
A git hash means nothing without the repository it came from, so you'd need to distribute both. A tarball is a self-contained artifact. If I store a tarball in a CD-ROM, and look at it twenty years later, it will still have the same complete code; if I store a git hash in a CD-ROM, without storing a copy of the repository together with it, twenty years later there's a good chance that the repository is no longer available.
We could distribute the git hash together with a shallow copy of the repository (we don't actually need the history as long as the commit with its trees and blobs is there), but that's just reinventing a tarball with more steps.
(Setting aside that currently git hashes use SHA-1, which is not considered strong enough.)
except it isn't reinventing the tarball, because the git hash forces verification that every single file in the repo matches that in the release.
And git even has support for "compressed git repo in a file" or "shallow git repo in a file" or even "diffs from the last release, compressed in a file". They're called "git bundle"'s.
They're literally perfect for software distribution and archiving.
People don't know how to use git hashes, and it's not been "done". Whereas downloading tarballs and verifying hashes of the tarball has been "good enough" because the real thing it's been detecting is communication faults, not supply chain attacks.
People also like version numbers like 2.5.1 but that's not a hash, and you can only indirectly make it a hash.
> I would go further than that: all files which are in a distributed tarball, but not on the corresponding git repository, should be treated as suspect.
This and the automated A/B / diff to check the tarball against the repo, flag if mismatched.
I don't think it would help much. I work on machine learning frameworks. A lot of them(and math libraries) rely on just in time compilation. None of us has the time or expertise to inspect JIT-ed assembly code. Not even mentioning that much of the code deliberately read/write out of bound, which is not an issue if you always add some extra bytes at the end of each buffer, which could make most memory sanitizer tools useless. When you run their unit tests, you run the JIT code, then a lot of things could happen. Maybe we should ask all packaging systems splitting their build into compile and test two stages, to ensure that a testing code would not impact the binaries that are going to be published.
I would rather to read and analysis the generated code instead of the code that generates it.
Maybe it's time to dramatically simplify autoconf?
How long do we need to (pretend to) keep compatibility with pre-ANSI C compilers, broken shells on exotic retro-unixes, and running scripts that check how many bits are in a byte?
Not just autoconf. Build systems in general are a bad abstraction, which leads to lots and lots of code to try to make them do what you want. It's a sad reality of the mismatch between a prodecural task (compile files X, Y, and Z into binary A) and what we want (compile some random subset of files X, Y, and Z, doing an arbitrary number of other tasks first, into binary B).
For fun, you can read the responses to my musing that maybe build systems aren't needed: https://news.ycombinator.com/item?id=35474996 (People can't imagine programming without a build system - it's sad)
Autoconf is m4 macros and Bourne shell. Most mainstream programming languages have a packaging system that lets you invoke a shell script. This attack is a reminder to keep your shell scripts clean. Don't treat them as an afterthought.
I'm wondering is there i.e. no way to add an automated flagging system that A/B / `diff` checks the tarball contents against the repo's files and warns if there's a mismatch? This would be on i.e. GitHub's end so that there'd be this sort of automated integrity test and subsequent warning? Just a thought, since tainted tarballs like these might be altogether be (and become) a threat vector, regardless of the repo.
It looks like an earlier commit with a binary blob "test data" contained the bulk of the backdoor, then the configure script enabled it, and then later commits patched up valgrind errors caused by the backdoor. See the commit links in the "Compromised Repository" section.
Also, seems like the same user who made these changes are still submitting changes to various repositories as of a few days ago. Maybe these projects need to temporarily stop accepting commits until further review is done?
The use of "eval" stands out, or at least it should stand out – but there are two more instances of it in the same script, which presumably are not used maliciously.
A while back there was a discussion[0] of an arbitrary code execution vulnerability in exiftool which was also the result of "eval".
Avoiding casual use of this overpowered footgun might make it easier to spot malicious backdoors. Usually there is a better way to do it in almost all cases where people feel the need to reach for "eval", unless the feature you're implementing really is "take a piece of arbitrary code from the user and execute it".
Unfortunately eval in a shell script has an effect on the semantics but is not necessary to do some kind of parsing of the contents of a variable, unlike Python or Perl or JavaScript. A
$goo
line (without quotes) will already do word splitting, though it won't do another layer of variable expansion and unquoting, for which you'll need
You can be certain it has happened, many times. Now think of all the software we mindlessly consume via docker, language package managers, and the like.
Remember, there is no such thing as computer security. Make your decisions accordingly :)
A big part of the problem is all the tooling around git (like the default github UI) which hides diffs for binary files like these pseudo-"test" files. Makes them an ideal place to hide exploit data since comparatively few people would bother opening a hex editor manually.
How many people read autoconf scripts, though? I think those filters are symptom of the larger problem that many popular C/C++ codebases have these gigantic build files which even experts try to avoid dealing with. I know why we have them but it does seem like something which might be worth reconsidering now that the tool chain is considerably more stable than it was in the 80s and 90s.
The alternatives are _better_ but still not great. build.rs is much easier to read and audit, for example, but it’s definitely still the case that people probably skim past it. I know that the Rust community has been working on things like build sandboxing and I’d expect efforts to be a lot easier there than in a mess of m4/sh where everyone is afraid to break 4 decades of prior usage.
build.rs is easier to read, but it's the tip of the iceberg when it comes to auditing.
If I were to sneak in some underhanded code, I'd do it through either a dependency that is used by build.rs (not unlike what was done for xz) or a crate purporting to implement a very useful procedural macro...
I mean, autoconf is basically a set of template programs for snffing out whether a system has X symbol available to the linker. Any replacement for it would end up morphing into it over time.
We have much better tools now and much simpler support matrices, though. When this stuff was created, you had more processor architectures, compilers, operating systems, etc. and they were all much worse in terms of features and compatibility. Any C codebase in the 90s was half #ifdef blocks with comments like “DGUX lies about supporting X” or “SCO implemented Y but without option Z so we use Q instead”.
Even in binary you can see patterns. Not saying it's perfect to show binary diffs (but it is better than showing nothing) but I know even my slow mammalian brain can spot obvious human readable characters in various binary encoding formats. If I see a few in a row which doesn't make sense, why wouldn't I poke it?
This particular file was described as an archive file with corrupted data somewhere in the middle. Assuming you wanted to scroll that far through a hexdump of it, there could be pretty much any data in there without being suspicious.
You're right - the two exploit files are lzma-compressed and then deliberately corrupted using `tr`, so a hex dump wouldn't show anything immediately suspicious to a reviewer.
Is this lzma compressed? Hard to tell because of the lack of formatting, but this looks like amd64 shellcode to me.
But that's not really important to the point - I'm not looking at a diff of every committed favicon.ico or ttf font or a binary test file to make sure it doesn't contain a shellcode.
testdata should not be on the same machine as the build is done. testdata (and tests generally) aren't as well audited, and therefore shouldn't be allowed to leak into the finished product.
Sure - you want to test stuff, but that can be done with a special "test build" in it's own VM.
in this case the backdoor was hidden in a nesting doll of compressed data manipulated with head/tail and tr, even replacing byte ranges inbetween. Would've been impossible to find if you were just looking at the test fixtures.
> "Given the activity over several weeks, the committer is either directly
involved or there was some quite severe compromise of their
system. Unfortunately the latter looks like the less likely explanation, given
they communicated on various lists about the "fixes" mentioned above."
So when are we going to stop pretending that OSS maintainers/projects are reaping what they sow when they "work for free" and give away their source code away using OSS licensed software, while large companies profit off of them? If they were paid more (or in some cases even actually paid), then they could afford to quit their day jobs, reducing burn out, they could actually hire a team of trusted vetted devs instead of relying on the goodwill of strangers who step up "just to help them out" and they could pay security researchers to vet their code.
Turns out burned out maintainers are a great attack vector and if you are willing to play the long game you can ingratiate yourself with the community with your seemingly innocuous contributions.
That's true, but many of these maintainers work a day job on top of doing the open source work precisely because the open source work doesn't pay the bills. If they could get back 40 hours of their time I think many would appreciate it
I'm not sure that we are. Doesn't everybody know that developing/maintaining free software is largely thankless work, with little to no direct recompense?
I don't think moving towards unfree software is a good way to make free software more secure.
It shouldn't be a surprise that proprietary software is less likely to be exploited in this way simply because they don't accept any patches from outside of the team.
What you want is more people that understand and care about free software and low barriers to getting involved.
> Doesn't everybody know that developing/maintaining free software is largely thankless work, with little to no direct recompense?
No I don't think that is a universally acknowledged feeling. Numerous maintainers have detailed recieving entitled demands from users, as if they were paying customers to the open source software projects. Georges Stavracas' interview on the Tech over Tea podcast^1 describes many such experiences. Similarly, when Aseprite transitioned its license^2 to secure financial stability, it faced backlash from users accusing the developer of betraying and oppressing the community.
On the flipside, if everyone truly does know this is the case, then it's a shame that so many people know, and yet are unwilling to financially support developers to change that. See all of the developers for large open source projects who have day jobs, or take huge pay cuts to work on open source projects. I get that not everyone can support a project financially, but I've personally tried to break that habit of expecting everything I use to be free, and go out of my way to look for donation buttons for project maintainers, and raise awareness during fundraisers. Now if only I could donate directly to Emacs development... I'd encourage other people to do the same.
> What you want is more people that understand and care about free software and low barriers to getting involved.
This is tough. For example, the intention behind initiatives like DigitalOcean's Hacktoberfest, are designed to do just this. It is a good idea in theory, submit 4 pull requests and win a tshirt, but not in practice. The event has been criticized for inadvertently encouraging superficial contributions, such as minor text edits or trivial commits, which burden maintainers^3, causing many maintainers to just archive their repos for the month of October.
So, while there's a recognition of the need for more people who understand and value free software, along with lower barriers to entry, the current state of affairs often falls short. The path forward should involve not just increasing awareness and participation but also providing meaningful support and compensation to maintainers. By doing so, we can foster a more sustainable, secure, and vibrant open source community. Or at least that is how I feel...
To be clear, I'm not at all against compensating developers for their work. I am not trying to argue that people do not need to be supported financially, or that you shouldn't donate, or that no one should be able to make a living working on free software, and so on.
What I'm saying is that paying people (or having a trusted security team) to work on software necessarily makes it less free. Note that "less free" doesn't mean worthless and absolutely free isn't the ideal.
Sorry, I used "everybody" to mean a subset of everybody -- the "we" that you referred to, or people generally involved in open source software development.
> it's a shame that so many people know, and yet are unwilling to financially support developers to change that.
Regardless of which set of everyone, this is undoubtedly the case. However, I'm not sure that paying (some of) the developers a wage is the best way to improve the software, particularly as free software.
> initiatives like DigitalOcean's Hacktoberfest, are designed to do just this.
You've got this backwards.
Hacktoberfest is a scheme to pay (more) people to contribute (more) to open source projects.
This is an example of why paying people to work on open source doesn't necessarily improve the software.
It also doesn't lower any barriers, it just increases the incentive to overcome them.
So while this might increase the number of people contributing[0] to open source projects,
it doesn't directly increase the number of people who understand and care about the specific project they're contributing to, let alone the broader free software movement.
In short, you can't pay people to care.
[0] according to their pretty weak metric for what contributing is
Just FYI, krygorin4545@proton.me (the latest message before the upload) was created Tue Mar 26 18:30:02 UTC 2024, about an hour earlier than the message was posted.
Proton generates PGP key upon creating the account, with the real datetime of the key (but the key does not include the timezone).
That account seems to be a contributor for xz though, you can see him interact a lot with the author of the backdoor on the GitHub repo. Some pull requests seem to be just the two of them discussing and merging stuff (which is normal but looks weird in this context)
Even if they have a "real" picture or a credible description that is not good enough. Instead of using an anime character a malicious actor could use an image generator [0], they could generate a few images, obtain something credible to most folks, and use that to get a few fake identities going. Sadly, trusting people to be the real thing and not a fake identity on the Internet is difficult now and it will get worse.
You can quite easily generate a realistic photo, bio, even entire personal blogs and GitHub projects, using generative AI, to make it look like it's a real person.
That name jumped out at me, Hans Jansen is the name Dominic Monaghan used when posing as a German interviewer with Elijah Woods. Not that it can't be a real person
I'd love to be at Microsoft right now and have the power to review this user's connection history to Github, even though VPN exists, many things can be learned from connection habits, links to ISPs, maybe even guess if VPNs were used, roundtrip time on connections can give hints.
I really don't think some random guy wants to weaken ssh just to extract some petty ransomware cash from a couple targets.
> Note: GitHub automatically includes two archives Source code (zip) and Source code (tar.gz) in the releases. These archives cannot be disabled and should be ignored.
The author was thinking ahead! Latest commit hash for this repo: 8a3b5f28d00ebc2c1619c87a8c8975718f12e271
It's very common in autoconf codebases because the idea is that you untar and then run `./configure ...` rather than `autoreconf -fi && ./configure ...`. But to do that either you have to commit `./configure` or you have to make a separate tarball (typically with `make dist`). I know because two projects I co-maintain do this.
It's common but it's plain wrong. A "release" should allow to build the project without installing dependencies that are only there for compilation.
Autotools are not guaranteed to be installed on any system. For example they aren't on the OSX runners of GitHub Action.
It's also an issue with UX. autoreconf fails are pretty common. If you don't make it easy for your users to actually use your project, you lose out on some.
> [...] A "release" should allow to build the project without installing dependencies that are only there for compilation.
Built artifacts shouldn't require build-time dependencies to be installed, yes, but we're talking about source distributions. Including `./configure` is just a way of reducing the configuration-/build-time dependencies for the user.
> Autotools are not guaranteed to be installed on any system. [...]
Which is why this is common practice.
> It's common but it's plain wrong.
Strong word. I'm not sure it's "plain wrong". We could just require that users have autoconf installed in order to build from sources, or we could commit `./configure` whenever we make a release, or we could continue this approach. (For some royal we.)
But stopping this practice won't prevent backdoors. I think a lot of people in this thread are focusing on this as if it was the source of all evils, but it's really not.
Autotools are not backwards-compatible. Often only a specific version of autotools works. Only the generated configure is supposed to be portable.
It's also not the distribution model for an Autotools project. Project distributions would include a handwritten configure file that users would run: The usual `./configure && make && make install`. Since those configure scripts became more and more complex for supporting diverse combinations of compiler and OS, the idea of autotools was for maintainers to generate it. It was not meant to be executed by the user: https://en.wikipedia.org/wiki/GNU_Autotools#Usage
For running autoreconf you need to have autotools installed and even then it can fail.
I have autotools installed and despite that autoreconf fails for me on the xz git repository.
The idea of having configure as a convoluted shell script is that it runs everywhere without any additional. If it isn't committed to the repository you're burdening your consumers with having compilation dependencies installed that are not needed for running your software.
Yes...For running gcc you need to have gcc installed.
You don’t need gcc to run the software. It’s not burdening anyone that gcc was needed to build the software.
It’s very standard practice to have development dependencies. Why should autoconf be treated exceptionally?
If they fail despite being available it’s either a sign of using a fragile tool or a badly maintained project. Both can be fixed without shipping a half-pre-compiled-half-source repo.
The configure script is not a compilation artifact.
The more steps you add to get final product the more errors are possible. It's much easier for you as the project developer to generate the script so you should do it.
If it's easier for you to generate the binary, you should do it as well (reproducible binaries of course). That's why Windows binaries are often shipped. With Linux binaries this is much harder (even though there are solutions now). With OSX it depends if you have the newest CPU architecture or not.
> If it's easier for you to generate the binary, you should do it as well (reproducible binaries of course).
I think that's the crux of what you're saying. But consider that if Fedora, Debian, etc. accepted released, built artifacts from upstreams then it would be even easier to introduce backdoors!
Fedora, Debian, Nix -all the distros- need to build from sources, preferably from sources taken from upstreams' version control repositories. Not that that would prevent backdoors -it wouldn't!- but that it would at least make it easier to investigate later as the sources would all be visible to the distros (assuming non-backdoored build tools).
For a long time, there was one legitimately annoying disadvantage to the git-generated tarballs though - they lost tagging information. However, since git 2.32 (released June 2021; presumably available on GitHub by August 2021 when they blogged about it) you can use `$Format:%(describe)$` ... limited to once per repository for performance reasons.
Comment from Andres Freund on how and why he found it [0] and more information on the LWN story about the backdoor. Recommend people read this to see how close we came (and think about what this is going to mean for the future).
A mirror of the offending repository created by someone else is available at [1]. GitHub should be keeping the evidence in the open (even if just renamed or archived in a safer format) instead of deleting it/hiding it away.
The offending tarball for v5.6.1 is easier to find, an example being.[2]
m4/.gitignore was updated 2 weeks ago to hide build-to-host.m4 that is only present in the release tarball and is used to inject the backdoor at build time.[3]
Definitely looking like they were most likely some sort of state actor. This is very well done and all in plain sight. It's reassuring that it was discovered but given a simple audit of the release build artifacts would have raised alarms, how prevalent is this behavior in other projects? Terrifying stuff.
A lot of eyes will be dissecting this specific exploit, and investigating this specific account, but how can we find the same kind of attack in a general way if it’s being used in other projects and using other contributor names?
1. Everything must be visible. A diff between the release tarball and tag should be unacceptable. It was hidden from the eyes to begin with.
2. Build systems should be simple and obvious. Potentially not even code. The inclusion was well hidden.
3. This was caught through runtime inspection. It should be possible to halt any Linux system at runtime, load debug symbols and map _everything_ back to the source code. If something can't map back then regard it as a potentially malicious blackbox.
There has been a strong focus and joint effort to make distributions reproducible. What we haven't managed though is prove that the project compromises only of freshly compiled content. Sorta like a build time / runtime "libre" proof.
This should exist for good debugging anyway.
It wouldn't hinder source code based backdoors or malicious vulnerable code. But it would detect a backdoor like this one.
Just an initial thought though, and probably hard to do, but not impossibly hard, especially for a default server environment.
Build-related fixes are only treating the symptoms, not the disease. The real fix would be better sandboxing and capability-based security[1] built into major OSes which make backdoors a lot less useful. Why does a compression library have the ability to "install an audit hook into the dynamic linker" or anything else that isn't compressing data? No amount of SBOMs, reproducible builds, code signing, or banning binaries will change the fact that one mistake anywhere in the stack has a huge blast radius.
Note that the malicious binary is fairly long and complex.
This attack can be stopped by disallowing any binary testdata or other non-source code to be on the build machines during a build.
You could imagine a simple process which checks out the code, then runs some kind of entropy checker over the code to check it is all unminified and uncompressed source code, before finally kicking off the build process.
autogenerated files would also not be allowed to be in the source repo - they're too long and could easily hide bad stuff. Instead the build process should generate the file during the build.
This requires a more comprehensive redesign of the build process. Most Linux distributions also run the tests of the project they're building as part of the build process.
Profile guided optimization is, unfortunately, wildly powerful. And it has a hard requirement that a casual link exists from test data (or production data!) to the build process.
We should be able to produce a tar and a proof that tar was produced from a specific source code.
Quote from the article:
That line is not in the upstream source of build-to-host, nor is build-to-host used by xz in git.
Zero Knowledge virtual machines, like cartesi.io, might help with this. Idea is to take the source, run a bunch of computational steps (compilation & archiving) and at the same time produce some kind of signature that certain steps were executed.
The verifiers can then easily check that the signature and indeed be convinced that the code was executed as it is claimed and source code wasn't tampered with.
The advantage of Zero-Knowledge technology in this case is that one doesn't need to repeat the computational steps themselves nor rely on a trusted party to do it for them (like automated build - that can also be compromised by the state actors). Just having the proof solves this trust problem mathematically: if you have the proof & the tar, you can quickly check source code that produced the tar wasn't modified.
I don’t think zero knowledge systems are practical at the moment. It will take over around 8 orders of magnitude more compute and memory to produce a ZKP proof of generic computation like compilation. Even 2 orders of magnitude is barely acceptable.
I haven't looked at Guix but in the discussions around this exploit for NixOS they mentioned that regenerating autoshit for xz-utils would not be something they can/want to do because that would add a lot more dependencies to the bootstrap before other packages can be build. Kind of funny how a requirement for bootstrapped builds can add a requirement for trusting not-quite-binaries-but-also-not-really-source blobs.
More reproducible builds, maybe even across distributions? Builds based on specific commits (no tarballs like in this case), possibly signed (just for attribution, not for security per se)? Allow fewer unsafe/runtime modifications The way oss-fuzz ASAN was disabled should've been a warning on its own, if these issues weren't so common.
I'm not aware of any efforts towards it, but libraries should also probably be more confined to only provide intended functionality without being able to hook elsewhere?
NixOS/Pkgs 23.11 unaffected, unstable contains backdoored implementations (5.6.0, 5.6.1) but their OpenSSH sshd does not seem to link against systemd/liblzma, and the backdoor doesn't get configured in (only happens on .deb/.rpm systems).
Right, but in this case it's not even compiled it, which is arguably better than compiled in but assumed dormant :) (at least until someone actually does a full analysis of the payload).
Note that NixOS has a unique advantage in that `dlopen` is easier to analyze, but you do have to check for it. A lot of people are looking only at `ldd` and missing that they can be vulnerable at runtime.
Not affected by the latest CVE, but the author had unrestricted access to xz for 2 years, so I would say it is affected until the other contributions are proven safe (never gonna happen) or it reverts to pre-adversarial actor version.
That's one of the advantages of NixOS - viruses and mass hacks have lesser chance to function due to how different this OS is. Until it gets more popular, of course.
It's actually not an advantage. The reason why the exploit wasn't included is because the attacker specifically decided to only inject x86_64 Debian and RHEL to reduce the chances of this getting detected.
I looked at the differences between the GitHub repository and released packages. About 60 files are in a release package that are not in the repo (most are generated files for building) but also some of the .po files have changes.
That's devastating.
If you don't build your release packages from feeding "git ls-files" into tar, you are doing it wrong.
Although if I look at its documentation, it's already a somewhat complicate invocation with unclear effects (lots of commandline options). Git seems to not be able to do KISS.
git ls-files and tar is a simple thing everybody understands and can do without much issues.
The latest commit from the user who committed those patches is weirdly a simplification of the security reporting process, to not request as much detail:
I think the reason is pretty obvious. They want you to waste more time after you've submitted the security report and maximize the amount of back and forth. Basically the hope is that they'd be able to pester you with requests for more info/details in order to "resolve the issue" which would give them more time to exploit their targets.
Potentially the purpose is that if someone goes to the effort to get those details together, they are more likely to send the same report to other trusted individuals. Maybe it was originally there to add legitimacy, then they got a report sent in, and removed it to slow the spread of awareness
Most people, to find the affected versions, would either have to bisect or delve deep enough to find the offending commit. Either of which would reveal the attacker.
By not asking for the version, there is a good chance you just report "It's acting oddly, plz investigate".
It looks like the person who added the backdoor is in fact the current co-maintainer of the project (and the more active of the two): https://tukaani.org/about.html
Why has Github disabled the (apparently official) xz repository, but left the implicated account open to the world? It makes getting caught up on the issue pretty difficult, when GitHub has revoked everyone's access to see the affected source code.
The account has been suspended for a while, but for whatever reason that's not displayed on the profile itself (can be seen at https://github.com/Larhzu?tab=following). Repo being disabled is newer, and, while annoying and realistically likely pointless, it's not particularly unreasonable to take down a repository including a real backdoor.
Taking down the repo prevents more people inadvertendly pulling and building the backdoor so that makes sense. They should have immediately rehosted and archived the state at a different URL which makes it clear to not use it.
The author (Jia Tan) also changed the xz.tukaani.org (actually the github.io, where the main contributor is, surprise, also them) release description to state all new releases are signed by their OpenPGP key. I'd guess that was one of the first steps to a complete project takeover.
I hope Lasse Collin still has control of his accounts, though the CC on the kernel mailing list looks kind of suspicious to me.
The backdoor is not in the C source directly, but a build script uses data from files in the test dir to only create the backdoor in the release tars. Did I summarize that correctly?
That's how I understand it. A build script that's in the releases tarballs but not the git repo, checks to see if it's being run as part of the debian/build or rpm build processes, and then injects content from one of the "test" files.
I could imagine another similar attack done against an image processing library, include some "test data" of corrupted images that should "clean up" (and have it actually work!) but the corruption data itself is code to be run elsewhere.
"Amazon Linux customers are not affected by this issue, and no action is required. AWS infrastructure and services do not utilize the affected software and are not impacted. Users of Bottlerocket are not affected."
I read through the entire report and it gradually got more interesting. Then, I got to the very end, saw Andres Freund's name, and it put a smile on my face. :)
Who else would have run a PostgreSQL performance benchmark and discover a major security issue in the process?
This is another proof that systemd is an anti-pattern for security: with its crawling and ever growing web of dependencies, it extends the surface of vulnerability to orders of magnitude, and once embraced not even large distro communities can defend you from that.
A malware code injection in upstream xz-tools is a vector for remote exploitation of the ssh daemon due to a dependency on systemd for notifications and due to systemd's call to dlopen() liblzma library (CVE-2024-3094). The resulting build interferes with authentication in sshd via systemd.
Please take the systemd trolling to Reddit. They likely targeted xz specifically because it’s so widely used but there are dozens of other libraries which are potential candidates for an attack on sshd, much less everything else which has a direct dependency unrelated to systemd (e.g. dpkg).
Rather than distracting, think about how the open source projects you use would handle an attack like this where someone volunteers to help a beleaguered maintainer and spends time helpfully taking on more responsibilities before trying to weaken something.
Those other libraries dependend on by sshd are hopefully more closely monitored. The upstream sshd developers probably did not even consider that liblzma could end up being loaded in the process.
Make excuses for systemd all you want but loading multiple additional libraries into crytical system deamons just to write a few bytes into a socket is inexcusable and directly enabled this attack vector.
You are distracting from facts with speculations and trolling FUD. I refer to what is known and has happened, you are speculating on what is not known.
Your claim is an appeal to emotion trying to build support for a position the Linux community has largely rejected. Starting with the goal rather than looking unemotionally at the facts means that you’re confusing your goal with the attackers’ – they don’t care about a quixotic attempt to remove systemd, they care about compromising systems.
Given control of a package which is on most Linux systems and a direct dependency of many things which are not systemd - run apt-cache rdepends liblzma5! – they can choose whatever they want to accomplish that goal. That could be things like a malformed archive which many things directly open or using something similar to this same hooking strategy to compromise a different system component. For example, that includes things like kmod and dpkg so they could target sshd through either of those or, if their attack vector wasn’t critically dependent on SSH, any other process running on the target. Attacking systemd for this is like saying Toyotas get stolen a lot without recognizing that you’re just describing a popularity contest.
Actually you have a point. A collection of shell scripts (like the classical init systems) have obviously a smaller attack surface. In this case the attacker used some integration code with systemd to attack the ssh daemon. So sshd without systemd integration is safe against this specific attack.
In general, I’m not convinced that systemd makes things less secure. I have the suspicion that the attacker would just have used a different vector, if there was no systemd integration. After all it looks like the attacker was also trying to integrate exploits in owner libraries, like zstd.
Still I would appreciate it, if systemd developers would find a better protection against supply chain attacks.
It’s also tricky to reason about risk: for example, ShellShock caused a bunch of vulnerabilities in things which used shell scripts and the classic SysV Init system was a factor in a ton of vulnerabilities over the years because not having a standard way to do things like drop privileges or namespace things, manage processes, difficulties around managing chroot, etc. meant that you had a bunch of people implementing code which ran with elevated privileges because it needed to do things like bind to a low network port and they either had vulnerabilities in the privileged part or messed up some detail. I think in general it’s been much better in the systemd era where so much of that is builtin but I have been happy to see them starting to trim some of the things like the compression format bindings and I expect this will spur more.
I really appreciate your tone and dialectic reasoning, thanks for your reply. And yes, as simple as it sounds, I believe that shell scripts help a lot to maintain mission critical tools. One hands-on example is https://dyne.org/software/tomb where I took this approach to replace whole disk encryption which is nowadays also dependent on systemd-cryptsetup.
3. Aren't =, != etc. used to compare strings and -eq, -ne, -gt etc. used to compare numbers? I see lot of numbers compared as strings, e.g.:
[ $? = 0 ]
[ $? != 0 ]
[ $exitcode = 0 ]
4. There are lot of "cat <<EOF" blocks without indentation. I understand that this is made because the shell expects "EOF" on the line start, but there is a special syntax designed on purpose for this use case, simply put a dash between << and the token, e.g. "cat <<-EOF".
In this case:
tomb_init() {
system="`uname -s`"
case "$system" in
FreeBSD)
cat <<-EOF
create=posix_create
format=posix_format
map=posix_map
mount=freebsd_mount
close=freebsd_close
EOF
;;
Linux)
thanks for your review! tho you are referring to the tomb-portable unfinished experiment which is about to be dismissed since cross-platform experiments with veracrypt show very bad performance.
you are welcome to share a review of the tomb script, but be warned in that we use a lot of zsh specific features. It is a script that works since 15+ years so it has a discrete amount of patchwork to avoid regressions.
It is 10 and more years that I experience such ad-hominem attacks.
You are so quickly labeling an identifiable professional as troll, while hiding behind your throwaway identity, that I am confident readers will be able to discern.
Our community is swamped by people like you, so I will refrain from answering further provocations, believing I have provided enough details to back my assertion.
For bad-3-corrupt_lzma2.xz, the claim was that "the original files were generated with random local to my machine. To better reproduce these files in the future, a constant seed was used to recreate these files." with no indication of what the seed was.
I got curious and decided to run 'ent' https://www.fourmilab.ch/random/ to see how likely the data in the bad stream was to be random. I used some python to split the data into 3 streams, since it's supposed to be the middle one that's "bad":
I used this regex to split in python, and wrote to "tmp":
re.split(b'\xfd7zXZ', x)
I manually used dd and truncate to strip out the remaining header and footer according to the specification, which left 48 bytes:
$ ent tmp2 # bad file payload
Entropy = 4.157806 bits per byte.
Optimum compression would reduce the size
of this 48 byte file by 48 percent.
Chi square distribution for 48 samples is 1114.67, and randomly
would exceed this value less than 0.01 percent of the times.
Arithmetic mean value of data bytes is 51.4167 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is 0.258711 (totally uncorrelated = 0.0).
$ ent tmp3 # urandom
Entropy = 5.376629 bits per byte.
Optimum compression would reduce the size
of this 48 byte file by 32 percent.
Chi square distribution for 48 samples is 261.33, and randomly
would exceed this value 37.92 percent of the times.
Arithmetic mean value of data bytes is 127.8125 (127.5 = random).
Monte Carlo value for Pi is 3.500000000 (error 11.41 percent).
Serial correlation coefficient is -0.067038 (totally uncorrelated = 0.0).
The data does not look random. From https://www.fourmilab.ch/random/ for the Chi-square Test, "We interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random. If the percentage is between 99% and 95% or between 1% and 5%, the sequence is suspect. Percentages between 90% and 95% and 5% and 10% indicate the sequence is “almost suspect”."
Here's a handy bash script I threw together to audit any docker containers you might be running on your machine. It's hacky, but will quickly let you know what version, if any, of xz, is running in your docker containers.
``` #!/bin/bash
# Get list of all running Docker containers containers=$(docker ps --format "{{.Names}}")
# Loop through each container for container in $containers; do # Get container image image=$(docker inspect --format='{{.Config.Image}}' "$container")
# Execute xz --version inside the container
version=$(docker exec "$container" xz --version)
# Write container name, image, and command output to a text file
echo "Container: $container" >> docker_container_versions.txt
echo "Image: $image" >> docker_container_versions.txt
echo "xz Version:" >> docker_container_versions.txt
echo "$version" >> docker_container_versions.txt
echo "" >> docker_container_versions.txt
done
echo "Output written to docker_container_versions.txt"
```
Sadly this is exactly one of the cases where open source is much more vulnerable to a state actor sponsored attack than proprietary software. (it is also easier to find such backdoors in OS software but that's BTW)
Why? Well, consider this, to "contribute" to a proprietary project you need to get hired by a company, go through their he. Also they have to be hiring in the right team etc. Your operative has to be in a different country, needs a CV that checks out, passports/ids are checked etc.
But to contribute to an OS project? You just need an email address. Your operative sends good contributions until they build trust, then they start introducing backdoors in the part of the code "no one, but them understands".
The cost of such attack is a lot lower for a state actor so we have to assume every single OS project that has a potential to get back doored had many attempts of doing so. (proprietary software too, but as mentioned, this is much more expensive)
So what is the solution? IDK, but enforcing certain "understandability" requirements can be a part of it.
Is that true? Large companies producing software usually have bespoke infra, which barely anyone monitors. See: the Solarwinds hack. Similarly to the xz compromise they added the a Trojan to the binary artifacts by hijacking the build infrastructure. According to Wikipedia "around 18,000 government and private users downloaded compromised versions", it took almost a year for somebody to detect the trojan.
Thanks to the tiered updates of Linux distros, the backdoor was caught in testing releases, and not in stable versions. So only a very low percentage of people were impacted. Also the whole situation happened because distros used the tarball with a "closed source" generated script, instead of generating it themselves from the git repo. Again proving that it's easier to hide stuff in closed source software that nobody inspects.
Same with getting hired. Don't companies hire cheap contractors from Asia? There it would be easy to sneak in some crooked or even fake person to do some dirty work. Personally I was even emailed by a guy from China who asked me if I was willing to "borrow" him my identity so he could work in western companies, and he would share the money with me. Of course I didn't agree, but I'm not sure if everybody whose email he found on Github did.
> Well, consider this, to "contribute" to a proprietary project you need to get hired by a company, go through their he.
Or work for a third-party company that gets access to critical systems without any checks. See for example the incident from 2022 here: https://en.wikipedia.org/wiki/Okta,_Inc.
Or a third-party that rents critical infrastructure to the company (Cloud, SaaS solutions).
Or exactly this kind of backdoor in open source but target proprietary software. I don't know of any survey but I'd be surprised if less than half of proprietary software used open source software one way or another and not surprised if it was quite a bit more than that.
It's wild that this could have laid dormant for far longer if the exploit was better written-- if it didn't spike slow down logins or disturb valgrind.
So many security companies publishing daily generic blog posts about "serious supply chain compromises" in various distros on packages with 0 downloads, and yet it takes a developer debugging performance issues to find an actual compromise.
I worked in the software supply chain field and cannot resist feeling the entire point of that industry is to make companies pay for a security certificate so you can shift the blame onto someone else when things go wrong.
> cannot resist feeling the entire point of that industry is to make companies pay for a security certificate so you can shift the blame onto someone else when things go wrong.
That's the entire point. You did everything you could by getting someone else look at it and saying it's fine.
This needs a Rust joke. You know, the problem with the whole certification charade is it slows down jobs and prevents __actual_problems getting evaluated. But is it safe?
Thats basically the whole point actually... A company pays for insurance for the business. The insurance company says sure we will insure you, but you need to go through audits A B and C, and you need certifications X and Y to be insured by us. Those audits are often industry dependent, mostly for topics like HIPAA, PCI, SOC, etc.
Insurance company hears about supply chain attacks. Declares that insured must have supply chain validation. Company goes and gets a shiny cert.
Now when things go wrong, the company can point to the cert and go "it wasnt us, see we have the cert you told us to get and its up to date". And the company gets to wash their hands of liability (most of the time).
What you describe is a normal process in order to minimise damage from attacks. The damage of hacking is ultimately property damage. The procedures you've described allow you to minimise it.
If you installed xz on macOS using brew, then you have
xz (XZ Utils) 5.6.1
liblzma 5.6.1
which are within the release target for the vuln. As elsewhere in these comments, people say macOS effect is uncertain. If concerned you can revert to 5.4.6 with
Yeah it was when I posted the comment too. That's why you could type brew upgrade xz and it went back to 5.4.6 I guess? But it might have been around that time, cutting it fine, not out for everybody. I don't know. Comment race condition haha! :)
> the entire point of that industry is to make companies pay for a security certificate so you can shift the blame onto someone else when things go wrong.
That is actually a major point of a lot of corporate security measures (shifting risk)
That's the entire point of certification, and any certification at all. Certification does not guarantee performance. Actually, I would always cast a suspect glance to anyone who is FOCUSED on getting certification after certification without any side project.
When I search for "digital masquerade" on Google, the first result is a book with this title from the author Jia Tan. I assume that is how the attackers got their fake name. Or they think using this author's name is a joke.
A lot of software (including https://gitlab.com/openconnect/openconnect of which I'm a maintainer) uses libxml2, which in turn transitively links to libzma, using it to load and store compressed XML.
I'm not *too* worried about OpenConnect given that we use `libxml2` only to read and parse uncompressed XML…
But I am wondering if there has been any statement from libxml2 devs (they're under the GNOME umbrella) about potential risks to libxml2 and its users.
This doesn't matter, if libxml2 loads .so and the library is malicious, you are already potentially compromised, as it is possible to run code on library load.
But there are indeed others where it attempts to auto-detect compression, although as I understand it from the docs only ZLib compression is autodetected… though I suspect these may be out-of-date and it may autodetect any/all compiled -in compression algorithms.
Regardless, the fact that it links with liblzma is cause for concern, given the mechanism of operation of the liblzma/xz backdoor.
Since GitHub disabled the repos..
I uploaded all GitHub Events from the two suspected users and from their shared project repo as easy to consume CSV files:
Very strange behavior from the upstream developers. Possible government involvement? I have a feeling LANG is checked to target servers from particular countries
One thing to note is that the person that added the commits only started contributing around late 2022 and appears to have a Chinese name. Might be required by law to plant the backdoor.
This does make me wonder how much they made a deliberate effort to build an open source portfolio so they’d look more legitimate when time came to mount an attack. It seems expensive but it’s probably not really much at the scale of an intelligence agency.
What's the salary for a software engineer in urban China? 60-80k/yr USD? Two years of that salary is cheaper than a good single shoulder fired missile. Seems like a pretty cheap attack vector to me. A Javelin is a quarter million per pop and they can only hit one target.
They are paid much less than that. However, American weapons are also far overpriced due to high labor costs, among other things. The Chinese probably have cheaper weapons.
Unless the payment is performed by foreign entity (which means a US employer is hiring a Chinese hacker), it's not a wise choice to do currency exchange when measuring salary, because it would erase other facts affecting salary, like CPI or housing price.
Apart from (both visible and invisible) taxes, I expect a senior programmer would earn ~500-700k CNY per year. Game programmers may reach up to 200k. For a team able to perform such attack, 1M/yr avg. might be reasonable.
But if this is not a state-sponsored attack, I can't find enough interest. And, if this is state-backed...contractor or some dishonest officials would a huge part, so the real cost might be >2M/yr. Considering you can get nothing during 2 year's lurking I doubt if it's feasible enough.
This kind of shallow dismissal is really unhelpful to those of us trying to follow the argument. You take a tone of authoritative expert, without giving any illuminating information to help outsiders judge the merit of your assertion. Why is it a very bad reading of the current situation? What is a better reading?
I am not sure I agree that every low quality post needs a detailed rebuttal? HN couldn't function under such rules.
as to the specific comment:
> Seems like the complexity of XZ has backfired severely, as expected.
to summarise: someone found a project with a vulnerable maintenance situation, spent years getting involved in a project, then got commit rights, and then commited a backdoor in some binaries and the build system, then got sock puppets to agitate for OSes to adopt the backdoored code.
the comment I replied to made a "shallow" claim of complexity without any details, so let's look at some possible interpretations:
- code complexity - doesn't seem super relevant - the attacker hide a highly obfuscated backdoor in a binary test file and committed it - approximately no one is ever going to catch such things without a process step of requiring binaries be generatable in a reasonable-looking and hopefully-hard-to-backdoor kind of way. cryptographers are good at this: https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number
- build complexity - sure, but it's auto*, that's very common.
- organisational complexity - the opposite is the case. it had one guy maintaining it, who asked for help.
- data/file format complexity - doesn't seem relevant unless it turns out the obfuscation method used was particularly easy for this format, but even in that case, you'd think others would be vulnerable to something equivalent
perhaps OP had some other thing in mind, but then they could have said that, instead of making a crappy comment.
To summarize the article, the back door is introduced through build scripts and binaries distributed as “test” data. Very little to do with the complexity or simplicity of xz; more that it was a dependency of critical system binaries (systemd) and ripe for hostile takeover of the maintainer role.
Totally agree. This aggressive stance about xz suddenly is not even helpful to anyone. xz has been and will always be my preferred compression algorithm for times to come, despite this pitfall of really insane levels of social engineering.
I feel for the author having burn out and such, but in all fairness, xz is one of the best compression formats of today's time and still going.
This potentially could be a full automated rootkit type breach right? Great - is any system with 5.6.1 possibly vulnerable?
Also super weird a contributor thought they could slip this in and not have it be noticed at some point. It may point to burning that person (aka, they go to jail) for whatever they achieved with this. (And whoever they are…)
This was only a matter of time. Open source projects are under-staffed, maintainers are overworked and burned out, and everyone relies on the goodwill of all actors.
Obviously a bad actor will make use of these conditions and the assumption of good will.
We need automated tooling to vet for stuff like this. And maybe migrate away from C/C++ while we are at it because they don't make such scanning easy at all.
Wouldn’t be surprised that the ssh auth being made slower was deliberate - that makes it fairly easy to index all open ssh servers on the internet, then to see which ones get slower to fail preauth as they install the backdoor
it wasn't the apparently newly-created identity "Hans Jansen" just asking for a new version to be uploaded, it was "Hans Jansen" providing a new version to be uploaded as a non-maintainer-upload - Debian-speak for "the maintainer is AWOL, someone else is uploading their package". if "Hans Jansen" is another attacker then they did this cleverly, providing the new - compromised - upstream tarballs in an innocent-looking way and avoiding anyone examining the upstream diff.
Looking at how many requests to update to the backdoored version have been made, I wonder if the fact that many people (including developers) have been conditioned to essentially accept updates as "always-good" is a huge contributing factor in how easy it is to spread something like this.
The known unknowns can be better than the unknown unknowns.
Totally agree. With things like Dependabot encouraged by GitHub, people now get automated pull requests for dependency updates, increasing the speed of propagation of such vulnerabilities.
Looks like GitHub has suspended access to the repository, which while it protects against people accidentally compiling and using the code, but certainly complicates forensic analysis for anyone who doesn't have a clone or access to history (which is what I think a lot of people will be doing now to understand their exposure).
It looks like git clone https://git.tukaani.org/xz.git still works for now (note: you will obviously be cloning malware if you do this) - that is, however, trusting the project infrastructure that compromised maintainers could have had access to, so I'm not sure if it is unmodified.
HEAD (git rev-parse HEAD) on my result of doing that is currently 0b99783d63f27606936bb79a16c52d0d70c0b56f, and it does have commits people have referenced as being part of the backdoor in it.
That was me. I'm part of ArchiveTeam and Software Heritage and I'm one of the Debian sysadmins, the latter got some advanced notice. I figured archives of xz related stuff would be important once the news broke, so I saved the xz website and the GitHub repos. I regret that I didn't think to join the upstream IRC channel and archive the rest of the tukaani.org domain, nor archive the git.tukaani.org repos. Been archiving links from these threads ever since the news broke.
It seems like based on the (very well written) analysis that this is a way to bypass ssh auth, not something that phones out which would've been even scarier.
My server runs arch w/ a LTS kernel (which sounds dumb on the surface, but was by far the easiest way to do ZFS on Linux that wasn't Ubuntu) and it seems that since I don't have SSH exposed to the outside internet for good reason, and my understanding is Arch never patched shhd to begin with that I and most people who would be in similar situations to me are unaffected.
Still insane that this happened to begin with, and I feel bad for the Archlinux maintainers who are now going to feel more pressure to try to catch things like this.
Being included via libsystemd isn't the only way ssh can load liblzma, it can come as an indirect dependency of Selinux (and its PAM stack) IIUC. Which makes it even a bit more funny (?) since Arch also doesn't officially support any Selinux stuff.
There might be other ways sshd might pull in lzma, but those are the 2 ways I saw commonly mentioned.
On a different note, pacman/makepkg got the ability to checksum source repository checkouts in 6.1.
They just signed each other's keys around that time, and one needs to redistribute the public keys for that; nothing suspicious about it I think. The key fingerprint 22D465F2B4C173803B20C6DE59FCF207FEA7F445 remained the same.
before:
pub rsa4096/0x59FCF207FEA7F445 2022-12-28 [SC] [expires: 2027-12-27]
22D465F2B4C173803B20C6DE59FCF207FEA7F445
uid Jia Tan <jiat0218@gmail.com>
sig 0x59FCF207FEA7F445 2022-12-28 [selfsig]
sub rsa4096/0x63CCE556C94DDA4F 2022-12-28 [E] [expires: 2027-12-27]
sig 0x59FCF207FEA7F445 2022-12-28 [keybind]
after:
pub rsa4096/0x59FCF207FEA7F445 2022-12-28 [SC] [expires: 2027-12-27]
22D465F2B4C173803B20C6DE59FCF207FEA7F445
uid Jia Tan <jiat0218@gmail.com>
sig 0x59FCF207FEA7F445 2022-12-28 [selfsig]
sig 0x38EE757D69184620 2024-01-12 Lasse Collin <lasse.collin@tukaani.org>
sub rsa4096/0x63CCE556C94DDA4F 2022-12-28 [E] [expires: 2027-12-27]
sig 0x59FCF207FEA7F445 2022-12-28 [keybind]
Lasse's key for reference:
pub rsa4096/0x38EE757D69184620 2010-10-24 [SC] [expires: 2025-02-07]
3690C240CE51B4670D30AD1C38EE757D69184620
uid Lasse Collin <lasse.collin@tukaani.org>
sig 0x38EE757D69184620 2024-01-08 [selfsig]
sig 0x59FCF207FEA7F445 2024-01-12 Jia Tan <jiat0218@gmail.com>
sub rsa4096/0x5923A9D358ADF744 2010-10-24 [E] [expires: 2025-02-07]
sig 0x38EE757D69184620 2024-01-08 [keybind]
Going forward this will require more than a citizens investigation. Law enforcement will surely be granted access. Also, tarballs are still available in package managers if you really want to dig into the code.
Nice idea, but then you just hide the attack in logo.png that gets embedded in the binary. Less useful for libraries, works plenty good for web/desktop/mobile.
The problem with the parent's suggestion is you end up banning lots of useful techniques while not actually stopping hackers from installing back doors or adding security exploits. The basic problem is once an attacker can submit changes to a project, the attacker can do a lot of damage. The only real solution is to do very careful code reviews. Basically, having a malicious person get code into a project is always going to be a disaster. If they can get control of a project, it is going to be even worse.
5.6.1-2 is not an attempted fix, it's just some tweaks to Arch's own build script to improve reproducibility. Arch's build script ultimately delegates to the compromised build script unfortunately, but it also appears the payload itself is specifically targeting deb/RPM based distros, so a narrow miss for Arch here.
(EDIT: as others have pointed out, part of the exploit is in the artifact from libxz, which Arch is now avoiding by switching to building from a git checkout)
Are you sure about that? The diff moves away from using the compromised tarballs to the not-compromised (by this) git source. The comment message says it's about reproducibility, but especially combined with the timing it looks to me like that was just to avoid breaking an embargo.
So, you suggest that Frederik Schwan had prior knowledge of the security issues but hid the real purpose of the commit under "improve reproducibility"?
And, If you break the embargo too many times then you just find out with the rest of us and that's not a great way to run a distro. I believe openbsd is or was in that position around the time of the intel speculative execution bugs.
xz was masked in the Gentoo repositories earlier today with the stated reason of "Investigating serious bug". No mention of security. It's pretty likely.
I upgraded Arch Linux on my server a few hours ago. Arch Linux does not fetch one of the compromised tarballs but builds from source and sshd does not link against liblzma on Arch.
[root@archlinux ~]# pacman -Qi xz | head -n2
Name : xz
Version : 5.6.1-2
[root@archlinux ~]# pacman -Qi openssh | head -n2
Name : openssh
Version : 9.7p1-1
[root@archlinux ~]# ldd $(which sshd) | grep liblzma
[root@archlinux ~]#
Interesting, they just switched from tarballs to source 19 hours ago. It seems to me that Frederik Schwan had prior knowledge of the security issue, or it is just a rare coincidence.
The writeup indicates that the backdoor only gets applied when building for rpm or deb, so Arch probably would have been okay either way? Same with Nix, Homebrew, etc.
On arch, `ldd $(which sshd)` doesn't list lzma or xz, so I think it's unaffected? Obviously still not great to be shipping malicious code that just happens to not trigger.
This is what the `detect_sh.bin` attached to the email does. I can only assume that the pesron who reported the vulnerability checked that this succeeds in detecting it.
Note that I'm not looking for the vulnerable symbols, I'm looking for the library that does the patching in the first place.
Incredible. It's like discovering your colleague for 2 years at the secret nuclear weapon facility is a spy for another country, covering his tracks until the very last minute. Feels like a Hollywood movie is coming up.
Should we start doing background checks on all committers to such critical IT infrastructure?
But how? Let's say you're one of 10 maintainers of an open source project. A new user wants to contribute. What do you do? Do you ask them to send you some form of ID? Assuming this is legal and assuming you could ensure the new user is the actual owner of an actual, non counterfeit ID, what do you do? Do you vet people based on their nationality? If so, what nationality should be blackballed? Maybe 3 maintainers are American, 5 are European and 2 are Chinese. Who gets to decide? Or do you decide based on the company they work for?
Open source is, by definition, open. The PR/merge request process is generally meant to accept or refuse commits based on the content (which is why you have a diff), not on the owner.
Building consensus on which commits are actually valid, even in the face of malicious actors, is a notoriously difficult problem. Byzantine fault tolerance can be achieved with a 2/3 + 1 majority, but if anyone can create new identities and have them join the system (Sybil attack) you're going to have to do things differently.
@people who write github scanners for updates and security issues (dependabot and the like)
Can we start including a blacklist of emails and names of contributors (with reasons/links to discussions)?
I can't track them and I don't want them in my projects.
Might not be very helpful as it is easy to create new identities, but I see no reason to make it easier for them.
Also, I might approach differently someone with lots of contributions to known projects than a new account, so it still helps.
It takes a minute to create a new email address. And you can change or fake an email address on a git commit trivially. You, too, can writing code as anyone you want by just doing "git commit --author='Joe Biden <icecream@whitehouse.gov>'". On the internet nobody knows you're Joe Biden.
You can write a rather simple GitHub action that would do that: look at a PR and reject / close it if you don't like it for some reason. AFAIK open-source projects have a free quota of actions.
OTOH sticking to the same email for more than one exploit might be not as wise for a malicious agent.
They should also remove the emojis, there is no need to have people feel good about upvotes. I've long felt uncomfortable with emojis on Slack as well. Responding to a coding or infrastructure issue should not be a social activity, I respond because it's my job and if the issue is worth it, not because a human being should feel appreciated (either them or me).
> openssh does not directly use liblzma. However debian and several other distributions patch openssh to support systemd notification, and libsystemd does depend on lzma.
It looks to be limited to Linux systems that are running certain patches. macOS and BSD seem unaffected?
FreeBSD is not affected as the payloads in question were stripped out, however we are looking into improvements to our workflow to further improve the import process.
The lesson here seems to not depend on tools written in languages that have complex, obscure build systems and no one is either able or interested to read. Using tools rewritten in Rust, Go or any other languege which resolves dependencies within project seems the only way to do hardening here.
I agree there's safer languages than C, but nobody reads the 50,000 lines changed when you update the vendoring in a random golang project. It would be easy to introduce something there that nobody notices too.
It is generally harder to introduce vulnerabilities in readable language even more when it is memory safe. Sure life is not perfect and bad actors would have found a ways to inject vulnerabilities also in Rust, Go codebase. The benefit of modern languages is that there is one way to build things and the source code is the only thing that needs to be auditted.
You don't need a complex obscure build system for most C code. There's a lot of historical baggage here, but many projects (including xz, I suspect) can get away with a fairly straight-forward Makefile. Double so when using some GNU make extensions.
Thanks for that post, I wish people stopped pushing ever so more complicated build systems, opaque, non-backward compatible between their own versions when a 2 pages Makefile would work just fine, and still work in 20 years time.
Rust is the worst in terms of build system transparency. Ever heard of build.rs? You can hide backdoors in any crate, or in any crate's build.rs, or the same recursively.
Most build systems are turing-complete. Rust, at least, drastically reduces the need for custom build scripts (most of my projects have empty build.rs files or lack one entirely), and build.rs being in the same language as the rest of the codebase aids transparency immensely.
That doesn't make build.rs any less of a juicy target for a supply chain attack.
Arbitrary code downloaded from the internet and run at build time? That's a nightmare scenario for auditing, much worse than anything Autotools or CMake can offer.
You're not wrong about arbitrary code execution. It's just that your statement applies to most of the packages on any linux distribution, Autotools and Cmake included, regardless of language. Many moreso than Rust due to the aforementioned features of Cargo and build.rs not requiring me to be an expert in a second language just to audit it.
Packages in a Linux distro are not built on my machine, they are built by the distro in a sandbox. Every time I type "cargo build" I am potentially running arbitrary code downloaded from the internet. Every time I type "make" in an Autotools program only my code runs.
> not requiring me to be an expert in another language just to audit it.
Do you do that every time your Cargo.lock changes?
> Every time I type "make" in an Autotools program only my code runs.
Says who? Make is just as good at calling arbitrary code as Cargo. Including code that reaches out over the network. Have you audited every single makefile to ensure that isn't the case?
So... you're complaining about what could happen in a Rust build if you include a library without examining that library first? How do you think that is different from doing the same in any other language?
The difference is that in another language the build step is delegated to someone else who has packaged the code, and every version has presumably gone through some kind of audit. With Rust I have no idea what new transitive dependencies could be included any time I update one of my dependencies, and what code could be triggered just by building my program without even running it.
Again, we're not talking about the dependencies that I choose, but the whole transitive closure of dependencies, including the most low-level. Did you examine serde the first time you used a dependency that used it? serde did have in the past a slightly sketchy case of using a pre-built binary. Or the whole dependency tree of Bevy?
I mean, Rust has many advantages but the cargo supply chain story is an absolute disaster---not that it's alone, pypi or nodejs or Ruby gems are the same.
> Nothing prevents you from using the packaged libraries if you prefer them
Nothing except, in no particular order: 1) only having one version of crates 2) mismatched features 3) new transitive dependencies that can be introduced at any time without any warning 4) only supporting one version of rust 5) packages being noarch and basically glorified distro-wide vendoring—so their build.rs code is still run on your machine at cargo build time
Same as any other library provided by the distribution in any other language.
> 2) mismatched features
Same as any other library provided by the distribution in any other language.
> 3) new transitive dependencies that can be introduced at any time without any warning
Not in packaged Rust libraries in Fedora, at least. Please read the aforementioned link.
> 4) only supporting one version of rust
Same as any other library provided by the distribution in any other language.
> 5) packages being noarch and basically glorified distro-wide vendoring
Packages containing only source is a consequence of the Rust ABI still stabilizing, see: https://github.com/rust-lang/rust/pull/105586 After ABI stabilization, Rust libraries will be first class like any other language.
Exactly. And at least Cargo will refuse to download a crate which has been yanked. So any crate which has been discovered to be compromised can be yanked, preventing further damage even when someone has already downloaded something which depends on it.
Building packages with up-to-date dependencies is also vastly preferable to building against ancient copies of libraries vendored into a codebase at some point in the past, a situation I see far too often in C/C++ codebases.
Wouldn't a supply chain attack like this be much worse with Rust and Cargo because of the fact it's not just a single dynamic library that needs to be reinstalled system-wise, but, instead, every binary would require a new release?
It would mean rebuilding more packages. I don't think that's meaningfully "much worse", package mangers are perfectly capable of rebuilding the world and the end-user fix is the same "pacman -Syu"/"apt-get update && apt-get upgrade"/...
On the flip side the elegant/readable build system means that the place this exploit was hidden wouldn't exist. Though I wouldn't confidently say that 'no hiding places exist' (especially with the parts of the ecosystem that wrap dependencies in other languages).
It's much worse because it requires repackaging every affected system package instead of a single library. Knowing which packages are affected is difficult because that information isn't exposed to the larger system package manager. After all, it's all managed by the build system.
Those CI and build infrastructures rely on the Debian and RedHat being able to build system packages.
How would an automated CI or build infrastructure stop this attack? It was stopped because the competent package maintainer noticed a performance regression.
In this case, this imagined build system would have to track every rust library used in every package to know which packages to perform an emergency release for.
I... don't see your point. Tracking the dependencies a static binary is built with is already a feature for build systems, just maybe not the ones Debian and RH are using now, but I imagine they would if they were shipping static binaries.
Rust isn't really the point here, it's the age old static vs dynamic linking argument. Rust (or rather, Cargo) already tracks which version of a dependency a library depends on (or a pattern to resolve one), but it's besides the point.
Rust is the issue here because it doesn't give you much of an option. And that option is the wrong one if you need to do an emergency upgrade of a particular library system-wide.
It's really not, it's not hard to do a reverse search of [broken lib] <= depends on <= [rust application] and then rebuild everything that matches. You might have to rebuild more, but that's not really hard with modern build infrastructure.
Not to mention if you have a Rust application that depends on C libraries, it already dynamically links on most platforms. You only need to rebuild if a Rust crate needs to be updated.
So, I know that librustxz has been compromised. I'm Debian. I must dive into each rust binary I distribute as part of my system and inspect their Cargo.toml files. Then what? Do I fork each one, bump the version, hope it doesn't break everything, and then push an emergency release!??!
> I must dive into each rust binary I distribute as part of my system and inspect their Cargo.toml
A few things:
1. It'd be Cargo.lock
2. Debian, in particular, processes Cargo's output here and makes individual debs. So they've taken advantage of this to already know via their regular package manager tooling.
3. You wouldn't dive into and look through these by hand, you'd have it as a first-class concept. "Which packages use this package" should be table stakes for a package manager.
> Then what? Do I fork each one, bump the version, hope it doesn't break everything, and then push an emergency release!??!
The exact same thing you do in this current situation? It depends on what the issue is. Cargo isn't magic.
The point is just that "which libraries does the binary depend on" isn't a problem with actual tooling.
People already run tools like cargo-vet in CI to catch versions of packages that may have issues they care about.
Except it is. The system package maintainers release a new build of the package in question and then you install it. There's not really anything else to do here. There's nothing special about Rust in this context, it would be exactly the same scenario on, for example, Musl libc based distros with any C application.
Fundamentally there is no difference. In practice Rust makes things a lot worse. It encourages the use of dependencies from random (i.e. published with cargo) sources without much quality control. It is really a supply chain disaster to happen. A problem like this would propagate much faster. Here the threat actor had to work hard to get his library updated in distributions and at each step there was a chance that this is detected. Now think about a Rust package automatically pulling in transitively 100s of crates. Sure, a distribution can later figure out what was affected and push upgrades to all the packages. But fundamentally, we should minimize dependencies and we should have quality control at each level (and ideally we should not run code at build time). Cargo goes into the full opposite direction. Rust got this wrong.
Whether a hypothetical alternate world in which Rust didn't have a package manager or didn't make sharing code easy would be better or worse than the world we live in isn't an interesting question, because in that world nobody would use Rust to begin with. Developers have expected to be able to share code with package managers ever since Perl 5 and CPAN took off. Like it or not, supply chain attacks are things we have to confront and take steps to solve. Telling developers to avoid dependencies just isn't realistic.
> It encourages the use of dependencies from random (i.e. published with cargo) sources without much quality control. It is really a supply chain disaster to happen.
Oh I 100% agree with this, but that's not what was being talked about. That being said, I don't think the distribution model is perfect either: it just has a different set of tradeoffs. Not all software has the same risk profile, not all software is a security boundary between a system and the internet. I 100% agree that the sheer number of crates that the average Rust program pulls in is... not good, but it's also not the only language/platform that does this (npm, pypi, pick-your-favorite-text-editor, etc.), so soloing out Rust in that context doesn't make sense either, it only makes sense when comparing it to the C/C++ "ecosystem".
I'm also somewhat surprised that the conclusion people come to here is that dynamic linking is a solution to the problem at hand or even a strong source of mitigation: it's really, really not. The ability to, at almost any time, swap out what version of a dependency something is running is what allowed this exploit to happen in the first place. The fact that there was dynamic linking at all dramatically increased the blast radius of what was effected by this, not decreased it. It only provides a benefit once discovered, and that benefit is mostly in terms of less packages need to be rebuilt and updated by distro maintainers and users. Ultimately, supply-chain security is an incredibly tough problem that is far more nuanced than valueless "dynamic linking is better than static linking" statements can even come close to communicating.
> A problem like this would propagate much faster. Here the threat actor had to work hard to get his library updated in distributions and at each step there was a chance that this is detected.
It wouldn't though, because programs would have had to have been rebuilt with the backdoored versions. The book keeping would be harder, but the blast radius would have probably been smaller with static linking except in the case where the package is meticulously maintained by someone who bumps their dependencies constantly or if the exploit goes unnoticed for a long period of time. That's trouble no matter what.
> Now think about a Rust package automatically pulling in transitively 100s of crates.
Yup, but it only happens at build time. The blast radius has different time-domain properties than with shared libraries. See above. 100s of crates is ridiculous, and IMO the community could (and should) do a lot more to establish which crates are maintained appropriately and are actually being monitored.
> Sure, a distribution can later figure out what was affected and push upgrades to all the packages.
This is trivial to do with build system automation and a small modicum of effort. It's also what already happens, no?
> But fundamentally, we should minimize dependencies and we should have quality control at each level
Agreed, the Rust ecosystem has it's own tooling for quality control. Just because it's not maintained by the distro maintainers doesn't mean it's not there. There is a lot of room for improvement though.
> (and ideally we should not run code at build time). Cargo goes into the full opposite direction. Rust got this wrong.
Hard, hard, hard disagree. Nearly every language requires executing arbitrary code at compile time, yes, even a good chunk of C/C++. A strong and consistent build system is a positive in this regard: it would be much harder to obfuscate an attack like this in a Rust build.rs because there's not multiple stages of abstraction with an arbitrary number of ways to do it. As it stands, part of the reason the xz exploit was even possible was because of the disaster that is autotools. I would argue the Rust build story is significantly better than the average C/C++ build story. Look at all the comments here describing the "autotools gunk" that is used to obfuscate what is actually going on. Sure, you could do something similar for Rust, but it would look weird, not "huh, I don't understand this, but that's autotools for ya, eh?"
To be clear, I agree with you that the state of Rust and it's packaging is not ideal, but I don't think it necessarily made wrong decisions, it's just immature as a platform, which is something that can and will be addressed.
I am not completely sure about this exploit, but seems like a binary needed to be modified for the exploit to work[1] which was later picked up by build system.
This seems to be an orthogonal issue. Rust could build the same dynamic library with cargo which could then be distributed. The diference is that there would be a single way to build things.
Most Rust libraries are not dynamically linked; instead, versions are pinned and included statically during the build process. This is touted as a feature.
Only a few projects are built as system-wide libraries that expose a C-compatible abi interface; rsvg comes to mind.
Once somebody actually does this people are gonna complain the same as always: "The sole purpose of your project is to rewrite perfectly fine stuff in Rust for the sake of it" or something along these lines.
Is this really the lesson here? We are talking about a maintainer here, who had access to signing keys and a full access to the repository. Deb packages which were distributed are also different than the source code. Do you honestly believe that the (arguably awful) autotools syntax is the single root cause of this mess, Rust will save us from everything, and this is what we should take away from this situation?
The fundamental problem here was a violation of chain of trust.
Open source is only about the source being open. But if users are just downloading blobs with prebuilt binaries or even _pre-generated scripts_ that aren't in the original source, there is nothing a less-obscure build system will save you from as you are putting your entire security on the chain of trust being maintained.
> One portion of the backdoor is solely in the distributed tarballs. For
easier reference, here's a link to debian's import of the tarball, but it is
also present in the tarballs for 5.6.0 and 5.6.1:
He was a consultant/sysadmin for Intel, and he did 3 things which he thought his employer would support, and was astonished to find that not only did his employer not support, but actively had him prosecuted for doing it. Ouch.
1. He ran a reverse-proxy on two machines so he could check in on them from home.
2. He used the crack program to find weak passwords.
3. He found a weak password, and used it to log into a system, which he copied the /etc/shadow file from to look for additional weak passwords.
He didn't try and hide his activities, and didn't do anything else untoward, it was literally just these things which most people wouldn't bat an eyelid at. These days, it is completely normal for a company to provide VPNs for their employees, and completely normal to continually scan for unexpected user accounts or weak passwords. But... because he didn't explain this to higher-ups and get their buy-in, they prosecuted him instead of thanking him.
I find it useful to compare the reactions of O'Reilly and Intel. Schwartz worked for both (he wrote Learning Perl and co-authored Programming Perl for O'Reilly and made them plenty of money). He cracked the passwords of both companies without first getting permission.
O'Reilly's sysadmin told him off for not getting permission, and told him not to do it again, but used his results to let people with weak passwords know to change them.
Intel's sysadmin started collecting a dossier on Schwartz and ultimately Intel pushed for state criminal charges against him.
O'Reilly's sysadmin testified in Schwartz's defense that he was an overly eager guy with no nefarious intent. So - kinda-sus or not - Intel could have resolved this with a dressing down, or even termination if they were really unhappy. Intel _chose_ to go nuclear, and invoke the Oregon computer crime laws, and demand the state prosecute him.
Lots of things are crimes even though they're just offering something to a victim who willingly accepts it, e.g. phishing attacks, fraudulent investment schemes, contaminated food products.
Sure. I'm wondering if there is a specific law that was broken here. It seems to me that it might be beneficial if there were some legal protection against this sort of act.
It is a little different but a thing that you might have missed in the quick read is that one of the things he was accused of was installing and using a backdoor.
Kinda relevant, as I saw few comments about how safer languages are the solution.
Here[0] is a very simple example, that shows how easy such supply chain attacks are in Rust; and lets not forget that there was a very large python attack just a few days ago[1].
Rust’s “decision” to have a very slim standard library has advantages, but it severely amplifies some other issues. In Go, I have to pull in zero dependencies to make an HTTP request. In Rust, pulling reqwest pulls in at least 30 distinct packages (https://lib.rs/crates/reqwest). Date/time, “basic” base64, common hashing or checksums, etc, they all become supply chain vectors.
The Rust ecosystem’s collective refusal to land stable major versions is one of the amplifying issues. “Upgrade fatigue” hits me, at least. “Sure, upgrade ring to 0.17” (which is effectively the 16th major version). And because v0.X versions are usually incompatible, it’s not really possible to opt not to upgrade, because it only takes a short while before some other transitive dependency breaks because you are slow to upgrade. I recently spent a while writing my code to support running multiple versions of the `http` library, for example (which, to be fair, did just land version 1.0). My NATS library (https://lib.rs/crates/async-nats) is at version 34. My transitive base64 dependency is at version 22 (https://lib.rs/crates/base64).
This makes it nearly impossible for me to review these libraries and pin them, because if I pin foo@0.41.7, and bar needs foo@0.42.1, I just get both. bar can’t do =>0.41, because the point of the 0.X series is that it is not backwards compatible. It makes this process so time consuming that I expect people will either just stop (as if they did) reviewing their dependencies, or accept that they might have to reinvent everything from URL parsing to constructing http headers or doing CRC checks.
Combine this with a build- and compile-time system that allows completely arbitrary code execution, which is routinely just a wrapper for stuff like in the zx attack (look at a lot of the low-level libs you inevitably pull in). Sure, the build scripts and the macro system enables stuff like the amazing sqlx library, but said build and macro code is already so hard to read, it really takes proper wizardry to properly understand.
> In Rust, pulling reqwest pulls in at least 30 distinct packages
This would be less of a problem if each dependency (and in turn, their dependencies) were individually sandboxed, and only allowed to access specific inputs/files at runtime in the capability security (https://en.wikipedia.org/wiki/Capability-based_security) fashion.
This way the attack surface would be hollowed out as much as possible, and exploits limited to the (sub)program output or specific accessible (writable) files.
You don't automatically download anything at build or install time, you just update your local source copies when you want to. Which to be clear I know means rarely.
Vendoring is nice, and I usually prefer it, but you don't always have the time or people for it.
Vendoring + custom build system (Bazel?) for everything is basically googles approach, if what I have read is correct. Definitely better than everything we have, but the resources for it are not something most can afford.
P.S also what mrcus said, if we trust the upstream build process, we may as well trust their binaries.
Keeps one wonder how many similar backdoors are there in the wild. What is the best way to execute such a move? This is sophisticated enough, but not good enough to stay unnoticed for a long while. If I were a state actor I'd think about at least 6-12 months.
Interesting. Is there also a pattern in the times of day? (I don't so much mean the times in commits done by the developer because they can be fake. I'd be more interested in authentic times recorded by GitHub, if any such times are publicly accessible.)
Another thing would be to examine everything ever written by the user for linguistic clues. This might point towards particular native languages or a particular variant of English or towards there being several different authors.
That timezone information is provided by whoever created the commits so cannot be trusted to be correct. Considering the chosen alias it's not unexpected that the timezone information was also made to look Chinese.
So could be the holiday inactivity. I'd expect multiple layers of country obfuscation as well as conflicting information to confuse you. Or none. Impossible to know for sure.
It's not ironic, this change is really sinister IMO. They want you to waste more time after you've submitted the security report and maximize the amount of back and forth. Basically the hope is that they'd be able to pester you with requests for more info/details in order to "resolve the issue" which would give them more time to exploit their targets.
This is exactly why I fight the windmills so hard when it comes automatic updates in Linux software.
So much damage is caused just by adding a single maintainer to a project - imagine how much power you would have to wield the remote execution systems put in place by naive developers for "automatic updates".
All it takes is a single malicious maintainer given access to the new version update of some popular user software, and they have a new botnet of thousands of devices at their disposal. Better yet, after the backdoor installation, they can just release the real update and cover their tracks forever.
Automatic updates are like running web applications, but without any sandboxing or protection usually implemented by the browser.
I hope mainstream news cover this so the general population can understand the issue with our software ecoysystems reliance on unpaid open-source maintainers
Note that Fedora 40 isn't even released yet, it's in beta, Fedora 41 / rawhide is basically a development branch used only by a small number of people.
i knew there was an advantage to being 8-10 years out of date at all times...
and when they do port finally backport this bug in 2026, they will probably implement the systemd integration with openssl (pbthththt...) via 600 patch files in some nonstandard divergent manner that thwarts the payload anyhow. see? i knew they were super duper secure.
Given the recent ( not so recent ) attacks/"bugs" I feel there is a need to do more than the already hard task of investigating and detecting attacks but also to bring IRL consequences to these people.
My understanding is that right now it's pretty much a name and shame of people who most of the time aren't even real "people" but hostile agents either working for governments or criminal groups ( or both )
Getting punched in the face is actually a necessary human condition for a healthy civilization.
In the article it says CISA was notified - that sounds like it's going to be a federal investigation if nothing else. If I was this person, I wouldn't be in the USA (or any US friendly nation) ASAP.
It's also very possible that the account was compromised and taken over. A two years long con with real useful work is a lot of patience and effort vs. just stealing a weakly protected account. I wonder if MFA shouldn't be a requirement for accounts that contribute to important OSS projects.
If you really step back and think about it, this type of behavior is perfectly aligned with any number of well resourced criminal groups and state actors. Two years of contributing in less visible software with the goal of gaining trust and then slowly pushing your broken fix in.
To me that's way more plausible than losing control of your account and the person who compromised it then having someone over a long time insert the backdoor that took a long time to develop and then obfuscate it.
Likely someone at GH is talking to some government agencies right now about the behavior of the private repos of that user and their associated users.
This would be the smarter attack vector, but I've noticed over time that these people are just assholes. They aren't patient. They are in for the smash/grab.
I would not be surprised if there was a group using this approach, but I doubt most of them are/would. If they were that dedicated, they'd just have a fucking job, instead of being dicks on the internet for a living.
However at this point: every developed nation has a professional offensive security group that have varying degrees of potency. All are more resourced than 99.9% of organizations defending and enjoy legal autonomy in their country and allied countries for their work.
If you're getting salaried comfortably, and you have near infinite resources, a two year timeline is trivial. As an American, I always like to point to things we know our own services have done first[0].
Each actor group have their own motivations and tactics[1]. As someone who spent a lot of time dealing with a few state actors, you learn your adversaries tricks of the trade and they are patient for the long-con because they can afford to be.
I think you are confusing non-state e.g. ransomware groups, which are usually not part of a government (although some exceptions like North Korea likely exist) with state-sponsored hackers who are often directly working under military command. Soldiers are not "dicks on the internet".
This is not that costly. Growing bonsai trees also takes a lot of patience, decades, but you don't have to grow only one at a time, the pros are growing them in large numbers, with minimal work on each individual trees once in a while.
It might not even be a long time. He might have just been approached exactly because of his history to insert the back door. And offered either money, or blackmailed or threatened
Oh man. The was a scenario that didn't cross my mind. I was too narrowly focused on the technical aspects rather than the social aspects of security. Great point.
What if this contributor was a member of a state actor/persistent threat group and, like some totally legit software dev houses, they encourage their people to contribute to OSS projects for the whole personal pursuit/enjoyment/fulfillment angle?
With the added bonus that sometimes they get to pull off a longcon like this.
2 years of one engineer's time is very cheap, compared to e.g. the NSA's CryptoAG scam. I'd say most likely a Chinese intelligence plant, kindly offering to relieve the burden of the original author of xz.
I got the same idea. On XZ dev mailing list there were a few discussions about "is there a maintainer?" 2-3 years ago. It's not hard to find these types discussions and then dedicate a few years of effort to start "helping out" and eventually be the one signing releases for the project. That's peanuts for a state actor.
Yeah I saw that - I wouldn't bet on them being in the US but who knows. Maybe they just really love CRC32 ;) And introducing backdoors (if it that was them not an account takeover).
Names can be faked, and even real names are not a great indicator.
Unless you have some very specific cultural knowledge you could not make even vaguely useful deductions about my location, nationality, culture, ethnicity etc. from my name. I get a lot of wrong guesses though!
Remember that agencies like NSA, GCHQ etc will always use false flags in their code, even when it doesn’t have as high risk of exposure as a backdoor in public has.
Looking at the times of commits shouldn’t be given much value at all. A pretty pointless endeavour.
State actors are actually known for not doing that; after all, there's no need to hide when what you're doing is legal. They also tend to work 9-5 in their own timezones.
It might be legal but would (or at least should) be seen as an attack by all other countries using the software, even allies, and in a saner world wouldl receive a strong political response.
As some of the Tweet replies mentioned, they shipped releases that contained the backdoor, and committed other questionable changes at the "usual" times. For sure we're almost certainly not dealing with a compromised workstation, so I don't think that would explain the different times for the worst offending changes.
Maybe he has some technical experts/handlers/managers that had to oversee when they introduced the actual malicious changes, and thus this reflects when he got the go-ahead signal from these other people (and thus that reflects their working hours?)
Or maybe they were just travelling at that time? (maybe travelling to visit the aforementioned handlers? Or travel to visit family... even criminals have a mom and dad)
Also, keep in mind that my Clickhouse query includes all of the Github interactions (for example, timestamp of issue comments)... and unlike a Git commit timestamp, it's hard to fake those (because you'd need to schedule the posting of such comments, probably via the API. Not impossible, but easier to think that JiaT75 just used the Gitub UI to write comments), the Tweet mentions just "commit history"
Usually the simpler explanation has less chance of being wrong... thinking of some possibilities:
- Chinese/Taiwanese state actor, who employs people 9-5 (but somehow, their guy worked 20.00 - 02.00 local time)
- Chinese/Taiwanese rogue group/lone wolf... moonlighting on this exploit after their day job (given that to interact with Lasse they'd be forced to work late, this is not outside of the realm of possibilities)
- Non-Chinese state actor, employing someone 9-5 (consistent with most of the Github interactions), wanting to pin responsibility on China/Taiwan (+0800 timezone for commits), which for some unexplained reason pushed the worst offending changes at really weird times.
- Chinese/Taiwanese state actor, that wanted to pin the blame on western state actors (by making all of the changes at times compatible with someone working in Europe), and somehow they slipped up when pushing the worst offending changes.
- Chinese/Taiwanese state actor, employing someone in Europe (if they need to get approval of changes/gain the trust of the previous maintainer Lasse, it might make sense to have better/more timezone overlap)... which for some weird (yet "innocent") reason, kept the device that they worked on, configured with a +0800 timezone
- Non-Chinese state actor, pretending to be a Chinese entity that wanted to pin the blame on a western entity and slip up by making the worst offending changes at 3am (i.e. it was not a slip up, but it's part of the misdirection efforts.)
Some of these hypotheses are a bit farfetched, but reality is stranger than fiction
My git commits are sometimes in UTC, depending on which computer I make them from. Sometimes my laptop just switches timezones depending on whether I'm using wifi or LTE. I wouldn't put much weight on the timezone.
The time stamp of a git commit depends on the system clock of the computer the commit was checked in. This cannot be checked by github & co (except that they could reject commits which have time stamps in the future).
I assume you mean UTC+8... that covers about 20% of the earth's population, besides China it includes parts of Russia, a bunch of SEA and Western Australia.
We shouldn't rule it out, but it seems unlikely to me.
This is more reckless than any backdoor I can think of by a US agency . NSA backdoored Dual EC DRBG, which was extremely reckless, but this makes that look careful and that was the Zenith of NSA recklessness. The attackers here straight up just cowboy'd the joint. I can't think of any instance in which US intelligence used sock puppets on public forums and mailinglists to encourage deployment of the backdoored software and I maintain a list of NSA backdoors: https://www.ethanheilman.com/x/12/index.html
The CIA had plans to commit terrorist acts against American civilians to start a war against Cuba in the 60s. This is quite literally their style. For example, perhaps they were planning to blame the hack of a power plant or critical infrastructure on this exploit, then use the "evidence" that was leaked to prove it was China, and from there carry out an offensive operation against Chinese infrastructure. There are lots of subversive reasons they would want to do this.
You are referring to Operation Northwoods [0], a set of plans from the 1960s, all of which were rejected.
Operation Northwoods came about because Brig. Gen. Edward Lansdale, asked the CIA to come up with a list of pretexts that might be used to justify an invasion of Cuba. This request had a number of planners at the CIA enumerate possible false flags that could be used as a pretext. One of those plans was a terror attack against US citizens. Operation Northwoods was rejected and never implemented.
The US has plans for nearly everything, but there is a massive difference between a plan that some CIA analyst is pitching and something the US is likely or even able to do. The US had all sorts of plans for how to handle a pandemic, but then when one actually happened, the plans couldn't be implemented because the US didn't actually have the capabilities the plans called for.
> example, perhaps they were planning to blame the hack of a power plant or critical infrastructure on this exploit, then use the "evidence" that was leaked to prove it was China, and from there carry out an offensive operation against Chinese infrastructure.
Backdooring OpenSSH would in no way function as a pretext for attacks on Chinese infrastructure. No one outside the tech companies cares about this. The US also doesn't need to invent hacking pretexts, you could just point to one of many exposed Chinese hacking incidents.
Just so I understand, you're alleging that a U.S. agency was, among other things, submitting patches for a mainland Chinese home-grown CPU architecture (Loongson)?
No, they're not. They are saying that due to the extraordinary circumstances with this case US agencies cannot be excluded from suspicion. At this time no actor seems to be a more likely perpetrator than the next. (Keep in mind that false-flag operations are a very common occurrence in cyber warfare and this cannot be ruled out yet.)
And if someone wanted to attack a target running on Loongson, they would certainly have to make sure the code can actually run there in the first place.
It doesn't seem out of the question that the U.S. or allied nations might want to be involved in the development effort around these CPUs. Even if initially it's just to build some credibility for this account so future adversarial patches are accepted with less suspicion? If you think that's implausible, I'm interested why?
Note that it say "Fedora 41" in the CISA page link to Red Hat, but Red Hat changed the blog title to "Fedora 40" and left the HTML page title as "Fedora 41".
> knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer;
No, freedom of speech (as far as I know) protects even exploit code. The statutes being linked would cover using the backdoor to gain unauthorized entry to a system. I think the question of whether anything illegal has occurred from the public facts is unclear, at least to me, and interesting.
The first amendment might overrule the cited law if that law didn't already include a requirement for intentional harm. But since the law does already have that requirement, there's not really an opportunity for a freedom of speech justification to be what protects a non-malicious publication of a proof of concept. The law isn't trying to infringe on freedom of speech.
But my argument isn't that freedom of speech could be used as an excuse for something that would otherwise be illegal -- my argument is that publishing and discussing exploit code is a constitutionally-protected activity. The CFAA statutes can be violated by gaining unauthorized access to a protected computer system, but that did not happen in the process of authoring and publishing the exploit code. The attacker was authorized to release new versions of the software, and they did. Their choice of what to make their software actually do is not regulated by the government, any more than a musician's choice of which lyrics to include in their song.
If an attacker then actually uses the backdoor created by someone else's decision to deploy the new release into their own environment, to gain unauthorized access to a protected computer system, then obviously there's a CFAA violation there. The public facts don't contain documented examples of this having happened (yet), though it will be unsurprising if that changes.
So it is still not obvious, at least to me, that any crime under US law has occurred so far. I am not a lawyer, though I'm aware of how badly the government has lost the previous court cases that attempted to restrict what humans can put in source code.
> Getting punched in the face is actually a necessary human condition for a healthy civilization.
Aside from signed commits, we need to bring back GPG key parties and web of trust. When using a project you would know how many punches away from the committers you are.
PGP is more famous for "web of trust" topologies, not chains of trust.
For all of their nerd cred, key parties didn't accomplish very much (as evidenced by the fact that nothing on the Internet really broke when the WoT imploded a few years ago[1]). The "real" solution here is mostly cultural: treating third-party software like the risky thing it actually is, rather than a free source of pre-screened labor.
Yes, but there was also little pressure to really build the WOT. People, like myself, did it because it was fun, but no one really relied on it. This could change, but it is still far from certain if it'd work given enough pressure.
Nowadays i achieve this with linkedin[1] connections. Less nerd cred, but achieves roughly the same purpose (most of the people I care about in my niche are at most a 3rd degree connection - a friend of a friend of a friend).
> Getting punched in the face is actually a necessary human condition for a healthy civilization.
This is factually false - in fact, it's literally the direct opposite of the truth. "Getting punched in the face" is base violence that is incompatible with a healthy civilization. A good government with a robust justice system is what is actually needed for a healthy civilization.
> openssh does not directly use liblzma. However debian and several other distributions patch openssh to support systemd notification, and libsystemd does depend on lzma.
The systemd notification protocol could have been as simple as just writing a newline to a pipe, but instead you have to link to the libsystemd C library, so now security-critical daemons like openssh have additional dependencies like liblzma loaded into their address space (even if you don't use systemd as PID 1), increasing the risks of supply chain attacks. Thanks, systemd.
This is what I did for a daemon I'm maintaining. Type=notify support was requested but I'm really allergic to adding new libs to a project until they really do some heavy lifting and add enough value. I was pleasantly surprised the protocol was that simple and implemented it myself. I think systemd should just provide a simple standalone reference implementation and encourage people to copy it into their project directly. (But maybe they already do, I did that almost a decade ago IIRC when the feature was relatively new.)
Whoops, you forgot `vsock:`, `@`, `SO_PASSCRED` (I think)... oh and where is that example provided? But yep that's all the protocol is for sure (and forever)!
Is that protocol documented/stable? For whatever reason, daemons are choosing to link to libsystemd instead of implementing it themselves.
It doesn't matter that libsystemd links to liblzma for other reasons. It's still in the address space of any daemon that is using libsystemd for the notification protocol.
I know Golang has their own implementation of sd_notify().
For Slurm, I looked at what a PITA pulling libsystemd into our autoconf tooling would be, stumbled on the Golang implementation, and realized it's trivial to implement directly.
> I suppose that you could execute systemd-notify but that's a solution that I would not like.
What I did was to use JNA to call sd_notify() in libsystemd.so.0 (when that library exists), which works but obviously does not avoid using libsystemd. I suppose I could have done all the socket calls into glibc by hand, but doing that single call into libsystemd directly was simpler (and it can be expected to exist whenever systemd is being used).
Caveat is that golang is not a good enough actor to be a reliable indicator of whether this interface is supported, though. They’ll go to the metal because they can, not because it’s stable.
The funny thing is that libsystemd _used_ to be split into several different libraries. I certainly remember libsystemd-journal (which is presumably the part of libsystemd that pulls in liblzma) being separate to libsystemd-daemon (which is the part that implements sd_notify, as used by OpenSSH [after patching by distros]).
If that split had never happened, then liblzma wouldn't have ended up being linked into sshd...
Services may be in a different mount namespace from systemd for sandboxing or other reasons (also means you have to worry about filesystem permissions I suppose). Passing an fd from the parent (systemd) is a nice direct channel between the processes
FWIW, I did a quick check on a Devuan system.
The sshd in Devuan does link to a libsystemd stub - this is to cut down on their maintenance of upstream packages.
However that stub does not link to lzma.
Huh. That's rather surprising. Do you know how MX Linux handles systemd? Devuan does that shimming of upstream. Do they perhaps just try to leave out certain packages?
Anyway. I did not see lzma in the results on Devuan running a process check (just in case).
I did see it on a Debian.
It turns out MX uses a package called systemd-shim that seems to be the Debian one:
$aptitude show systemd-shim
Package: systemd-shim
Version: 10-6
State: installed
Automatically installed: no
Priority: extra
Section: admin
Maintainer: Debian QA Group <packages@qa.debian.org>
Architecture: amd64
Uncompressed Size: 82.9 k
Depends: libc6 (>= 2.34), libglib2.0-0 (>= 2.39.4), cgmanager (>= 0.32)
Suggests: pm-utils
Conflicts: systemd-shim:i386
Breaks: systemd (< 209), systemd:i386 (< 209)
Description: shim for systemd
This package emulates the systemd function that are required to run the systemd helpers without using the init service
> so now security-critical daemons like openssh have additional dependencies like liblzma
Systemd itself seems security-critical to me. Would removing other dependencies on libsystemd really make a secure system where systemd was compromised through its library?
1. systemd (at least the PID 1 part) does not talk to the network, so a remotely-accessible backdoor would need to be more complex (and thus more likely to be detected) than a backdoor that can be loaded into a listening daemon like openssh.
2. You can run Debian systems without systemd as PID 1, but you're still stuck with libsystemd because so many daemons now link with it.
One of the objections that many people do not understand, is that systemd adds complexity. Unnecessary complexity. Boats full, loads full, mountains full of complexity.
Yes, there are things delivered with that complexity. However, as an example, sysvinit is maybe, oh, 20k lines of code including binaries, heck including all core init scripts.
What's systemd? 2M lines? It was >1M lines 4+ years ago.
For an init system, a thing that is to be the core of stability, security, and most importantly glacial, stable change -- that is absurdly complex. It's exceedingly over engineered.
And so you get cases like this. And cases like that, and that over there, and that case over there too. All which could not exist, if systemd didn't try to overengineer, over complicate everything.
Ah well. I'm still waiting for someone to basically fork systemd, remove all the fluff (udev, ntp, dns, timers, restart code, specialized logging, on and on and on), and just end up with systemd compatible service files.
This is a bit like complaining that the Linux kernel has 30 million lines of code, while ignoring that 3/4 of that is in hardware support (drivers) or filesystems that nobody is actually required to use at any given time.
systemd is a collection of tools, one of which is an init system. Nobody accused GNU yes of being bloated just because it's in a repository alongside 50 other tools.
> that nobody is actually required to use at any given time
But that's the very problem with systemd! As time goes on you're required, whether by systemd itself or by the ecosystem around it, to use more and more of it, until it's doing not only service management but also timezones, RTC, DNS resolution, providing getpwent/getgrent, inetd, VMs and containers, bootloader, udev (without adding literally any benefit over the existing implementations), ... oh and you also have to add significant complexity in other things (like the kernel!) to use it, like namespaces (which have been a frequent source of vulnerabilities)...
> timezones, RTC, DNS resolution, providing getpwent/getgrent, inetd, VMs and containers, bootloader
How many of those are you actually required to use systemd for? At least for DNS, inetd, containers and bootloader I'm pretty sure I run a few different alternatives across my systems. I think major distros (running systemd) still ship with different dns and inetd, for containers its a lot more common to use a docker-like (probably docker or podman) than it is to use systemd-nspawn.
> oh and you also have to add significant complexity in other things (like the kernel!) to use it, like namespaces (which have been a frequent source of vulnerabilities)
Namespaces were implemented before systemd, have been used before systemd in widely used systems (for example LXC and many others). Namespaces and similar kernel features are not tied to systemd.
> How many of those are you actually required to use systemd for?
That depends on what other software you want to run, because systemd's design heavily encourages other things (distros, libraries, applications) to take dependencies on various bits. See also: every mainstream distro.
> Namespaces were implemented before systemd, have been used before systemd in widely used systems (for example LXC and many others). Namespaces and similar kernel features are not tied to systemd.
Didn't say they were. But I don't have to use LXC or many others in order to use the most popular distros and applications.
>namespaces (which have been a frequent source of vulnerabilities)...
Unprivileged user namespaces sure, but I don't think that applies to namespaces in general (which without unprivileged user namespaces can only be created by root, and LPE is the concern with unprivileged userns due to increased attack surface). systemd doesn't need unprivileged userns to run.
yes(1) is the standard unix way of generating repeated data.
It's good to do this as quickly as possible.
I really don't understand why so many get annoyed with this code.
130 lines isn't that complicated in the scheme of things.
> Ah well. I'm still waiting for someone to basically fork systemd, remove all the fluff (udev, ntp, dns, timers, restart code, specialized logging, on and on and on)
Most of the things you named there are modular and can be easily disabled.
Furthermore, udev precedes systemd and systemd has in fact its own replacement for it (though the name escapes me).
Kind of a classic, people loving harping on systemd without properly understanding it.
Systemd is actually pretty damn good and it's GPL licensed free software.
I understand that people don't like the way it seems to work itself into the rest of Linux user space as a dependency but that's actually our own fault for not investing the man power that Red Hat invests. We have better things to do than make our own Linux user space and so they have occupied that niche. It's free software though, we always have the freedom to do whatever we want.
By the way, all the stuff you mentioned is not really part of the actual init system, namely PID 1. There's an actual service manager for example and it's entirely separate from init. It manages services really well too, it's measurably better than all that "portable" nonsense just by virtue of using cgroups to manage processes which means it can actually supervise poorly written double forking daemons.
> By the way, all the stuff you mentioned is not really part of the actual init system, namely PID 1
Except it literally is. I once had a systemd system suddenly refuse to boot (kernel panic because PID1 crashed or so) after a Debian upgrade, which I was able to resolve by... wait for it... making /etc/localtime not be a symlink.
Why does a failure doing something with the timezone make you unable to boot your system? What is it even doing with the timezone? What is failing about it? Who knows, good luck strace'ing PID1!
Turns out you're right and my knowledge was outdated. I seriously believed the systemd service manager was separate from its PID 1 but at some point they even changed the manuals to say that's not supported.
I was also corrected further down in the thread, with citations from the maintainers even:
As it stands I really have no idea why the service manager has not been split off from PID 1. Maintainer said that PID 1 was "different" but didn't really elaborate. Can't find much reliable information about said differences either. Do you know?
I have no idea, lol. Maybe the signal handling behavior? You can't signal PID1 (unless the process has installed its own signal handler for that signal). Even SIGKILL won't usually work.
That's my entire problem with systemd though: despite the averred modularity, it combines far too many concerns for anyone to understand how or why it works the way it does.
Yeah the signal handling thing is true, PID 1 is the only process that can handle or mask SIGKILL, maybe even SIGSTOP. The systemd manual documents its handling of a ton of signals but there's nothing in there about either of those otherwise unmaskable signals. So I don't really see how systemd is "relying" on anything. It's not handling SIGKILL, is it?
The other difference is PID 1 can't exit because Linux panics if it does. That's actually an argument for moving functionality out of PID 1.
There are other service managers out there which work outside PID 1. Systemd itself literally spawns non-PID 1 instances of itself to handle the user services. I suppose only the maintainers can tell us why they did it that way.
Maybe they are relying on the fact PID 1 traditionally reaps zombies even though Linux has a prctl for that:
What if the issue is just that nobody's bothered to write the code to move the zombie process reaping to a separate process yet? Would they accept patches in that case?
Ludicrously, that manual page straight up says systemd uses this system call to set itself up as the reaper of zombie processes:
> Some init(1) frameworks (e.g., systemd(1)) employ a subreaper process
If that's true then I really have no idea what the hell it is about PID 1 that they're relying on.
Edit: just checked the source code and it's actually true.
So they're not relying on the special signals handling and they even have special support for non-PID 1 child subreapers. Makes no sense to me. Why can't they just drop those PID == 1 checks and make a simpler PID 1 program that just spawns the real systemd service manager?
Edit: they already have a simple PID 1 in the code base!
> The other difference is PID 1 can't exit because Linux panics if it does. That's actually an argument for moving functionality out of PID 1.
I actually kinda think that can be an advantage for a service manager. If your service manager crashes an automatic reboot is nice, in a way. I doubt that's why they did it though.
> If your service manager crashes an automatic reboot is nice, in a way.
I don't think it's gonna do that! I saw it in the source code: when it's running as PID 1, systemd installs a crash handler that freezes itself in a desperate attempt to avoid the kernel panic! It's pretty amazing. They could have written it so that PID 1 watches over the service manager and just restarts it if it ever crashes. I mean, systemd already supports soft-rebooting the entire user space which is pretty much exactly what would happen if PID 1 restarted a separate service manager.
Know what else I found in the source code? Various references to /proc/1. I'm starting to think that's the true reason why they want to be PID 1...
People are complaining that it's too big, labyrinthine, and arcane to audit, not that it doesn't work. They would prefer other things that work, but don't share those characteristics.
Also, the more extensive the remit (of this init), the more complexly interconnected the interactions between the components; the fewer people understand the architecture, the fewer people understand the code, the fewer people read the code. This creates a situation where the codebase is getting larger and larger at a rate faster than the growth of the number of man-hours being put into reading it.
This has to make it easier for people who are systemd specialists to put in (intentionally or unintentionally) backdoors and exploitable bugs that will last for years.
People keep defending systemd by talking about its UI and its features, but that completely misses the point. If systemd were replaced by something comprehensible and less internally codependent, even if the systemd UI and features were preserved, most systemd complainers would be over the moon with happiness. Red Hat invests too much into completely replacing linux subsystems, they should take a break. Maybe fix the bugs in MATE.
A shell script with a few defined arguments is not a complexly interconnected set of components. It's literally the simplest, most core, least-strongly-dependent interconnection that exists in a nix system.
Tell us you never bothered to understand how init worked before drawing a conclusion on it without telling us.
> Red Hat invests too much into completely replacing linux subsystems, they should take a break.
They should do whatever they feel is best for them, as should we. They're releasing free as in freedom GPL Linux software, high quality software at that. Thus I have no moral objections to their activities.
You have to realize that this is really a symptom of others not putting in the required time and effort to produce a better alternative. I know because I reinvent things regularly just because I enjoy it. People underestimate by many orders of magnitude the effort required to make something like this.
So I'm really thankful that I got systemd, despite many valid criticisms. It's a pretty good system, and it's not proprietary nonsense. I've learned to appreciate it.
Let’s not get started on how large the kernel is. Large code bases increase attack surface, period. The only sensible solution is to micro service out the pieces and only install the bare essentials. Why does the an x86 server come with Bluetooth drivers baked in?
The kernel devs are wasting time writing one offs for every vendor known to man, and it ships to desktops too.
Init just a more or less normal program that Linux starts by default and by convention. You can make it boot straight into bash if you want. I created a little programming language with the ultimate goal of booting Linux directly into it and bringing up the entire system from inside it.
It's just a normal process really. Two special cases that I can think of: no default signal handling, and it can't ever exit. Init will not get interrupted by signals unless it explicitly configures the signal dispositions, even SIGKILL will not kill it. Linux will panic if PID 1 ever exits so it can't do that.
Traditionally, it's also the orphaned child process reaper. Process descriptors and their IDs hang around in memory until something calls wait on them. Parent processes are supposed to do that but if they don't it's up to init to do it. Well, that's the way it works traditionally on Unix. On Linux though that's customizable with prctl and PR_SET_CHILD_SUBREAPER so you actually can factor that out to a separate process. As far as I know, systemd does just that, making it more modular and straight up better than traditional Unix, simply because this separate process won't make Linux panic if it ever crashes.
As for the service manager, this page explains process and service management extremely well:
Systemd does it right. It does everything that's described in there, does it correctly, uses powerful Linux features like cgroups for even better process management and also solves the double forking problem described in there. It's essentially a solved problem with systemd. Even the people who hate it love the unit files it uses and for good reason.
There is a secondary feature to run per-user managers, though I'm unsure whether it does run doesn't run without systemd PID1. Though it might only rely on logind.
Wow, I remember reading that PID != 1 line years ago. Had no idea they changed it. I stand corrected then. Given the existence of user service managers as well as flags like --system and --user, I inferred that they were all entirely separate processes.
Makes no sense to me why the service manager part would require running as PID 1. The maintainer just says this:
> PID 1 is very different from other processes, and we rely on that.
He doesn't really elaborate on the matter though.
Every time this topic comes up I end up searching for those so called PID 1 differences. I come up short every time aside from the two things I mentioned above. Is this information buried deep somewhere?
Just asked ChatGPT about PID 1 differences. It gave me the aforementioned two differences, completely dismissed Linux's prctl child subreaper feature "because PID 1 often assumes this role in practice" as well as some total bullshit about process group leaders and regular processes not being special enough to interact with the kernel which is just absolute nonsense.
So I really have no idea what it is about PID 1 that systemd is supposedly relying on that makes it impossible to split off the service manager from it. Everything I have read up until now suggests that it is not required, especially on Linux where you have even more control and it's not like systemd is shy about using Linux exclusive features.
> One of the objections that many people do not understand, is that systemd adds complexity. Unnecessary complexity. Boats full, loads full, mountains full of complexity.
Complexity that would otherwise be distributed to a sea of ad-hoc shell scripts? systemd is a win
We removed tens of thousands of lines of code for fixes for those "simple" init scripts when migrating to systemd.
They are never "simple", there is always some fucking edge case, like for example we had Java apps writing its own PID few seconds after start. So any app that did start and immediately after status (like Pacemaker) threw errors.
Or how once in blue moon MySQL didn't start after reboot because it so happened that:
* the PID file from previous boot wasn't cleared
* some other app ran with same PID as it was in file
* Script did not care, script saw pid file existing and didn't start MySQL.
> One of the objections that many people do not understand, is that systemd adds complexity. Unnecessary complexity. Boats full, loads full, mountains full of complexity.
this is and always has been such a dumb take.
if you'd like to implement an init (and friends) system that doesn't have "unnecessary complexity" and still provides all the functionality that people currently want, then go and do so and show us? otherwise it's just whinging about things not being like the terrible old days of init being a mass of buggy and racey shell scripts.
There were plenty of those that existed even before systemd. Systemd's adoption was not a result of providing the functionality that people want but rather was a result of providing functionality that a few important people wanted and promptly took hard dependencies on.
> about things not being like the terrible old days of init being a mass of buggy and racey shell scripts.
Zero of the major distros used System V init by default. Probably only distros like Slackware or Linux From Scratch even suggested it.
It's unfortunate that so many folks uncritically swallowed the Systemd Cabal's claims about how they were the first to do this, that, or the other.
(It's also darkly amusing to note that every service that has nontrivial pre-start or post-start configuration and/or verification requirements ends up using systemd to run at least one shell script... which is what would have often been inlined into their init script in other init systems.)
> Zero of the major distros used System V init by default. Probably only distros like Slackware or Linux From Scratch even suggested it.
I have absolutely no idea what you're trying to claim.
Are you suggesting that Debian's "sysvinit" package wasn't a System V init system? That the years I spent editing shell scripts in /etc/init.d/ wasn't System V init?
or are you making some pointless distinction about it not actually being pre-lawsuit AT&T files so it doesn't count or something?
or did you not use Linux before 2010?
if you have some important point to make, please make it more clearly.
> It's unfortunate that so many folks uncritically swallowed the Systemd Cabal's claims about how they were the first to do this, that, or the other.
I feel like you have very strong emotions about init systems that have nothing to do with the comment you're replying to.
I've been using Linux regularly since 2002. I've never regularly used a Linux that used sysvinit.
In other words, over the past ~22 years (goddamn, where did the time go?) every Linux I've regularly used has had an init system that allows you to specify service dependencies to determine their start order.
> ...Debian...
Ah. That explains it. Debian's fine to build on top of but a bad distro to actually use. (Unless you really like using five-to-ten (and in some cases 25->35) year old software that's been superseded by much-improved versions.)
You should also consider that packages named "sysvinit" sometimes aren't actually what people think of when they hear "sysvinit": <https://wiki.gentoo.org/wiki/Sysvinit>
What's the point of your implementation? systemd is totally modular, you can use just the init system without networkd, timesyncd, resolved, nspawn, whatever else I forgot about.
If you want you can just use systemd as PID1 for service management and enjoy a sane way to define and manage services – and do everything in archaic ways like 20 years ago.
* Choice. If I have a separate implementation, my users do not have to be subject to systemd's choices. And I do not either.
* The same implementation will have the same bugs, so in the same way that redundant software has multiple independent implementations, having an independent implementation will avoid the same bugs. It may have different bugs, sure, but my goal would be to test like SQLite and achieve DO-178C certification. Or as close as I could, anyway.
I'd assume chances of monetizing this are incredibly low. There already is an init system that understands systemd unit files, the name escapes my mind unfortunately. DO-178C might be a selling point literally, but whether there's enough potential customers for ROI is questionable.
Well some distros might force more components upon you but thas hardly systemd's fault. Same if some software decides to make use of another component of systemd - then that's their choice, but also there are alternatives. The only thing that comes to mind right now would be something like GNOME which requires logind, but all other "typical" software only wants systemd-the-init-system if anything. You can run Debian just fine with just systemd as an init system and nothing else.
What about sponsors? Actually, now I have the idea of a platform similar to Kickstarter but for software development, and with just sponsors. It wouldn't work, sure... Except in some cases. Like when things like this happen...
Right, the systemd notification framework is very simple and I've used it in my projects. I didn't even know that libsystemd provided an implementation.
My Arch system was not vulnerable because openssh was not linked to xz.
IMO every single commit from JiaT75 should be reviewed and maybe even rolled back, as they have obliterated their trust.
If they hadn't been modifying SSH their users would never have been hit by this backdoor. Of course if it is actually intended to target SSH on Debian systems, the attacker would likely have picked a different dependency. But adding dependencies like Debian did here means that those dependencies aren't getting reviewed by the original authors. For security-critical software like OpenSSH such unaudited dependencies are prime targets for attacks like this.
My point was, this is not "Debian did a thing". Lots of other distros do the same thing. In this particular case, it was in fact fortunate for users of all these other distros that Debian did it, lest this vulnerability might have never been found!
Also, only users on sid (unstable) and maybe testing seem to have been affected. I doubt there are many Debian servers out there running sid.
I would phrase it as "It's good we have a heterogenous open-source community".
Monocrops are more vulnerable to disease because the same (biological) exploit works on the entire population. In our Linux biosphere where there are dozens of major, varied configurations sharing parts but not all of their code (and hundreds or thousands of minor variations), a given exploit is likely to fail somewhere, and that failure is likely to create a bug that someone can notice.
It's not foolproof, but it helps keep the ecosystem healthy.
> The script was not present in the git tree, only in the released archives.
I confess I couldn't quite figure out the branching and tagging strategy on that repo. Very weird stuff. That script seems to have been added by Sebastian Andrzej Siewior just ahead of the 5.6.0 release. It's definitely present in the Debian git tree, and probably in many other distros since others seem to be affected.
> I confess I couldn't quite figure out the branching and tagging strategy on that repo.
It's just a regular Debian packaging repository, which includes imports of upstream tarballs - nothing out of ordinary there. Debian packaging is based on tarballs, not on git repos (although in absence of upstream tarballs, Debian maintainer may create a tarball out of VCS repo themselves).
The linked repo just happens to include some tags from upstream repo, but those tags are irrelevant to the packaging. Only "debian/*" and "upstream/*" tags are relevant. Upstream VCS history is only imported for the convenience of the packager, it doesn't have to be there.
Debian's git repositories don't have any forced layout (they don't even have to exist or be up-to-date, the Debian Archive is the only source of truth - note how this repo doesn't contain the latest version of the package), but in practice most of them follow the conventions of DEP-14 implemented by gbp (in this particular case, it looks like `gbp import-orig --upstream-vcs-tag`: https://wiki.debian.org/PackagingWithGit#Upstream_import_met...).
Uh. systemd documents the protocol at various places and the protocol is trivial: a single text datagram sent to am AF_UNIX socket whose path you get via the NOTIFY_SOCKET. That's trivial to implement for any one with some basic unix programming knowledge. And i tell pretty much anyone who wants to listen that they should just implement the proto on their own if thats rhe only reason for a libsystemd dep otherwise. In particular non-C environments really should do their own native impl and not botjer wrapping libsystemd just for this.
But let me stress two other things:
Libselinux pulls in liblzma too and gets linked into tons more programs than libsystemd. And will end up in sshd too (at the very least via libpam/pam_selinux). And most of the really big distros tend do support selinux at least to some level. Hence systemd or not, sshd remains vulnerable by this specific attack.
With that in mind libsystemd git dropped the dep on liblzma actually, all compressors are now dlopen deps and thus only pulled in when needed.
> And i tell pretty much anyone who wants to listen that they should just implement the proto on their own if thats rhe only reason for a libsystemd dep otherwise
Deferring the load of the library often just makes things harder to analyze, not necessarily more secure. I imagine many of the comments quoting `ldd` are wrongly forgetting about `dlopen`.
(I really wish there were a way to link such that the library isn't actually loaded but it still shows in the metadata, so you can get the performance benefits of doing less work but can still analyze the dependency DAG easily)
It would make things more secure in this specific backdooring case, since sshd only calls a single function of libsystemd (sd_notify) and that one would not trigger the dlopen of liblzma, hence the specific path chosen by the backdoor would not work (unless libselinux fucks it up fter all, see other comments)
Dlopen has drawbacks but also major benefits. We decided the benefits relatively clearly outweigh the drawbacks, but of course people may disagree.
I have proposed a mechanism before, that would expose the list of libs we potentially load via dlopen into an ELF section or ELF note. This could be consumed by things such as packagae managers (for auto-dep generation) and ldd. However there was no interest in getting this landed from anyone else, so I dropped it.
Note that there are various cases where people use dlopen not on hardcoded lib names, but dynamically configured ones, where this would not help. I.e. things like glibc nss or pam or anything else plugin based. But in particular pam kinda matters since that tends to be loaded into almost any kind of security relavant software, including sshd.
The plugin-based case can covered by the notion of multiple "entry points": every library that is intended to be `dlopen`ed is tagged with the name of the interface it provides, and every library that does such `dlopen`ing mentions the names of such interfaces rather than the names of libraries directly. Of course your `ldd` tool has to scan all the libraries on the system to know what might be loaded, but `ldconfig` already does that for libraries not in a private directory.
This might sound like a lot of work for a package-manager-less-language ecosystem at first, but if you consider "tag" as "exports symbol with name", it is in fact already how most C plugin systems work (a few use an incompatible per-library computed name though, or rely entirely on global constructors). So really only the loading programs need to be modified, just like the fixed-name `dlopen`.
> And i tell pretty much anyone who wants to listen that they should just implement the proto on their own if thats rhe only reason for a libsystemd dep otherwise.
That's what I think too. Do the relevant docs point this out too? Ages ago they didn't. I think we should try to avoid that people just google "implement systemd notify daemon" and end up on a page that says "link to libsystemd and call sd_notify()".
The correct thing to do would be to put different unrelated APIs into their own library, instead of everything into libsystemd0. This has always been one of my biggest issues with it. It makes it hard to replace just one API from that library, because on a binary distribution, only one package can provide it. And as a nice side effect, surprises like this one could then be avoided.
systemd developers have already rejected that approach, so I guess we will end up with lots of reimplementations, both in individual projects and third-party libsystemd-notify style libraries.
I see that different clients implemented in different languages will need different client-libraries and maintaining all that is not something, a core project is going to do but if using the raw protocol instead of the convenience of libsystemd is a (commonly ignored) recommendation which makes a lot of sense in terms of segmentation, providing at least one reference implementation would point all systemd users into the right direction.
Recommending that each client should just implement the (trivial) protocol access itself does not make so much sense to me.
You also need to have sshd enabled to use PAM and your sshd pam stack should include pam_selinux. Then it will be dynamically loaded only when sshd starts a PAM session.
The notify protocol isn't much more complicated than that. From memory you send a string to a unix socket. I have written both systemd notify and listenfd in a few languages for little experiments and it is hard to imagine how the protocols could be simpler.
Looking at most popular projects these days they are a mass of dependencies and I think very few of them can be properly audited and verified by the projects that use them. Rust and Go might be more memory safe than C but look at the number of cargo or go modules in most projects. I have mostly stopped using node/npm on my systems.
Not a programmer, but couldn't the distribution's sshd patches for systemd (and all other distro patches for privileged daemons) use static includes? Wouldn't that have only pulled in the simple client-side communication API? Would that have defeated this vector? Would it be doable?
It's unfortunate that the anti-systemd party lost the war... years ago. But I don't blame systemd, Lennart Pottering or the fanboys (though it would have been so much better if the guy never worked in open source or wasn't such a prolific programmer). I blame Debian and its community for succumbing to this assault on Unix philosophy (again, years ago).
Sometimes things evolve in ways that make us feel a little obsolete.
I've been learning NixOS for a few years now, and it would have been impossible without systemd. It's one heck of a learning curve, but when you get to the other side, you know something of great power and value. Certain kinds of complexity adds 'land' (eg. systemd) that can become 'real estate' (eg. NixOS), which in turn hopes to become 'land' for the next innovation, and so forth.
Whether this happens or not (whether it's the right kind of complexity) is really hard to assess up-front, and probably impossible without knowing the complex new technology in question very well. (And by then you have the bias of depending, in part, yourself on the success of the new tech, as you've committed significant resources to mastering it, so good luck on convincing skeptical newcomers!)
It's almost like a sort of event horizon -- once you know a complex new technology well enough to see whether or not it's useful, the conflict-of-interest makes your opinion unreliable to outsiders!
Nevertheless, the assessment process itself, while difficult to get right, is worth getting better at.
It's easy for impatience and the sensation of what I've taken to calling 'daunt' -- that intrinsic recoil that the mind has from absorbing a large amounts of information whose use case is not immediately relevant -- to dissuade one from exploring. But then, one never discovers new 'land', and one never builds new real estate!
[ Aside:
This is why I'm a little skeptical of the current rebellion against frontend frameworks. Certainly some of them, like tailwind, are clearly adding fetters to an otherwise powerful browser stack. But others, like Svelte, and to some extent, even React, bring significant benefits.
The rebellion has this vibe like, well, users _should_ prefer more simply-built interfaces, and if they don't, well, they just have bad taste. What would be more humble would be to let the marketplace (e.g. consumers) decide what is preferable, and then build that.
]
What? I don't get it? Isn't it on Debian if they modified the package to do something like this? Why would you blame systemd for maintainers doing something that upstream has never required or recommended?
xz is so pervasive, I just discovered on my Mac that the (affected?) version 5.6.1 made it into homebrew. The post in the linked article says that only Linux x86-64 systems are affected, but now I'm left scratching my head whether my Mac is also in trouble, just that we don't know it yet.
The contents of the page when translated seems to be about jiat0218 auctioning a pair of spiritual chopsticks as a prank.
The blog entry is basically a QA between jiat0218 and various other people about these chopsticks.
If Jia Tan does turn out to be a compromised maintainer working for a state actor then some of the content on the blog page can be viewed in a more sinister way (i.e. spycraft / hacks for sale etc.).
Example question 38:
Question 38
accounta066 (3): Are these chopsticks really that good? I kind of want to buy
them! But I recently sent money for online shopping but didn’t receive anything.
It’s very risky; currently jiat0218 you don’t have any reviews, you can
interview me. Do you want to hand it over?! … A sincere buyer will keep it.
Reply to
jiat0218 (4): First of all, I would like to express my condolences to you for
your unfortunate experience! What can I say about this kind of thing...My little
sister has always been trustworthy. What’s more, this is a pair of spiritual
chopsticks, so I hope to have a good one. It’s the beginning! As you can see,
my little sister is very careful and takes her time when answering your
questions. Except for the two messages that were accidentally deleted by her,
she always answers your questions. If this still doesn’t reassure you, then I
can only say that I still have room to work hard. You are still welcome
to bid... ^_^
Note however, it could all just be what it purports to be which is a prank auction of spiritual chopsticks.
This is likely just a coincidence. 0218 looks like a birthday and jiat is probably the name + initial. 18 years is also too long of a time horizon for this.
Crazy to think that the time horizon for these kinds of attacks span decades. This absolutely does not read like a coincidence. Chopsticks, little sister, "room to work hard", all sound like codewords.
Something about this I found surprising is that Linux distros are pulling and packaging pre-built binaries from upstream projects. I'd have expected them to build from source.
# Remember: we cannot leverage autotools in this ebuild in order
# to avoid circular deps with autotools
Namely, to unpack autoconf-2.72e.tar.xz from gnu.org you need xz-tools. And this is just the shortest circle. It is not very common, but xz-utils was one of few rare cases where regeneration of autohell files was considered as unnecessary complication (it backfired).
Unfortunately, those GitHub links are no longer valid, so we randos can't use them to learn what went wrong here. Hopefully GH will reverse this decision once the dust settles.
GitHub should not just reverse and make repo public and archived "as is", because there are many rolling distributions (from Gentoo to LFS), submodule pullers, CI systems, unaware users, which may pull and install the latest backdoored commit of archived project.
However if you want to access exact copies of backdoored tarballs, they are still available on every mirror, e. g. in http://gentoo.mirror.root.lu/distfiles/9f/ . For project of this level artifacts are checksummed and mirrored across the world by many people, and nothing wrong with that.
The gist of it is: The "good" one is the auto generated "Source code" releases made by github. The "bad" one is a manually generated and uploaded source code release, which can have whatever you want.
Sorry, by "they" I mean "Debian and Fedora", which (when including derivatives) include most Linux systems which use a Linux distro in the standard sense.
Just because macs don't use systemd, doesn't mean the backdoor won't work. The oss-sec post talks about liblzma having backdoors in crc32_resolve() and crc64_resolve() and that it has not been fully reversed. This could perhaps affect more than just sshd on x86-64 linux?
> Just because macs don't use systemd, doesn't mean the backdoor won't work.
Practically speaking it can't - For one the script injected into the build process tests that you're running on x86-64 linux, for another, the injected code is elf code, which wouldn't link on a mac. It also needs to manipulate dynamic linker datastructures, which would also not work the same on a mac.
> This could perhaps affect more than just sshd on x86-64 linux?
This however is true - /usr/sbin/sshd was the only argv[0] value that I found to "work", but it's possible there are others. "/usr/sbin/sshd" isn't a string directly visible in the injected code, so it's hard to tell.
The article explains numerous concurrent conditions that have to be met for the backdoor to even be activated (at build time, not runtime), which combined make it extremely unlikely this will affect SSH on macOS:
- linux
- x86-64
- building with gcc & the GNU linker
- part of a .deb or .rpm build
Add to that, as the article explains: openssh does not directly use liblzma, the only reason SSH is affected at all, is because some Linux Distros patch openssh to link it against systemd, which does depend on liblzma.
Could it affect things other than SSH on a Mac? Unlikely. The compromise was introduced in 5.6.0, but macOS Sonoma has 5.4.4 (from August last year).
Well isn't this an interesting commit. He finished his inject macro to compose the payload at build, so now he can start clearing up the repo so none of that shit gets seen when cruising through it.
Makes it a bit harder for sure. What actually happens if you git add something that's ignored? I assumed it would still let you do it, but never tried.
Everybody here In jumping into the pure malice bandwagon, I have a better hypothesis.
Abandonment and inaction, the actual developers of these tools are elsewhere, oblivious to this drama, trying to make living because most of the time you are not compensated nor any corporation cares about making things sustainable at all. This is the default status of everything your fancy cloud depends on underneath.
An attacker took over of the project slowly and stayed dormant until recently.
Someone has worked on xz for several years. Are you saying that this somewhat active contributor was likely actively contributing, then all of a sudden stopped, also stopped paying attention, and also allowed their account to be compromised or otherwise handed it over to a nefarious party?
See, people drop dead from OSS projects pretty frecuently, usually because they take on other life responsabilities and there is no cushion or guard against a bus factor.
Then it is very easy to get credentials compromised or have your project took over by someone else.
Well, yeah. The attacker, operating largely under the name Jia Tan, has successfully manipulated the original author (Lasse Collin) to become a maintainer.
The attacker indeed laid dormant for two years, pretending to just be maintaining xz.
I really don't see any way how this wasn't malice on Jia's part. But I do think your hypothesis applies to Lasse, who was just happy someone could help him maintain xz.
funding model of OSS work is obviously a problem, but these problems are deeper than that. even a very well compensated OSS developer can get a knock on the door from a government agency (or anyone with a "$5 wrench")[1] and they might feel "compelled" to give up their maintainer creds.
Probably depends on criminal code a country. Mine does (EU country):
> Section 231 Obtaining and Possession of Access Device and Computer System Passwords and other such Data
> (1) Whoever with the intent to commit a criminal offence of Breach of secrecy of correspondence [...] or a criminal offence of Unauthorised access to computer systems and information media [...] produces, puts into circulation, imports, exports, transits, offers, provides, sells, or otherwise makes available, obtains for him/herself or for another, or handles
> a) a device or its component, process, instrument or any other means, including a computer programme designed or adapted for unauthorised access to electronic
communications networks, computer system or a part thereof, or
> b) a computer password, access code, data, process or any other similar means by which it is possible to gain access to a computer system or a part thereof,
shall be sentenced .. (1 year as an individual, 3 years as a member of a organized group)
No obvious need to reinstall if you didn't use ssh and expose it publicly and are not a politically important person. All signs suggest that it was a nation state attack, and you are likely not a target.
We'll see... given that sshd is just one of many possible argv[0] it may have choosen to act on, I'm going to be a little paranoid until it's been fully analyzed. It just takes half an hour to reinstall, I have some shows to catch up on anyway :)
The `pack`[0] compression utility that reached the HN front page the other day[1] is setting off my alarm bells right now. (It was at the time too, but now doubly so)
It's written in Pascal, and the only (semi-)documented way to build it yourself is to use a graphical IDE, and pull in pre-compiled library binaries (stored in the git repo of a dependency which afaict Pack is the only dependent of - appears to be maintained by the same pseudonymous author but from a different account).
I've opened an issue[2] outlining my concerns. I'm certainly not accusing them of having backdoored binaries, but if I was setting up a project to be deliberately backdoorable, it'd look a lot like this.
I'm not trying to troll, but I'm wondering if a distro like Gentoo is less susceptible to such attacks, since the source code feels more transparent with their approach. But then again, it seems that upstream was infected in this case, so I'm not sure if a culture of compiling from source locally would help.
It is not going to make a difference. If you run malicious code, you will get hacked. Compiling the code yourself does not prevent the code from being malicious.
The one it might help is it might make it easier to find the back door once you know there is one.
I am not embarrassed to say... is there anything in there that someone who runs a server with ssh needs to know?
I literally can't make heads or tails of the risk here. All I see is the very alarming and scary words "backdoor" and "ssh server" in the same sentence.
If I am keeping stuff up to date, is there anything at all to worry about?
You should probably not be running your own publicly-accessible ssh servers if this email is not sufficient to at least start figuring out what your next actions are.
The email itself comes with an evaluation script to figure out if anything is currently vulnerable to specifically this discovery. For affected distributions, openssh servers may have been backdoored for at least the past month.
Yet here I am, getting up every morning and getting dressed and tying my shoes all by myself, and then maintaining a small number of servers that have openssh on them!
Thanks, though, for pointing out the little script at the very end of that technical gauntlet of an email intended for specialists. I had gotten through the first 3 or 4 paragraphs and had given up.
What I should have done is just googled CVE-2024-3094, whatever, still glad I asked.
> You should probably not be running your own publicly-accessible ssh servers if this email is not sufficient to at least start figuring out what your next actions are.
Not at all. For instance, I don't know what the next steps are, but I run SSH servers behind Wireguard, exactly to prevent them being accessible in the case of such events. Wireguard is simple to setup, even if I lack the expertise to understand exactly how to go forward.
> I literally can't make heads or tails of the risk here. All I see is the very alarming and scary words "backdoor" and "ssh server" in the same sentence.
From what I've read, there is still lots of unknowns about the scope of the problem. What has been uncovered so far indicates it involves bypassing authentication in SSH.
> If this payload is loaded in openssh sshd, the RSA_public_decrypt function will be redirected into a malicious implementation. We have observed that this malicious implementation can be used to bypass authentication. Further research is being done to explain why.
Thus, an attacker maybe could use this to connect to vulnerable servers without needing to authenticate at all.
Is it time to deprecate the ability for code to implement linker symbols in other libraries? Shouldn't there be a strict namespace separation between binaries/libraries? liblzma being to implement openssh symbols seems like a symptom of a much larger problem.
Interesting, I used https://ossinsight.io/analyze/JiaT75 to identify contributions from the account used by author of the backdoor. It looks like the account made other potentially problematic contributions to other projects.
The disabling of ifunc in this PR against Google's oss-fuzz project maybe one way they tried to prevent this particular backdoor being flagged by that tool? https://github.com/google/oss-fuzz/pull/10667
ifunc is a GNU method of interposing function calls with platform-optimized versions of the function. It is used to detect CPU features at runtime and insert, for example, AVX2-optimized versions of memcmp. It is seen in crypto a lot, because CPUs have many crypto-specific instructions.
However, I don't like it much and I think software should be compiled for the target machine in the first place. My 1 hardened system that is reachable from the public network is based on musl, built mostly with llvm, and with ifunc disabled.
> However, I don't like it much and I think software should be compiled for the target machine in the first place.
That means you either have to compile software locally on each machine, or you have a combinatorial explosion of possible features.
Compiling locally has several drawbacks. It needs the full compilation environment installed on every machine, which uses a lot of disk space, and some security people dislike it (because then attackers can also compile software locally on that machine); compiling needs a lot of memory and disk space, and uses a lot of processor time and electric power. It also means that signature schemes which only allow signed code cannot be used (or you need to have the signing key available on the target machine, making it somewhat pointless).
The combinatorial explosion of features has been somewhat tamed lately, by bundling sets of feature into feature levels (x86_64-v1, etc), but that still quadruples the amount of compiled code to be distributed, and newer features still have to be selected at runtime.
Compiled _on_ and compiled _for_ are not the same. There must be a way to go to the target machine, get some complete dump of CPU features, copy that to the compile-box, do the build, and copy the resulting binaries back.
I don't think you can really say it is "combinatorial" because there's not a mainstream machine with AES-NI but not, say, SSSE3. In any case if there were such a machine you don't need to support it. The one guy with that box can do scratch builds.
Obviously compiling for the target architecture is best, but for most software (things like crypto libraries excluded) 95% of the benefit of AVX2 is going to come from things like vectorized memcpy/memcmp. Building glibc using ifuncs to provide optimized implementations of these routines gives most users most of the benefit of AVX2 (or whatever other ISA extension) while still distributing binaries that work on older CPU microarchitectures.
ifunc memcpy also makes short copies suck ass on those platforms, since the dispatch cost dominates regardless of the vectorization. It's an open question whether ifunc helps or harms the performance of general use cases.
By "open question" I meant that there is compelling research indicating that GNU memcpy/memcmp is counterproductive, but the general Linux-using public did not get the memo.
On the other hand, it also means that your distro can supply a microarchitecture-specific libc and every program automatically gets the memcpy improvements. (Well, except for the golang/rust people.)
Wasn't this the point of Gentoo, back in the day? It was more about instruction scheduling and register allocation differences, but your system would be built with everything optimized for your uarch.
Too inflexible ideological. There are infinite things that most properly belong in a release file and not in the source, that can't be generated from that source by github actions, and seperately no one should be compelled to use github actions.
Because then for autoconf codebases you have to commit `./configure` or you have to require that users have autoconf installed and run `autoreconf -fi` first.
Maybe autoconf-using projects should really just require that users have autoconf installed.
If committing configure is objectionable, perhaps there could be "service" repositories that are not directly writable and are guaranteed to be nothing more than the base repo + autoconf cruft used to generate the releases.
Well, for example in jq we do commit bison/flex outputs because for users ensuring that they have the right version of those can be tricky. We could do the same w.r.t. autoconf and its outputs, though again, that won't preclude backdoors.
Committing built artifacts presents similar problems: how do you know that the committed artifacts are in fact derived from their sources? Or from non-backdoored versions of build tools for that matter? Hello Ken Thompson attacks.
I don't believe there's a nice easy answer to these questions.
What we do in jq is rely on GitHub Actions to run the build and `make dist`. In fact, we could now stop committing the bison/flex outputs, too, since we can make sure that the tarball includes them.
We do also publish the git repo snapshots that GitHub auto-generates for releases, though we do that because GitHub doesn't give one a choice.
Thinking about this more: maybe there would be some benefit to GitHub taking control of "release" repositories that may only be written to be GA. They'd write everything -- maybe as a docker image -- so anyone could pull down the image and compare shas, or whatever. And maybe this could also be done by their competitors. The ultimate goal would be to have multiple trusted parties performing the build on the same code producing the same output, and allowing any randos to do the same.
If the source is included in those images, we could conceivably prove that the target was based on the source.
Really disappointed in the number of posters here who are playing down rushing to judgement and suggesting perhaps a legitimate developer was compromised, when it's very clear this is sophisticated and not the work of a single person.
I'm recalling bad memories of the Juniper backdoor years ago.
Whoever did this, was playing the long game. As the top post pointed out, there was an effort to get this into Fedora.... which eventually makes its way into RHEL (read: high value targets). This was not for short term payoffs by some rogue developer trying to mine crypto or other such nonsense. What you are seeing here is the planting of seeds for something months or a year down the road.
So, it's been almost 24 hours since I read this yesterday. Is it confirmed that Jia Tan is the perpetrator? do we know who he/she really is? Or are we going to live for the rest of our lives only knowing the pseudo name? just like Satoshi Nakamoto did to us. ;)
Python for Windows bundles liblzma from this project, but it appears to be version 5.2.5 [0] vendored into the Python project's repo on 2022-04-18 [1], so that should be fine, right?
Which nation state (if any) is most likely behind this? China based on name, or is this a red herring?
The perpetrator did most GitHub actions between 10 and 18 UTC, which sort of rules out US based, unless the messages were scheduled. Consistent with Europe to Asia.
It's something always in the back of our minds as developers using public libraries, but when something like this happens, non-developers that hear about it start to associate it with the rest of the open-source community.
It's essentially a terrorist attack on developer experience. Thankfully, management doesn't follow the same approach as the TSA.
Is there any news concerning the payload analysis? Just curious to see if it can be correlated with something I have in my sshd logs (e.g. login attempt with specific RSA keys).
Now consider that your average Linux distribution pulls in tens of thousands of packages, each of which can be similarly compromised. Pretty scary to think about.
The terrible desktop software security model of weak/essentially non-existent security boundaries at run and compile time makes this all the more spicy.
Computer security for billions runs on the simultaneous goodwill of many thousand contributors. Optimistically said it's actually a giant compliment to the programming community.
And this is not even talking about hardware backdoors that are a million times worse and basically undetectable when done well. The myriad ways to betray user trust at any level of computation make me dizzy...
Also the attacker included in the 5.6.0 release the support for the long-awaited multi-threading decompression (and - broken - sandbox) making it very attractive to upgrade to...
It was probably a tactic to give a reason to upgrade. It's not always a fault for those who did or tried to do.
Anyone keeping current with OpenSUSE Tumbleweed got a update...downgrade. Prior to `zypper dup --no-allow-vendor-change` I had 5.6.0, now I'm at 5.4.6.
Everything which is not engaging in licking Big Tech balls (open source or not) on HN is served with severe downvoting, that most of the time (probably real trash human beings or AI trolls with headless blink/geeko|webkit).
State actor or not, let's not ignore that the backdoor has been discovered thanks to the open nature of the projects involved that allowed digging into the code. Just another example like the infamous Borland InterBase backdoor in the early 2K that remained dormant for years and was discovered months after the source code has been released.
If the xz malware authors worked for any corp that produced closed source drivers or blobs that can't be properly audited, we would be fucked; I just hope this is not already happening, because the attack surface in all those devices and appliances out there running closed code is huge.
Why are projects like xz and sshd still active? Just freeze it, it works fine. Only changes should be fixes for vulnerabilities. None of this complicated new functionality. If you want something like that make a new project. If it is truly better people will use it.
Yes, Arch Linux’s OpenSSH binary doesn’t even link to liblzma, which means your installation is not affected by this particular backdoor.
The authors of the `detect_sh` script didn’t have that scenario in mind, so the `ldd` invocation never finds a link and the script bails early without a message.
Interestingly on of the accounts that the GitHub account who introduced the backdoor follows was suspended very recently [1] who is also part of the org who runs XZ
That JiaT75 account is also suspended, if you check https://github.com/Larhzu?tab=following you'll see that they're suspended as well. It's pretty weird that it's that hard to find out whether a user is suspended.
It seems that to counter this type of supply chain attack, the best practices for managing software dependencies are to pin the version numbers of dependencies instead of using `latest`, and to use static linking instead of dynamic linking.
In the future: automated `diff` or any other A/B check to see whether or not the tarball matches the source repo (if not, auto-flag with a mismatch warning attribute), is that feasible to implement?
that's... creative. and patient. 11/10 concerning - now I'm wondering how many other projects could have shit like this in them or added right as I'm writing this shudder
Brain fart: would it be possible to attach passwords to a crypto based micro transaction such that every time you attempted a password entry your crypto account was charged a small fee for the login attempt?
This would thwart brute force attacks, but not be a significant cost for users. If you could attach your login to the crypto account it would mean the account would have to be funded to allow the attempt. The token wouldn't store passwords it would just be a gatekeeper to the login attempt.
The fees would be paid to the service providers as mining fees.
E.g. foo@bar.com needs a password and a token provided from a designated crypto address to gain access to the service.
xz is just a horribly designed format, and always has been. If you use it, please switch to Lzip. Same compression level, but designed by someone competent.
Someone competent? More like a drama queen butthurt that his pet project did not win the popularity contest. Not the kind of person I want to rely on for important tools.
Seems the backdoor relied on Debian and others patching their copies of openssh to support systemd notifications, and this would obviously not be the case on OpenBSD.
Build from source AND run an Ai agent that reviews every single line of code you compile (while hoping that the any potential exploit doesn’t also fool / exploit your AI agent)
You’re not wrong. However, building from source wouldn’t have protected you against this specific backdoor. The upstream source tarball itself was compromised in a cleverly sneaky way.
"However, building from source wouldn’t have protected you against this specific backdoor."
Depends on how exactly you build from source. A generic build was not the target. Andres Freund showed that the attack was targeted against a specific type of build system.
Maybe it's finally time to start sunsetting LZMA and xz all together in favor of newer algorithms like Zstandard that also offer better performance but compression rates on par with LZMA.
Please note: the changes have been made after GitHub has enforced 2FA (certainly not for "better security", but for promotion of FIDO2 and Windows Hello biometric impl of FIDO2, see https://codeberg.org/KOLANICH/Fuck-GuanTEEnomo for more info. Until recent times (for now access via git protocol is blocked for my acc, I guess based on lack of 2FA set up) it was even possible to push into all repos one has access by just using single-factor SSH key even without enabling 2FA in the account). As I have warned, nothing will protect when a backdoor is introduced by a malicious maintainer, or a "smart entrepreneur" who sold his project to a ad-company, or a loyal "patriot" living and earning money within reach of some state, or just a powerless man who got an offer he can't refuse. In general supply chain attacks by "legitimate" maintainers cannot be prevented. "Jia Tan" is just a sockpuppet to mitigate consequences to maintainers to make it look like they are not involved into it. They surely are. At least according to the current info it were they who have given the malicious account the permission to publish releases on behalf of the project and access to the repo.
IMHO all maintainers of the backdooored projects anyhow related to accepting the malicious changes should be considered as accomplices and boycotted. We don't need evidence of their liability, it is they who need to maintain their reputation. We are just free to take our decisions based on their reputation. Even if they were hacked themselves, it is not our problem, it is their problem. Our problem is to keep ourselves safe. It may feel "unjust" to ruin reputation of a person based on the fact he may be cheated or hacked… But if a person can be cheated or hacked, why should he/she have such a good reputation as everyone else?! So, it makes a lot of sense to just exclude and replace everyone, for whome there exists evidence of comprometation, no matter due to unconcern or malice. But FOSS is a doocracy serving products at dumpling prices ($0, free of charge), and for majority backdoored software is completely acceptable given that they get them free of charge. And powerful actors who can afford to pay for software will just hire devs to develop their private versions, while allowing the public to pay $0 for their free versions and use the backdoors placed into them themselves. In other words a complete market failure.
I think that
1. xz project must be shut down completely. I mean projects should stop using it as a dependency, exclude from distros, boycott it. LZMA algo was developed by Igor Pavlov in 7z project, but somehow it has happenned that liblzma was developed and maintained by unrelated folks. liblzma should be developed as a part of 7z project taking no code other than the trivial one for API compatibility adapter from xz.
2. Projects created by compromised authkrs should be boycotted.
3. Other projects touched by the compromised devs/maintainers should be audited.
4. All the projects using autotools should be audited and must replace autotools with cmake/meson. Autotools is a piece of shit, completely uncomprehensible. There is no surprise it was used to hude a backdoor - according to my experience in FOSS noone likes to touch its scripts anyhow.
5. No project should be built from releases. Project should be built from git directly. Implementing full support of SHA256 in git and git forges (GitHub, GitLab, Codeberg, sr.ht) should be accelerated to mitigate attacks using collisions to replace approved commits (I guess the randomness can be concealed from reviewer's eye in binary resource files, like pictures).
TLDR: Some people have been throwing around “China,” but it seems also quite possible that Jia is from somewhere in Eastern Europe pretending to be from China. In addition, Lasse Collin and Hans Jansen are from the same EET time zone.
These are my notes on time stamps/zones. There are a few interesting bits that I haven't fully fleshed out.
Here is the data on Jia’s time zone and the number of times he was recorded in that time zone:
3: + 0200 (in winter: February and November)
6: +0300 (in summer: in Jun, Jul, early October)
440: +0800
1. The +800 is likely CST. China (or Indonesia or Philippines), given that Australia does daylight savings time and almost no one lives in Siberia and the Gobi dessert.
2. The +0200/+0300, if we are assuming that this is one location, is likely on EET (Finland, Estonia, Latvia, Lithuania, Ukraine, Moldavia, Romania, Bulgaria, Greece, Turkey). This is because we see a switch from +300 in the winter (past the last weekend of October) and +200 in the summer (past the last Sunday in March).
Incidentally, this seems to be the same time zone as Lasse Collin and Hans Jansen…
Observation 2: Time zone inconsistencies
Let’s analyze the few times where Jia was recorded in a non +800 time zone. Here, we notice that there are some situations where Jia switches between +800 and +300/+200 in a seemingly implausible time. Indicating that perhaps he is not actually in +800 CST time, as his profile would like us to believe.
Jia Tan Tue, 27 Jun 2023 23:38:32 +0800 —> 23:38 + 8 = 7:30 (+ 1)
Jia Tan Tue, 27 Jun 2023 17:27:09 +0300 —> 17:27 + 3 = 20:30
—> about a 9 hour difference, but flight from China to anywhere in Eastern Europe is at a min 10 hours
Jia Tan Thu, 5 May 2022 20:53:42 +0800
Jia Tan Sat, 19 Nov 2022 23:18:04 +0800
Jia Tan Mon, 7 Nov 2022 16:24:14 +0200
Jia Tan Sun, 23 Oct 2022 21:01:08 +0800
Jia Tan Thu, 6 Oct 2022 21:53:09 +0300 —> 21:53 + 3 = 1:00 (+1)
Jia Tan Thu, 6 Oct 2022 17:00:38 +0800 —> 17:00 + 8 = 1:00 (+1)
Jia Tan Wed, 5 Oct 2022 23:54:12 +0800
Jia Tan Wed, 5 Oct 2022 20:57:16 +0800
—> again, given the flight time, this is even more impossible
Jia Tan Fri, 2 Sep 2022 20:18:55 +0800
Jia Tan Thu, 8 Sep 2022 15:07:00 +0300
Jia Tan Mon, 25 Jul 2022 18:30:05 +0300
Jia Tan Mon, 25 Jul 2022 18:20:01 +0300
Jia Tan Fri, 1 Jul 2022 21:19:26 +0800
Jia Tan Thu, 16 Jun 2022 17:32:19 +0300
Jia Tan Mon, 13 Jun 2022 20:27:03 +0800
—> the ordering of these time stamps, and the switching back and forth looks strange.
Jia Tan Thu, 15 Feb 2024 22:26:43 +0800
Jia Tan Thu, 15 Feb 2024 01:53:40 +0800
Jia Tan Mon, 12 Feb 2024 17:09:10 +0200
Jia Tan Mon, 12 Feb 2024 17:09:10 +0200
Jia Tan Tue, 13 Feb 2024 22:38:58 +0800
—> this travel time is possible, but the duration of stay is unlikely
Observation 3: Strange record of time stamps
It seems that from the commits, often the time stamps are out of order. I am not sure what would cause this other than some tampering.
Observation 4: Bank holiday inconsistencies
We notice that Jia’s work schedule and holidays seem to align much better with an Eastern European than a Chinese person.
- Working on 2023, 29 September: Mid Autumn Festival
- Working on 2023, 05 April: Tomb Sweeping Day
- Working on 2023, 26, 22, 23, 24, 26, 27 Jan: Lunar New Year
Eastern European holidays:
- Never working on Dec 25: Christmas (for many EET countries)
- Never working Dec 31 or Jan 1: New Years
Observation 5: No weekend work —> salary job?
The most common working days for Jia was Tue (86), Wed (85), Thu (89), and Fri (79). If we adjust his time zone to be EET, then that means he is usually working 9 am to 6 pm. This makes much more sense than someone working at midnight and 1 am on a Tuesday night.
These times also line up well with Hans Jansen and Lasse Collin.
I think it is more likely that Jia does this as part of his work… somewhere in Eastern Europe. Likely working with, or in fact being one and the same as, Hans Jansen and Lasse Collin.
You say yourself that the time data could be tampered. It's trivial to change commit dates in git. So this analysis means nothing by itself, unfortunately.
I wouldn't say that. This guy seems to have tried hard to appear Chinese (and possibly tampered the time stamps this way) – but based on that analysis, it seems plausible they did a bad job and were actually based out of Eastern Europe.
I asked ChatGPT 4 based on Jia's Github avatar image:
The timezones that ChatGPT thinks the avatar comes from aligns with +2 and +3, see what how it ranked and at the end the description of Jia's avatar:
---
Rank, Score, Country, City, Timezone, Criteria
1, 10, Saudi Arabia, Mecca, AST (UTC+3), Heartland of Islam, deeply rooted calligraphic traditions.
2, 9.5, Iran, Tehran, IRST (UTC+3:30), Integral Persian calligraphy with a distinct style and history.
3, 9, Turkey, Istanbul, TRT (UTC+3), Historical significance of Ottoman calligraphy, actively preserved.
4, 8.5, Egypt, Cairo, EET (UTC+2), Home to Al-Azhar University, with calligraphy in the curriculum.
5, 8, Morocco, Marrakech, WET (UTC+0), Calligraphy integrated into architecture and crafts.
6, 7.5, United Arab Emirates, Abu Dhabi, GST (UTC+4), Promotes Islamic arts through festivals and museums.
7, 7, Syria, Damascus, EET (UTC+2), Historical center of Arabic calligraphy, despite recent conflicts.
8, 6.5, Pakistan, Islamabad, PKT (UTC+5), Rich tradition, hosts several institutions and events dedicated to calligraphy.
9, 6, Indonesia, Jakarta, WIB (UTC+7), Largest Muslim-majority country with calligraphy in art and monuments.
10, 5.5, Spain, Cordoba, CET (UTC+1), Legacy of Islamic culture and appreciation for calligraphy, particularly in Andalusia.
--
GPT4: This image appears to be a stylized representation of the letter 'J' within an intricate border, possibly inspired by the art style of Islamic calligraphy. The ornate background is typical of arabesque patterns, which are characteristic of Islamic art and consist of repeating geometric forms that often echo the shapes of plants, flowers, and sometimes calligraphic writing. The letter 'J' stands out in a vibrant yellow, contrasting with the dark green of the surrounding design.
Interesting :). However, I think that EET is the only time zone that works. (This is mostly because is seems that the area follows DST, which most non western countries in the worlf do not).
This 2011 addition to the XZ Utils Wikipedia page is interesting because a) why is this relevant, b) who is Mike Kezner since he's not mentioned on the Tukaani project page (https://tukaani.org/about.html) under "Historical acknowledgments".
Arch Linux played an important role in making this compression software trusted and depended upon. Perhaps not a coincidence, but at the very least, such a big project should more carefully consider the software they distribute and rely on, whether it's worth the risk.
I am taking the initiative to gather more information regarding the possible precursors and perpetrators of the backdoor.
The purpose of this commentary is focused on open source information (OSINT).
I am not a judge of anyone or any action that may occur, the objective of this comment is to help through accurate and quick information to help the core developers of the affected packages and consequently the Linux kernel (which may have been indirectly or directly affected) take action necessary in relation to the fact that occurred.
NOTE: This comment will always have "edit" so always review it for information.
Information I have so far.
Summary:
1. GitHub Account Suspension:
- The accounts of @JiaT75 and @Larhzu were suspended by GitHub.
- All Tukaani repositories, including downloads, were disabled.
- Investigate the cause of the account suspensions and whether there is any correlation with suspicious activities.
2. Possible Backdoor in xz/liblzma:
- There are concerns about the presence of a backdoor in xz/liblzma.
- Investigate whether there is evidence of compromise in the source code and recent updates.
- Examine potential impacts, especially if the software is used in critical systems.
3. Updates and Patches in Packages:
- Note recent updates in packages such as MinGW w64, pacman-static, Alpine, and OpenSUSE.
- Review changelogs to understand if these updates are related to security fixes.
4. Jia's Activities on Platforms and Projects:
- Investigate Jia's contributions to different projects and platforms, such as Arch Linux, Alpine Linux, and OpenSUSE.
- Check for correlations between Jia's activities and reported security issues.
5. Libera Registration Information:
- Analyze Jia's registration details on Libera to determine the timeline of their online activities.
- Consider correlating this information with other online activities of Jia.
6. VPN Usage:
- Confirm Jia's use of VPN and assess its impact on security investigations.
- Explore possible reasons for using a VPN and how it may affect the identification and tracking of online activities.
The stable releases don't have this particular backdoor, but they're still using older versions of the library that were released by the same bad actor.
A few interesting bits that I haven't fully fleshed out. TLDR: Some people have been throwing around that Jia is from “China,” but it seems also quite possible that Jia is from somewhere in Eastern Europe pretending to be from China. In addition, Lasse Collin and Hans Jansen are from the same EET time zone.
Here is the data on Jia’s time zone and the number of times he was recorded in that time zone:
3: + 0200 (in winter: February and November)
6: +0300 (in summer: in Jun, Jul, early October)
440: +0800
1. The +800 is likely CST. China (or Indonesia or Philippines), given that Australia does daylight savings time and almost no one lives in Siberia and the Gobi dessert.
2. The +0200/+0300, if we are assuming that this is one location, is likely on EET (Finland, Estonia, Latvia, Lithuania, Ukraine, Moldavia, Romania, Bulgaria, Greece, Turkey). This is because we see a switch from +300 in the winter (past the last weekend of October) and +200 in the summer (past the last Sunday in March).
1. Incidentally, this seems to be the same time zone as Lasse Collin and Hans Jansen…
Observation 2: Time zone inconsistencies
Let’s analyze the few times where Jia was recorded in a non +800 time zone. Here, we notice that there are some situations where Jia switches between +800 and +300/+200 in a seemingly implausible time. Indicating that perhaps he is not actually in +800 CST time, as his profile would like us to believe.
Jia Tan Tue, 27 Jun 2023 23:38:32 +0800 —> 23:38 + 8 = 7:30 (+ 1)
Jia Tan Tue, 27 Jun 2023 17:27:09 +0300 —> 17:27 + 3 = 20:30
—> about a 9 hour difference, but a flight from China to anywhere in Eastern Europe is at a min 10 hours
Jia Tan Thu, 5 May 2022 20:53:42 +0800
Jia Tan Sat, 19 Nov 2022 23:18:04 +0800
Jia Tan Mon, 7 Nov 2022 16:24:14 +0200
Jia Tan Sun, 23 Oct 2022 21:01:08 +0800
Jia Tan Thu, 6 Oct 2022 21:53:09 +0300 —> 21:53 + 3 = 1:00 (+1)
Jia Tan Thu, 6 Oct 2022 17:00:38 +0800 —> 17:00 + 8 = 1:00 (+1)
Jia Tan Wed, 5 Oct 2022 23:54:12 +0800
Jia Tan Wed, 5 Oct 2022 20:57:16 +0800
—> again, given the flight time, this is even more impossible
Jia Tan Fri, 2 Sep 2022 20:18:55 +0800
Jia Tan Thu, 8 Sep 2022 15:07:00 +0300
Jia Tan Mon, 25 Jul 2022 18:30:05 +0300
Jia Tan Mon, 25 Jul 2022 18:20:01 +0300
Jia Tan Fri, 1 Jul 2022 21:19:26 +0800
Jia Tan Thu, 16 Jun 2022 17:32:19 +0300
Jia Tan Mon, 13 Jun 2022 20:27:03 +0800
—> the ordering of these time stamps and the switching back and forth between time zones looks strange.
Jia Tan Thu, 15 Feb 2024 22:26:43 +0800
Jia Tan Thu, 15 Feb 2024 01:53:40 +0800
Jia Tan Mon, 12 Feb 2024 17:09:10 +0200
Jia Tan Mon, 12 Feb 2024 17:09:10 +0200
Jia Tan Tue, 13 Feb 2024 22:38:58 +0800
—> this travel time is possible, but the duration of stay is unlikely
Observation 3: Strange record of time stamps
It seems that from the commits, often the time stamps are out of order. I am not sure what would cause this other than some tampering.
Observation 4: Bank holiday inconsistencies
We notice that Jia’s work schedule and holidays seems to align much better with an Eastern European than a Chinese person.
Chinese bank holidays (just looking at 2023):
- Working on 2023, 29 September: Mid Autumn Festival
- Working on 2023, 05 April: Tomb Sweeping Day
- Working on 2023, 26, 22, 23, 24, 26, 27 Jan: Lunar New Year
Eastern European holidays:
- Never working on Dec 25: Christmas (for many EET countries)
- Never working Dec 31 or Jan 1: New Years
Observation 5: Little weekend work —> salary job?
The most common working days for Jia were Tue (86), Wed (85), Thu (89), and Fri (79). If we adjust his time zone to EET, then that means he is usually working 9 am to 6 pm. This makes much more sense than someone working at midnight and 1 am on a Tuesday night.
These times also line up well with Hans Jansen and Lasse Collin.
I think it is more likely that Jia does this as part of his work… somewhere in Eastern Europe. Likely working with, or in fact being one and the same as, Hans Jansen and Lasse Collin.
true, that is suspicious as well. A person that hasn't even created any bugs or issues suddenly has a big problem with the speed of development? Especially the way this was phrased: "You ignore the many
patches bit rotting away on this mailing list. Right now you choke your repo.
Why wait until 5.4.0 to change maintainer? Why delay what your repo needs?"
"Why delay what your repo needs?" This sounds like scammer lingo
Wow, people suck. I almost hope it's fake profiles urging the maintainer to take on a new member as a long con. Because I sincerely hope Jigar Kumar is not a real person behaving like that towards volunteers working for free.
I would put money on government hackers. They're the sort of people that have the time to pull something like this off. Frankly I'm really surprised it isn't more common, though maybe it is and these guys were just super blatant. I would have expected more plausible deniability.
Been saying this the whole day now, GitHub really needs an automated diff / A/B check-up on tarballs against the actual repo, flag everything with at least a warning (+[insert additional scrutiny steps here]) when the tarball isn't matching the repo.
I think its much more likely this was not a bad actor, given their long history of commits.
It's a known fact that China will "recruit" people to operate them. A quote:
> They talk to them, say my friend, I see you like our special menu. Are you from China? Are you here on a VISA? Do you have family back there? Would you like your family to stay alive? Is your loyalty to this temporary employer or is your loyalty to your motherland? You know, a whole bunch of stuff like that. That’s how Chinese intelligence operations acts...
This just gives feelings of less "compromised account" and more "Your account is now our account"
For the purposes of security discussions, I would say yes. You often don't know their real identity let alone their motivations and tribulations.
However if we were critiquing characters in a book-- especially ones where narrative voice tells us exactly their true motivations--then maybe not, and they get framed as a "dupe" or "manipulated" etc.
I believe your parent is trying to make a distinction that the handle's history may not be suspect, only recent activity, positing a rubber-hose type compromise.
I think we should seriously consider something like a ts clearance as mandatory for work on core technologies. Many other projects, both open and closed, are probably compromised by foreign agents.
> I think we should seriously consider something like a ts clearance as mandatory for work on core technologies.
Was xz/lzma a core technology when it was created? Is my tiny "constant time equality" Rust crate a core technology? Even though it's used by the BLAKE3 crate? By the way, is the BLAKE3 crate a core technology? Will it ever become a core technology?
With free software in general, things do not start a "core technology"; they become a "core technology" over time due to usage. At which point would a maintainer have to get a TS clearance? Would the equivalent of a TS clearance from my Latin America country be acceptable? And how would I obtain it? Is it even available to people outside the military and government (legit question, I never looked)?
Who is "we"? Are you from the US by any chance? Do you mean that the US government should rewrite every piece of core architecture (including Linux, Ssh, Nginx...) from scratch? Because they are all "contaminated" and actually were created by non-americans.
If that's the case, you do you. Do you also think that all other countries should do the same, and rewrite everything from scratch for their government use (without foreign, for example American, influence)? And what about companies? Should they be forced to switch to their government's "safe" software, or can they keep using Linux and ssh? What about multi-national companies? And what even counts as a "core" software?
The Linux kernel is complaining about a lack of funding for CI-one of the highest visibility projects out there. Where will the money come from for this?
Corps? Aside from Intel most of them barely pay to upstream their drivers.
The govt? The US federal government cut so much of it's support since the 70s and 80s.
You're right, but accepting code from random Gmail accounts can't be the solution. Honestly the Linux kernel is a bloated mess, and will probably never be secured.
Accepting code from any source without properly reviewing it is surely the actual problem, no? This person only infiltrated this project because there was no proper oversight.
Maintainers need to be more stringent and vigilant of the code they ship, and core projects that many other projects depend upon should receive better support, financial and otherwise, from users, open source funds and companies alike. This is a fragile ecosystem that this person managed to exploit, and they likely weren't the only one.
Maintainers can't fully review all code that comes in. They don't have the resources. Even if they could give it a good review, a good programmer could probably still sneak stuff in. That's assuming a maintainer wasn't compromised, like in this case. We need a certain level of trust that the contributors are not malicious.
I’ve been a package maintainer for a decade. I make it a habit to spot check the source code of every update of every upstream package, hoping that if many others do the same, it might make a difference.
But this backdoor? I wouldn’t have been able to spot it to save my life.
This wasn't caused by not reviewing the code of a dependency. This was a core maintainer of xz, who gradually gained trust and control of the project, and was then able to merge changes with little oversight. The failure was in the maintenance of xz, which would of course be much more difficult to catch in dependent projects. Which is why it's so impressive that it was spotted by an OpenSSH user. Not even OpenSSH maintainers noticed this, which points to a failure in their processes as well, to a lesser degree.
I do agree that it's unreasonable to review the code of the entire dependency tree, but reviewing own code thoroughly and direct dependencies casually should be the bare minimum we should expect maintainers to do.
> Not even OpenSSH maintainers noticed this, which points to a failure in their processes as well, to a lesser degree.
The OpenSSH project has nothing to do with xz.
The transitive dependency on liblzma was introduced by a patch written by a third party. [1] You can't hold OpenSSH project members accountable for something like this.
Alright, that's fair. But I mentioned them as an example. Surely liblzma is a dependency in many projects, and _none_ of them noticed anything strange, until an end user did?
This is a tragedy of the commons, and we can't place blame on a single project besides xz itself, yet we can all share part of the blame to collectively do better in the future.
One of the primary responsibilities of a maintainer is to ensure the security of the software. If they can't keep up with the pace of development in order to ensure this for their users, then this should be made clear to the community, and a decision should be made about how to proceed. Open source maintenance is an often stressful and thankless role, but this is part of the problem that allowed this to happen. Sure, a sophisticated attacker would be able to fool the eyes of a single tired maintainer, but the chances of that happening are much smaller if there's a stringent high bar of minimum quality, and at least one maintainer understands the code that is being merged in. Change proposals should never be blindly approved, regardless of who they come from.
At the end of the day we have to be able to answer why this happened, and how we can prevent it from happening again. It's not about pointing fingers, but about improving the process.
BTW, there have been several attempts at introducing backdoors in the Linux kernel. Some manage to go through, and perhaps we don't know about others, but many were thwarted due to the extreme vigilance of maintainers. Thankfully so, as everyone is well aware of how critical the project is. I'm not saying that all projects have the resources and visibility of Linux, but clearly vigilance is a requirement for lowering the chances of this happening.
> That's assuming a maintainer wasn't compromised, like in this case.
What makes you say that? Everything I've read about this (e.g. [1]) suggests that this was done by someone who also made valid contributions and gained gradual control of the project, where they were allowed to bypass any checks, if they existed at all. The misplaced trust in external contributions, and the lack of a proper peer review process are precisely what allowed this to happen.
My understanding is that the attacker was the only maintainer of xz, that was trusted by upstream maintainers. They couldn't realistically check his work. The defence against this can't be "do better, volunteer maintainers". Maybe we could have better automated testing and analysis, but OSS is allergic to those.
Sure, I'm not saying this is the only solution, or that it's foolproof. But this should be a wake up call for everyone in the OSS community to do better.
Projects that end up with a single maintainer should raise some flags, and depending on their importance, help and resources should be made available. We've all seen that xkcd, and found it more amusing than scary.
One idea to raise awareness: a service that scans projects on GitHub and elsewhere, and assigns maintenance scores, depending on various factors. The bus factor should be a primary one. Make a scoreboard, badges, integrate it into package managers and IDEs, etc. GitHub itself would be the ideal company to implement this, if they cared about OSS as much as they claim to do.
That's a very US centric view and would practically split the open source community along the atlantic at best and fracture it globally at worst. Be careful what you wish for.
That's hard to do when the development of these libraries is so international. Not to mention that it's already so hard to find maintainers for some of these projects. Given that getting a TS clearance is such a long and difficult process, it would almost guarantee more difficulty in finding people to do this thankless job.
It doesn't need to be TS for open source(but for closed, I'm leaning yes). But all code for these core technologies need to be tied to a real person that can be charged in western nations. Yes, it will make it harder to get people, but with how important these technologies are, we really should not be using some random guys code in the kernel.
Don't forget that the NSA bribed RSA (the company) to insert a backdoor into their RNG. Being in western jurisdiction doesn't mean you won't insert backdoors into code. It just changes whom you will target with these backdoors. But they all equally make our technology less trustworthy so they are all equally despicable.
That just means the bad actors will all have clearance while putting in a bunch of hurdles for amateur contributors. The only answer is the hard one, constant improvement in methods to detect and mitigate bugs.
"Constant improvement" sounds like "constantly playing catch-up". Besides that, someone with TS can be arrested and charged, and I don't want amateur contributors.
And you're free to not accept amateur contributions to the OS projects you maintain. Hell, you can require security clearance for your contributors right now, if you want.
This only ensures the backdoors are coming from governments that issued the clearances, nothing more. I prefer more competition, at least there is incentive to detect those issues.
It will ensure that my OS doesn't have code from random Gmail accounts. If someone with U.S clearance submits a backdoor, they should either be charged in the U.S, or extradited to somewhere that will charge them. We have no idea who this person is, and even if we did we probably could not hold them accountable.
Killing your pipeline for innovation and talent development doesn't make you secure, it makes you fall behind. The Soviet Union found this out the hard way when they made a policy decision to steal chip technology instead of investing in their own people. They were outpaced and the world came to use chips, networks, and software designed by Americans.
Who's we? Americans? Sure that's fine for you, but Americans aren't exactly trustworthy outside of the US either and I say that as someone who's usually pro US. This sort of mentality just shows a lack of understanding of how most of the world sees the US. Even in places like say, france, the us is seen as an ally but a very untrustworthy one. Especially since out of all the confirmed backdoors up until now, most of them were actually US made.
If this backdoor turns out to be linked to the US, what would your proposal even solve?
"We" doesn't have to be the U.S. This is a false dichotomy that I see people in this thread keep pushing. I suspect in bad faith, by the people that want to insert backdoors. As a baseline, we could keep the contributors to NATO and friends. If a programmer is caught backdooring, they can be charged and extradited to and from whatever country.
If it's just an extradition issue, the US has extradition treaties with 116 countries. You'd still have to 1) ensure that user is who they say they are (an ID?) and 2) they are reliable and 3) no one has compromised their accounts.
1) and 3) (and, to an extent, 2) )are routinely done, to some degree, by your average security-conscious employer. Your employer knows who you are and probably put some thought on how to avoid your accounts getting hacked.
But what is reliability? Could be anything from "this dude has no outstanding warrants" to "this dude has been extensively investigated by a law enforcement agency with enough resources to dig into their life, finances, friends and family, habits, and so on".
I might be willing to go through these hoops for an actual, "real world" job, but submitting myself to months of investigation just to be able to commit into a Github repository seems excessive.
Also, people change, and you should be able to keep track of everyone all the time, in case someone gets blackmailed or otherwise persuaded to do bad things. And what happens if you find out someone is a double agent? Rolling back years of commits can be incredibly hard.
Getting a TS equivalent is exactly what helps minimize them chances that someone is compromised. Ideally, such an investigation would be transferable between jobs/projects, like normal TS clearance is. If someone is caught, yes rolling back years isn't practical, but we probably ought to look very closely at what they've done, like is probably being done with xz.
If the ultimate goal is to avoid backdoors in critical infrastructures (think government systems, financial sector, transportation,...) you could force those organizations to use forks managed by an entity like CISA, NIST or whatever.
If the ultimate goal is to avoid backdoors in random systems (i.e. for "opportunistic attacks"), you have to keep in mind random people and non-critical companies can and will install unknown OSS projects as well as unknown proprietary stuff, known but unmaintained proprietary stuff (think Windows XP), self-maintained code, and so on. Enforcing TS clearances on OSS projects would not significantly mitigate that risk, IMHO.
Not to mention that, as we now know, allies spy and backdoor allies (or at least they try)... so an international alliance doesn't mean intelligence agencies won't try to backdoor systems owned by other countries, even if they are "allies".
The core systems of Linux should be secured, regardless of who is using it. We don't need every single open source project to be secured. It's not okay to me that SSH is potentially vulnerable, just because it's my personal machine. As for allies spying on each other, that certainly happens, but is a lot harder to do without significant consequences. It will be even harder if we make sure that every commit is tied to a real person that can face real consequences.
The "core systems of Linux" include the Linux kernel, openssh, xz and similar libraries, coreutils, openssl, systemd, dns and ntp clients, possibly curl and wget (what if a GET on a remote system leaks data?),... which are usually separate projects.
The most practical way to establish some uniform governance over how people use those tools would involve a new OS distribution, kinda like Debian, Fedora, Slackware,... but managed by NIST or equivalent, which takes whatever they want from upstream and enrich it with other features.
But it doesn't stop here. What about browsers (think about how browsers protect us from XSS)? What about glibc, major interpreters and compilers? How do you deal with random Chrome or VS Code extensions? Not to mention "smart devices"...
Cybersecurity is not just about backdoors, it is also about patching software, avoiding data leaks or misconfigurations, proper password management, network security and much more.
Relying on trusted, TS cleared personnel for OS development doesn't prevent companies from using 5-years old distros or choosing predictable passwords or exposing critical servers to the Internet.
As the saying goes, security is not a product, it's a mindset.
We wouldn't have to change the structure of the project to ensure that everyone is trustworthy.
As for applications beyond the core system, that would fall on the individual organizations to weigh the risks. Most places already have a fairly limited stack and do not let you install whatever you want. But given that the core system isn't optional in most cases, it needs extra care. That's putting aside the fact that most projects are worked on by big corps that do go after rogue employees. Still, I would prefer if some of the bigger projects were more secure as well.
Your "mindset" is basically allowing bad code into the Kernel and hoping that it gets caught.
>Your "mindset" is basically allowing bad code into the Kernel and hoping that it gets caught.
Not at all. I'm talking about running more and more rigorous security tests because you have to catch vulnerabilities, 99% of which are probably introduced accidentally by an otherwise good, reliable developer.
This can be done in multiple ways. A downstream distribution which adds its own layers of security tests and doesn't blindly accept upstream commits. An informal standard on open source projects, kinda like all those Github projects with coverage tests shown on the main repo page. A more formal standard, forcing some critical companies to only adopt projects with a standardized set of security tests and with a sufficiently high score. All these approaches focus on the content, not on the authors, since you can have a totally good-willing developer introducing critical vulnerabilities (not the case here, apparently, but it happens all the time).
On top of that, however, you should also invest in training, awareness, and other "soft" issues that are actually crucial in order to actualy improve cybersecurity. Using the most battle-tested operating systems and kernels is not enough if someone actually puts sensitive data on an open S3 bucket, or if someone only patches their systems once a decade, or if someone uses admin/admin on an Internet-facing website.
I would presume it's a state actor. Generally in the blackhat world, attackers have very precise targets. They want to attack this company or this group of individuals. But someone who backdoors such a core piece of open source infrastructure wants to cast a wide net to attack as many as possible. So that fits the profile of a government intelligence agency who is interested in surveilling, well, everything.
Or it could in theory be malware authors (ransomware, etc). However these guys tend to aim at the low hanging fruits. They want to make a buck quickly. I don't think they have the patience and persistence to infiltrate an open source project for 2 long years to finally gain enough trust and access to backdoor it. On the other hand, a state actor is in for the long term, so they would spend that much time (and more) to accomplish that.
So that's my guess: Jia Tan is an employee of some intelligence agency. He chose to present an asian persona, but that's not necessarily who he truly represents. Could be anyone, really: Russia, China, Israel, or even the US, etc.
Edit: given that Lasse Collin was the only maintainer of xz utils in 2022 before Jia Tan, I wouldn't be surprised if the state actor interfered with Lasse somehow. They could have done anything to distract him from the project: introduce a mistress in his life, give him a high-paying job, make his spouse sick so he has to care for her, etc. With Lasse not having as many hours to spend on the project, he would have been more likely to give access to a developer who shows up around the same time and who is highly motivated to contribute code. I would be interested to talk to Lasse to understand his circumstances around 2022.
> I haven't lost interest but my ability to care has been fairly limited
mostly due to longterm mental health issues but also due to some other
things. Recently I've worked off-list a bit with Jia Tan on XZ Utils and
perhaps he will have a bigger role in the future, we'll see.
That "Jigar Kumar" is like fake and one-time throw-off account, probably from the same state actor to orchestrate the painstakingly prepared supply chain attack (under the sun).
At first glance I thought it was a far-fetched conclusion but then I read in a subsequent reply he wrote:
> With your current rate, I very doubt to see 5.4.0 release this year. The only
progress since april has been small changes to test code. You ignore the many
patches bit rotting away on this mailing list. Right now you choke your repo.
Why wait until 5.4.0 to change maintainer? Why delay what your repo needs?
Oh wow, all his posts are trying to pressure Lasse, or guilt him into getting Jia on board. They're definitely conspiring.
"Your efforts are good but based on the slow release schedule it will unfortunatly be years until the community actually gets this quality of life feature."
"Patches spend years on this mailing list. 5.2.0 release was 7 years ago. There is no reason to think anything is coming soon."
"With your current rate, I very doubt to see 5.4.0 release this year. The only progress since april has been small changes to test code. You ignore the many patches bit rotting away on this mailing list. Right now you choke your repo. Why wait until 5.4.0 to change maintainer? Why delay what your repo needs?"
"Progress will not happen until there is new maintainer. XZ for C has sparse commit log too. Dennis you are better off waiting until new maintainer happens or fork yourself. Submitting patches here has no purpose these days. The current maintainer lost interest or doesn't care to maintain anymore. It is sad to see for a repo like this."
"Is there any progress on this? Jia I see you have recent commits. Why can't you commit this yourself?"
"Over 1 month and no closer to being merged. Not a suprise."
Given the details from another comment [1], it sounds like both maintainers are suspicious. Lasse's behavior has changed recently, and he's been pushing to get Jia Tan's changes into the Linux kernel. It's possible both accounts aren't even run by the original Lasse Collin and Jia Tan anymore.
Edit: Also, Github has suspended both accounts. Perhaps they know something we don't.
Whoops, I linked the wrong comment. I meant to link this one [1]. Anyway, seems like there's potentially a whole trail of compromised and fake accounts [2]. Someone in a government agency somewhere is pretty disappointed right now.
According to Webarchive, https://tukaani.org/contact.html changed very recently (between 11/02/2024 and 29/02/2024) to add Lasse Collin's PGP key fingerprint. That timing is weird, considering his git activity at that time is almost non existent. Although, i checked, this key existed back in 2012.
> I wouldn't be surprised if the state actor interfered with Lasse somehow
People could also just get tired after years of active maintainership or become busier with life. Being the sole maintainer of an active open source project on top of work and perhaps family takes either a lot of enthusiasm or a lot of commitment. It's not really a given that people want to (or can) keep doing that forever at the same pace.
Someone then spots the opportunity.
I have no idea what the story is here but it might be something rather mundane.
Or they have just one or a small number of targets, but don’t want the target(s) to know that they were the only target(s), so they backdoor a large number of victims to “hide in the crowd”.
I agree that this is likely a state actor, or at least a very large & wealthy private actor who can play the long game…
> Generally in the blackhat world, attackers have very precise targets
Lol, what
> wants to cast a wide net to attack as many as possible. So that fits the profile of a government intelligence agency
That's quite backwards. Governments are far more likely to deploy a complex attack against a single target (see also: Stuxnet); other attackers (motivated primarily by money) are far more likely to cast a wide net.
> That's quite backwards. Governments are far more likely to deploy a complex attack against a single target (see also: Stuxnet); other attackers (motivated primarily by money) are far more likely to cast a wide net.
Governments are well known to keep vulnerabilities hidden (see EternalBlue). Intentionally introducing a vulnerability doesn’t seem that backwards tbh
Oh for sure. I'm not suggesting that this wasn't a government actor, although I'd only give you 50/50 odds on it myself. It coulda just been someone with a bunch of time, like phreakers of old.
Don't forget that you could have state actors who are otherwise interested in open source code, and working to actually improve it.
In fact, that'd be the best form of deep cover. It'll be interested to watch as people more knowledgable than I pour over every single commit and change.
If you have a backdoor in a specific piece of software already, what is the purpose of trying to introduce another backdoor (and risk it getting caught)?
There are two general attack targets I'd use if I had access to a library/binary like xz:
(1) A backdoor like this one, which isn't really about its core functions, but about the fact that it's a library linked into critical code, so that you can use it to backdoor _other things_. Those are complex and tricky because you have to manipulate the linking/GOT specifically for a target.
(2) Insert an exploitable flaw such as a buffer overflow so that you can craft malicious .xz files that result in a target executing code if they process your file. This is a slightly more generic attack vector but that requires a click/download/action.
Not every machine or person you want to compromise has an exposed service like ssh, and not every target will download/decompress a file you send to them. These are decently orthogonal attack vectors even though they both involve a library.
(Note that there's as yet no evidence for #2 - I'm just noting how I'd try to leverage this to maximum effect if I wanted to.)
xz is a data compression tool, so it's natural to have compressed files for (de)compression tests.
these files are also useful to check that the library we just built works correctly. but they aren't necessary for installation.
we may have more sophisticated procedures that will allow us to use some parts of distribution only for tests. This may significantly reduce an attack vector - many projects have huge, sophisticated testing infrastructure where you can hide the entire Wikipedia.
> They want to attack this company or this group of individuals. But someone who backdoors such a core piece of open source infrastructure wants to cast a wide net to attack as many as possible.
The stuxnet malware, which compromised Siemens industrial controls to attack specific centrifuges in uranium enrichment plants in Iran, is a counterexample to that.
Stuxnet wasn't similar to this xz backdoor. The Stuxnet creators researched (or acquired) four Windows zero-days, a relatively short-term endeavor. Whereas the xz backdoor was a long-term 2.5 years operation to slowly gain trust from Lasse Collin.
But, anyway, I'm sure we can find other counter-examples.
If a government wants to cast a wide nest and catch what they can, they'll just throw a tap in some IXP.
If a government went to this much effort to plant this vulnerability, they absolutely have targets in mind - just like they did when they went to the effort of researching (or acquiring) four separate Windows zero-days, combining them, and delivering them...
Adding some unreadable binary to the source code is a really dangerous thing to do.
We also need tools to quickly detect the addition of indentation symbols that can be easily overlooked.
BYW,I had a classmate who used to play DOTA1(on war3) under this name at the University of Science and Technology of China a long time ago, and this was his first girlfriend name (maybe) . His father was a high-ranking official. Then he joined the parent department of the Internal Security Detachment, a secret service that has gained a lot of power in the last few years. I hope I'm not awake . lol.
It's ridiculous to think it's the US as it would be an attack on Red Hat a US company and an attack on Americans. It's a good way to be dragged in front of Congress.
You say that as if members of US government agencies didn't plot terror attacks on Americans (Operation Northwood), steal the medical records of American whistleblowers (Ellsberg), had to be prevented from assassinating American journalists (Gordon Liddy, on Jack Anderson), collude to assassinate American political activists (Fred Hampton), spy on presidential candidates (Watergate), sell weapons to countries who'd allegedly supported groups who'd launched suicide bombing attacks on American soldiers (Iran-Contra), allow drug smugglers to flood the USA with cocaine so that they could supply illegal guns to terrorists abroad on their return trip (Iran-Contra again) and get caught conducting illegal mass-surveillance on American people as a whole (Snowden). Among others.
It's super-naive to suggest that government agencies wouldn't act against the interest of American citizens and companies because there might be consequences if they were caught. Most of the instances above actually were instances where the perpetrators did get caught, which is why we know about them.
You don’t even have to be this conspiratorially minded to believe the NSA is a legitimate suspect here. (For the record, I think literally every intelligence agency on Earth is plausible here.)
You kind of lost the thread when you say, “act against the interests of American citizens and companies”. Bro, literally anyone could be using xz, and anyone could be using Red Hat. You’re only “acting against Americans” if you use it against Americans. I don’t know who was behind this, but a perfectly plausible scenario would be the NSA putting the backdoor in with an ostensibly Chinese login and then activating on machines hosted and controlled by people outside of the US.
Focusing on a specific distro is myopic. Red Hat is popular.
> but a perfectly plausible scenario would be the NSA putting the backdoor in with an ostensibly Chinese login and then activating on machines hosted and controlled by people outside of the US.
There's a term for that: NOBUS (https://en.wikipedia.org/wiki/NOBUS). It won't surprise me at all if this backdoor can only be exploited if the attacker has the private key corresponding to a public key contained in the injected code. It also won't surprise me if this private key ends up being stolen by someone else, and used against its original owner.
The HN crowd has come a long way from practically hero-worshipping Snowden to automatically assuming that 'state actor' must mean the countries marked evil by the US.
The US has backdoored RSA's RNG and thus endangered the security of American companies. It is naive to think that US intelligence agencies will act in the best interest of US citizens or companies.
Notably that was a "no-one-but-us" backdoor, that requires a specific secret key to exploit. We'll see when someone analyzes the payload further, but presumably this backdoor also triggers on a specific private key. If not there are ways to do it that would look far more like an innocent mistake, like a logic bug or failed bounds check.
I can see some arguments that might persuade the NSA to run an attack like this
- gathers real world data on detection of supply attacks
- serves as a wake-up call for a software community that has grown complacent on the security impact of dependencies
- in the worst case, if no one finds it then hey, free backdoor
There's an implicit "always" in their second sentence, if you're confused by the wording. They aren't positing the equivalent of the guard that only lies.
It's an interesting story for those who haven't heard about that an think the NSA could only be up to evil. You may not have read it as the guard only ever lies, but that doesn't stop people from thinking that anyway.
No I think that's it. "What about it?" kinda set me off, and then "if you're confused by the wording" was unnecessarily condescending.
You coulda just pointed out that just because they did right in the case of DSA, doesn't mean we should actually ever trust them, which I would agree is the correct stance.
Mostly I think that story is neat and wanted people to know about it, so I asked a question as a performative writing technique.
"What about it?" is a very real question that I still want to know the answer to. What did you want as a response when you asked that?
"If you're confused by the wording" was definitely condescending, but I think interpreting guinea-unicorn's post that way doesn't make sense. Even in your reply you didn't say you think it's the right interpretation, just that someone might believe the NSA could "only be up to evil". That followup gives the impression you were giving an FYI for readers. Which is nice to do, but then the "what about" doesn't fit.
So all of that is to say the words "what about" felt like you were deciding to read their post in an unfair way.
I'm happy to listen to an alternate explanation! But you ignored my request for why you said that, and I'm honestly kind of confused as to why that's what set you off.
So overall I think I think my first post can come across as fighty but I don't think the followups should suggest I'm making things fighty. I think my response to 2OEH8eoCRo0 was fine given the way they were ignoring half of the four sentences I had typed.
You are understating the level of evidence that points to the NSA being fully aware of what it was doing.
To be clear, the method of attack was something that had been described in a paper years earlier, the NSA literally had a program (BULLRUN) around compromising and attacking encryption, and there were security researchers at NIST and other places that raised concerns even before it was implemented as a standard. Oh, and the NSA paid the RSA $10 million to implement it.
Heck, even the chairman of the RSA implies they got used by the NSA:
In an impassioned speech, Coveillo said RSA, like many in industry, has worked with the NSA on projects. But in the case of the NSA-developed algorithm which he didn’t directly name, Coviello told conference attendees that RSA feels NSA exploited its position of trust. In its job, NSA plays two roles, he pointed out. In the information assurance directorate (IAD) arm of NSA, it decides on security technologies that might find use in the government, especially the military. The other side of the NSA is tasked with vacuuming up data for cyber-espionage purposes and now is prepared to take an offensive role in cyber-attacks and cyberwar.
“We can’t be sure which part of the NSA we’re working with,” said Coviello with a tone of anguish. He implied that if the NSA induced RSA to include a secret backdoor in any RSA product, it happened without RSA’s consent or awareness.
What type of confirmation do you want? The documents aren't going to be declassified in the next couple of decades, if ever.
I've never heard anyone claim that Dual_EC_DRBG is most likely not intentionally backdoored, but there's literally no way to confirm because of how its written. If we can't analyze intention from the code, we can look at the broader context for clues. The NSA spent an unusual amount of effort trying to push forward an algorithm that kept getting shot down because it was slower than similar algorithms with no additional benefits (the $10 million deal specified it as a requirement [1]). If you give the NSA the benefit of the doubt, they spent a lot of time and money to... intentionally slow down random number generation?!
As an American, I'd prefer a competent NSA than an incompetent NSA that spends my tax dollars to make technology worse for literally no benefit...
I'd say that CCTV is quite different to wiretapping. You (generally) wouldn't have the expectation of privacy in a public place, most people would expect that phone calls, messages, etc do remain private.
Now, GCHQ is no better than the NSA for that either, but I don't think CCTV is a good comparison.
While his leaks expose surveillance, he was useful idiot https://en.wikipedia.org/wiki/Useful_idiot in hands of Assange club. And it might be event of his saving was trigger for Putin to start war. So no, I'd better see whole camaraderie before court and sentenced. Regardless of 'heroism'.
And yes, most of modern supporters of Wikileaks / Assange / Snowden / etc, chanting 'release Assange' and 'pardon Snowden' are useful idiots in hands of tyrannies like BRICS club.
Yeah as we know, intelligence agencies are very often held accountable in the US. As witnessed by all the individuals that got charged or punished for uh... nevermind.
I'm not very inclined to think this is the US govt, however, you should better acquaint yourself with the morals of some members of Congress.
I think the best reason to doubt USG involvement is the ease with which somebody discovered this issue, which is only a month or two old. I feel like NSA etc. knows not to get caught doing this so easily.
Seems to be a perfect project to hijack. Not too much happening, widely used, long history, single maintainer who no longer has time to manage the project and wants to pass it over.
Yikes indeed. This fix is being rolled out very fast, but what about the entire rest of the codebase? And scripts? I mean, years of access? I'd trust no aspect of this code until a full audit is done, at least of every patch this author contributed.
(note: not referring to fedora here, a current fix is required. But just generally. As in, everyone is rolling out this fix, but... I mean, this codebase is poison in my eyes without a solid audit)
edit: I have to be missing something, or I'm confused. The above author seems to be primary contact for xz? Have they just taken over?? Or did the bad commit come from another source, and a legit person applied it?
It's important to focus on people, not just code, when suspecting an adversary. Now, I have no idea if this is the right account, and if it has recently been compromised/sold/lost, or if it has always been under the ownership of the person who committed the backdoor. But IF this is indeed the right account, then it's important to block any further commit from it to any project, no matter how innocuous it seems, and to review thoroughly any past commit. For the most security-conscious projects, it would be a good idea to even consider reverting and re-implementing any work coming from this account if it's not fully understood.
An account that has introduced a backdoor is not the same thing as an account who committed a bug.
They appear to have moved carefully to set this up over the course of weeks by setting up the framework to perform this attack.
I would now presume this person to be a hostile actor and their contributions anywhere and everywhere must be audited. I would not wait for them to cry 'but my bother did it', because an actual malicious actor would say the same thing. The 'mob' should be pouring over everything they've touched.
My above post shows the primary domain for xz moving from tukaani.org to xz.tukaani.org. While it's hosted on github:
$ host xz.tukaani.org
host xz.tukaani.org is an alias for tukaani-project.github.io.
And originally it was not:
$ host tukaani.org
tukaani.org has address 5.44.245.25
(seemingly in Finland)
It was moved there in Jan of this year, as per the commit listed in my prior post. By this same person/account. This means that instead of Lasse Collin's more restrictive webpage, an account directly under the control of the untrusted account, is now able to edit the webpage without anyone else's involvement.
For example, to make subtle changes in where to report security issues to, and so on.
So far I don't see anything nefarious, but at the same time, isn't this the domain/page hosting bad tarballs too?
This account changed the instructions for reporting security issues in the xz github as their very last commit:
commit af071ef7702debef4f1d324616a0137a5001c14c (HEAD -> master, origin/master, origin/HEAD)
Author: Jia Tan <jiat0218@gmail.com>
Date: Tue Mar 26 01:50:02 2024 +0800
Docs: Simplify SECURITY.md.
diff --git a/.github/SECURITY.md b/.github/SECURITY.md
index e9b3458a..9ddfe8e9 100644
--- a/.github/SECURITY.md
+++ b/.github/SECURITY.md
@@ -16,13 +16,7 @@ the chance that the exploit will be used before a patch is released.
You may submit a report by emailing us at
[xz@tukaani.org](mailto:xz@tukaani.org), or through
[Security Advisories](https://github.com/tukaani-project/xz/security/advisories/new).
-While both options are available, we prefer email. In any case, please
-provide a clear description of the vulnerability including:
-
-- Affected versions of XZ Utils
-- Estimated severity (low, moderate, high, critical)
-- Steps to recreate the vulnerability
-- All relevant files (core dumps, build logs, input files, etc.)
+While both options are available, we prefer email.
This project is maintained by a team of volunteers on a reasonable-effort
basis. As such, please give us 90 days to work on a fix before
Seems innocuous, but maybe they were planning further changes.
For what it's worth, tukaani is how you spell toucan (the bird) in Finnish, and Lasse is a common Finnish name; the site being previously hosted in Finland is very plausible.
Yeah according to their website[0] it looks like majority of the past contributors were Finnish so nothing odd about the hosting provider. On the same page it says that Jia Tan became co-maintainer of xz in 2022.
Zoner is a Finnish web hosting company, which has a history of providing hosting for Finnish open source projects, and the original maintainer (and most of the original crew) is Finnish as well. Nothing weird here.
If the owner of the account is innocent and their account was compromised, it's on them to come out and say that. All signs currently point to the person being a malicious actor, so I'll proceed on that assumption.
Probably not. I did some pattern of life analysis on their email/other identifiers. It looks exactly like when I set up a burner online identity- just enough to get past platform registration, but they didn't care enough to make it look real.
For example, their email is only registered to GitHub and Twitter. They haven't even logged into their Google account for almost a year. There's also no history of it being in any data breaches (because they never use it).
It would be interesting to hear the whole arc of social engineering behind getting access to the repo. Although, as a maintainer of a large-ish OSS project myself, I know that under a lot of burden any help will be welcomed with open arms, and I've never really talked about private stuff with any of them.
I tried to understand the significance of this (parent maybe implied that they reused a completely fictitious identity generated by some test code), and I think this is benign.
That project just includes some metadata about a bunch of sample projects, and it links directly to a mirror of the xz project itself:
I assume it downloads the project, examines the git history, and the test then ensures that the correct author name and email addresses are recognized.
(that said, I haven't checked the rest of the project, so I don't know if the code from xz is then subsequently built, and or if this other project could use that in an unsafe manner)
additionally, even though the commit messages they've made are mostly plain, there may be features of their commit messages that could provide leads, such as his using what looks like a very obscure racist joke of referring to a gitignore file as a 'gitnigore'.
There's barely a handful of people on the whole planet making this 'joke'.
If I wanted to go rogue and insert a backdoor in a project of mine, I'd probably create a new sockpuppet account and hand over management of the project to them. The above is worringly compatible with this hypothesis.
OTOH, JiaT75 did not reuse the existing hosting provider, but rather switched the site to github.io and uploaded there old tarballs:
If JiaT75 is an old-timer in the project, wouldn't they have kept using the same hosting infra?
There are also some other grim possibilities: someone forced Lasse to hand over the project (violence or blackmailing? as farfetched as that sounds)... or maybe stole Lasse devices (and identity?) and now Lasse is incapacitated?
Or maybe it's just some other fellow scandinavian who pretended to be chinese and got Lasse's trust. In which case I wish Lasse all the best, and hope they'll be able to clear their name.
Is the same person sockpuppeting Hans Jansen? It's amusing (but unsurprising) that they are using both german-sounding and chinese-sounding identities.
That said, I don't think it's unreasonable to think that Lasse genuinely trusted JiaT75, genuinely believed that the ifunc stuff was reasonable (it probably isn't: https://news.ycombinator.com/item?id=39869538 ) and handed over the project to them.
And at the end of the day, the only thing linking JiaT75 to a nordic identity is a nordic racist joke which could well be a typo. People already checked the timezone of the commits, but I wonder if anyone has already checked the time-of-day of those commits... does it actually match the working hours that a person genuinely living (and sleeping) in China would follow? (of course, that's also easy to manipulate, but maybe they could've slip up)
Anyhow, I guess that security folks at Microsoft and Google (because of JiaT75 email account) are probably going to cooperate with authorities on trying to pin down the identity of JiaT75 (which might not be very useful, depending on where they live).
I'd say at this point all major tech companies, ISPs and authorities should have more enough information and disabling and freezing their accounts would be the first step.
This can happen if you delete your old gmail account. Source: I deleted a gmail account I shouldn't have years ago. It will say taken if it previously existed, and was deleted.
That seems to be fine. safe_fprintf() takes care of non-printable characters. It's used for archive_entry_pathname, which can contain them, while "unsafe" fprintf is used to print out archive_error_string, which is a library-provided error string, and strerror(errno) from libc.
We know there's long-cons in action here, though. This PR needn't be the exploit. It needn't be anywhere _temporally_ close to the exploit. It could just be laying groundwork for later pull requests by potentially different accounts.
Exactly. If we assume the backdoor via liblzma as a template, this could be a ploy to hook/detour both fprintf and strerror in a similar way. Get it to diffuse into systems that rely on libarchive in their package managers.
When the trap is in place deploy a crafted package file that appears invalid on the surface level triggers this trap. In that moment fetch the payload from the (already opened) archive file descriptor, execute it, but also patch the internal state of libarchive so that it will process the rest of the archive file as if nothing happened, and the desired outcome also appearing in the system.
Assuming there isn't another commit somewhere modifying a library-provided error string or anything returned by libc. There is all kinds of mischief to be had there, which may or may not have already happened, e.g. now you do some i18n and introduce Unicode shenanigans.
No. There's no good reason HTTP response decoding would ever be implemented in terms of libarchive; using libz directly is simpler and supports some use cases (like streaming reads) which libarchive doesn't.
Unlike the GNU tar I'm used to, it's actually a "full fat" command line archiving tool, compressing & decompressing zip, xz, bz2 on the command-line - really handy :-O
EDIT: Ahh, I was wrong and missed the addition of "strerror"
The PR is pretty devious.
JiaT75 claims is "Added the error text when printing out warning and errors in bsdtar when untaring. Previously, there were cryptic error messages" and cites this as fixing a previous issue.
The PR literally removes a new line between 2 arguments on the first `safe_fprintf()` call, and converts the `safe_fprintf()` to unsafe direct calls to `fprintf()`. In all cases, the arguments to these functions are exactly the same! So it doesn't actually make the error messages any different, it doesn't actually solve the issue it references. And the maintainer accepted it with no comments!
It does remove the safe prefixes... But it also adds one print statement to "strerror()", which could plausibly give better explanations for the error code...
The only suspicious thing here is the lack for safe_ prefix (and the potential for the strerror() function to already be backdoored elsewhere in another commit)
I don't know whether this is a formerly-legitimate open source contributor who went rogue, or a deep-cover persona spreading innocuous-looking documentation changes around to other projects as a smokescreen.
Minor documentation change PRs is a well known tactic used to make your GitHub profile look better (especially to potential employers).
He could be doing the same thing for other reasons; nobody really digs into anything very deep so I could see someone handing over co-maintenance to a project based on a decent looking Github graph and some reasonability.
Consider the possibility those type of submissions were part of the adversary's strategy in order to make their account appear more legitimate rather than appearing out of nowhere wanting to become the maintainer of some project.
There is also a variety of new, parallelized implementations of compression algorithms which would be good to have a close look at. Bugs causing undefined behaviour in parallel code are notoriously hard to see, and the parallel versions (which are actually much faster) could be take the place of well-established programs which have earned a lot of trust.
> Versions 5.2.12, 5.4.3 and later have been signed with Jia Tan's OpenPGP key . The older releases have been signed with Lasse Collin's OpenPGP key .
It must be assume that before acquiring that privilege, they also contributed code to project. Probably most was to establish respectable record. Still could be malicious code going back someways.
I get why people are focusing on this bad actor. But the question that interests me more: how many other apparent individuals fit the profile that this person presented before caught?
Are you referencing the '-unsafe' suffix in the second link? That is not something to worry about.
This is from Gnulib, which is used by Gettext and other GNU projects. Using 'setlocale (0, NULL)' is not thread-safe on all platforms. Gnulib has modules to work around this, but not all projects want the extra locking. Hence the name '-unsafe'. :)
I think it's not a coincidence: Hans Jansen (hansjansen162@outlook.com) has a matching account on Proton mail too (hansjansen162@proton.me).
Furthermore, the Outlook account is configured as recovery e-mail for the Proton account.
Keep in mind that having a "false identity" does not make you a malicious actor. I have a serious project I work on under another pseudonym, but it has to do more with the fact that I do not want my real name to be associated with that project AND having a serious case of impostor syndrome. :/
That, and I used to contribute to various games (forks of ioquake3) when I was a teen and I wanted to keep my real name private.
I am more interest about his git commits https://github.com/JiaT75?tab=overview&from=2021-12-01&to=20...
If JiaT75 is a Chinese, then his working log should follow Chinese Holiday, especially Spring Festival and National Holiday.
Chinese usually not work on first 3 days of Spring Festival and National Holiday -
2021 2/11 - 2/13 (few commits), 2021 10/1 - 10/3 (nothing)
2022 1/31 - 2/2 (huge commits on 1/31, suspect), 2022 10/1 - 10/3 (nothing)
2023 and 2024, not very much commits.
So 2022 1/31 huge commits is a proof that he is not follow Chinese holiday.
But wait, 2021 is his active year, but he missed almost all Aug. Is he on holiday? Who can have such a long holiday? What i can think is a solider who has a long vacation (探亲假).
So let's guess he is a solider then it's sense that he worked on Spring Holiday because they need on duty.
Let's double check again, if he is a solider, then they will have a holiday on every Aug. 1 because it's liberation army day. I check and no commits on all 4 years Aug. 1.
I mention Chinese social media specifically because I know it's not indexed so well by western search engines. You can't conclude someone has no social footprint until you've actually checked.
Regardless of how likely you think it is, finding a social media footprint would be useful information. Seek information first, reach conclusions second.
He understood the software architecture quite early on while working on the following repository. He connected the dots from his other projects and went rogue. (probably to benefit from crypto?). Take a look at his other repositories and code style and recent likes on github. Is he our Jia Tan?
An Indian with the name, Jigar (meaning heart) would never address himself as Jigar, as seen in the citation. This would be culturally a bit weird. Unless he is being sarcastic or writing this on some comic note.
Secondly, the use of English is not consistent in what should be from typical Indian. He should be from a foreign background or a very reputed English medium.
The language though seemingly simple for a native English speaker but it seems in this case; a person whose first language: likely is not English.
It is possible that Grammarly or auto correct could have been used to write these. But can't be certain of anything stated above.
I do think that this is a sabotage account with 60% chances unless Mr. Kumar comes out clean, publicly. He is likely a state sponsored actor.
Not a developer but reading the changelogs and commit history from this person seem interesting, as they appear to be some effort consolidate control and push things in the direction of supporting wider dissemination of their backdoor code:
Discussing commits that the other author has since reverted, IFUNC change with Project Zero tests, a focus on embedded, etc.:
"crazytan" is the LinkedIn profile of a security software engineer named Jia Tan in Sunnyvale working at Snowflake, who attended Shanghai Jiao Tong University from 2011 to 2015 and Georgia Institute of Technology from 2015 to 2017. However, this Jia Tan on LinkedIn might not be the same Jia Tan who worked on XZ Utils. Also, the person who inserted the malicious code might be someone else who hijacked the account of the Jia Tan who worked on XZ Utils.
My assumption would be that he knows the jig is up, and is probably going to do everything he can to jettison the JiaTan account, lest any IPs he uses be turned over to authorities.
Tukaani website states "jiatan" as the nickname of the malicious code committer on Libera Chat.
WHOWAS jiatan provided me the following information:
jiatan ~jiatan 185.128.24.163 * :Jia Tan
jiatan 185.128.24.163 :actually using host
jiatan jiatan :was logged in as
jiatan tungsten.libera.chat :Fri Mar 14:47:40 2024
WHOIS yields nothing, the user is not present on the network at the moment.
Given that 185.128.24.163 is covered with a range-block on the English Wikipedia, it appears this is a proxy.
egos of people who just like to say cool words they don't understand
lol
this comment will probably get deleted, but let the action of this comment being deleted stand that in 2024 we're all allowed to use big words with no definition of what they mean -> bad
state actor? who? what motive? what country? all comments involving "state actor" are very broad and strange... i would like people to stop using words that have no meaning, as it really takes away from the overall conversation of what is going on.
i mean you're seriously going to say "state actor playing the long game" to what end? the issue was resolved in 2 hours... this is stupid
For starters, the backdoor was technically really sophisticated.
For example, the malicious code circumvents a hardening technique (RELRO) in a clever way, which would otherwise have blocked it from manipulating the sshd code in the same process space at runtime. This is not something that script kiddies usually cook up in an afternoon to make a quick buck. You need experts and a lot of time to pull off feats like that.
This points to an organization with excellent funding. I’m not surprised at all that people are attributing this to some unknown nation-level group.
This feels like the exact opposite of the takeaway you should have. Old software isn't inherently more secure; you're missing thousands of security and bug fixes. Yes, this was bad, but look how quickly the community came together to catch it and fix it.
Still better than TwoMinuteToiletPapers and other AI-bamboozled channels hyping over proprietary OpenAI crap (text/photo/video), what a time to be alive!
I guess that rewriting liblzma in Rust would not have prevented this backdoor. But would have likely increased the confidence in its safety.
Using the build system (and potentially the compiler) to insert malicious backdoors is far from a new idea, and I don't see why this example would the only case.
It would have made it worse, because there would be 300 crates with 250 different maintainers, all pulled in by several trivial/baseline dependencies. More dependencies = higher the probability that a malicious maintainer has gotten maintainer's rights for one of them, especially because many original authors/maintainers of rust style microdepencency crates move on with their lives and eventually seek to exit their maintainer role. At least for classic C/C++ software, by the virtue of it being very inconvenient to casually pull 300 dependencies for something trivial, there are fewer dependencies, i.e. separate projects/repos, and these tend to be more self-contained. There are also "unserious" distributions like Fedora and something like stable/testing/unstable pipeline in Debian, which help with catching the most egregious attempts. Crates.io and npm are unserious by their very design, which is focused on maximizing growth by eliminating as many "hindrances" as possible.
Rust specifically chose a minimal standard library to not get stuck with the Python "dead batteries" problem. There's a strong culture as well of minimizing a project's dependencies in Rust.
Don’t know all the details and rust isn’t immune to a build attack, but stuff like that tends to stand out a lot more I think in a build.rs than it would in some m4 automake soup.
The backdoor hinged on hiding things in large shell scripts, obscure C "optimizations", and sanitizer disabling. I'd expect all of those would be a much bigger red flag in the Rust world.
This hack exploited a fairly unique quirk in the linux C ecosystem / culture. That packages are built from "tarballs" that are not exact copies of the git HEAD as they also contain generated scripts with arbitrary code.
It would not have happened in any modern language. It probably wouldn't have even happened in a Vistual Studio C-project for windows either.
`pip install` does do exactly the same thing: it downloads and executes code from a tarball uploaded to PyPi by its maintainer. There's no verification process that ensures that tarball matches what's in the git repository.
"Lasse Collin," as other posters here have found, does not seem to exist as an experienced coder. Oddly, there is a Swedish jazz musician named Lasse Collin, which would otherwise be one of those names, especially the last name, that would stick out. Instead it is buried under a lot of mentions of a musician.
Lasse Collin the contributor is findable, especially if you add "tukaani" to the search. But not in any other context, unless that's what old jazz musicians do in their retirement.
I don't think that's what they meant. The idea is to find information about their personal life, not OSS contributions. Something that proves they're a real person.
He has been part of the xz project for 2 years, adding all sorts of binary test files, and to be honest with this level of sophistication I would be suspicious of even older versions of xz until proven otherwise.