Hacker News new | past | comments | ask | show | jobs | submit login
Reflections on Distrusting xz (joeyh.name)
286 points by edward 6 months ago | hide | past | favorite | 325 comments



One thing that comes to mind is that “Jia Tan” might be more accurately seen as a “sleeper” of some sort: a foot soldier who infiltrates a juicy open source project and waits for further instructions; backdooring sshd might not have been part of the original plan.

Which raises the concerning question of how much more sleeper maintainers there are.


>Which raises the concerning question of how much more sleeper maintainers there are.

Given how easy the infiltration is and how extremely hard to detect it is, likely a lot.

For an intelligence operation it is also extremely cheap, you just need a few knowledgeable developers spending some time each week on the project. The upside being a backdoor into a significant portion of infrastructure, the downside being wasted time.

I do not think it is unlikely that in many important open source software projects there are one or two people assign to keep an eye on things. They don't even need to be malicious, just being somewhat trusted contributors is enough. I would be extremely surprised if the NSA hasn't a couple of guys who keep watch on the Linux Kernel.


The irony is that this would make a oddly effective way of having paid open source devs who for the most part just honestly improve projects. With the massive downside they undermine it at a critical moment.


Ye then maybe thanks to them finally we'll have The year of the air gapped Linux desktop.

I dunno what to do if e.g. Debian gets compromized, as in, I can't trust the collective of maintainers.

I assume any Windows machine is backdoored. Trivially proven by forced auto updates.

Maybe air gapping some home computer for sensitive data might be a good idea.


The ideal situation is having multiple intelligence agencies all working on one project and spotting each others' backdoors, so at the end of the day we just have a really secure and well-maintained project.


> Given how easy the infiltration is and how extremely hard to detect it is, likely a lot.

I read it exactly the other way around: the infiltration took years and detecting it was the default with a fuzzer, which had to be disabled for the exploit to succeed.

It speaks to the hardening and that hardening should be required more. And of course the precarious roles of maintainers, which have been discussed elsewhere.


The exploit could be detected with a fuzzer perhaps. The infiltration seems like it was something that would be very easy for a funded intelligence agency to do, and would be nigh undetectable if they had been subtly introducing bugs rather than shipping a sophisticated backdoor to every Linux distro.

You have to assume if this was an intelligence agency they didn’t burn their only agent like this.


> they didn’t burn their only agent like this.

I think I'd characterize that as "their only identity." What's the chance that some number (>1) of actual agents were sharing this identity? If that's the case, I'd extrapolate that to multiple identities. In fact the social engineering to gain the trust of the original maintainer likely involved several identities.

I suspect that detectability of intentionally injected bugs would be very low.


You are right, it took quite some time. On the other hand, it looks like the legitimate part of contributing to xz was only a part time job for the attacker. The rest of the time, they either worked on the exploits, or in other things, like infiltrating other projects using a different handle.

Basically I can imagine the attackers being a well organized group, using work sharing and pipelining. Some members of the group would be preparing exploits, some would infiltrate projects and some would make sure not to get caught. And since infiltrating takes time, they would make sure to have multiple projects in the pipeline, seine in the early contributor stage, some in the social pressure stage, and some in the exploiting stage.


>a fuzzer, which had to be disabled for the exploit to succeed.

According to this comment, the fuzzer wouldn't have detected it. It wasn't necessary to disable the fuzzer:

>https://news.ycombinator.com/item?id=39911249


At this point I wouldn’t be surprised if the NSA also has a couple of Microsoft employees on their payroll.


> Given how easy the infiltration is and how extremely hard to detect it is, likely a lot.

This particular case seems to be an example of the exact opposite. It took "Jia Tan" two years to conduct all of the social engineering necessary to get into a position to introduce the backdoor, then upon doing so, was caught almost immediately, with the initial discovery not even coming from a dedicated security researcher, but from a sysadmin who kept digging when he saw unusual performance issues.

And the threat actor here deliberately went after the weakest link in the chain. Assuming that the goal was to compromise sshd, the antagonist evidently found it too much of a challenge to attempt to infiltrate OpenSSH itself, so targeted a small-team compression project that only some deployments even link to, and still got caught extremely rapidly.


I don't think you are looking at it the right way. Two years matter if you are a hobbyist or your goal is to compromise the system for some individual gain.

For a nation state actor two years is nothing. Likely the entire attack didn't cost more than a couple of thousand hours of developer time. I would guess it was easily cheaper than 100k in financial terms, that is extremely efficient for an intelligence operation with the upside being access to a large amount of servers.

The method by which he got caught also depended largely upon random chance, some major "if's" needed to happen, the performance reduction was an unfortunate side effect from the perspective of the attacker, really a minor mistake exposed him. Even if he were exposed a couple months later the damage would have been enormous, if that version had become a stable part of any major distro for the next few years hundreds of thousands of machines would have been vulnerable.

There is absolutely no reason to assume that if another attack of this quality happens it will not find it's way into some stable distro.


> For a nation state actor two years is nothing.

Perhaps it is, perhaps not. If they are playing a long game, two years might be a straightforward investment. But the point is that it still took two years either way -- the fact that something is only feasible for big institutional actors implies that it is not in itself "easy".

And the kinds of organizations that have the motivation and resources to engage in this kind of long-game infiltration by definition have the resources to influence, infiltrate, and manipulate in a variety of ways. It is absolutely not clear here whether having formal organizations involved in directing the development of xz would have made it easier or harder for whomever was behind this attack.

> The method by which he got caught also depended largely upon random chance,

Why do you presume it's random chance, and that it wouldn't be better represented by more complex probability models? On the surface, it seems like the chance of overall detection would be reasonably represented as an equivalent of an MTBF calculation, with the number of skilled users capable of detecting the vulnerability through normal use as an input. (Not to say that we have enough data to calculate this probability, or that this necessarily even happens enough to have a statistically significant sample -- it might all be "black swans" -- but there's certainly non-random stochastic inputs into this.)

> There is absolutely no reason to assume that if another attack of this quality happens it will not find it's way into some stable distro.

There's no guarantee for any of this. All rules can be gamed, all incentives can be manipulated, all organizations can be compromised. There is no perfect solution to security, and no guarantee that any prescriptive solutions based on generalizing from the specifics of one incident would not create more opportunities for attackers rather than fewer.


I don't disagree, I am not even sure what exactly you are arguing against.

I completely agree with your comment about the detection being the result of iterative low probability chances of detection. My point was that, even if that is the case, there was a major chance that it would have gone into many stable distos or would have been exploited.


> I don't disagree, I am not even sure what exactly you are arguing against.

I am arguing that this incident demonstrated the resilience of the FOSS model to even extremely strategic, long-term compromise attempts. Some antagonist invests years worth of time, effort, and money to introduce this backdoor, and the openness of the entire process allowed a single engineer to unravel the whole thing almost immediately after the backdoor was introduced, and every distro immediately sprung into action and neutralized it entirely.

All of the people complaining about the vulnerability of small projects, the pseudonimity of contributors, the need for more institutional involvement, etc. are getting things exactly backwards.

Imagine if this were a closed-source project funded and managed by an opaque institution, and the attacker used a different set of social engineering tactics to backdoor the code from within the org. Suppose then that Andreas Freund noticed the exact same behavior and set out to investigate it. How far would he have gotten?

This incident validates the "many eyes" concept and is a point in favor of the FOSS model.


fwiw, characterizing Andres as a sysadmin isn't really the whole picture; he's a postgres developer that conducts benchmarking operations with some frequency (and he's quite good at what he does)... he's perhaps naturally a bit more sensitive to things like the cumulative effect of 500ms or so over a number of sshd invocations.


You're right -- I went back and changed "sysadmin" to "engineer". Either way, though, he was not a dedicated security researcher, and managed to unravel this entire thing upon noticing an anomaly in the course of his regular work.


I believe LLMs could be useful here as a component of pre-commit hook.


I once got a (probably scam) offer for adding a cryptominer to a library that I maintained at that time. And a more serious offer to add trackers to a popular >1M installs app.

Both cases I obviously ignored it. But it made me aware of a nasty attack vector: someone who's thanklessly building a wordpress-plugin, pip, npm, or whatever software, thanklessly dealing with issues, PRs, support, maintainence, often for no pay, suddenly gets offered three figure sums to add a few lines of "affiliate stuff" or such. There are many places in the world, or people in situations where this amount of money really makes a compelling case.


“Given enough underfunded maintainers, all security is shallow.”[0]

0. https://en.wikipedia.org/wiki/Linus%27s_law


I wouldn't say "funding" is necessarily the problem.

Most maintainers do it because they like doing it. Their main limiting factor is time. I can drop a million dollars an hour into a maintainer's lap; that doesn't mean they can dedicate every waking moment to a project. They still have human needs that money can't buy like sleep, family obligations, and health concerns. And that's making the assumption that the maintainer uses that million/hr to quit their job.

No, the problem is a lack of trustworthy candidates for maintainership and a lack of time. There are components of a GNU userland that are now too complex for a single human to both maintain and enhance at the same time. We now need to target multiple distros (really, more than are necessary, strictly speaking) and ISAs. Most are written in systems programming languages like C that are more complex than the average software engineer in 2024 works with.

We need consolidation, simplification, maintainer redundancy, and a trust/governance framework for packages.


We need to utilise a specialised AI to scan through the code looking for bugs and security holes. Imagine if openai donated server time to this.


I believe the problem of thankless maintenance is best solved with two things: the thanks (yes, we are all human and want recognition and appreciation from fellow humans)[0], and a stable employment (work for a good large business while open-sourcing what’s possible)[1].

If you do OSS for profit, then it can become a question of where is more money; but if you work a reliable job with insurance, relationships and other implications then the stakes may be a bit different.

Many of the biggest OSS projects today were started by people who had no money in mind whatsoever. Some had other jobs, others were students, etc. If we feel relatively secure, we are driven by our innate desire to tinker, create cool things and show it off.

[0] Undermined by LLMs that are used to gobble up your code and suggest it to others commercially and without attribution.

[1] Undermined by low employment protections (if you can expect to be fired at any time, you would be less loyal), and by LLMs (whatever you open-source now more directly benefits Microsoft or whatever).


> and a stable employment (work for a good large business while open-sourcing what’s possible)

Even if it was hypothetically possible to open-source basically everything that the team in which I work produces:

The software that I work on is very specialized software that is used by the company's employees and customers for specialized purposes. Imagine some nice LoB application that is actually somewhat comfortable to use. It basically does "what the users need" and is thus deeply ingrained in some parts of the company's workflows. The only use someone outside the industry might have for it is "cosplaying being employed in this industry".

A lot of software that is developed (in particular in companies that don't sell or rent software) is of this kind.

Thus: the open-source scene does in my opinion not have any use for a huge amount of software that is actually developed and actively used.


> The software that I work on is very specialized software that is used by the company's employees and customers for specialized purposes

Oh really? Welcome to the club. Our very specialized software for very specialized purposes used Django with a certain auth provider. So I refactored that into a standalone Django app that painlessly handles this specific OAuth provider, configurable via settings with sane defaults, and open-sourced it. (The refactoring was very beneficial to myself, that part of the project got instantly nicer to work with.)

Of course, it is small beans compared to something algorithmically hardcore (I was a junior myself back then), but it’s just an example.

Any software, no matter how specialized and bespoke, can be expressed as many self-contained isolated components that individually know nothing about that specialization and bespokeness. In fact, such factoring is generally a sign of good design: you may have heard of the loose coupling & high cohesion principle—once you follow it, open-sourcing a particular component is very straightforward.

Note, though, that if your contract has certain licensing provisions in certain countries you may not be allowed to unilaterally open-source anything during the full term of employment (even if it is unrelated to your dayjob). You may need to get approval first. However, many good tech companies are reasonable when it comes to open-sourcing non-core components.


3. More maintainers

Days of 100+ notifications aren't easy. Things will slip through


Agreed, but especially in light of recent events it’d be important to know who they are, and that’s not always easy.


This is what worries me more.

It's easy to point a finger at a specific Bad Guy® and shout "He did it!" It's much harder to face the reality that any maintainer of any open-source project can slowly burn out to a point where they become accomplices in an attack, or at least turn a blind eye.

The pool of open-source developers does not split cleanly into honest contributors and evil agents. The boundary is quite fluid -- more so in some circles than in others -- and there are always temptations to move from one side to the other and back again.


> suddenly gets offered three figure sums to add a few lines of "affiliate stuff" or such

Back of the envelope calculation: you're looking at 2 orders of magnitude more money from "affiliate stuff" than you would be from generous user donations


Well, yes. But it's also something you can do once. When (not if) it comes out, all credibility is lost.

Whereas donations, regardless of how puny, are recurring and potentially forever.


That's definitely true. And a lot of times it could be a random open source project that is under the radar and rarely thought about. E.g. The Great Suspender Chrome extension which was sold to an unknown buyer which later turned it to malware: https://www.bleepingcomputer.com/news/security/the-great-sus...


It's why I actually always encourage app devs to charge for their apps, even open source ones. It creates an exchange of value for the author to feel valued and detract from these vectors.


> There are many places in the world, or people in situations where this amount of money really makes a compelling case.

It’s especially easy to imagine that using the classic intelligence agency playbook: monitor high-impact maintainers and look for leverage before making the approach (“hey, saw your post about the divorce settlement and that $%#@ cleaning you out. My affiliate marketing pays in bitcoin…”) just as they’ve done for ages.


I believe this is a nation state actor and there are a a fleet of 'Jia Tans' working on other OSS projects to backdoor operating systems.

And some have probably succeeded.


I wouldn't limit that to OSS projects. How many of them managed to get hired and are working for Microsoft, Apple, Google, Oracle or Amazon?

In some cases they don't even need to introduce backdoors themselves but just review and spot bugs they don't correct or raise issues for but communicate to mothership. They could even work in team with having one building the backdoor and the other approving the code.

Most companies have more thorough processes to avoid this but that doesn't mean those processes are applied correctly everytime, especially if more than one malicious engineer is involved.


I'm now imagining a department where every single worker is a spy for different government and they play endless game of "add exploit, close off the other people's exploits". And all of them think they are the 10x developer because all the other people do in their view is pushing shoddy code


I worked on a project just like this once.. a mobile phone network build in the middle east before the arab spring. 10/10 would not repeat the experience.


Most of the time they don't need infiltrators. Governments can just pressure companies with export controls or warrantless surveillance to get backdoors into commercial systems. OSS projects require different methods because the more direct method would be discarded by the community and forked.


> Most of the time they don't need infiltrators. Governments can just pressure companies with export controls or warrantless surveillance to get backdoors into commercial systems.

Or they simply pay companies with a “support contract” in return for embedding spyware that sells out customers. Seen that first hand (private key exfiltration), resigned the same day.

Lots of comments saying we need to do something about the OSS supply chain but in my estimation the problem is much worse with closed source commercial software.


> I wouldn't limit that to OSS projects. How many of them managed to get hired and are working for Microsoft, Apple, Google, Oracle or Amazon?

They don't have to sneak into those companies tho, they just hand over something like a national security letter and do whatever they want while making it clear to the heads of the company that anyone who talks or pushes back will rot in gitmo. Why wouldn't there be at least an equivalent to Room 641A (https://en.wikipedia.org/wiki/Room_641A) in every major US corporation that deals with massive amounts of people's sensitive data and communication?


> Most companies have more thorough processes to avoid this but that doesn't mean those processes are applied correctly everytime, especially if more than one malicious engineer is involved.

I’d also bet that you could exploit the tiers at many companies: how many places have more robust review for the staff engineers but then assume that some lowly “ops monkey” will take care of the build environment, etc.? I’d hope that wouldn’t work at Google, Microsoft, etc. but have heard enough stories about disparities between which jobs are contracted out and which have the coveted FAANG benefits that I wouldn’t exactly be shocked if it turned out otherwise.


Deleted.


The thing is they only need one member sometimes, to observe what is in use.

Example scenario: "malicious engineer in say Microsoft, finds out that office365 is using xz internally and the library is pulled directly without code review. Same engineer or another member of same group would be that Jia Tan doing the necessary backdooring in xz to target office365. And bam all worlwide Office365 accounts would be backdoored."

I am not saying Office365 is using xz, I have no idea really, but this would be a possible scenario. I know MsTeams is using ffmpeg for example.

So I think having this discussion while only scoping linux distributions is a big mistake. xz project was particularly interesting as a target as it is distributed under BSD zero-close license, which is pretty much a public domain license. You don't have the attribution part of the BSD license so there are probably myriads of proprietary software using it too without them acknowledging it.


I believe the term "nation state actor" is a term that means "country" but with the bonus of connoting that the writer is an armchair infosec wizard. This speculation is not valuable without adding information, otherwise it's just McCarthyist bluster.


Nation state actor is a standard info sec term. Using it does not imply any kind of wizardry.

Edit: most threat actors do not have the patience or the motive to behave in this way. It is reasonable to suppose that this is a nation state actor.


There are organised crime networks which buy 0-days to run ransomware, and actively target companies to do it.

Why couldn't this be an attempt at finding or selling exploit access on the black market?

The problem here is no one is looking properly at the scope. It's more then trivial so everyone is leaping for "nation-state" as though that's the only threat actor with motivation and patience.


Granted, other threat actors remain a possibility, there is no proof.

It looks like the sort of thing nation states would do and develop. It could be some other group hoping to make money as you say.

Whoever they are, they seem to have good opsec, over multiple years.


It's also a term that implies the competent, official hacking departments or espionage agencies of that country, rather than government-supported amateurs (e.g. an untrained policeman) or generic people from that country.


Whales is a country but it is not a nation-state and probably does not have its own APT.


I'll bite. Wales is a country only in a "traditional" sense, since it does not currently hold the type of sovereignty we require of "countries" in the usual sense. As Voltaire said, "This body which called itself and which still calls itself the Holy Roman Empire was in no way holy, nor Roman, nor an empire.".

Wales, when independent, could reasonably be described as a nation state, being roughly associated with the Welsh people and their culture, language, history etc. But most countries are not nation states! The USA, Russia, and China for instance are explicitly plurinational.

If you think "nation" and "state" are synonyms (along with "country") then it's redundant to use both. If you think that "state" alone might lead someone to think of Alabama or Minas Gerais then say "nation". If you think "nation" will make people think of Cherokee, then say "country" or "sovereign state".

"Nation state" has a specific and well-established meaning, misusing it is like misusing any other jargon and just comes across as a failed attempt to seem part of the ingroup and hence authoritative.

(yes, apparently I will die on this hill)


Are there any actual nation states? Even small countries, like Greece, contain other nationalities.


Greece is certainly a nation state, being primarily occupied by "Greeks" (referring the the nebulous concept of a national identity). It doesn't matter if some French or Turks live there.

A country that strictly limited residency by ethnic identity might be called an "ethnostate" and indeed it's hard to find a pure example of one of those.


At this point, considering the apparent ease with which a project that is used pretty much everywhere was taken over, that seems like a reasonable position.


I can walk out on the street and stab someone to death if I wanted to. This is surprisingly easy.

Just because something is relatively easy to pull off doesn't mean it happens a lot.

It's also not that easy to pull off because you need to have a project with relatively few eyes and a place to hide it. In this case: binary tests. But most projects don't have those.

There is no evidence for any of this, including that it's a nation-state actor. There's also a case to be made that it's NOT a nation-state actor as nation states use Linux and want a secure Linux. The NSA and such have somewhat conflicting interests here. We just don't know. It's likely we will never know.

All of this is starting to resemble the spy paranoia of the first world war. A few spies got caught and suddenly everyone was now a suspected German spy (including a general, if I recall correctly, who was detained for a while because he couldn't answer a question about baseball or some such).

I suspect that very soon people will start demanding maintainers put some of their blood in a Petri dish to be tested with a hot needle. Just in case.


> There's also a case to be made that it's NOT a nation-state actor as nation states use Linux and want a secure Linux. The NSA and such have somewhat conflicting interests here. We just don't know.

I agree that we do not know that it’s a nation-state but this point seems to work in the opposite direction: this attack was very carefully constructed so only someone with a particular key pair could exploit it. That’s reminiscent of what the NSA did with the Dual EC constants, and they were confident enough about that to push it into the FIPS requirements for federal IT.


Motive, opportunity, means - and consequences: it is primarily the absence of a motive, and secondarily the likelihood of consequences, that keeps the prevalence of street stabbings way lower that if means and opportunity were the only factors.

The argument against nation-states being involved has some problems: a state can avoid becoming victim to its own work, while its own restraint would not prevent developments elsewhere.


You're commenting under a link where commits to the xz-decoder are discussed. Some level of paranoia is warranted.

The binary files look like a sideshow in comparison. Maybe we're lucky the attacker was tempted to hide something in there.


> I believe this is a nation state actor

It is certainly possible, but we don't really have a good indication for that. This whole thing would be definitely doable by a single individual.


doable yes, but what seems to me like a strong indication is the duration, multiple years, and the effort to set up a quasi patch infrastructure for the backdoor which I can't remember ever having seen in some amateur or ransomware hack.


My assumption is that this was state sponsored mass surveillance campaign of some kind but God knows what exactly they were looking for.

I think if backdoor was discovered 2 or 3 months later, we maybe could understand better what they wanted to do. My speculation is that they wanted to build a massive botnet and then snoop on machines' processes and traffic looking for something. It's hard to speculate because luckily they were captured soon enough.


I find it intriguing that out of all the speculative comment threads I've read so far, none of them have suggested it was Microsoft attempting to make FOSS look bad/vulnerable.


How would that benefit Microsoft, who owns GitHub, the home of OSS? It's not a secret that oss is vulnerable, the opportunity for MS is to sell the solution to a captive audience.


Microsoft making the decision to own GitHub in the first place also speaks to my suspicion. Embrace, Extend, Extinguish.


I've never been concerned about spies infiltrating open source projects compared to legitimate maintainers being hacked, even now after this whole xz incident.

I'll put it this way. Let's say a bad guy had a decent budget to spend on paying agents/criminals to break into maintainer's homes on their behalf with a rubber ducky, etc. I'd expect a pretty high success rate compromising their hardware...


You’re ignoring scale.

A single Jia Tian can be infiltrating 10s or more OSS projects each week without needing to travel around the world physically stealing hardware from various maintainers who they then need to impersonate.

They can just impersonate some anons with no real lives or connections and just get the keys to OSS projects given time.


> A single Jia Tian can be infiltrating 10s or more OSS projects each week

Single? 10s or more per week?! I can't help but think you are underestimating the cost of developer time. How many hours of work did it take JT to infiltrate to the point of finally implementing a backdoor? How much does that time cost?

> just get the keys to OSS projects given time.

This is not what JT did though, and for good reason. Trust of anons in open source is generally built through contributions of real developer work over time. That does not scale.

> without needing to travel around the world physically stealing hardware from various maintainers

I wasn't suggesting stealing hardware to impersonate someone. I'm talking about hiring petty criminals or using field agents to break into a house, using physical hardware access to install a backdoor, etc. into the legit maintainers hardware. The field guy's goal is to not get caught, so the maintainer is unaware they are compromised.

I suppose the limitation with both approaches (maintainer plant vs compromising maintainers) is cost. My educated guess is that the cost of hiring skilled developers from a very limited pool for multiple years is more than it would cost to hire criminals that are already breaking into houses for low risk jobs where they don't even need to steal anything.


When you find one cockroach, you can be sure there are thousands more you haven’t found.


We could all be Jia Tan.

Someone could be bought, killed and replaced, or simply shadowed when they die or go to jail. Anonymity makes this even easier.


> killed [...] die or go to jail

All of my commits are signed with a PGP key that is on hardware security tokens and password-protected. In the event of my death, my digital identity could not be stolen without backdoors in my hardware security tokens.

That being said, $5 wrenches and large sums of money are still possible attack vectors.


Also don’t forget that not everyone expects perfection and a canny attacker can exploit that. It’s really easy to focus on how you’d avoid trojans, keyloggers, etc. but I’d also ask how likely it is that if someone sent a message from your email address claiming you’d lost your token in a minor accident, etc. that they’d believe it - or simply accept it if commits started showing up with a new key (maybe with an upgraded crypto system) since 99% of Git users never check those.


A cool tax-free no questions 500k can convince a lot of people


One thing I’ve learned, not from direct experience but from observation. These things are way cheaper than the more ethical and optimistic of us in society think. Your point is totally valid but the number is probably more like $5k-10k.


Tax free $500k? I don't want the IRS to come after me. Please mark all your bribes as regular income thanks


Everyone working on important open source code should have a real identity associated with them. The fact that "Jia Tan" was able to become a maintainer without anyone ever trying to figure out their real identity shows a huge weakness in our trust model in OSS (everyone real would have something like a Linked In page, Facebook, Twitter, Instagram, or better, their own website with stuff that could be used to ensure they're a real person - that could be faked as well but the amount of effort would be high, and checking this would be much better than just allowing effectively anonymous users to be maintainers - there's just no need for anonymity in this scenario!).


> everyone real would have something like a Linked In page, Facebook, Twitter, Instagram, or better, their own website with stuff that could be used to ensure they're a real person

Oof, I guess I’m not real then, as I have none of those things.


On top of what you mentioned I also dislike the TSA-like response the OSS community is taking with this happen stance.

I have anonomously contributed to many projects because I enjoy my privacy. All of my founding projects have also been done with anonymity.

Because someone wants their anonmity and privacy does not mean they're nefarious, and I find it funny the group that takes to these principals most is negging on those ideas.


Personally, I do find it hard to trust an open source project maintained by an anonymous person. (I'm talking about maintainership, not regular contributions that need to be code reviewed by another maintainer) I may toy around with them but I will probably not use them in a manner where I need to trust them continuously.

It's totally cool for you to do whatever you want, since it's a free world after all, but if you want other people to use your code, then it's a two-way street no? Your code has a direct effect on their computers, and so they are placing their trust on you. You may value your privacy, but you need to balance that with other people's valuing their own security, and it's likely that whatever project you maintain may have an alternative as well.

If you just want to commit some code and not have people use them then that's another issue altogether.

I guess what I'm saying is: it's a two-way street. You can do things anonymously, but big companies / projects also don't have an obligation to use your code.


> You can do things anonymously, but big companies / projects also don't have an obligation to use your code.

You're not wrong here, but I'm not forcing anyone to use my code bases or contributions.

Also, think about how many systems you blindly trust on a daily basis.

When you drive over a bridge, did you research the maitenance procedures and compliance was up to date?

When you got a house or apartment, did you look into the engineering sign offs and construction companies? And that maitenance has been done up to snuff? Even down to hoping the inspector knows what they're doing?

When you step into an elevator do you check the recent inspection plaque?

When you get on an airplane are you aware of its maitenance history? And to further my point by refering back to the house example, did the company even QA the plane before they shipped it?

And most importantly, did you check into whether the people actually did these things versus just saying they did them?

What kind of trust does having a persons name attached to the project actually provide? I would argue its a psuedo-facade trust basis that gives a false sense of security.

The truth of the matter is that you blindly trust millions of things on a daily basis. Including the very system you type from, which I guaruntee you has more than one anonymous maitainer attached to its underlying software.

I totally get where you're coming from, but the same problems exist in every industry, supply, politics, every facet of your life is based on many blind trust principles.

The one difference here with anonymous open source contributors is that they give you the code to read through yourself (and hope that you help ;) )

Very much unlike the proprietary software you're running beside it.


I have Facebook and the account and what's on it is no one's -ing business in a professional context.


I wouldn't post anything on Facebook (or on social media generally) that could be professionally embarrassing but I also don't generally accept invites from people who are solely professional acquaintances or use it in a purely professional context at all.


> I wouldn't post anything on Facebook (or on social media generally) that could be professionally embarrassing

The age of self censorship :)

I don't post anything on my FB. I'd still reject any employer who wanted to take a look.


Self-censorship is probably a good thing in many cases. And there are certainly things I don't care to share in writing on any public or semi-public medium.

But I agree that even if I can't keep an employer from sleuthing generally, I don't consider Facebook part of my professional record even if there's nothing on there I'd have a problem with a co-worker or potential co-worker seeing.


>> what's on it is no one's -ing business in a professional context.

> The age of self censorship :)

I find this amusing.

I would also like to know what you hoped to achieve by self censoring the word fucking in your message above.

- HN doesn't block posts with any kind of "profanity" filter

- You didn't spare anyone from the profanity, since we all knew exactly what you were saying/thinking

So I'm really curious what that actually achieved.


That one amuses me because of a character in a (iirc) fantasy book that swore all the time but used just -ing everywhere. Sadly I don’t remember what character of which book…


It's Mr Tulip, of Terry Pratchett's The Truth.


Thank you! I was pretty sure it was Pratchett (but not which book and character) but I self censored in case i was wrong :)


Also this can all be faked


No-one working on open source code on their time owes anything to anyone using the code. If you want an important open source project to be maintaned by non-anonymous person, surprise-surprise, hire that person and pay them.

Besides, some of the best open source contributors I know are almost-anonymous people behind nicknames and anime girls avatars.


At the same time no one is obligated to use your source code. I think the point here is from now on people (companies and large projects) may be more paranoid about anonymous contributors and refuse to sign off on using code maintained exclusively by them. It's fine for people to stay anonymous, but they just run the risk of not having the credibility for adoption and need to accept that.

But yes, I do trust certain figures like that, e.g. Asahi Lina. It's a fine ambiguous line. But at least in Asahi Linux there are real known human figures and they know who Asahi Lina is.


It is not about owing someone... it's about having provenance of code.

If you're just an anonymous guy doing stuff for free and want to remain anonymous, that's fine, but then your software shouldn't be used by anyone who cares about toolchain attacks as there's just no way to trust you, and no way to verify every single commit you make on new releases.

For software that gets used by many, which is a goal of OSS (otherwise just don't even bother to publish stuff, what's the point?), there needs to be a face behind it.

I do agree with others that identity is a hard problem, but people here are pretending there's no solution to that (or misinterpreting what I wrote to mean people should have a Facebook or Twitter account, which is absolutely not what I was trying to say - I just mentioned the most popular websites real people are likely to be found on, as that could be used to prove their identity... for example, I have a Keybase account where my proof of identity, which is tied to my public keys, can be found on my GitHub profile - but they let you choose Facebook or Twitter for that purpose as well) when obviously there is. I should know, I work on this space.


> If you're just an anonymous guy doing stuff for free and want to remain anonymous, that's fine, but then your software shouldn't be used by anyone who cares about toolchain attacks as there's just no way to trust you, and no way to verify every single commit you make on new releases.

What is, from security point of view, the difference between a toolchain attack performed by an anonymous contributor and by an identifiable real person?

> For software that gets used by many, which is a goal of OSS

It's not. The goal of OSS is to give users the possibility to study, change and improve the software. And that includes giving you ability to independently audit the code. All of that does not need any person behind it.


> everyone real would have something like a Linked In page, Facebook, Twitter, Instagram, or better, their own website with stuff that could be used to ensure they're a real person

Have you seen the campaigns people have run building fake LinkedIn profiles and slowly adding “connections”? There was one a few years ago which roped in a lot infosec people who should have known better and it’s gotten much worse with AI generators. Even before LLMs what you described would have been a godsend for intelligence agencies - who has more time for it, an open source developer writing actual code or the dedicated social media team at the IRA? – and now that’s increasingly worse.


I believe the solution to identity on the Internet needs to be tied to governments, that's unfortunate but in real life, that's always been the case and I can see no alternative here. Blockchain is a pipedream and no serious work is going to associate a person's identity to a key which cannot be revoked, cannot be recovered in case of "loss", can be tracked on a public ledger etc. etc...

But there's actual good work going on in the identity industry, like Verifiable Credentials, so this will become a reality soon: you will be able to verify someone's identity as long as you trust the issuer of their "credential" (which in the case here would mean basically a username and a public key or reference to a JWKS which can be used to verify the signature of the person, very much like the digital version of an identity card which can be used to check the signature on some piece of paper, but actually cyptographically safe)... so you would need to add a few governments to your list of "approved issuers", or something more indirect like universities (which themselves would rely on the government-issued identity) or traffic authorities (if you rely on driving licenses). Sure, Governments can lie, and people go to great lengths to steal others' identities in real life, but in the current world, we're still able to get bank accounts, passports etc. based on this model... just because the system is not perfect doesnt' mean it's not good enough, specially when there's no better alternative at all.


What is real identity? Anything online can be faked. A state-issued id? How that protects against nation state?


Nothing protects you if you're up against a state. That doesn't mean we should give up completely.

Do you have a passport? That's a real identity in most places. Soon, it may be possible to use that to link your identity to a set of public keys which you can then use to identify yourself.

There's a lot of work to be done to make this a reality, but work is surely going on right now and this is going to be possible one day.

Check this out, as a starting point: https://curity.io/resources/learn/verifiable-credentials/


Why would I trust an "important open source project" with my identity?

It goes both ways.

Besides, the 'state actor' the security theater people keep mentioning would have no trouble creating such real identities.


If you don't trust the project, you wouldn't contribute to it.

The state actor may be able to fake identities, but that would still allow tracking the identity to a particular state... and if caught multiple times, that state would start losing credibility and projects may choose to stop trusting people from such nationality, unfortunately, or at least require more strong evidence the person is real and trustworthy if they come from known rogue nations.


> If you don't trust the project, you wouldn't contribute to it.

Trust them to merge a bugfix is different from trusting them with my identity isn't it?

There are degrees of trust. For example I have a gmail address in my profile because the spam filter on there is better than what I have on my personal domain. People I've known for longer, business or otherwise, get the other (that I read more often).


News just in: NSA et al. defeated after having to create a LinkedIn and Instagram profile for their agents.


If you want security, pay for independent code audits (not compliance bullshit). Repeatedly. Don't offload your desires onto one-man-shows that the world decided are useful tools.


None of those things prove identity. A well funded or just patient attacker can spoof all of those. Sure, it raises the bar a tiny bit, but it's no proof of identity.


Yeah, it's not enforced (and certainly not with linked-in and facebook) but it's really not uncommon to require use of real names for contributions.

Linux doesn't allow anonymous contributions:

https://www.kernel.org/doc/html/latest/process/submitting-pa...

and this guide has been adopted by a lot of GPL-licensed projects (at least openwrt, glibc and gcc).


Wasn't there some controversies around this before? I remember there was some talk of why Asahi Lina (anonymous vtuber working on Asahi Linux) can contribute code to Linux. From casual search: https://www.spinics.net/lists/kernel/msg4888830.html

FWIW I like Asahi Lina, just trying to understand the discrepencies


Interesting. My understanding is that these projects don't allow anonymous contributions to make their copyright situation clear, so in theory if marcan42 sent a letter to the linux project saying that contributions from Asahi Lina are actually theirs, they might reasonably be fine with that.

It seems like this is how the ASF runs: you can be anonymous publicly, but you have to sign their CLA (or whatever they call it) properly.

To me, the people trying to unmask Asahi Lina are being simultaneously mean and silly. If it's so obvious that it's marcan42 doing a voice, do you really need to point it out? That's kind of the joke.


I'm not sure why the downvotes. That seems to be a statement of fact.

You can do a certain amount of identity obfuscation online but for anyone with a real professional profile you're generally not really anonymous if anyone really cares to find out your true name.


Me neither, i even provided a source, and it's easy to find other examples. There certainly are projects that allow anonymous contributions, but i doubt it's the majority of projects that one would consider important.

For these kinds of projects you could make up an identity relatively easily and nobody would know, but you're screwing over the project (as they may need to remove your contributions if they find out), so it's not something to be doing if you actually want to contribute (instead of inserting backdoors).

The original idea (not being able to contribute without a verified identity) is still wrong, but it's wrong because it's impractical to prove identity in a way that people find acceptable (and works), not because people will not give up anonymity, as many of the replies state.


There are people who downvote things that present facts that aren't in accordance with how they think the world should be.

I do think it's difficult to verify identity in any reasonably acceptable lightweight way. That said, for the larger projects I'm most familiar with, a lot of people work for companies, attend conferences, etc. They may go by nicknames day to day, but they have known real identities and their professional existence wouldn't be possible without one.


Who are you to propose requirements onto people who work for free?


I am not imposing anything on anyone. I am only saying that an OSS project that aims to be used as part of important infrastructure should impose at least some sort of identity vetoing and not just make random anonymous users maintainers of anything.

If your project is not important and you don't care about any of this security stuff, feel free to continue publishing your untrustable projects.


I took a look at the diff linked in the article with code that "we are all running". The top of the diff certainly looks interesting. They remove the bounds check in dict_put() and add a safe version dict_put_safe().

This kind of change is difficult to make without mistakes because it silently changes the assumptions made when code calling dict_put() was originally written. ALL call sites would need to be audited to ensure they are not overflowing the dictionary size.

The diff I am referring to is here:

https://git.tukaani.org/?p=xz.git;a=commitdiff;h=de5c5e41764...


Also because the 'safe' version only checks

  dict->pos == dict->limit
and not

  dict->pos >= dict->limit
if you can get one call of dict_put somewhere to pass the limit, all later calls of dict_put_safe will happily overwrite memory and not actually be safe.


No, because dict_put will update the limit value if the new pos exceed it.


I don't see anything like what you are describing. What line exactly are you talking about?


Wow, that is 1000% obviously malicious


Agree, nice catch. Also, there are many other opportunities in this patch to hide memory safety bugs.

This is the kind of optimization I might have done in C 10 years ago. But coming back from Rust, I wouldn't consider it any more. Rust, despite its focus on performance, will simply not allow it (without major acrobatics). And you can usually find a way to make the compiler optimize it out of the critical path.


I agree, this looks extremely sketchy. Especially because the code is just writing a fully controlled byte in the buffer and incrementing its index.

This would give you a controlled relative write primitive if you can repeatedly call this function in a loop and going OOB.


I think at this point is clear that everybody has to assume that XZ is completely rotten and can no longer be trusted. Is it XZ easy to replace with some other compression tool? Or has it been so widely adopted that is going to take huge effort moving out of it?


There is no reason to assume that. Even if you assume every commit since Jia became a maintainer is malicious, the version from 3 years ago is perfectly fine.

Zstd has a number of benefits over Xz that may warrant its use as a replacement of the latter, and this will likely be a motivating factor to do so. But calling it entirely rotten is going way too far IMO


There is an interesting argument to be made that pre-JT xz code is probably pretty secure due to the fact that the threat actors would have already audited the code for existing exploits prior to exerting effort to subvert it.


I always use "zstd --long=31 -T0 -19" to compress disk images, since that is a usecase where it generally offers vastly superior compression to xz, deduplicating across bigger distances.

XZ offers slightly better compression on average, but decompression is far slower than Zstd.


IIRC memory consumption is generally worse for Zstd at comparable levels of compression. Which, these days, is generally fine, but my point is you can't thoughtlessly substitute the two.


What keeps ringing in my head is the "." that was found that invalidates compilation. I personally don't buy it (but is my opinion).


What do you mean "don't buy it"?


My bad. I thought that the person who made that commit was someone else than JT. Can't delete comment nor self-down-vote it.


Huge effort, because it is the default .deb compressor in Debian for example


Arch Linux has replaced it with zstd in 2020 already. It's doable for the next major release of Debian.


Certainly, but we need an xz decompressor to read the current debian repo versions for the next decades, when they are oldstable or archived.


Decoding is easy.


This is 100% malicious or novice coder. And we surely know it's not the latter.

If you need an unsafe call, you add a dict_put_unsafe(). That again should of course be rejected in a code review.


I think Joey's right that we should all go back to the "pre-Jia-Tan" xz, and I've raised this with Red Hat too. It's actually not a big deal as xz and liblzma is relatively stable and the version from 2 years ago is fine, although I understand that Debian's dpkg uses some new API(s) from liblzma which makes this a problem albeit a minor one.

(Unfortunately the Debian bug report that Joey filed got derailed with a lot of useless comments early on.)


How do you know what 'pre' means given that pseudo-anonymous identities are free and Tan is already suspected of having some (e.g. Hans Jansen and Jigar Kumar: https://research.swtch.com/xz-timeline)


I mean we go back before all possible sockpuppets. We do have a reasonably good idea of when the attempt started.


The point that the person you replied to is trying to make is how do you know when the repo is clean? How can you ever be sure that someone hasn't introduced a backdoor at some point? It's bigger than just what has been discovered.


How do you know anything has not been compromised? You go and look at the commits and the code. It's hard work with no easy answers despite what many think.


By not letting the perfect be the enemy of the good-enough-for-now


The greater concern should be how many other sleeper contributors are out there. Anonymous contributions are accepted every day, and we know of cases with malicious intent such by "James Bond" (https://lore.kernel.org/lkml/20200809221453.10235-1-jameslou...).

I am not specifically worried about other contributions by "Jia Tan", those are being extensively looked at right now. They and other sleepers may just as well have contributed to any project with a different name and therefore "Jia Tan" does not pose more danger than any other contribution whose submitter cannot be held responsible.


What's malicious about that patch? From reading the thread it looks like an attempt to fix a FP from some tooling.


One of the patches that the University of Minnesota was banned for from contributing to the Linux kernel. They were trying to introduce a use-after-free (Fig. 9 in their paper).

https://news.ycombinator.com/item?id=26887670


I just had to think about how ironic it would be if "Jia Tan" turned out to be a Post-Doc from the University of Minnesota continuing that research on hypocrite commits.


Consider “Jia Tan” started working on xz because they already found a critical vulnerability and wanted to maintain it, or more tin foil, they burned xz to get upstreams to use another compression that is also already backdoored. When dealing with state actors there’s really no limit to how complex the situation can be.


This is something I also wondered but havent seen discussed anywhere. This could all be a smokescreen to get distros to switch to the next best compression library which already contains malicious code. Hopefully maintainers of any upstream compression libraries are all looking hard at their code bases right now.


Seems like a sensible thing to do, assuming this is a state-level threat actor there’s really no easy way to prove that their contributions are free of back doors. Seems not worthwhile risking the security of a large part of the Internet over a few thousand lines of code.


But why would the entire behind this submit all their attacks through the same single identity? Removing all this code could just be removing 1% of their harmful code. How do you deal with the rest? How do you discover the other identities?


You start with what you know about, and you investigate other projects carefully at the same time. There's no easy answer here, you do what you can.


Full on tinfoil hat here. But warranted and practical.

I'm wondering what fallout we'll see from this backdoor in the coming weeks, months or years. Was the backdoor used on obscure build servers or obscure pieces of build infrastructure somewhere? Lying dormant for a moment in future to start injecting code into built packages maybe? Are distro's going to go full-on tinfoil-hat and lock down their distribution, halting progress for long time? Are software developers (finally?) going to remove dependencies(now proven to be liabilities!), causing months of refactoring and rewriting, without any other progress?


>Are software developers (finally?) going to remove dependencies(now proven to be liabilities!), causing months of refactoring and rewriting, without any other progress?

How is that even a possibility? xz was very useful software, which can only exist if people with significant knowledge put effort into it. Not every OSS project has the ability or resources to duplicate that. The same goes for many, many other dependencies.

I believe that there is essentially nothing you can do to prevent these attacks with the current software creation model. The problem here is that it is relatively simple for a committed actor to make significant contributions to a publicly developed project, but this is also the greatest asset of that development model. It is extremely hard to judge the motivation of such an individual, for most benign contributors it is interest in the project, which they project onto their co contributors.


Agreed. More than that, there's not much of a way to preven these kinds of attacks, period, whether in software or otherwise, if the perpetrator is some intelligence agency or such.

For threats lesser than a black op, the standard way of mitigating supply chain attacks in the civilized world is through contracts, courts, and law enforcement. I could, in theory, get a job at a local food manufacturer, and over the course of year or two, reach the point I could start adding poison to the products. But you're relatively confident that this won't happen, because should it ever did, the manufacturer will be smeared and sued and they'll be very quick to find me and hand me over to the police. That's how it works for pretty much everything; that's how trust is established at scale.

Two key components of that: having responsibility over quality/fitness for use of your product, and being able to pass the blame up to your suppliers, should you be the victim too. In other words: warranty and being an easily identifiable legal entity. Exactly the two components that Open Source development does away with. Software made by a random mix of potentially pseudoanonymous people, offered with zero warranties. This is, of course, also the reason OSS is so successful. Rapid, unstructured evolution. Can't have one without the other.

Or in short: the only way I see to properly mitigate these kinds of threats is to ditch OSS and make all software commercial again (and legally force vendors to stop with the "no warranty" clause in licensing). That doesn't seem like a worthwhile trade-off to me, though.


>the only way I see to properly mitigate these kinds of threats is to ditch OSS and make all software commercial again (and legally force vendors to stop with the "no warranty" clause in licensing).

Which just pushes the problem to commercial companies getting a 'friendly' national security letter they can't talk about to anyone stating they should add REDACTED to the library they provide.


Correct. Hence the disclaimer in my first paragraph, which could also be stated as the threat duality principle, per James Mickens[0]:

"Basically, you’re either dealing with Mossad or not-Mossad. If your adversary is not-Mossad, then you’ll probably be fine if you pick a good password and don’t respond to emails from ChEaPestPAiNPi11s@virus-basket.biz.ru. If your adversary is the Mossad, YOU’RE GONNA DIE AND THERE’S NOTHING THAT YOU CAN DO ABOUT IT."

--

[0] - https://www.usenix.org/system/files/1401_08-12_mickens.pdf


Code is law. As such, the "standard way" you mention is appropriate for people with zero strategic foresight. There is no absolute need to depend on third parties to solve your problems and the possibility to limit and disperse trust to mostly yourself is real. Sure, glowies can always get to you but they can't get to you everywhere nor all the time. Security/Assurance models and both proprietary and free software architecture are already adapting to such facts.


Not all dependecies are "xz" complexity.

Minimizing dependencies, probably means keeping a few libs things like crypto or compression and such.

But do you need a library to color console output? Even if colored console output is a business critical feature, you don't need a (tree of) dependencies for that. I see so many, rather trivial software that comes with hundreds or thousands of dependencies, it's mind boggling really. Why have 124million people downloaded a rubygem that loads and parses a .env file, something I do in a bash oneliner? Why do 21k public npm packages depend on a library that does "rm -f"?

The answer, I'm afraid is mostly that people don't realize this isn't just some value added to their project but rather a liability.

Some liabilities are certainly worth it. XZ is probably one of them. But a Library that does "rm -f" certainly isn't.


It's impossible to insure against in any practical terms.

The way forward is to invest heavily in a much more security-oriented kernel(s) and make sure that each program has the bare minimum to achieve what it offers as a value-add.

The human aspect of vetting seems like an impossibly difficult game of whack-a-mole. Though realistically I doubt that the bad actors have infinite agents everywhere, this also has to be said. So maybe a "sweep" could eliminate 90% of them, though I'd be skeptical.


Agreed, as a developer: minimize your dependencies while providing your core function. Don't grant dependencies permissions they don't need. Be granular about it. Austral lets you select what filesystem, network, etc. access each library gets.

Also, in big organizations, risk assessment is more about making sure there is someone to point the finger at, than actual security. Treating libfubar as golden because it ships with something you paid another company money for makes sense in that light. But not from an actual security mindset.


"reduce the attack surface" is Security 101. Noting again that sshd doesn't natively use xz/liblzma (just libz) or systemd, so I don't think I need to point out where the billowing attack surface is ;)

Apache (by way of mod_systemd) is similarly afflicted, as is rsyslogd, I guess most contemporary daemons that need systemd to play fair are (try: "fuser -v /lib64/liblzma.so.?" and maybe "ldd /lib64/libsystemd.so.?" too).

Like a Luddite I still use Slackware and prefer to avoid creeping dependencies, ever since libkrb5 starting getting its tentacles into things more than a decade ago.


Yeah, its almost like "do one thing and do it well" had security benefits...

SElinux has the desired sort of granular permissions at the OS level, but if everything is dynamically linked to everything else that doesn't help as the tiniest lib is now part of every process and can hence pick and choose permissions.

But even if we go full monolith OS when systemd takes over the job of the kernel and the browser, that just changes where we need those permissions implemented. We can't practice zero trust when there is no mechanism for distrust in the system.


> Agreed, as a developer: minimize your dependencies while providing your core function. Don't grant dependencies permissions they don't need. Be granular about it. Austral lets you select what filesystem, network, etc. access each library gets.

Still wouldn't help for this particular exploit.


If systemd could deny liblzma any syacall or filesystem access, that would have prevented it. It is only used to compress a data stream, it only needs read access from one buffer, and write access to another. I realize there is no current mechanism for these granular permissions, that is what I was proposing be addressed.


We don't have the way to apply any restrictions on a per-library basis. This is generally quite difficult to do.


I know. That's what's missing from our current technology. I am honestly tired of everyone collectively pretending this is not a problem. Periodically we get very grim reminders that it's in fact a problem, everyone pretends to care for a month then it's all back to where it was.

It's depressing. (And no, this comment does not imply you are such. I am responding + ranting.)


I suspect most of your frustration comes from reading "we don't think this is a good place to spend our effort" as "there are no problems here".


Likely. Though people not seeing the problem is quite frustrating by itself.


In a way it would.

If a software project has hundreds of dependencies, finding that one that was compromised is hard, impossible even. But if it has three dependencies (that aide in the core functionality) keeping a keen eye on them is much easier.

When I look at a typical `node_modules` or `pipenv` directory, I see there's absolutely no way I can vet that all is safe in there. When I look at my typical cargo tree, the four or five dependencies (of dependencies) are doable to just go over every so often.

Automation helps. But that doesn't give me the confidence that just opening the project pages of the stuff that I use, once every few months does for me.


Since I didn't keep as current as I wanted to be (work and life happen a lot lately), what could have prevented it?


> The way forward is to invest heavily in a much more security-oriented kernel(s)

While I don't disagree that kernels should be secure, I also don't see how that would have helped in this case, given that (AFAICT) this attack didn't rely on any kernel vulnerabilities..


True, I wasn't specific enough. The attack exploited that nobody thinks security is a serious enough problem. It's a failure of us (the technical community) as a whole, and a very shameful one at that.


IIRC Debian has wiped and is rebuilding all their build hosts, so yes.

But while I understand what you mean, I would not call improving the security of a piece of software “halting progress”. Security improvements are progress, too. Plus, revisiting processes and assumptions can also give opportunities to improve efficiency elsewhere. Maintenance can be an opportunity if you approach it properly.


What I meant with "halting progress" is what commonly happens when a piece of software is "rewritten from scratch". Users (or clients or customers) see no improvements for years or weeks, while the business is burning money like mad.

The main reason why I am firmly opposed to "rewrite from scratch" or "we'll need some weeks to refactor¹".

Removing upstream dependencies and replacing them with other deps, with no-code, or with self-written code² is a task that takes long time during which stakeholders see no value added other than "we reduced the risk"; in case of e.g. SAAS that's not even risk these stakeholders are exposed to, so they then see "nothing improving". I'm certain a lot of managers, CTOs and developers suddenly realize that, wow, dependencies really are a liability.

¹ I am not against refactoring, just very much against refactoring as standalone, large task. Sometimes it's unavoidable because of poor/hard choices made in the past, but it's always a bad option. The good option would be "refactor on touch" - refactoring as part of our daily jobs of writing software.

² Too often do I see dependencies that are redicoulously simple. Left-pad being the posterchild. Or dependencies that bring everything and the kitchen-sink, but all we use is this one tiny bit that would've cost us less than 100 lines to write. Or dependencies to solve -- nothing really? Just that no-one took the time to go through it and remove it. And so forth and so on.


Not fallout but increased vigilance is the expected most significant outcome. Some 5+ years ago or so I listened to a Debian guy his talk about reproducible builds and security, he was stressing the audience to be aware of just happened in a very detailed manner. One of the details he mentioned was glowies having moved their focal point to individual developers and their tooling & build systems. At least some people who matter have been working on these threats for many years already, maybe more people will start to listen to them; in such case this entire debacle could have a net positive effect on the long run.


> Was the backdoor used on obscure build servers or obscure pieces of build infrastructure somewhere?

And developer machines. The backdoor was live for ~1 month on testing releases of Debian and Fedora, which are likely to be used by developers. Their computers can be scraped for passwords, access keys and API credentials for the next attack.


> Are software developers (finally?) going to remove dependencies(now proven to be liabilities!), causing months of refactoring and rewriting, without any other progress?

We've been here before, with e.g. event-stream and colors on npm. So I don't think it will change much. Except maybe people will stop blaming it on JS devs being script kiddies in their mind, when they realise that even the traditional world of C codebases and distro packages are not immune.


You can't really remove dependencies in open source. It is so intertwined at this point that doing it would be too expensive for most companies.

I think the solution is to containerize, containerize and then containerize some more times and make it all with zero trust in mind.


Containerizing is entirely the worst response here. Containers, as deployed in the real world, are basically massive binary blobs of completely uncertain origin, usually hard to reproduce, that easily permit the addition of unaudited invisible changes.

(Yes yes, I know there are some systems which try to mitigate this, but I say as deployed in the real world.)


Your application is already most likely a big binary blob of uncertain origin that's hard to reproduce. Containers allow these big binary blobs of uncertainty to at least be protected from each other.


Pretty much; updating say libssl in a "traditional" system running app, or maybe 2-3 dependent apps fixes the bug.

Put all of them in containers and now every single one needs to be rebuilt with the dep fixed and instead of having one team (ops) responsible, you now need to coordinate half of the company to do so. It's not impossible but in general much more complex, despise containers promising "simpler" operations.

...that being said I don't miss playing whack-a-mole game with developers that do not know what their apps need to be deployed on production and for some retarded reason tested their app on unstable ubuntu while all of the servers run some flavour of stable linux with a bit older libs...


Docker containers are not really a security measure.


It is a security measure. Sure it doesn't secure anything in the container itself. But it secures the container from other containers. Code can (as proven) not be trusted, but the area of effect can be reduced.


Only with additional hardening between the container and the kernel and hardware itself.


> What if xz contains a hidden buffer overflow or other vulnerability, that can be exploited by the xz file it's decompressing?

If you generalize this problem further, to all packages, then the only reliable solution is security through compartmentalization. On Qubes OS, any file I open, including .jpg and .avi, can't have the access to my private data or attack the admin account for the whole computer. This is ensured by hardware-assisted virtualization.


> the only reliable solution is security through compartmentalization

I hope we get there eventually. Not just for standalone processes, but for individual libraries. A decompression library could run inside a WebAssembly sandbox, with the compressed file as input, the uncompressed file as output, and no other capabilities.


What does this have to do with WebAssembly? That is another runtime that adds complexity. Apple has been sandboxing codecs for a long time. They run in a sandboxed process that is only communicating through stdin and stdout or something similar, if I remember correctly. You can ran native code directly. Adding a runtime with a JIT-compiler makes it harder to understand what is going on.


WebAssembly can be run in-process rather than requiring a process switch, and it can be easier to port library code to run inside a WebAssembly sandbox than a completely separate process. Also, sandbox mechanisms for separate processes are not always as robust, since they have to give access to any direct syscalls the process makes, whereas WebAssembly completely insulates a library from any native surface area.


There's AOT wasm too. Firefox uses it to sandbox some stuff. https://hacks.mozilla.org/2021/12/webassembly-and-back-again...


WebAssembly is the new Rust, I think

Hey, no one proposed to rewrite xz in Rust yet! I'm sure that would automatically protect any project from social engineering attacks!


Running xz in a sandbox would not prevent an attack that causes it to modify source code in a .tar.xz that is being streamed through it.


No, it wouldn't, but that wasn't the attack here. And code outside the sandbox could check a checksum of the uncompressed data, to ensure that the decompression can't misbehave.


It's a bit ironic that after a trust attack this person ends the article sayin

> I do have a xz-unscathed fork which I've carefully constructed to avoid all "Jia Tan" involved commits.

He may be fully legitimate, and perhaps a famous person in OSS (which I was unfamiliar with), but still ironic :)


There seems to be a fundamental misunderstanding with a lot of these writeups. Are they 100% sure history was not rewritten at any point? Going back in time on the repo prior to listed involvement doesn't do anything as the attacker had full control. Starting from the last signed release prior to their involvement is the only way to actually move this forward (history may be fully lost at this point), the rest is posturing.


Even history rewrites would be visible with Github's new Activity tab, e.g., see the two force-pushes in llama.cpp https://github.com/ggerganov/llama.cpp/activity So, while, yes, git history can be rewritten, commits pushed to Github can effectively never be deleted. Personally, I find this to be a downside. Think, personal information, etc. But, in this case, it is helpful. Of course, the repository is suspended right now, so the Activity cannot be checked.


While it's certainly possible to rewrite git history, it's tricky to do it without other maintainers or contributors noticing, since anyone trying to pull into an existing local repo (rather than cloning fresh) would be hit with an unexpected non-fast-forward merge.

It seems likely to me that Lasse Collin would have one or more long-standing local working copies.

So IMHO injecting malicious changes back in time in the git history seems unlikely to me. But not strictly impossible.


Based on how this has gone (remember xz has effectively been orphaned for years, and the majority of long-standing setups were using the release archives), unless if Lasse has never run any code from Jia (unlikely) I'd consider the entire machine untrusted (keys, etc). Provided the tarballs are still signed from that date, from another immutable source, that's really the only starting point here to rebuilding.


In any case Debian has its own archive of every xz-utils version they've used in the past.


The attacker had access to the GH mirror of the repo. The original repo remained at https://git.tukaani.org/


> Are they 100% sure history was not rewritten at any point?

With git, one way to check is if other people still have clones of the xz repository from a time when it was trusted.

If you suspect the repo history has been tampered with, you can check against those copies.

I believe it would be hard to introduce such a history rewrite, since people pulling from the xz repo would start getting git error messages when things don't match up?

I don't know to what degree intentional SHA-1 hash collisions could be used to work around that?


You can create pairs of SHA-1 hash collission, but not for a particular existing SHA-1 hash (the git one)


People think git is immutable. It is not.


Yes and no.

A local GIT repo can be changed (including its history) however you please. But once you have shared it with others you can't take that back. If you try to, then others will notice that the hashes mismatch and that their HEAD diffs uncleanly.

I know the term is infamous here, but GIT is essentially a blockchain. Each commit has a hash, which is based on the hashes of previous commits, forming a linked list (+ some DAG branching).


> If you try to, then others will notice that the hashes mismatch and that their HEAD diffs uncleanly.

So it relies on a human noticing and acting upon it. People not noticing backdoors being merged into the project is kinda the source of this problem.


You can automate checks for if a large part of the previous git history suddenly changed.

You can't automate checks for malicious code.


That relies on some heuristics which can be worked around, unless you disallow rewriting history.

But the bigger issue is that this is some theoretical system which is not present in most git repositories.


The heuristic would be "sound the alarm if the main branch is rewritten". And maybe also "if a release tag that we have used for our distro is moved".

Wouldn't that catch most problems, and not generate too many false alarms?


You can rename/switch branches. You can change what branch is considered main/master. You can find valid reasons why you'd want to do stuff which raises the alarms so that other people become deaf to them, and only then execute the rewriting attack. Relying on people noticing (even with alarms) is just super fragile.


> You can rename/switch branches. You can change what branch is considered main/master.

Sure, in the project repo the branches are just simple text files that contain the hashes of the commits they point to.

So they are trivial to change in the project repo. But it is also trivial for the distro project to keep copies of the branch/tag info and check against those. I guess what you mainly care about are the previous release tags. They should never change after a release.

> Relying on people noticing (even with alarms) is just super fragile.

I'd say there's plenty of motivation now for the major distros to put infrastructure in place to automate this (keeping track of previous releases) and to actually keep looking at the alarms.

> You can find valid reasons why you'd want to do stuff which raises the alarms so that other people become deaf to them

I'm sure the attackers would try things like that.

But let's say you have an open source application/library that is part of Debian.

How common has it been in the past that the app/lib project had a bunch of tagged releases, and then wanted to rewrite the history so that the tagged releases now point to different commits? I assume it has been very uncommon, but maybe I'm wrong?

And even if that is the case, new infrastructure tools can keep local copies of the source code for previous releases, and check against that.

Repo checking is not trivial, perfect, or sufficient. But I'd say it's a necessary component in guarding against attacks.

The big challenge is still that there is so much code added/changed for each new release of apps/libs that it is very difficult to check against attacks. The obfuscated C contest has proven again and again how hard it is.


Its a Merkle Tree. They were invented 3 years before blockchains: https://en.wikipedia.org/wiki/Merkle_tree


It also uses a Merkle tree to compress the snapshot versions associated with commits. But the actual commit structure builds on top of that. A pure Merkle tree or forest would only give you a set of overlapping snapshots, without any directionality. So, I think it is fair to call it a blockchain as well.


Blockchains were invented in 1982?


In short, yes: https://en.wikipedia.org/wiki/Blockchain#History

People conflate blockchains, distributed networks and cryptocurrencies.


Well, it is and it isn't: It has mutable pointers (branches and tags) to immutable nodes in a graph (commits).


Can you elaborate? Are you thinking of intentional SHA-1 has collisions? Would that work in practice?


The history. Every time something like this attack happens people think they can read the complete git history in the repo.


If some commits are signed by people you trust, can the chain before that still be compromised?


Concerning history rewrite, it makes sense to point to Fossil and its major difference to Git:

https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki#...

There is also a link to "Is Fossil a Blockchain?", an interesting read because the term was mentioned elsewhere is this thread.


Trusting anything from that actor is full on ignorant, let alone "a new decoder". It's insane.


Trusting people in general is inadvisable. I haven't trusted anyone for years and I am richer than ever.


You probably still rely on trusting others a lot more than you realize.

If you really, really, didn’t rely on trusting anyone, I don’t even see how it would be possible to exist on earth.


You're trusting millions of people just to be able to write this comment.


Those two topic don't have much in common, trusting a state level hacker actor vs. trusting people in general.


> Hopefully, Lasse Collin will consider these possibilities and address them in his response to the attack.

Here's the thing: Lasse Collin was overloaded back in 2021. I've no particular reason to believe that isn't still the case. He needs help. Dealing with this solo is an incredible amount of work. Also, he needs help from a verifiably trustworthy source, verifiable in a way that doesn't require a lot of effort. In practice, that almost certainly means help from a major open source company.

I seriously doubt that's going to happen, because the people who really need to learn this lesson won't, because it's probably not in their financial interest to realize that supply chain problems start with them doing things on the cheap. While we keep on running the world's infrastructure like XKCD 2347 every so often everything's going to topple over.


> Also, he needs help from a verifiably trustworthy source, verifiable in a way that doesn't require a lot of effort.

PGP signing parties?


I remember when they were big. People signed anyone’s key, didn’t need to know them. Yes, sensible people thought this was an issue. Still happened.


... but `xz` is pretty much feature complete to me.

Lasse Collin was doing bug-fix-only release just fine.


The start of the attack was few fake accounts trying to shame the maintainer for not "developing" it constantly and so give maintainer rights to someone else.

And there wasn't really anyone to say "nope, it's fine, fuck off"


> And there wasn't really anyone to say "nope, it's fine, fuck off"

That's because for people for whom the project is doing fine and they haven't experienced any bug, why would they go to the mailing list, forum or whatever other form of communication channel the project has?

One has to understand and keep in mind that places that can gather feedback will invariably attract more of the negative kind than the positive kind because people who are happy are not motivated to say they're happy. Those people wouldn't even know people were complaining about xz.

There's thousands of libraries/independent software projects installed on any computer. No one has the time to check the place of all those software projects and go there just to say "hey, I'm happy, no need to change anything, thanks?", right?

People who are discontent with something on the other hand are sure to be vocal about it. But just because they're the most vocal doesn't mean they are the majority of your users.


I can't help but feel open source's responses ultimately don't address the root of the problem.

Yeah okay, reverting to 5.4.6 or some version from over 2 years ago might "solve" the immediate problem that is the backdoor, but it's not going to solve anything else.

More specifically, I've not heard so much as a rumor that any of the dependents will contribute time and manpower to the project they rely so heavily on. I find it amusing that it was someone from Microsoft, a company reviled by a lot of the open source (and particularly FOSS) community, who brought this problem to light.

Producing something needs time and manpower, and time and manpower ultimately are not free (both beer and libre).


>Yeah okay, reverting to 5.4.6 or some version from over 2 years ago might "solve" the immediate problem that is the backdoor, but it's not going to solve anything else.

The author suggested going back to 5.3.x, I tend to agree with this. From what I read, "Jia Tan", had hands in 5.4.x, if true I would revert back to a version earlier than 5.4.x. "Jia Tan" proved to be quite skilled at Obfuscation.


I am wondering if the person who sent the patch to disable systemd reliance on lzma shortly after the release of the backdoored xz knew about the plan. Maybe an agent of a competing entity?


my gut instinct is that xz needs to be rolled back to its pre-attack state, but obviously that would probably also reintroduce some bugs and likely break some things. Still, very curious to see some analysis on the impact of doing so, because this article points out, xz is in a critical path for lots of system level processes.


There were symbol changes in recent releases, and things like apt link to liblzma. If liblzma were downgraded without also updating apt at the same time you could be left with a non-functioning apt.


That’s a good start. In the long run probably three things are necessary:

1) wiring critical software in a language that protects better against such exploits. Might be Rust, Go, perhaps also C# and Nim.

2) Making reproducible builds the norm, that start from the original source code repositories (e.g., based on a Git hash)

3) making maintainers more resilient against social attacks. This means more appreciation, less demands, and zero tolerance against abuse. If the maintainer can be pressured, I am at risk.

The last one is probably the most difficult.


> It feels good to not need to worry about dpkg and tar. I only plan to maintain this fork minimally, eg security fixes.

This is exactly the problem in the first place, lack of support for maintainers.

OP themselves say "I will only minimally maintain this fork". Okay, but it's so easy in hindsight to criticize what has happened.

> Hopefully Lasse Collin will consider these possibilities and address them in his response to the attack.

I can't even imagine how he's feeling these days.


> I can't even imagine how he's feeling these days.

None of this is his problem or fault. I see no reason he should feel anything about this. Keeping everyone safe was never his job. He wrote code and gave it away for free. That should be enough.


anyone who has ever been pickpocketed or robbed or worse will know reason and feelings are different things


> I see no reason he should feel anything about this.

Absolutely agree, but from the sounds of the emails at least, he was going through a bad time then, and nobody feels good when they realise they were taken advantage of.


I'm here wondering why big tech companies that have Too Much to Lose didn't already massively fund a project that freaking sshd depended on it (through systemd).

Like how does it hurt Google to assign 100 people to review and investigate commits of some project as basic and fundamental as a compression tool


Poke at any large company at all, and you'll find that their in-house critical fundamental infrastructure thing is chronically underfunded, understaffed, bug-ridden, everyone is worried but no budget is ever approved.


Within a given part of an organization, just about everyone thinks they're underresourced and understaffed.



Google also does code reviews for some commonly used projects (or maybe that's part of the same thing? I don't know). I went through that last year with one of my Go libraries.

The idea is good, but the entire process is so bureaucracy-heavy and time-consuming that I found it both frustrating and entertaining in equal parts; like something out of Brazil (the film, not the country). So many emails, so many video meetings, so many people involved, so much talking. And all for looking at a 4,500 line Go library.

"Here's the code; just clone and look at it, and let me know if you find something"... It's not like you need my permission to do any of this *shrug*.


Brazil the country is also known for onerous bureaucracy


MS funded people and pipelines to analyze it. Jia Tan convinced them to disable the fuzzing that was designed specifically to find malicious behavior, using social engineering, and one MS engineer did find it.


What were the MS analysis projects? The fuzzing was google-sponsored.


You are correct. The article I read led me to conflate the Azure engineer's valgrind triggered activity with oss-fuzz. Which is as you say a google effort.



sshd didn't depend on it, to be fair. Not officially at least.


Is XZ embedded affected by any of this?


Good question. Embedded systems are harder, sometimes impossible, to upgrade, and there are chances a backdoor in a small board inside an appliance that doesn't offer easy physical access could take years before it is found and removed.


It's like security 101. If a system has been infiltrated, you can't trust any part of it. So it's better to discard any part that has been reached or possibly affected.

Perhaps it's the correct action to distrust xz/lzma or any source code this team has control over and switch to alternatives. If there are no alternatives, to start ones.


Hiding more backdoors in the library would only increase the risk of getting discovered. Care is certainly advised on the source level, but I'd leave the paranoia to the state of systems where the code has run.

From the attackers' perspective, what they'd want to do is use their project infiltration success as little as possible, only enough to squeeze in other backdoors completely unrelated to xz. But that's all operations, not development.


What are your definitions of 'system' and 'any part'? Any big company has been breached at some point. They don't throw away all their hardware everytime, even though it was connected. You have to draw the line somewhere.

You're assuming a world with separate hardware and software. That's not the case any more. We have closed sourced firmware running anywhere and no way to verify what's running.


Sure that's a problem for threat assessment process. And I totally agree in today's world software/hardware and wetware are too interconnected. And that's another threat for this kinds of attacks.

In this case, is the whole git repo a threat? Or are just the manually created distribution files? Threat actors' access reach defines that. As time passes by we see that reach is not too limited. They even reached to other software with patches too. So that assessment should be done.


> If a system has been infiltrated, you can't trust any part of it. So it's better to discard any part that has been reached or possibly affected.

systemd! let's discard systemd!


They just added an example to the documentation[0] of how to implement the sd_notify protocol without linking to libsystemd, so a little bit of discarding systemd (or at least parts of it) does seem to be part of the solution.

[0] https://github.com/systemd/systemd/pull/32030/files


People have been bandying about "10 lines of C", but I'm curious if you know why the protocol is not "2 characters" of shell, namely ":>PATH" (ok, ok, PATH is probably something like /run/serviceName/I-B-ready). At the user (i.e. service daemon) -level this seems much simpler. (EDIT: and systemd would unlink the file as soon as it "gets the message", of course.)

There's just a 40 year culture of using some "official" lib to implement socket protocols - even if the docs suggest you roll your own. I feel like file creation escapes that "reach-for-the-official-lib TCP/UDP/datagram" culture.

It's probably not harder for systemd either if they just use/require the Linux inotify and incorporate that into its select or poll or whatever. I mean, if they wanted to be portable to non-inotify kernels some timeouts/stat-loop would be an ok fallback that would probably be rarely-to-never needed.

It sounds like it's not even hard to add this simpler channel in after the fact just as an alternative option for `whateverd` and then deprecate the datagram one for 10 years (if they even care to).


But this suggests reimplementing xz/lzma. Which would cost money. Hence, won't be done.


> But this suggests reimplementing xz/lzma.

If there is a known good copy of the repo from before the attacker had sufficient access to alter history, then that is an acceptable starting point.

From there you look at each update since and assess what they do to decide if you want to keep (as they are valid improvements/fixes) or discard them. If some are discarded, then later ones that are valid may need further work to integrate them into the now changed codebase. Similar to Debian assessing upstream security patches to the latest version to possibly back-port them to the version they have in stable, when there is significant disparity (due to a project being much faster moving than Debian:Stable).

As xz/xzutils is a relatively stable package, with very few recent changes, this should be quite practical. A full rewrite shouldn't be needed at all here.


> If there is a known good copy of the repo from before the attacker had sufficient access to alter history, then that is an acceptable starting point.

I heard someone calling themselves “Honest Ivan” has just the thing, totally trustworthy.


Given how spread the copies could be, and that we know when the bad actor gained the level of control needed to upset history, or if we want to go further back when that user started making contributions, it is likely that by comparing many claims we can prove to a reasonable level of assurance¹ that a given version is untouched in that regard.

Furthermore the original main maintainer seems to have a repository with an untouched lineage. While true paranoia says they can't be trusted without verification (he could be under external influence, as could anyone) I think we can safely give their claims more credence than those of Honest Ivan.

--

[1] to the level where a clean-room implementation is not significantly less likely to be compromised by external influence with bad motives.


It should be easy to go back to https://snapshot.debian.org/ and one more repository and verify old untainted releases between the two archives.


But there are alternatives, most notably zstd.


It's a different algorithm made for a different purpose.


sadly, the zstd cli tool links to lzma right now (as installed by some distros) :/


a half-arsed search resulted in this half-baked rust library: https://github.com/gendx/lzma-rs


It seems detecting holes in jia’s code could be extremely difficult. Given the stakes, as a precaution, would it be viable to simply wipe and rewrite (from scratch) the last ~2 years of commits to xz?


Why not add another layer to the tinfoil hat?

How do we know "Jia Tan" is not a Facebook op to "nudge" people to switch to Zstandard?


Yeah, it's left me a little disappointed in Arch in particular that they didn't follow the lead of Debian and Fedora and revert to a much older version, instead just building 5.6.1 from the git repo and basically defended it as "the hacked build script checked for dpkg/rpm anyway".


Is this what you're referring to?

> Regarding sshd authentication bypass/code execution

> Arch does not directly link openssh to liblzma, and thus this attack vector is not possible. You can confirm this by issuing the following command:

> However, out of an abundance of caution, we advise users to remove the malicious code from their system by upgrading either way. This is because other yet-to-be discovered methods to exploit the backdoor could exist.

https://archlinux.org/news/the-xz-package-has-been-backdoore...

I'm not finding anything from Arch mentioning dpkg/rpm, the linked article above is the latest article about the xz compromise from the Arch homepage.


- https://bbs.archlinux.org/viewtopic.php?pid=2160841#p2160841

- https://gitlab.archlinux.org/archlinux/packaging/packages/xz...

- There were some comments in the middle of the giant openwall mailing list thread I can't find now because they're in the middle of 30,000 replies


Arch Linux is not vulnerable to this specific attack, which requires sshd to be linked to liblzma. This link is provided by out-of-sshd patches, that Arch does not apply to their build.


The point here is there is uncertainty in all commits by Jia Tan, Arch’s focus is on this specific hack, but are there other vulnerabilities in the hundreds of commits to the git repo from the same author?


But as this article points out, liblzma is used in other crucial processes, and is generally trusted, often probably being run as root. The known bad actor contributed lots of code to xz that isn’t involved in the SSH backdoor. To assume it’s all innocuous would be truly foolish.


Arch tries to always be as current as possible, for better or for worse. So this definitely makes sense for arch


Wow. So for the xz package it looks like they changed the upstream to this (edit: the original maintainer’s personal repo, Lasse Collins) git repo that still contains Jia Tan’s commits: https://git.tukaani.org/?p=xz.git

tl;dr they re-enabled the sandboxing previously disabled by Jia Tan.


What if over 80% of all open source projects are secretly sleeper agents for various malicious actors, states, terrorists and whatnot, and they pretend to give use precious software updates for free so they can attack later?

What if proprietary software is running the same way, except they don't even give you free updates and you can't audit the source code and have to trust them when they push updates?

What if your mother gave birth to you just so she can slap you in the face when you're 30?

Yeah, we can go very far, but in this moment, xz is under so much scrutiny that in 2-4 weeks, I'd trust it with my life unless the big orgs looking at it issue more reverts (hence, the delay). So if there are issues, they're everywhere else.


Nit-picking but, eh, png does not use lzma at all.

> PNG compression method 0 (the only compression method presently defined for PNG) specifies deflate/inflate compression with a sliding window of at most 32768 bytes. Deflate compression is an LZ77 derivative used in zip, gzip, pkzip, and related programs.


I don't think joeyh wanted to imply that PNG uses liblzma. PNG is just a convenient place to put opaque binary stuff that'd trigger an xz compression bug.


I still don't understand how would that work. The post said:

> Let's say they want to target gcc. Well, gcc contains a lot of documentation, which includes png images. So they spend a while getting accepted as a documentation contributor on that project, and get added to it a png file that is specially constructed, it has additional binary data appended that exploits the buffer overflow. And instructs xz to modify the source code that comes later when decompressing gcc.tar.xz.

It says "when decompressing", and I would imagine that such a bug needs specifically constructed lzma stream to trigger. If you want to do it by changing a source file (a png here) you need to make "second-order" bugs: i.e. the compressor needs to output a broken lzma stream which when later decompressed would exploit (not simply cause) a memory corruption bug. This is too brittle [1] and are very likely to be detected.

[1] Disclaimer: I'm not an expert in writing backdoors. I consider myself reasonably competent for writing exploits, and I've written deliberately buggy programs (for CTFs) before.


Consider this backdoor:

- if the decompressed stream contain a magic keyword, run the rest of the file as x86-64 binary.

Now you just need any opaque binary file to host the payload. A PNG works fine, because most decoder don't care extra bytes at the end.


decompressing gcc.tar.xz which contains foo.png followed by main.c, the decompressor is instructed by the hidden data in the png how to alter the code.


The build script decoded a precompiled backdoor code from a binary test file that wasn't really an archive, but encrypted with Caesar cipher. Any blob can be used like this as a trivial steganographic contained.


Any code that extract data from such a blob would look very suspicious.


I'm disappointed, to put it very mildly how archlinux handled the matter. They still use version 5.6.1 and assume that switching from github to the one hosted by Lasse fixes the issue. They say "our sshd isn't compromised". But what the author of this article wrote, who knows what else might be affected. There's a forum post on the archlinux forum, which was closed by ewaller, an administrator account, with a weird reason, that this thread was only to inform people, but when people started calling out the malpractice of the archlinux maintainers the thread got locked.

To me, this is very suspicious.


Why not get rid of xz completely? Hoe about using a simpler piece of software, which could be maintained by more people?


> Why not get rid of xz completely?

... you mean compression package used by most big distros to make their packages? Do you really need to ask

> Hoe about using a simpler piece of software, which could be maintained by more people?

It's not a complex piece of software. Lib itself is ~15k lines of code.

It does it's job well and it needs little work. It didn't had any outstanding bugs lingering unfixed for years

Complexity have nothing to do with the problem, it's just... uninteresting enough that there is no reason for contribute.


Complexity is one of the root causes of the problem. The script was added via messy autoconf scripts inscrutable to most people. That is decades old tech which has alternatives.


I've seen for example here a lot of issues raised:

https://www.nongnu.org/lzip/xz_inadequate.html


> ... you mean compression package used by most big distros to make their packages? Do you really need to ask

Because of the existence of gzip, zlib, bzip2, and many others, it's trivial to drop xz.

So yeah, why not get rid of xz completely?


From reading comments here alone, I can predict the following (unhealthy) effects on software at large:

first, huge scaremongering like this writeup. It all hinges on a notion that who knows who is writing walls of who knows what code, and the code and its purpose is absolutely unpenetrable, inscrutable by anyone else at all. It's not true, as evidenced by analyses of this hack alone and by reverse engineering community at large. Of course, it requires doing what Anders has done, meaning, rolling up your sleeves and actually reading the source and trying to understand it, and not just hope someone else will do it. Whatever one person tangles, another can always untangle.

Then, there is going to be a witch hunt and people jumping on any innocent change with pitchforks (already happening, satanic panic over ibus is in the nearby discussion). The code review theater will be in full swing. Previously, people were conserving their brain energy (or masking their incompetence) by skimming walls of changes and stamping LGTM on them based on how well-formed they had been, how long they had known the author, and how the builds were not broken. Now it will become extremely hard to get any changes done at all, because the new mental shortcuts will be: too long, didn't approve; explanations too complex; just plain Reviewer Says No; and so on. Any sloppy PR denial will follow by patting oneself on the back, look at me, I just have thwarted a KGB agent.

Some people thinking they know a lot without basing such assertions in reality will try to become overnight „wonder experts in security”, barking on every shell script they don't understand, or every piece of generated text, like generated Makefiles.

Vulnerabilities akin to CVE 2022-3786 and CVE 2022-3602 — which got introduced by writing a whole new email address parser from scratch (which IIRC also got checked in as a whole wall of code at once, and I read someone blaming exactly this as the culprit) — will lead to questioning by police at least once.

Automated codebase scans with bogus reports like Daniel Stenberg wrote about in [0] will be more abound. Everyone will just jump on any "unsafe" function call without actually understanding its context, and will keep pestering authors to "make a fix" because potentially something (gasp!) may happen.

Later everyone will get tired of this charade, and everything will come back to "normal", probably with added processes and red tape to make FOSS maintainership even more of a liability than it is today. Nothing will be done at all to make it less of a burden.

[0] https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-f...


The US spends $30-60 billion a year on agriculture and subisides [1]. This is controversial for all the obvious reasons. Without subsidies we'd waste less food but overproduction of food is an intentional objective of these programs. Why? Because if there's a major drought or crops are lost to ice or snow or flooding, Americans won't starve. It's why we have things like the US government having a reserve of over a billion pounds of cheese [2].

People not starving is a national security interest.

It's getting to the point where the software we rely on is also a national security interest. The US government should be paying to maintain and improve this software. Security risks in Linux and core packages threaten to shut down key infrastructure.

Paying developers to maintain open source projects that are actually used could be an incredibly effective use of tax dollars.

[1]: https://usafacts.org/topics/agriculture/#581290c9-a960-49aa-...

[2]: https://www.deseret.com/2022/2/14/22933326/1-4-billion-pound...


It's not valid to regard the potential of negative consequences in any sphere of human life as a "national security risk".

It's not valid to presume that the only way to mitigate risks is through top-down political intervention.

And it's absolutely not valid to presume that making FOSS communities dependent on political subsidies would not have much worse and much longer term consequences than the problems we are trying to mitigate.

In fact, it's entirely possible that the political intermediaries who control the purse-strings would have an even greater capacity to introduce their own backdoors or otherwise compromise security in pursuit of their own ambitions.

What you're proposing here might well represent trying to keep out one set of threat actors by handing the keys to the castle over to another set of threat actors. And this would be on top all of the other problems it would cause: convergence toward homogeneous monocultures, project priorities being distorted by political incentives, vested interests using political influence to suppress competition from FOSS projects, etc.

It's worth pointing out that the swift detection and remediation of the xz backdoor by the community almost immediately after the threat actor pulled the trigger on their two-year long con represents a resounding success of the FOSS "many eyes" model, and it's not clear what politicians throwing money around would add to the equation.


Your argument falls apart when you remember that the U.S. federal, state, and local governments are critically dependent on open source software – not just directly in things like Linux servers or Chrome/Edge/Firefox but also open source components used in appliances or compiled into commercial software. It is quite reasonable to argue that even a narrow approach of improving only components they run would be justifiable on those grounds and it’d be a tiny part of, say, NIST’s budget to fund developers directly or to pay some group like the Linux or Apache foundations to support an open source maintainership team.


Organizations of all types -- government and otherwise -- are dependent on a wide variety of externally-sourced solutions for mission-critical operations. They can and do develop their own processes for testing and vetting potential solutions against their own criteria for performance, reliability, maintainability, and security.

Government orgs can and do contribute the results of the work they do in this regard upstream to FOSS projects. This has never not been the case, and when government-employed developers release the work they do to meet their own security requirements to the broader community, everyone benefits.

But this is drastically different from the scenario that the preceding poster was proposing, in which government officials would assume effective responsibility for the entire project, not just act as participants in the FOSS community.

That proposal would invert the situation, and change it from government devs adhering to the norms and conventions of the community to the community adhering to the rules and priorities defined by the government, which is where the negatives I outlined above would come into play.


I fail to see how you addressed the previous comment here. You ignored 80% of the points made.

Donations is fine, but it needs to be “no strings attached”, otherwise I agree with the GP that the risk of weaponising FOSS may become even greater.

I do agree that the US government is critically dependent on FOSS by now though. But “why throw money at it if it ain’t broken?” is the prevalent mentality, especially when everyone can have their own definition of “broken”…


The first sentence was a category error. It is a national security issue as long as systems which national security depend on are running the code in question. I totally agree on the FOSS side, but consider that the proposal could be as simple as having, say, NIST give the Linux foundation money annually to pay a supporting team of developers rather than the government taking over maintenance on anything.


> It is a national security issue as long as systems which national security depend on are running the code in question.

This is itself a category error. By this standard anything and everything upstream of certain government agencies implicates a "national security", and you might as well say that the supply chain for staplers is a national security issue because government officials staple documents together, or that the manufacture of socks is a national security issue because government agents wear socks.

Naturally, it's up to organizations themselves to make sure that their particular usage of any specific resource meets their exceptional security needs, not to expand the definition of "national security" to encompass the entire upstream supply chain independently of their use cases.

> NIST give the Linux foundation money annually to pay a supporting team of developers rather than the government taking over maintenance on anything.

This in itself would create a nexus of influence that could ultimately function as a vector of social engineering attacks, including from factions within our own government. It's just not cut-and-dried enough to presume that government money is some sort of magic solution and wouldn't itself actually make things worse.


> you might as well say that the supply chain for staplers is a national security issue because government officials staple documents together, or that the manufacture of socks is a national security issue because government agents wear socks.

You’re leaving out the key part: this only works if there’s a way for a flaw in those staples or socks to impact national security functions. Once you correctly make the analogy you can easily see why that’s true for a server operating system but not either of your examples.


What if a threat actor embedded secret listening devices into staplers? What if wool socks specifically designed to maximize ESD discharge were used to disable sensitive equipment?

Organizations that are concerned with outlandish risks like these implement their own policies and procedures to safeguard against them. They might x-ray office supplies before allowing their use in secure facilities; they might maintain a short list of approved fabrics in an ESD-sensitive environment. The point is that organizations that have exceptional security requirements apply their own policies and procedures to mitigate risk, and don't expect parties upstream of them do so for them.


Socks and staplers are not part of the security infrastructure. Computers are, and that means it’s in the interests of everyone to keep them secure.


"Computers" is too broad to be meaningful. Compression tools are no more a part of the security infrastructure per se than socks and staplers are.


It's worth pointing out that we got lucky with a savvy tester stumbling on the backdoor in the dark. It wasn't anyone's job to find this backdoor, and arguably if it had been designed just a little better, no one would have noticed.

I wouldn't have gov't be maintainers of the main repos, but either be assigned to vetting critical repos, or mirroring them and co-maintaining copies endorsed for security (if I agreed gov't should be involved).

The ultimate questions is, are our critical systems (kernel, systemd, core userland) safe *enough* against future similar attacks, that we are okay trusting the global economy to another lucky valgrind test?

You incorrectly implied that a national security risk is "the potential of negative consequences in any sphere of human life". If we didn't get lucky, this could have been an economic and human catastrophe. That's not an inconvenience.


> It's worth pointing out that we got lucky with a savvy tester stumbling on the backdoor in the dark.

I believe in Bayesian probability a lot more than I believe in luck. The fact that a random sysadmin investigating performance issues was able to rapidly unravel this whole thing is at least a minimal indicator that either (a) this particular attack was largely incompetent, and any of the many "savvy testers" in the community would have uncovered it within short order, but more sophisticated attacks might remain undetected, or (b) this was a sophisticated attack, and the community is generally resilient to this form of infiltration.


I think we’re lucky that the compromise was ham fisted enough to be trivially detectable. I’m sure other, better attacks exist, implemented by smarter actors.

Software is a national security threat because IT is increasingly dependent on computing. Plenty of civilian tech is being used in Ukraine to good effect.

Smart stakeholders are going to watch where stuff is used and target the supply chain. It may be Russians, Mideast stakeholders, or domestic extremists.

Software is to 2024 what a truck bomb was in 1994.


I'm not sure why people keep misidentifying the problem as "lack of funding". Lasse Collin was doing fine as a maintainer up until Jia Tan showed up. He was psy-op'd into believing there was a crowd of angry people eagerly awaiting a new release when there wasn't. No real person was unhappy with the way he'd been maintaining xz.


Funding aside, single individuals being responsible for software is not a good thing, see bus factor.


In fact, there was no issue with Lasse Collin maintaining xz as a single individual, and creating a false impression to the contrary was the primary tactic used by the antagonist to gain access to the project.


Fortunately there was Jia Tan to help him. /s


I'm not saying money would absolutely fix the issue, but I could also see it helping. If Collin was approached by a government that said "Hey, the thing you're maintaining is important, if you want, we'll fund 2 additional full-time maintainers that can contribute based on your guidance", maybe Collin would be in a better position to ensure the Jia Tan contributions were genuine and proper.


There's probably a number of things that could improve the situation. Mindlessly throwing money and government at a problem almost never improves things.

Which government bureaucracy decides how much Lasse Collin should be paid? Based on what metrics? This is a giant can of worms.


If I was the maintainer and was approached by government telling me, "hey, here are two folks who're going to be two new full-time maintainers and we're funding them" I certainly would be worried.


Similarly, if the government approached me and said "Here, embed this black-box binary into your build process", I'd be worried too. But luckily, no one suggested this, nor what you wrote about :)


Getting help for mental health issues is a whole lot easier if you have the money.


IIRC the xz package was maintained by an individual in a place with at least some socialized healthcare, but correct me if I'm wrong (I'm not trying to be snarky here, please do).


Perhaps socialized medicine is the solution.


Socialized or politicized?


But if he was paid he might not have given up control


Okay, sure, but rather than trying to solve a problem by throwing money at it (or worse, trying to solve it with government intervention), maybe it's better to think of other mitigations.

For example, maybe developers need to be made aware of potential psyops by attackers (the publicity surrounding this issue probably made some progress on that front).


He never mentioned about any financial problems, it was more about his mental health


And would mental health issues have kept him from working on xz if that had been part of his day job?


I’ve been in similar burnout situations, and the difference between work and side projects did not matter to my mental health. The money was not an issue, it was headspace and fatigue.


Agricultural subsidies are a terrible example because they're so corrupt. The cheese reserve doesn't exist because it's gonna protect Americans from starvation. It exists because the dairy industry massively overproduces. Only a small amount gets converted into cheese. Millions of gallons just get dumped[1]. The dairy industry is entirely unsustainable, especially given that the government keeps the price of milk low because it's considered vital for children's development, dairy lobby propaganda with no basis in fact. And these massive subsidies don't even help small farmers. The only way to keep up with the absurd artificial demand is massive factory farming of genetically engineered super cows bred in a lab to produce as much milk as possible with a life expectancy a third of a normal cow.

[1]: https://www.wsj.com/articles/americas-dairy-farmers-dump-43-...

(I guess this isn't really relevant to the OP, but I recently read a fantastic book about the dairy industry and now I can't shut up).


Why are you advocating for the government to take control of open source projects? Is anyone here naïve enough to believe that, after being persuaded of the national security interests of these projects, they're just going to hand over money to the random people who maintain them to keep doing what they're doing? The U.S. Govt isn't Santa Claus. Look at what they did to the farming industry. Most people used to be farmers and now there aren't many farms at all, since most of it's being done by big companies. Applying that idea to open source means the government would use regulation to prevent community developers from having their software used in production, and all future work on open source code would have to be done by engineers at big tech companies. In many ways that's already the de facto system we have today. So if you get the government involved, it'll just become law, and the lone wolves in open source who big tech doesn't want to hire will be fined, sent to jail, etc. Read "Everything I Want To Do Is Illegal" by Joel Salatin.


It seems very stupid to me.

How long will it be before the government starts pressuring these maintainers to do the things the government wants?


For example, introducing their own backdoors.


The EU realised this like a decade ago, and have had a couple of programs around it that have been small steps in the right direction, but not enough - such as EU funder bug bounty programs, giving grants, mandates that the EU should use specific open source tooling for specific needs (e.g. VLC).


Doesn't .gov do a bit of that already?

Part of the problem is, no one knows who's really working on what. If you asked some of the most knowledgeable people in the GNU/Linux ecosystem who maintained xz before last week, there's a real chance they couldn't have told you, not without some investigating first. And that would only have gotten them a name, not the maintainer's personality, resources situation, etc.

There needs to be a census of sorts over the stuff that goes into the GNU/Linux ecosystem to see who needs what.


It may be paying for minimal (including the SDK), but able to do a good enough job, ultra stable in time reference software/network protocol/file format maintainance.

It excludes nearly all software out there (even open source, and closed source are de-facto excluded), because most "developers" are only a bunch of scammers heavy on planned obsolescence.

That includes software "maintained" by the academic sector, like your have with MIT media labs and the nice "donations" from bill gates (to probably steer it the way he wanted, I would not be surprised this is not alien to c++ in gcc... one of the biggest mistakes in open source software), that revealed in the epstein files.

To say the least, it is far from ez. If you are an _honest_ dev, you know it is extrutiatingly hard to justify a permanent income.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: