I took a look at the diff linked in the article with code that "we are all running". The top of the diff certainly looks interesting. They remove the bounds check in dict_put() and add a safe version dict_put_safe().
This kind of change is difficult to make without mistakes because it silently changes the assumptions made when code calling dict_put() was originally written. ALL call sites would need to be audited to ensure they are not overflowing the dictionary size.
if you can get one call of dict_put somewhere to pass the limit, all later calls of dict_put_safe will happily overwrite memory and not actually be safe.
Agree, nice catch. Also, there are many other opportunities in this patch to hide memory safety bugs.
This is the kind of optimization I might have done in C 10 years ago. But coming back from Rust, I wouldn't consider it any more. Rust, despite its focus on performance, will simply not allow it (without major acrobatics). And you can usually find a way to make the compiler optimize it out of the critical path.
I think at this point is clear that everybody has to assume that XZ is completely rotten and can no longer be trusted. Is it XZ easy to replace with some other compression tool? Or has it been so widely adopted that is going to take huge effort moving out of it?
There is no reason to assume that. Even if you assume every commit since Jia became a maintainer is malicious, the version from 3 years ago is perfectly fine.
Zstd has a number of benefits over Xz that may warrant its use as a replacement of the latter, and this will likely be a motivating factor to do so. But calling it entirely rotten is going way too far IMO
There is an interesting argument to be made that pre-JT xz code is probably pretty secure due to the fact that the threat actors would have already audited the code for existing exploits prior to exerting effort to subvert it.
I always use "zstd --long=31 -T0 -19" to compress disk images, since that is a usecase where it generally offers vastly superior compression to xz, deduplicating across bigger distances.
XZ offers slightly better compression on average, but decompression is far slower than Zstd.
IIRC memory consumption is generally worse for Zstd at comparable levels of compression. Which, these days, is generally fine, but my point is you can't thoughtlessly substitute the two.
This kind of change is difficult to make without mistakes because it silently changes the assumptions made when code calling dict_put() was originally written. ALL call sites would need to be audited to ensure they are not overflowing the dictionary size.
The diff I am referring to is here:
https://git.tukaani.org/?p=xz.git;a=commitdiff;h=de5c5e41764...