Somewhat interesting for those who missed it - another rust/brotli project was a target for afl-rust. IIRC it didn't find any memory safety issues, but it definitely exposed a few panic crashes:
Relevant comment from ende76 on why brotli-rs is relatively slow and what the goals of the project are/were:
"Yes, performance has not been a focus point at all so far.
I've started this project as a a way to get familiar with Rust, and also with the Brotli spec.
For the implementation, I have made it a point to work only from the spec, while avoiding looking at the reference implementation, because another goal was helping to improve the Brotli spec itself.
I do believe that there are a number of places where copies could be avoided, and other places where I used clone() simply because at the time I was unable otherwise to appease the borrow checker.
For a performance-oriented implementation, I would probably take the test cases and experiences so far, and start over with a new implementation."
Have you run AFL against git2-rs? I'd love to see that interface hammered on, to make sure it doesn't expose any unsafe behavior from the underlying library.
I haven't, though I've thought about it. Most of the logic behind git2-rs (as far as I know) is written in C. While it's possible to run afl.rs on a Rust project that uses C code behind the scenes, I haven't ever attempted to get AFL instrumentation working on the underlying C code. I don't think it should be that difficult, I just haven't gotten around to it yet.
EDIT: I forgot to mention: It's possible to run AFL on uninstrumented code, it just won't be that smart about finding new code paths.
> Most of the logic behind git2-rs (as far as I know) is written in C.
True, but the Rust bindings necessarily contain tons of unsafe FFI code, and those bindings enforces many required safety properties. Even without checking the underlying C code, running AFL to check for any unsafe holes in the bindings would help.
That said, yes, for best results you'd want to check the combination of C and Rust to find new paths and full coverage on both.
On a purely compression algorithm front, the recently format-stabilized [1] zstd [2] beats the pants down Brotli, in terms of compression speed, decompression speed, and compression ratio. Zstd isn't some random effort either, but a more flexible (in terms of compression time-for-ratio) effort by the creator of the popular and insanely fast lz4 compressor[3].
Brotli's fastest compression is slightly faster than zstd's. Zstd decompresses faster, but neither is slow. Zstd can use sliding window size longer than 16 MB, in brotli this is limited to 16 MB to have guarantees of the maximum resource use at decoding time. Zstd's longer sliding window helps with the longest files (16 MB+), and often benchmarking is done with 100 MB or even 1 GB files.
Brotli compresses usually more, and quite a lot more on shorter files. Try with cp.html, sparc sum or xargs.1. Brotli compresses these 9-19 % more densely on the following benchmark:
Note, that on this benchmark brotli is always limited to the 4 MB sliding window. Other algorithms are run with wider windows, too. This will make brotli seem worse on large files (4+ MB).
> Brotli's fastest compression is slightly faster than zstd's.
Come on, this is not serious.
Brotli's fastest compression algorithm is still significantly slower than zstd. And more importantly, it compresses _much worse_.
For a 3rd party evaluation, one can try [TurboBench](https://github.com/powturbo/TurboBench) or even [lzbench](https://github.com/inikep/lzbench) which are open-sourced. Squash introduces a wrapper layer with distortions which makes it less reliable, and more complex to use and install, quite a pity given the graphical presentation is very good.
I'm interested in speed, and in this area, all benchmarks point in the same direction : for a given speed budget, Zstandard offers better ratio (and decompresses much faster).
zstd 0.7.1 -22 compresses in 4.01 MB/s to 28363 bytes
Of course it is an unfair example because of the static dictionary that brotli uses, but it is not a pathological example: Thai is not part of the static dictionary. The numbers are on a i7-4790K@4.00 GHz.
Brotli's fastest compression is faster than that of zstd, at least as shown with lzbench and this file. Also brotli wins in compression density. In this file the win is 10.5 % less bytes for brotli -11 than for zstd -22.
> Of course it is an unfair example because of the static dictionary that brotli uses
It is certainly a favorable ground for Brotli.
Brotli claims an advantage in html compression, thanks to its integrated specialized dictionary.
The real pb though is the suggested conclusion that these favorable results are broadly applicable everywhere else.
That's a terrible suggestion.
We need more examples, not just "html files" which happen to be Brotli's best case.
> brotli 0.4.0 -11 compresses to 25413 bytes
> zstd 0.7.1 -22 compresses in 4.01 MB/s to 28363 bytes
Why you don't disclose the compression time of brotli ?
Of course it does matter : everyone understand that an algorithm that spend 10x more cpu has the budget to compress more.
Here, you don't disclose the compression ratio of both algorithm, implying they are equal.
By such standard, LZ4 is probably the best : it's so much faster !
Of course, they do not compress the same...
I was initially thrilled at your detailed answer,
but now, quite frankly, I feel cheated. Grossly so.
This is really disappointing.
I was so much vexed that I decided to run the tests myself.
Downloading and using __the same html file__,
the same lzbench, same library versions, just a different computer and compiler,
here is what it produced :
__Conclusion__ :
brotli -0 is indeed fast, faster than in my previous tests.
It seems to be tuned to reach this objective, but throw away a lot of compression ratio to get there.
Consequently, brotli -0 is not comparable to zstd -1,
it takes brotli -2 to produce an equivalent compressed size .
By that time though, zstd is much, much faster.
Which is exactly the question I'm trying to get answers to :
which algorithm compresses better for a given speed budget ?
That's what matters, at least in our datacenter.
I'm not interested in ultra slow mode, but while at it,
I wanted to complete the picture with the missing compression speed of brotli - 11.
It produced :
zstd -22 : 2.95 MB/s
brotli -11 : 0.53 MB/s
So that's > 5x difference. It surely helps to reach better compression ratios.
I also wanted an answer to "by how much the dictionary helps ?".
Fortunately, TurboBench can help, thanks to a special mode which turns off dictionary compression.
By using it on the very same sample, brotli -11 compressed size increases
from 25413 to 26639 bytes. 5% larger, clearly not negligible.
Still good, but it cuts the advertised size difference in half.
Anyway, clearly I feel disappointed to have to redo the tests myself,
because some inconvenient results were intentionally undisclosed (or not produced).
This really undermines my trust in future publications.
That learned me something : trust only benchmark done by yourself.
And now, I should probably benchmark even more ...
If you read your own results with care you will notice that your results also show that fastest brotli is faster than fastest zstd. Your results also show that brotli compresses more than zstd at higher settings, even if you turn off the static dictionary of brotli.
If you are interested at zstd 0.7.1 -22, you can reach the same compression density with brotli 0.4.0 at quality setting -7 (at least for this file with the static dictionary). Then you are comparing a brotli's compression speed of 57 MB/s to zstd's 4.01 MB/s. Brotli is 14x faster at this compression density.
For decompress-once roundtrip at this density, brotli achieves 53 MB/s, and zstd 4 MB/s. Brotli's roundtrip is 13x faster.
In decompression brotli is 2x slower on Intel, but decompression times at 800+ MB/s are going to be negligible in most use (think < 1 % of cycles in your datacenter), if the data is parsed/processed somehow afterwards.
Brotli's entropy encoding is simpler (no 64-bit operations), and because of this on 32 bit arm the decompression speeds of zstd and brotli are about the same.
I acknowledge that there can be use cases where zstd 0.7.1 can be favorable to brotli 0.4.0 -- particularly those where a 32+ MB file is compressed at once and the 150-500 MB/s compression speed range, but even this simple compression test shows that brotli can compress significantly (5-10+ %) more with the higher quality settings.
> Why you don't disclose the compression time of brotli ?
I left it out because zstd didn't produce comparable output size.
I showed the compression and decompression speed at brotli at quality 7 that already compressed more densely than zstd at maximum setting. Of course it is a somewhat technically flawed comparison, because brotli gets an advantage for this kind of data from its dictionary, but I invite you to test using another set of files. (The three small compression benchmark files discussed earlier in this thread show similar trend.)
The two benchmarks you mention aren't very useful for comparing brotli and zstd, because they deal with large datasets. Per-call overhead matters a lot for small files, and I imagine (particularly brotli) is aimed at small files. Zstd (usefully!) calls out dictionary compression, which hints that small files matter for it too, but I'm not positive there's any specific use case for zstd in mind.
In any case, neither the compression ratios nor the speed of large-file compression necessarily say much about small file performance. There's just much more context to search in a 100MB file than there is in a 10KB file.
Having said that, there's no reason to assume brotli is better for small files; there's just no way to tell given the links you provide.
I'm not affiliated with nor use neither zstd nor brotli.
While I'm around, I've been looking for a good compressor for an embedded scenario, where compression time and memory is near-irrelevant (it's done offline), but streaming decompression time and memory use (both in code and scratch memory) is primordial. By memory use, I'm talking a system with single-digit megabyte RAM, and decompressor scratch memory usage in the dozens of kilobytes.
Right now, we're using LZSS because the decompressor code is tiny (hundreds of bytes), and scratch memory is in the single-digit kilobytes.
Any suggestions? All benchmarks I can find online are focused on desktop usage (where comp/decomp memory use is either not mentioned or in the megabyte range) or web streaming usage (where total comp+decomp time is important. I don't care (within reason) about comp time)
I've had good results with LZJB[1] (designed by Jeff Bonwick for ZFS) in embedded scenarios, perhaps at least worth testing for you?
I even have some public code for it, there's Python module [2] that will adequately address your wish for mediocre offline-friendly performance (heh) and a streaming C decompression library [3].
The runtime memory usage for the decompressor are tiny (which is good, since from my perspective your system sounds huge!) so that should be fine at least.
Very interested in any feedback you might have, of course.
Now I have a huge hankering to whip up a frame-based allocator, as used to be popular in game and graphics programming. For those unfamiliar, the idea is that you segregate allocation of memory objects of different sizes into their own fixed blocks. This provides a means to control memory fragmentation, and avoids hitting the typically slow system allocator with a lot of small allocations/deallocations.
Fancy frame allocators would actually use a fairly standard malloc-style interface, but transparently place allocations of different sizes in their own regions. The allocator would internally use the slow system malloc to create and expand the fixed blocks it uses for its own allocations. During development, the frame allocator would be used to profile and optimize the actual object allocation sizes being made. E.g. sometimes it might make sense to pad some objects to consolidate the number of allocation-size regions. As the project progresses, the initial block allocations for each allocation size would be adjusted so that the system malloc was called very infrequently, usually only at application startup.
It might be possible that jemalloc already does this? Rust uses it by default, and while I don't recall the specifics jemalloc does a bunch of things in userspace wrapped around malloc that make it better than malloc.
Rust makes it easy to swap out the allocator too, so I'd love to see an allocator lib specifically focused on size management :)
Does jemalloc allow you to report telemetry and feed that back into the initial pool sizes? That's what makes the gamedev oriented allocators interesting.
FWIW a "Frame allocator" was always a block of memory where malloc returned pointer into the block and advanced the pointer. At the end of rendering a frame you just reset the pointer to the start of the block. The downside is you need to guarantee that nothing in has a lifetime longer than a frame, something Rust should be good at :).
I think what the GP is referring to is a pooling allocator that pools similar sized allocations.
> Does jemalloc allow you to report telemetry and feed that back into the initial pool sizes? That's what makes the gamedev oriented allocators interesting.
Not AFAIK. There are a lot of optimizations it's quite difficult to do with dynamic pool sizes, which aren't worthwhile in the general case.
I'm having a hard time understanding the motivations for porting to rust...
If they were going to run the whole thing in a SECCOMP container anyway, there is little damage a compromised C library could do.
If reasoning about uninitialized memory would take a review of the entire brotli code base, didn't the rust port require that anyway? (speaking as someone who has done a couple of cross-language rewrites)
> If they were going to run the whole thing in a SECCOMP container anyway, there is little damage a compromised C library could do.
They can't block everything with SECCOMP, the code still has to be useful. It also must still do it's job correctly, which it won't if it's compromised.
> If reasoning about uninitialized memory would take a review of the entire brotli code base, didn't the rust port require that anyway? (speaking as someone who has done a couple of cross-language rewrites)
But even if you go through the whole original codebase (which I assume was in C or C++), and kept it in that same memory unsafe language, you don't have any guarantees that you caught even a subset of possible classes of memory safety bugs.
Preventing compromise and ensuring determinism using checksums and SECCOMP is more reliable and easier to implement than reimplementing in rust an already de-facto secure library used millions of times every day.
I'm sorry but the Rust rationale just isn't there. Porting to Rust is a strictly less efficient way of accomplishing the same exact goal.
Now, writing a new library in rust, that's a different story and actually is reasonable.
People were finding buffer overflows in zlib when it was more than twice as old as Brotli, and they're still finding bugs in libpng. I don't see any reason to think that Brotli is "de-facto secure" or even mature. If history is an indication it takes decades to shake the buffer overflows out of even a very popular C library.
Your argument is certainly reasonable. One case where it might matter is if SECCOMP weren't available on all platforms that needed to decompress data. Also: a vulnerability could decide to only strike after a certain clock date. In that way it could sneak past certain checksum checks and still cause issues later.
On the client side there is already a vast amount of C-code that is interpreting file data, the brotli portion is a minimal part of that security risk. Porting brotli to rust for the purposes of increased security is like optimizing the part of your program that takes the least time. Lots of effort for minimal gain.
> If they were going to run the whole thing in a SECCOMP container anyway, there is little damage a compromised C library could do.
Not really. For instance, a compromised library can still rewrite the file you're storing in DropBox to contain different contents, which could ultimately result in remote code execution when you redownloaded it. Seccomp only makes sure the decompression server is safe from a rogue process.
The file's sha256sum can be verified before the file is sent to any users, so there's no chance of RCE there, even with a hypothetical C brotli-- but a reproducible decompression is key.
Additionally, if you want to do the decompression on client-side, facilities like SECCOMP simply may not be available on that platform. And in that case, having a language like Rust to guard against RCE is an excellent idea. Also it is easiest to maintain the same code running on all platforms rather than C code where SECCOMP is available and Rust code where it is not.
Is the brotli decompression step really the most dangerous vector on the client-side? What about all the non-verified client-side native code that actually interprets file data? From a practical perspective using the C code doesn't deteriorate existing conditions and using the Rust code doesn't improve them.
> If reasoning about uninitialized memory would take a review of the entire brotli code base, didn't the rust port require that anyway? (speaking as someone who has done a couple of cross-language rewrites)
Yes, and now they can rely on the Rust compiler to help keep them safe whenever they make a change. If they had stayed in C, any changes would require more manual reasoning about uninitialized memory.
That's assuming this library will change often which I doubt since it's not upstream. Cost-benefit falls short here.
This "future changes" logic would apply better if this were an upstream library that were expected to change often enough to offset the upfront cost of porting an entire library.
> If they were going to run the whole thing in a SECCOMP container anyway, there is little damage a compromised C library could do.
True, especially since they appear to be using a traditional seccomp sandbox, which means the process really can't do much.
Based on the architecture there is no broker process involved, you simply feed in data via read() and it writes the data out via write(). It has access to two other system calls by virtue of seccomp, so your kernel attack surface is quite small - but it is there.
The "no hostile process could escape the sandbox" is not really true since it assumes a perfect implementation of each system call available.
That said, kernel attack surface is drastically lowered, as are the capabilities of the process. But it can still read and write arbitrary files - unless, of course, they filter on fd, which I suppose they likely are.
As for determinism and robustness, those are still valid. To quote your other comment:
> an already de-facto secure library used millions of times every day.
The fact that it is used often does not mean it is "de-facto" secure. How could it? You can not prove it is secure, and doing so would be a huge pain. Whereas, given safe rust, you have strong guarantees about what can and can't happen in the code.
Binaries in a SECCOMP cannot by any means read/write arbitrary files. They would need access to open(), socket(), etc. there is only read(), write(), exit() and sigreturn(). Only the intentionally inherited file descriptors are accessible.
As for your "kernel attack surface is lowered" argument, it's incredibly silly. If read() and write() ever had a vulnerability we have much bigger problems than binaries in a SECCOMP. It's like saying SHA256 isn't secure because there is technically a risk of collision.
Determinism and robustness aren't valid, they can use checksums to verify data coming out of the SECCOMP container. Same effect, much less effort.
And for your argument against de facto security: security reasoning isn't binary. It's a risk based assessment. Nothing is 100% secure and context matters. Binary thinking and dogma are incredibly dangerous. Spouting "Rust is secure" is dogma, it's not true. Rust is "99%" secure, unknown unknowns will always be out there.
In this case, context is important. My point isn't that c-brotli is as secure as rust-brotli. My point is that using c-brotli doesn't introduce any significant new security risk on the client-side given that there are lots of other c-based libraries already interpreting file data. Given that context, reimplementing brotli in rust for the purposes of increased security only marginally accomplishes its goal. It's like optimizing the part of your program that takes the least time, lots of effort for almost no gain.
On the server side, there's no gain at all to porting c-brotli to rust since you can already use SECCOMP and checksums. It's just wasted effort.
Am I saying there is no valid reason to port things to Rust? No. I'm saying it's invalid to port a library to rust for security reasons if you already have effective means of mitigating binary attacks, I.e. SECCOMP and checksums.
Since most of Brotli's improvements over its ancestor LZ77 are due to its large, hardcoded, text-corpus dictionary [1], most of the algorithm's strengths would be wasted on binary data like images.
Zopfli, from the same people, is a DEFLATE encoder, so it can be used in PNG [2] and this has already been added to some optimizers, e.g. AdvanceCOMP [3]
About 25 % of compression improvements for short files (such as web pages) come from the static dictionary. The rest are format improvements. The relatively small static dictionary does not improve the compression of long files.
Interestingly, there are also 121 different transformations [1] you can apply to each dictionary word, from adding various prefixes and suffixes, trimming letters, and some more complex ones [2].
If the plain-text dictionary linked earlier [3] is accurate, it'd appear that the dictionary contains a lot of redundant forms.
Brotli is not as good as an image compression algorithm because it is strongly tuned to the "typical" workload for Web (have you looked at the preset dictionary? :-), not images.
I would instead suggest FLIF [1], which roughly consists of an image-specific context model, adaptive entropy coding and general interlacing (that makes any sufficiently long prefix of the file a valid approximation to the original image). Still in development, but seems very promising.
https://github.com/frewsxcv/afl.rs (discussion https://news.ycombinator.com/item?id=11936983)
I don't think they mentioned fuzzing their Brotli decompressor in this article, but I hope they try afl-rust.