Writing a JPEG Decoder in Rust – Part 2: Implementation I

pslam · on Aug 19, 2016

If you scan through the source (on github), the most notable thing is the lack of the "unsafe" keyword. I've seen too many people basically transliterate from C to Rust, along with all the unsafe operations. This one is pretty much unoptimized, but still seems to be performant enough, and doesn't do anything "unsafe".

That's not to say you could just throw this into the back-end of a public-facing website. It's still going to panic (abort) if someone unexpected happens, and it's still going to chew indeterminate amounts of CPU, memory or storage unless sandboxed (maybe even just ulimit). And there's still the danger one of the Rust standard library functions has a flaw in it. But this is the kind of starting point you wouldn't get with the plain C equivalent.

technion · on Aug 20, 2016

Given the history of vulnerabilities in tools like ImageMagick, that panic might still be a better position than an exploitable memory bug.

conradev · on Aug 20, 2016

If you run it in a spawned thread, the thread will actually just unwind and disappear if it panics (by default).

But to detect panics, you can use `afl.rs` to fuzz a parser: https://github.com/frewsxcv/afl.rs#user-content-trophy-case

EugeneOZ · on Aug 21, 2016

Given the fact you can avoid panic and exit safely, all panicking is just laziness ;)

jakub_h · on Aug 20, 2016

Isn't it the case that at least some unsafe features in C actually prevent compiler optimizations? Being safe should not necessarily mean being slower. Sometimes restrictions are enablers for better compilers.

0x0 · on Aug 19, 2016

I don't know rust, but what happens if you get a 0xff as the last byte here, wouldn't i+1 go out of array bounds?

   while i < vec.len() {
            encoded_data.push(vec[i]);
            if vec[i] == 0xff && vec[i + 1] == 0x00 {
                // Skip the 0x00 part here.

ekidd · on Aug 19, 2016

IIRC, Rust will panic and halt your program. This is one of a handful of errors that Rust can't catch at compile time, and so it inserts run-time checks.

The syntax "vec[i+1]" means "I expect this index to be in bounds, and if it's not, my program is so hopelessly broken it should be shut down immediately."

If you're really confident in your code and you need to wring out the last bit of performance, you could also use:

    unsafe { vec.get_unchecked(i+1) }

...but in this case, your code would produce undefined behavior. If you want to detect this case and recover from it, you could also write:

    if let Some(value) = vec.get(i+1) {
      // Use value.
    } else {
      // Fail with an error.
    }

...or you could immediately exit the current routine with an error if no value is available:

    let value = try!(vec.get(i+1).ok_or(make_an_error()));

If you're writing a lot of parser code, your best bet might be to define a custom macro named that looks something like "try_get!(vec, i+1)" wrapping the final example above.

steveklabnik · on Aug 19, 2016

  > and you need to wring out the last bit of performance,

... and if you've already demonstrated that LLVM hasn't removed the bounds checks itself already.

grayrest · on Aug 19, 2016

Since Rust people are reading this.

I find myself wanting to walk an iterator pairwise regularly for stuff like this and I'm using:

    for (cur, next) in (& vec).iter().zip(vec.iter().skip(1)) {
        encoded_data.push(cur);
        if cur == 0xff && next == 0x00 {
            // ...
        }
    }

Is there a better way to write this? What I'd really like is the equivalent to Clojure's partition[1]. I know that the stdlib has windows `(partition n (dec n) seq)` and chunks `(partition n n seq)` implemented on slices but I usually want this on iterators for the lookahead-like behavior shown here.

[1] https://clojuredocs.org/clojure.core/partition

steveklabnik · on Aug 19, 2016

I'm not 100% sure, but https://crates.io/crates/itertools might have what you're looking for.

bpbp-mango · on Aug 20, 2016

I think http://bluss.github.io/rust-itertools/doc/itertools/struct.Z... would fit their use case?

dbaupp · on Aug 19, 2016

The `.windows(usize)` function almost does this, but it uses a runtime width and so has to yield slices rather than tuples.

doomrobo · on Aug 19, 2016

If you can afford the allocation at every iteration, the iterslide crate might work for you.

https://github.com/slapresta/rust-iterslide

martinhath · on Aug 19, 2016

Ooops :( One of many minor bugs, I suppose.

bluejekyll · on Aug 19, 2016

this is a really elegant way to take parallel arrays and put them into a table:

  let codes: Vec<HuffmanCode> = data_table.iter()
            .zip(code_lengths.iter())
            .zip(code_table.iter())
            .map(|((&value, &length), &code)| {
                HuffmanCode {
                    length: length,
                    code: code,
                    value: value,
                }
            })
            .collect();

Nice!

lmm · on Aug 19, 2016

It seems like a lot of work? In scala this would just be:

    val codes = (data_table, code_lengths, code_table).zipped map HuffmanCode

mtanski · on Aug 19, 2016

Rust is not Scala

lmm · on Aug 19, 2016

It's not, but they're both ML-family languages. I had hoped that Rust would be able to offer a similar level of elegance to Scala.

pcwalton · on Aug 19, 2016

The biggest difference between Scala and Rust here is ".iter()", which is necessary mainly because you need to specify whether the iterator iterates over references (".iter()") or passes results by value (".into_iter()"). Because Java doesn't have value types and has no concept of move semantics, it has no need for the distinction, while Rust does.

In effect, the "elegance" of Scala here is really the product of its heavy GC dependence. So the comparison isn't as meaningful as it might seem.

eddyb · on Aug 19, 2016

The example can be written like this though:

  let codes: Vec<HuffmanCode> = data_table.iter()
            .zip(&code_lengths)
            .zip(&code_table)
            .map(|((&value, &length), &code)| {
                HuffmanCode {
                    length: length,
                    code: code,
                    value: value,
                }
            })
            .collect();

Veedrac · on Aug 20, 2016

In fact `code_lengths` and `code_table` are never used again so should just be passed by value, which saves four more `&`s.

darksaints · on Aug 19, 2016

Wouldn't the type guarantees provided by rusts type system allow for different implementations based on whether it was called on value types vs reference types? I thought there was an auto specialization feature created a while ago that did something like that.

Tuna-Fish · on Aug 19, 2016

It allows changing implementation depending on what you pass it, but it cannot change how you pass things. The issue is that if you just call zip(code_lengths), then ownership of code_lengths is passed into the function and it ceases to work in it's parent.

foota · on Aug 19, 2016

I think the into iterator version borrows the references (did I say that right?

nercury · on Aug 19, 2016

Looking at the example, I can't find a thing in there that is not required (except the obvious `:Vec<_>` type that is not required). We need `.iter` to tell that iterator is imutable (there is mutable version), we need `.collect` to actually run the iteration. I also don't think objects should have default constructor functions.

It may be possible to implement `.zipped` on tuples though.

epidemian · on Aug 19, 2016

Yep, but it's still an unfortunately big difference in expressiveness between the two languages, where one might think that they would be more or less on par, given the languages' similarities (both are more or less contemporary in their design, statically typed, with similarly powerful type systems, etc).

My question is then: is there something fundamentally preventing Rust from achieving similar levels of expressiveness to this Scala example without incurring in unnecessary runtime overhead? Given that both languages are statically typed, my inclination would be to say no: the information is there, the compiler should be able to figure it out. But, alas, i know very little about these things, hence my question :)

Update: pcwalton gave some nice insight on why Rust needs some of these constructs on an uncle comment.

Veedrac · on Aug 20, 2016

Actually Rust already has izip and IntoIterator that covers the early differences. The type annotation `Vec<_>` is also optional since it can be inferred from the context.

The `map` is uglier in Rust because

* the comparison was against a constructor with positional, rather than named, arguments

* the Scala code didn't need to dereference any arguments, and

* and Scala's `Zipped` is a special type with a special `map` function that takes three arguments, unlike a normal iterator.

The first and last points could be easily copied in Rust: you'd build a constructor for HuffmanCode and augment iterators of tuples with with a starmap method (that can be done in a library). The middle point can be done before the zipping. The result would be

    let codes =
        izip!(data_table.iter().cloned(), code_lengths, code_table)
            .starmap(HuffmanCode::new)
            .collect();

Rust's collect is never implicit, like Scala's CanBuildFrom. This prevents accidental collects, which helps writing fast code, but in principle I don't see why it couldn't be implicit - it would just require the whole standard library to be overhauled.

Veedrac · on Aug 20, 2016

Actually the last two `.iter()` aren't needed at all because of IntoIter. All three can all be removed if you use Itertools' izip!

    let codes: Vec<_> =
        izip!(&data_table, code_lengths, code_table)
            .map(|(&value, length, code)| {
                HuffmanCode {
                    length: length,
                    code: code,
                    value: value,
                }
            })
            .collect();

(If HuffmanCode was a tuple type, this could even be

    #![feature(fn_traits)]

    let codes: Vec<_> =
        izip!(data_table.iter().cloned(), code_lengths, code_table)
            .map(|args| (&HuffmanCode).call(args))
            .collect();

but now I'm just playing around.)

steveklabnik · on Aug 19, 2016

  > (except the obvious `:Vec<_>` type that is not required)

It is required, otherwise, `collect()` doesn't know what type of collection to collect into.

  > It may be possible to implement `.zipped` on tuples though.

This is impossible without varargs, no?

pcwalton · on Aug 19, 2016

> This is impossible without varargs, no?

Nah, you'd just implement it as an impl over and over on tuples of reasonable size (up to 16 or so).

It wouldn't be elegant, but it'd work in practice.

cbreeden · on Aug 20, 2016

Which is what itertools does I believe https://bluss.github.io/rust-itertools/doc/itertools/struct....

steveklabnik · on Aug 19, 2016

Oh right, I see now.

__s · on Aug 20, 2016

This technique is used a lot by specs: https://github.com/slide-rs/specs/blob/master/src/join.rs#L5...

nercury · on Aug 19, 2016

> `collect()` doesn't know what type of collection to collect into.

Yes, unless there is some other mention of Vec, i.e. it is returned from the function.

> This is impossible without varargs, no?

Varargs would definitely help, and would allow Rust to borrow more ideas without workarounds. But one can do it manually for each tuple variant.

nimish · on Aug 19, 2016

Haskell just brute force implements zipWith3, zipWith4 etc

Not elegant, but it works.

tome · on Aug 19, 2016

The un brute force way is to use the Applicative instance of ZipList.

bluejekyll · on Aug 19, 2016

Scala and Rust are not trying to solve problems in the same way though. Rust is a zero-overhead language. Scala has never tried to make that claim.

gmfawcett · on Aug 19, 2016

It's a bit of a stretch to call Scala and Rust "ML-family languages".

cfallin · on Aug 19, 2016

Not at all -- there's a lot of commonality in the language designs.

Core language features: both have algebraic datatypes like ML ("case classes" in Scala, enums in Rust), and a 'match' statement that makes working with them easy. Both have pervasive destructuring that works with match arms, 'let' bindings, etc.

Data structures and mutation: both encourage a pragmatic version of immutability -- use values and provide functional interfaces primarily, but support mutation where needed. In Rust there are 'Cell' and 'RefCell' types, just like ML's cell-based interior mutability.

Library idioms: all three have the standard set of functional-style higher-order transformation functions ('map', 'filter', etc. -- Rust in particular has a very rich Iterator trait API), and it's idiomatic to use these.

FWIW, Rust's first/bootstrap compiler was written in OCaml and there's a lot of obvious influence.

gmfawcett · on Aug 19, 2016

I'm aware that Rust's bootstrap compiler was written in Ocaml. And yes, there are lots of ML inspirations in the language -- but not enough to make an ML language. Where are the module functors, for example? Many would consider that an essential element of an ML language, and it is a stretch to call a language an ML family member without them.

I don't want this to devolve into a No True Scotsman debate, I just feel that the parent post -- which decried a lack of similarity between Rust and Scala because they were both in the ML family -- was founded on a weak premise.

darksaints · on Aug 19, 2016

A module functor is nothing more than a higher order module...one that takes as a parameter some data structure that conforms to an interface (signature). Rust may not have constructs called module functors, but it definitely has constructs that are capably equivalent, as well as some that are even more powerful (a claim that applies to Scala as well)

For me, type inference, strong typing, and ADTs with pattern matching puts a language into the ML family quite easily.

gmfawcett · on Aug 19, 2016

All right, I concede.:) Thanks for sharing your views.

bjz_ · on Aug 20, 2016

Module functors were an SML feature - not really in the original 'ML' language. I would put Haskell in the ML family, and it doesn't have module functors. Rust has type inference, and a core type syntax and semantics that is heavily inspired by it.

Tyr42 · on Aug 20, 2016

I mean, Haskell doesn't have module functors, and I'd certainly claim Haskell is in the ML family. If Haskell isn't, what would be?

premium-concern · on Aug 19, 2016

I think ML-family is fine. Scala is much closer to ML than what people sort into the Algol-family or C-family groups.

masklinn · on Aug 19, 2016

Yeah though there's something a bit sad to the lack of variable-arity zip/map as you'd find in dynamically typed languages e.g. in Clojure:

    (mapv HuffmanCode. code_lengths code_table data_table)

I don't know if even dependent types would allow for that, given the (variable number of) arguments are all different types.

lmm · on Aug 19, 2016

It's straightforward with dependent types, unless the language treats multiple arguments (or functions of multiple arguments) specially. In my Scala example in the sibling thread the ().zipped is sort-of-language-level, but one could easily implement the same thing using a HList instead of a tuple; turning the function into a function that accepts a HList requires language-level support or a (standardized) macro but it wouldn't in a Haskell-like language where a function of multiple arguments is curried by default.

Artlav · on Aug 19, 2016

I feel like i'm missing something.

In the "Why" section he claims that Rust is a performant language with a low-level feel.

In this part, he claims that decoding a tiny JPG image took 2 seconds with his code.

How is that "performant" by any definition?

balducien · on Aug 19, 2016

Somebody on reddit found the bottleneck: https://www.reddit.com/r/rust/comments/4yinbt/writing_a_jpeg...

After fixing it, it apparently runs in 100 ms.

martinhath · on Aug 19, 2016

It's worth mentioning, those numbers are for my test program which only read a single file, using my and his implementations. In addition, the file read was 20MB. lena.jpeg is 90K.

dbaupp · on Aug 19, 2016

A language allowing high performance code doesn't mean it automatically guarantees high performance code. The latter is essentially impossible.

One common pitfall is not turning the optimiser on (with --release for cargo) i.e. benchmarking a debug build rather than a release one, and from a quick glance this code looks like it may be doing a system call to read each and every byte of the file due to the use of .bytes() on an unbuffered file, whereas it would be better to use, say, read_to_end to read many bytes of the file at once.

martinhath · on Aug 19, 2016

Author here.

I am having some trouble reproducing the 2 seconds number, but with optimizations I'm getting around 1.5 seconds (without, I'm up to 5!), and as balducien mentions, /u/DroidLogician pointed out a _huge_ performance flaw in the file reading code.

Writing bad code is easy in any language (as I just showed :) - writing _super_ performant code is only possible in some.

EDIT: I made a pretty bad mistake while writing this comment, claiming that fixing the file reading code improved the speed of the decoder _a lot_, which was not true at all. In fact, the difference is barely (if at all) noticeable when decoding lena.jpeg. However, when reading a larger file, the improvement is definitely there. So sorry! Replying to HN comments while drinking and watching a movie turns out not to be a great idea after all :)

Artlav · on Aug 19, 2016

Ah, ok. From the Reddit thread it sounds like you were using a 20MB file for the 2s result?

From the article it sounds like the 2s result was for the 90Kb file, which would have been rather odd.

pcwalton · on Aug 19, 2016

Many language implementations generate code with good performance. That does not mean that all programs written in those languages will exhibit good performance.

evmar · on Aug 19, 2016

The code reads the file into a big buffer and then does array operations to index into it, which seems like the opposite of what you want for a good decoder. Using iterators etc. will let you read the file incrementally and will avoid array bounds checking.

ape4 · on Aug 19, 2016

Make it work, then optimize.

daconvergence · on Aug 19, 2016

Rust code looks very similar to ES6 or Typescript. The great CS language convergence has begun!

bjz_ · on Aug 20, 2016

There's a huge difference between concrete syntax and static/dynamic semantics. I wish more developers would stop judging languages by their covers, but perhaps that is too much to ask...

tdb7893 · on Aug 19, 2016

As far as I know languages have broken down into families that look pretty similar to each other for a long time.

lmm · on Aug 19, 2016

I think there is some convergence. E.g. there seems to be broad consensus that at least some level of standardisation of type information is valuable (see Python adding/improving type annotation syntax), and at the same time that having to explicitly specify the type of absolutely everything probably isn't worth it.

quotemstr · on Aug 19, 2016

Yes: now we all forget how to correctly manage memory. Rust: where resources are unlimited and aborting when you run out of them is okay

pcwalton · on Aug 19, 2016

"Rust: where resources are unlimited and aborting when you run out of them is okay"

I'd change "Rust" to "safe programming" here, but OK.

There have been lots of fun security vulnerabilities coming from programs trying to handle OOM gracefully. e.g. SpiderMonkey: https://bugzilla.mozilla.org/show_bug.cgi?id=982957, https://bugzilla.mozilla.org/show_bug.cgi?id=730415

Manishearth · on Aug 19, 2016

Rust has provided ways to not abort on oom for a while now.

wahern · on Aug 19, 2016

If by "a while" you mean several months. But, yes, the recent addition of catch_unwind answered one of my biggest gripes with Rust.

  https://doc.rust-lang.org/std/panic/fn.catch_unwind.html

Does anybody know if the standard library allocator aborts the process or throws a catchable exception? When they added catch_unwind they officially split panics into two types: abort panics and unwind panics. Code can choose one or the other when panic'ing. Also, programs can be built in such a way that all panics become aborts. That kind of sucks but it's probably one of the compromises needed to assuage those Rust developers who don't believe it's practical to handle OOM conditions.

steveklabnik · on Aug 20, 2016

To be clear, _every_ panic has to have the same implementation; you can't choose unwind vs abort for some panics vs others.

The default allocator crate always does an abort, both before and after this change.

wahern · on Aug 20, 2016

When building an app, does changing the default allocator effect all imported libraries? And is that allocator used for boxing, when boxing requires dynamic allocation?

steveklabnik · on Aug 20, 2016

Currently, "the allocator" is global in Rust. That is, the standard library uses liballoc to do all of its heap allocation, and it's liballoc that aborts on OOM.

You can of course write code that does anything, including use some other mechanism than liballoc.

There's motion on several fronts here:

The first, and currently unstable, is swapping out the implementation of liballoc for a different one. (we ship jemalloc and a pass-through to the system allocator with Rust)

The second, and currently in the "working on an RFC" phase, is per-object allocators.

Both of these are unstable because we're not 100% sure of the interface we want to stabilize yet, it's still a work in progress.

  > when boxing requires dynamic allocation?

Boxing always implies dynamic allocation.

vertex-four · on Aug 19, 2016

The vast majority of programs that you run on a day to day basis will abort when they run out of memory. On a default Linux system, there's not even any way to prevent that - you need to faff with the config to make it actually return errors when allocating memory.

wahern · on Aug 19, 2016

That's all irrelevant for a language which is supposed to be a systems programming language, however that term is defined.

vertex-four · on Aug 20, 2016

I don't think it is - less, bash, gcc, systemd, most interpreters, and Firefox will all pretty much just crash if malloc() fails - they're varying levels of things you'd want a "systems language" for. Unless you're trying to define systems language to mean one used for kernel or bare-metal embedded development (both cases where even in C, you skip the standard library and write your own memory allocation functions etc), there's really very few cases where you (a) are allowed to allocate memory and (b) need to fail gracefully somehow if you can't.

wahern · on Aug 20, 2016

I'm not saying that aborting on OOM never makes sense. I'm saying there are many situations where you shouldn't abort on OOM, and those situations overlap considerably with the situations in which a systems language is ideal[1]. So a systems language that doesn't allow for robust and fine-grained OOM handing isn't much of a systems language.

Example: Basically any highly concurrent network daemon that multiplexes many clients on the same thread. In that case, you want much more control over where to put your recovery point. Even if the process doesn't abort, if the recovery point is beyond the scope of the kernel thread (i.e. a controller thread, which was the recommended solution before catch_unwind), that can be really inconvenient, and also requires a lot of unnecessary passing of mutable state between threads, which is usually something you try to avoid.

[1] Lua has robust support for OOM recovery, which is noteworthy because there can be situations where you both want to handle OOM but where a scripting language is more preferable. Example: An image manipulation program with scriptable filters, where you don't want an operation that can't complete to take down your process or thread.

vertex-four · on Aug 20, 2016

On the other hand... who actually disables over-committing on their Linux servers or the machines they run image editors on so that you could actually handle out-of-memory errors? I've never come across anyone who's changed the setting unless told to by a specific piece of software (almost always database software).

wahern · on Aug 20, 2016

I do. Not only do I disable overcommit, I disable swap.

Most software is riddled with buffer overflows and other exploits, and yet it's rare that you come across an intruder while he's installing his rootkit. That doesn't mean it's not happening, just that people are ignorant about it; and that things can appear normal even with rootkits installed.

Like buffer bloat, people can be experiencing a problem without even realizing it's a problem. When software crashes under load they just think that it's _normal_ to crash under load.

Or when it crawls to a snails pace under load because it's swapping like mad, they think that's normal, even though QoS would have been much better if the software failed the requests it couldn't serve rather than slowing everybody down until they _all_ timeout, sometimes even preventing administrators from diagnosing and fixing the problem.

OOM provides back pressure. Back pressure is much more reliable and responsive than, e.g., relying on magic constants for what kind of load you _think_ can be handled.

nnethercote · on Aug 20, 2016

> Firefox will all pretty much just crash if malloc() fails

It's not nearly as simple as that. A general approach we try to use is the following.

- Small allocations are infallible (i.e. abort on failure) because if a small allocation fails you're probably in a bad situation w.r.t. memory.

- Large allocations, especially those controlled by web content, are fallible.

The distinction between small and large isn't clear cut, but anything over 1 MiB or so might be considered large. You certainly don't want to abort if a 100 MiB allocation fails, and allocations of that size aren't unusual in a web browser.

wahern · on Aug 20, 2016

That approach may make sense on the desktop, but it doesn't make sense for network servers. On a network server allocations tend to be small. Your 10,001st request might fail not because of a huge allocation but because it just pushed you over the line.

Aborting all 10,000 requests provides really poor QoS. You can get away with that if you're Google, because 10,000 requests is still miniscule relative to your total load across "the cloud". For almost everybody else it will hit hard, and among other things makes you more susceptible to DoS.

pcwalton · on Aug 20, 2016

So, to be clear, you're implying that you shouldn't write network services in Python, Ruby, PHP, Go, Java in practice [1], Perl, C++ without exceptions, JavaScript, etc. etc.?

[1]: http://stackoverflow.com/questions/1692230/is-it-possible-to...

quotemstr · on Aug 20, 2016

Malloc can always fail, even when you have overcommit enabled. You can always run out of address space. It's pretty common on mobile, which is still predominantly 32-bit.

juliangoldsmith · on Aug 19, 2016

Original thread: https://news.ycombinator.com/item?id=12319557

martinhath · on Aug 19, 2016

Oh well - not getting karma on HN isn't the worst thing that could've happend :)

namelezz · on Aug 19, 2016

[0] an old decoding speeds survey of existing JPEG decoders(2010).

[0] - http://www.briancbecker.com/blog/2010/analysis-of-jpeg-decod...