Why does LLVM generate "ud2" instructions? WTF?

pascal_cuoq · on May 12, 2011

Because it can. This is undefined behavior we are talking about.

KonradKlause · on May 12, 2011

Sorry, I don't get it. The Intel manual says on UD2:

"Raises an invalid opcode exception in all operating modes."

What is here undefined? LLVM must not generate such instructions except it it really wants such a exception. (Like Linux's panic() does on x86)

schrototo · on May 12, 2011

The original piece of C code has undefined behaviour, meaning LLVM can generate anything it wants. It happens to generate ud2 instructions (because it's better to crash hard and fast) but it could just as well print "puppies puppies puppies" a million times.

repsilat · on May 13, 2011

> it's better to crash hard and fast

I'm not so sure about that. If you need to squeeze out a little more performance, code that is "technically undefined" can be more portable than dropping to ASM.

I think LLVM should emit a warning on code with undefined semantics and generate DWIM instructions instead of UD2s.

ToastOpt · on May 13, 2011

It really depends on your motives. If you will be needing to port to new platforms in the future, it's better to have a hard-and-fast crash now, so you can learn and avoid the undefined behavior now, rather than face a large bug backlog years down the line.

But you're correct that sometimes it can be expedient to exploit such technically undefined behavior. (I've committed this sin myself, most commonly in serializers/deserializers)

KonradKlause · on May 12, 2011

Thanks.

simias · on May 12, 2011

To make him or her aware of the issue? I'd rather have my program crash at a well identifiable point in the execution flow than start acting rogue for no obvious reason (in both debug and production environments).