Bitcode Demystified

HHad3 · on July 2, 2018

When the author pointed out that bitcode may impact security, I expected to read about how compiler optimizations that happen during compilation of LLVM IR to machine language may introduce security issues.

However, the article only mentioned that decompilation is easier with LLVM IR, because it is a more high-level language. It certainly is a valid point, but addresses the topic of binary obfuscation instead for algorithmic security.

I'm thus wondering if anyone can shed some light on the real security aspects. For example, let's say that I compile C that is supposed to run in constant time to LLVM IR and submit it to Apple. Does Apple guarantee that their blackbox optimizations do not introduce branches or other factors that may result in variable timing into a constant time algorithm? Can I do anything to ensure that my code will always run in constant time despite unknown optimizations being applied to it in the future?

esrauch · on July 2, 2018

It seems trivially obvious to me that there is no way to guarantee and code you write won't be transformed into any runtime under arbitrary "optimization" (transformation).

As in, you can have just a single numberic constant in your code and Apple is permitted to insert a for loop up to that constant.

bla2 · on July 2, 2018

I only skimmed, but I found several mistakes. With bitcode, you still get one bitcode slice per arch since LLVM bitcode isn't platform-independent. Pointers are for example sized differently, which causes the same C file to compile to different bitcode; preprocessor macros are different, which also causes the same C file to compile to different bitcode; and so on. App thinning can be implemented without bitcode, so that whole "What problems Apple’s Bitcode aims to solve?" section is bogus.

0x0 · on July 2, 2018

Agreed, my impression is that the main benefit of bitcode is that it allows Apple to rebuild apps for the purposes of fixing compiler backend bugs and perhaps introducing improved micro-optimizations for newer CPUs generations within the same architecture, but it certainly does not work as a "java bytecode" layer - it definitively wouldn't enable Apple to change to a completely different CPU architecture in newer iPhones or Macs. The LLVM IR and the ObjectiveC/Swift/libc ABI is too architecture-specific to work as a portability layer.

glhaynes · on July 2, 2018

my impression is that the main benefit of bitcode is that it allows Apple to rebuild apps for the purposes of fixing compiler backend bugs and perhaps introducing improved micro-optimizations for newer CPUs generations within the same architecture

This is my impression, too, but it's always seemed strange to me - are the rare fix for a compiler bug and a few micro-optimizations worth all the effort?

0x0 · on July 2, 2018

Maybe they expected more compiler bugs than usual when bringing out Swift?

bla2 · on July 2, 2018

The "What problems do Bitcode introduce?" section is a bit silly too: Production binaries you ship don't contain bitcode. For store apps, the app store will strip it, for non-store apps you strip it yourself. It's like shipping your app with debug info, that also makes reversing easier.

Finally, the bitcode_retriever code is fine, but you could get the same functionality with 3 lines of shell (`otool -l mybinary | grep -A 4 __bitcode` to get the part of the file containing bitcode and then use `dd` to extract that).

JdeBP · on July 2, 2018

Another problem with the article is the description of how compilers work. In my experience over the years, the compilers that use assembly language as an intermediate form, and that require an assembler, are in the minority; and the majority of compilers that I have encountered go straight from (lowered) IR to machine code generation.

As bla2 points out, it seems that the author has got entirely the wrong end of the stick. App thinning isn't related to bitcode. It's about delivering a program image that only contains the program for the instruction architecture of the downloading client. LLVM bitcode use is, rather, about holding off generating that final machine image until the point of download, rather than when the developer compiled the program, theoretically allowing improvements in code generation and IR lowering to be taken advantage of after a developer provides a compiled application to Apple.

A better treatment of the subject appears to be https://imore.com/app-thinning-ios-9-explained .

someguydave · on July 2, 2018

It sounds to me that Apple is preparing for the day when they ship a non-x86 mac but don't want to slow everything down with emulation.

jcranmer · on July 2, 2018

Bitcode is still architecture-specific: you generally can't take LLVM IR produced for x86-64 and expect it to run on ARM without issues.

someguydave · on July 2, 2018

Understood, but I suppose what I meant was that you could write an emulator that works at a higher level than machine-code which could interpret IR on a foreign arch.

lilyball · on July 2, 2018

This article is presumably from 2015 as it says Apple introduced Bitcode “a few months ago”.