Hacker News new | past | comments | ask | show | jobs | submit login

Does anyone understand (or have any theories on) how this actually works? I don't understand how it's possible. Surely they didn't write a Linux version of Rosetta, it must be talking to the host OS somehow—but how? Where is the boundary?



> Surely they didn't write a Linux version of Rosetta

They did just that. The folder share is just for licensing as far as I can see.


I wonder how they handled TSO mode. Do they enable it for the whole VM? Otherwise I cannot see how it would safely work given they could be context-switched anytime by the guest kernel.


According to Twitter, as soon as you attach the Rosetta volume it switches TSO to on


Either that or not relying on HW TSO, didn't evaluate yet which path they took between those two options.


Apple picked always-on TSO.


Huh? Are you saying it can't toggle?

Are you saying that it's always on for VMs?


Always on for VMs with the emulator shared filesystem attached.

That's... a weird way for coding a feature flag, but I guess it "just works".


That's super neat! I wonder if the "licensing trick" could be patched out some day, for use in e.g. Asahi Linux.



It seems pretty clear from TFA - there’s a directory share with the host, and given that Rosetta isn’t an emulator, but rather a translation layer, they don’t need a Linux version: x86 instructions go in, arm64 come out.


But surely that would be too slow? Although Rosetta is great at caching instructions ahead of time, it does need to emulate a lot of code (ie, anything generated at runtime).


It’s doing an AOT translation of x86-64 opcodes to ARM64 equivalents. There isn’t really any back and forth, it just digests the binary all at once.

This would still be pretty slow (see: Microsoft’s version of this under Windows ARM) due to the need to issue a ton of memory fence instructions to make ARM’s looser memory model behave like Intel’s, except that Apple baked the ability to switch the CPU into an Intel-like memory model directly into the silicon.

So in practice it is shockingly fast.


> It’s doing an AOT translation of x86-64 opcodes to ARM64 equivalents. There isn’t really any back and forth, it just digests the binary all at once.

No it's not. Apple is not immune from fundamental computer science principles whatever their marketing team says, and even the original keynote acknowledged that Rosetta 2 emulates some instructions at runtime.

Imagine you're running Python under Rosetta. The original Python interpeter takes Python code, translates it into x86 assembly, and runs that x86 assembly. Those x86 instructions did not exist prior to execution! Even if Rosetta could translate the entire interpreter into ARM code, the interpreter would still be producing x86 assembly.

Other types of programs produce code at runtime as well. Rosetta 2 is able to cache a very impressive amount of instructions ahead of time, but it's still doing emulation.


Yes, it includes a runtime JIT component for apps that happen to dynamically generate x86-64, but in all other cases the binary is AOT translated. It does this by inserting (during the AOT translation) function calls to a linked in, in-process translation function whenever it sees mmap’d or malloc’d regions being marked for execute and then jumped to - this data dependency on the jump instructions can be entirely determined from a static analysis of the executable, no violation of fundamental computing science principles required.

So yeah, no real back and forth to the host platform.


AOT (ahead of time) really does work on whole binaries. That is also why the first launch of an Intel app seems longer sometimes. Intel binary in, ARM binary out.

You're still right that that's not sufficient, for example anything that generates Intel code will definitely need JIT (just in time) translation. But presumably a lot of code will still hit the happy AOP path.

That being said, a JIT does not have to be super slow. The early vwmware products, back before there was virtualization support in Intel products, actually had to do some translation as well: https://www.vmware.com/pdf/asplos235_adams.pdf


> That is also why the first launch of an Intel app seems longer sometimes. Intel binary in, ARM binary out.

I mean, we can call it an ARM binary or we can call it an instruction cache. I generally prefer the latter term, because what Rosetta produces are not standalone executables, they're incomplete. I don't know how often the happy path is used, but Rosetta can always be observed doing work at runtime.

JITs are great and Rosetta 2 is incredible! I just can't imagine it working over any sort of shared filesystem, that would add an incredible amount of latency.


Note that the main Python implementation CPython actually does no translation. It's an interpreter with no JIT.


An interpreter is still producing x86 instructions at some point, right? Or else what does the CPU execute? Am I totally misunderstanding how interpreters work?


> An interpreter is still producing x86 instructions at some point, right?

Not dynamically. They just call predefined C (or whatever the interpreter was written in) functions based on some internal mechanism.

> Or else what does the CPU execute?

Usually either the interpreter is just walking the AST and calling C functions based on the parse tree’s node type (this is very slow), or it will convert the AST into an opcode stream (not x86-64 opcodes, just internal names for integers, like OP_ADD = 0, OP_SUB = 1, etc) when parsing the file, and then the interpreter’s “core” will look something like a gigantic switch state statement with case OP_ADD: add(lhs, rhs) type cases. “add” in this case being a C function that implements the add semantics in this language. (The latter approach, where the input file is converted to some intermediate form for more efficient execution after the parse tree is derived, is more properly termed a virtual machine and “interpreter” generally only refers to the AST approach. People tend to use “interpreter” pretty broadly in informal conversations, but Python is strictly speaking a VM, not an interpreter)

In either case, the only thing emitting x86-64 is the compiler that built the interpreter’s binary.

> Am I totally misunderstanding how interpreters work?

You’re confusing them with JITs.

If every interpreter had to roll their own dynamic binary generation, they’d be a hell of a lot less portable (like JITs).


Have you tried Rosetta? It can be pretty impressive.


Rosetta is very impressive! I just don't see how they could maintain that by passing instructions back and fourth over a shared drive, that would be a ridiculous amount of latency!


The shared drive is just a licensing trick. They do an ioctl over /proc/self/exe as the licensing mechanism. (and that's routed over with virtio-fs to the host)


Didn’t Apple also implement togglable memory models for m1 (and I’m assuming m2)?


They are exporting some sort of Linux ARM binary under a virtual filesystem mount point that handles execution of x64 images.

Probably, that binary is passing the instructions into native macOS Rosetta code for translation, but its also possible that the entire Rosetta code was ported to Linux.


If I’m not mistaken this is also available on WSL. I was surprised, while in WSL, to be able to run windows binaries.


Sort of, yes-- there is `binfmt_misc` handling of PE excutables and virtual filesystems (akin to VirtIOFS) involved, but no binary/architecture translation like Rosetta.


Ahhh ok, thanks for the clarification instead of down voting like others.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: