Hacker News new | past | comments | ask | show | jobs | submit login
A Complete Formal Semantics of x86-64 User-Level Instruction Set Architecture (illinois.edu)
119 points by matt_d on April 23, 2019 | hide | past | favorite | 27 comments



> Like closely related previous work [33, 37], we omit the relaxed memory model of x86-64 and thus the concurrency- related operations.

Bummer


For the memory model you can take a look at some of the best research in the subfield:

https://www.cl.cam.ac.uk/~pes20/weakmemory/index3.html


Does anybody know what .k files are? I see them in the repo but haven't seen them before.

And for reference (from the page), their repo is here... but their file names contain colons, and hence can't be checked out on Windows: https://github.com/sdasgup3/binary-decompilation


They're specifications in the K Framework, developed by the same group of folks: http://www.kframework.org/index.php/Main_Page


Thank you!


I wonder if these formalism can be machine-translated into a form that Valgrind can use. IIRC, Valgrind is currently limited to pretty old x86 ISAs.


Valgrind does add support for newer instruction set extensions (for instance the latest release two weeks ago added the RDRAND and F16C instruction set extensions). It would probably be less work to just directly add Valgrind support for anything that's still missing, rather than try to implement a converter from this formalism. That might make sense if this was an official Intel thing and you could reasonably expect updates for future instruction set extensions (so effort now could be amortized in future), but as a third party academic thing that seems less likely to happen. You'd also need to clarify the licensing -- I couldn't see any license notice in a quick pass over the github repo. Last but not least, Valgrind cares about performance, and an automatic translation of a formal spec is going to be a lot slower than a hand-written implementation, especially for SIMD instructions (where V tries to use SIMD insns in its output generated code for SIMD input code, AIUI).


It would be interesting to compare this to "A Formal Description of System/360" by Falkoff and Iverson[1], using APL as the notation language, from 1964.

Sadly it is pay-walled by the IEEE.

https://ieeexplore.ieee.org/document/5388512



Oh! That's the IBM Journal article! Hah, suck on that, IEEE.

Also, note, the formal description is claimed to be executable code (by an APL interpreter).


It’s weird that the user manual isn’t this to begin with.


Not really.

The 3439 page Intel® 64 and IA-32 Architectures Software Developer’s Manual is a specification. It just isn't a formal specification suitable for use in proving properties of programs. ARM on the other hand, released a machine readable architecture specification for ARMv8 but that architecture isn't as old or as large as x86_64. Nor is Intel/AMD in the business of selling architecture licenses where having a machine readable architecture specification would be useful for verifying implementations and proving properties. Indeed they're kind of in the business of preventing that from happening.

https://alastairreid.github.io/ARM-v8a-xml-release/


Formal specifications aren't only useful for proving properties of programs. They can also be useful for providing definitive answers to questions about example program behavior which is useful for regular programmers.

For example the Herd tool kit can take small concurrent programs with some constraints and a memory model and then answer whether any executions are possible.

http://diy.inria.fr/doc/herd.html

Alternately there is the online tool for the same purpose for ARMv8:

https://www.cl.cam.ac.uk/~sf502/regressions/rmem/


I was going to ask if you could then fuzz a physical CPU to find CPU bugs, but sure enough, the paper authors have already been doing that!


There was a really good Blackhat talk a year or two ago in a sort of similar vein: they developed an approach to iterate and find undocumented opcodes.

https://youtu.be/KrksBdWcZgQ

The software they developed is called Sandsifter.


Oh yeah. That's Christopher Domas, the same guy that built a functioning compiler that only emits MOV instructions [0] [1].

His other talks are quite interesting, as well!

[0]:https://github.com/xoreaxeaxeax/movfuscator [1]:https://www.youtube.com/watch?v=2VF_wPkiBJY


Also, if the specification is too formal, no one besides academics would understand it and thus it would pretty useless as practical documentation.


Which is why there is a 7000+ page ARM ARM

https://static.docs.arm.com/ddi0487/da/DDI0487D_a_armv8_arm....

which is mostly written in plain English for the common programmer, but frequently suffers due to lack of specificity or general clarity.


Right.

Also, sometimes you want to leave things undefined. This killed the attempt to create a formal specification for ECMAScript 4 back in the day. For example, it was hard to formally specify things like "Array.sort calls the sorting order callback in such a way as to make the array sorted" without mandating a particular sorting algorithm.


Maybe formal specification could have been done for the language but not for the standard library.

Note that formal specification of ECMAScript 5 does exist (JSCert), although it is not de jure and it does not handle the standard library.


Can this be compiled into an emulator?


The semantics is executable but for an emulator you probably want something more efficient. I don't know whether K semantics can be "compiled", so to speak. It probably can with enough effort?


They can; Grigore was my advisor in college and I’ve seen him demo exactly that.


Have a look at the Sail ISA specification language: https://news.ycombinator.com/item?id=19735677


That's some work! How can it get used? Verifying compilers? OSs?


As the title says this is user-level, so can't be used to verify OS.


Almost all of the code in an os are the same instructions we use everywhere else. That can be verified to function as intended.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: