Say hello to x64 Assembly, part 1

robert_tweed · on Aug 30, 2014

Does anyone know what's a good macro-assembler these days? The article recommends NASM, but I don't know if it's comparable to MASM for larger, more complex projects.

I rather fancy getting back into writing asm, just to sharpen that skill. I haven't really written any since MASM 6.x on DOS, 20-ish years ago. I actually found it quite enjoyable and it's surprising how complex an application you can write from scratch in assembly without it becoming unmanageable, so long as you get into the right mindset and make effective use of macros.

Of course, any significant piece of assembly code is likely to contain considerably more bugs than just about anything else of the same complexity. You'll also experience a lot more segfaults during development than perhaps most are comfortable with, but there's something rewarding about controlling precisely what the machine is doing at that level. This is especially true if you manage to find a novel solution that just wouldn't exist when the hardware capabilities are abstracted away by a high level language.

In the same way that everyone should learn a Lisp to think in terms of ASTs and code-as-data, everyone should write at least one whole application in assembly just to appreciate how the hardware really works. Also to see how often there are many ways to solve the same problem (especially with an x86 instruction set), sometimes with wildly different performance characteristics.

userbinator · on Aug 31, 2014

A possible interesting side-effect of really learning Asm is that you start discovering just how horrible compilers/high-level languages actually are at exploiting the full capabilities of the machine, despite "common wisdom" suggesting the opposite. I started with Asm, and when I eventually decided to learn C, I remember the first time I looked at the compiler output of a program I'd written, compiled with full optimisation, I was astounded. Unnecessary moves and other instructions, very poor register utilisation, and blindness to status flags were just some of the things that compiled programs regularly contained.

This was many years ago, but I still see the same today. I do RE so I've read a lot of compiler output, and I've seen some isolated instances where a compiler did something "clever" (Intel's is not bad at this), but it tends to be rare and it's easy to see the rest of the code still has that "compiler-generated" feel to it.

I said "really learning" above, because I think there's two ways that people are learning Asm: the first, which is probably more common, is that they only learn the ways in which compilers generate instructions. Those who learn the first way would likely not do any better job than a compiler if asked to write a program, and not see the inefficiency of compiler-generated code, so they wouldn't find any particular advantages to using Asm.

On the other hand, I believe that if you learn Asm by starting with the machine itself, independent of any HLL, then you don't get any preconceived notions of what it can and cannot do, which leads to what I'd call "real Asm programming." Then you can see the inefficiencies in compiler-generated code and what HLL abstractions introduce, and can easily beat the compiler in size or speed (often both). Good hand-written Asm has a very different look to it than compiler output.

This is especially true if you manage to find a novel solution that just wouldn't exist when the hardware capabilities are abstracted away by a high level language.

For some entertaining examples of what Asm can do that compilers cannot, look at the sub-1k categories in the demoscene:

http://www.pouet.net/prodlist.php?type%5B%5D=32b&type%5B%5D=... One of my favourites: http://www.pouet.net/prod.php?which=3397

DanWaterworth · on Aug 31, 2014

everyone should write at least one whole application in assembly just to appreciate how the hardware really works

Unfortunately, with out-of-order execution and instruction-level parallelism, I doubt learning assembly teaches you much about how the hardware really works.

Edit: To the downvoter, care to comment?

userbinator · on Aug 31, 2014

Microarchitecture doesn't change the fact that the instructions in your program - the ones that you can work with - still have the same programmer-visible behaviour (except perhaps being a little faster.)

DanWaterworth · on Aug 31, 2014

I don't dispute that. I'm saying the model that you learn from learning assembly is very different to what the hardware is doing.

Concretely, learning assembly, you might assume each core has a set of physical registers that correspond to the registers you see and that isn't the case.

floody-berry · on Aug 30, 2014

NASM or Yasm are both good. NASM has really powerful macro support, and Yasm is a NASM clone/rewrite. Yasm additionally supports GAS syntax (if you're in to that), although its documentation for non-NASM features is a bit lacking. Yasm is also a lot nicer to hack on as well due to its modular design.

RDeckard · on Aug 30, 2014

FASM: http://www.flatassembler.net/

robert_tweed · on Aug 30, 2014

Well, digging around the docs and FAQs on both sites I couldn't see much useful introductory information about what the unique features of either project are, but I did some further Googling and read a few discussions. For anyone else interested, my conclusions are:

NASM and FASM are really the only up-to-date and cross-platform capable assemblers. MASM is up to date, but Windows only. TASM is not up to date. Others appear to have been abandoned.

The differences:

NASM: Is written in C and generates object files. Requires a linker to produce executables. Slow, inefficient compilation. Has some syntax quirks. May be more flexible in some cases due to the multiple object formats available.

FASM: Written in FASM. Very fast compilation. Cleaner syntax, better debugging tools. Produces executables directly without a linker. Possibly limited due to smaller number of output formats, but likely good enough for most projects that would be written in pure asm anyway.

FASM looks like the best option to learn first and then move to NASM for any specific requirement that FASM cannot meet. The syntax is mostly compatible between the two, so porting code shouldn't be too much trouble in the worst case.

robert_tweed · on Aug 30, 2014

I found some helpful thoughts regarding GAS - agreed, for source distributions targeting Linux is has a place, but it's not really a full blown macro assembler. Using the C preprocessor seems like a poor hack to me. Although I haven't tried it, it's generally discouraged in C, never mind something it was never intended for. Also, AT&T syntax: Yuck!

http://x86asm.net/articles/what-i-dislike-about-gas/

MegaDeKay · on Aug 31, 2014

You don't have to use AT&T syntax in GAS. I wrote a blog post a while back showing how you can use Intel syntax instead, and skip a whole lot of % characters while you are at it.

http://madscientistlabs.blogspot.ca/2013/07/gas-problems.htm...

e12e · on Aug 31, 2014

Wow, thanks for that! It's still a little painful, but here goes:

    # file:hello.s
    #
    # Translated to gas syntax.
    # assemble with:
    # as --64 -o hello.o hello.s
    # link with:
    # ld -o hellos hellos.o
    #
    # Modifications to original code considered trivial and to be
    # public domain.
    #
    # Support intel syntal vs. ATT and don't use % before register names
    .intel_syntax noprefix

    .section .data
        msg: .asciz "hello, world!\n"

    .section .text

    .global _start

    _start:
        # write syscal
        mov     rax, 1
        # file descritor, standard output
        mov     rdi, 1
        # message address
        mov     rsi, OFFSET FLAT:msg
        # length of message
        mov     rdx, 14
        # call write syscall
        syscall

        #
        mov    rax, 60
        mov    rdi, 0

        syscall

Note the trailing new-line in the message (and length change from 13 to 14). For nasm:

    section .data
        msg db      "hello, world!",`\n`
    ;; Remember to use 14 for string length!

MegaDeKay · on Sept 2, 2014

Try these changes instead. Untested, but it should work

    # String is read only.
    .section .rodata
        msg: .asciz "hello, world!\n"
    # Put string length in a variable instead
        .set STR_SIZE, . - msg
    # <snip>
    mov     rdx, STR_SIZE

nkurz · on Aug 30, 2014

Are there compelling advantages to using one of the above rather than GNU 'as'? I ask out of ignorance rather to say there is not. But 'as' is well documented, and if you access it as 'gcc -c foo.S' (with a capital S) it gets run through the C preprocessor first for macros and definitions. And if you are distributing Mac/Unix/Linux source, you can generally presume it or something compatible is preinstalled.

One possible other tool to consider is 'terse': http://www.terse.com/howdoes.htm

It's got a lot of issues, and you probably don't want to actually use it. It's unmaintained, proprietary, DOS only, and according the website, still distributed on a 3.5" floppy. But the syntax has a lot of appealing things about it. You can't actually read the real manual without buying the product, but a short lived open source clone "nega" used a very similar one: http://webcache.googleusercontent.com/search?q=cache:7E6Ddug...

floody-berry · on Aug 30, 2014

Relying on as won't work with Visual Studio

It's easier to update an external assembler than the system assembler. A lot of distros don't ship with updated binutils so you can't reliably compile for newer CPU extensions on them.

Earlier versions of clang's integrated assembler (which clang uses instead of as) weren't fully compatible with as, e.g. no .intel_syntax support.

Different operating systems can have subtly different behavior, e.g. the ancient as that ships with OS X uses $name for macro parameters while most? other systems use \name. I think gcc on OS X is intentionally forgotten so everyone will switch to clang.

Cross platform x86 asm is a real headache no matter what. NASM/Yasm/fasm just make it less of one.

MegaDeKay · on Aug 31, 2014

FASM is interesting because it has an extension that supports ARM as well.

http://arm.flatassembler.net/

I don't count its fast compilation speed as much of a plus because you've got to write a heck of a lot of assembler before you'd ever notice much of a difference, I'd suspect.

e12e · on Aug 31, 2014

I came across Intel's intro to x64 assembly when I was looking for some information on working with wide characters on 64bit (I seem to recall there were some new instructions introduced for that, I'm guessing one would probably be better off using libicu to parse eg: utf-8 into some form of 16-bit characters first, though?). Anyway, nothing on wide characters, but essentially a work-a-like hello-world for Microsoft Windows and MASM:

https://software.intel.com/en-us/articles/introduction-to-x6...

jpgvm · on Aug 30, 2014

Being able to read x64 assembler even if you can't write it is great for debugging strange issues.

nl · on Aug 31, 2014

Also great is being able to read JVM bytecode. Even minor changes can be pretty interesting.

Back in the day I wrote [1] about simple Java string concatenation. I still get people quoting it now, even though I'm sure it is completely outdated by newer compilers.

It gets even more interesting when you see what x64 (or whatever!) assembly is generated by the JVM.

[1] http://nicklothian.com/blog/2005/06/09/on-java-string-concat...

NaNNaNNaNNaN · on Aug 30, 2014

Javascript is the assembly language of today. For debugging purposes you're better off with Javascript.

MrBuddyCasino · on Aug 30, 2014

If you use humor or irony here without adding any substance to the discussion, one of two things will happen:

- it will not be understood, you might get downvoted

- it will be understood and you'll definitely get downvoted

ThatOtherPerson · on Aug 30, 2014

Not everybody targets all their code for the web.

rev_bird · on Aug 30, 2014

You mean there's code... not on the web?!

mikeash · on Aug 30, 2014

Oh yeah, that'll totally help when debugging my iOS or Android code.

lovelearning · on Aug 30, 2014

Excellent! Just 2 days ago, somebody here was complaining about shortage of asm programmers, and I suggested them that I'd personally be interested in a hello world x86_64 tutorial to rekindle my interest in asm...and today I see this posted! Exactly what I wanted, thank you.

innocenat · on Aug 30, 2014

I still don't understand why people who want to learn amd64 assembly won't read x86 tutorial. I write both x86 and amd64 assembly from time to time, and I don't think there are anything that make x86 tutorial/manual not applicable to amd64.

Sure, there are differences: register name, C ABI convention, system calls, memory modes, etc. But those information can be find easily in references. And you need reference for x86 anyway. Otherwise all mechanisms are the same.

ANTSANTS · on Aug 30, 2014

It's a bit of a catch-22: you can't know that there's not much different between x86 and x86-64 until you understand both. I guess asm newbies have some kind of mistaken idea that x86 is irrelevant and not worth their time learning (even if it were completely dead in the wild, which it absolutely isn't, you still need to know it truly understand the architecture); I've tried to post older x86 and ARM assembly language guides, much better written and more in-depth than this article (no offense intended to the author), and the only comments I get are along the lines of "this is old, it doesn't even cover x86-64/ARM64."

lovelearning · on Aug 30, 2014

With so many other things to learn and ideas to implement, something like assembly programming sits way back on the backburner. It's only recreational for me, not something critical for my work.

An easy tutorial like this injects just the right amount of motivation to atleast dip my feet back in. Having to wade through Intel's 1000 page system manuals to check if my past knowledge is useful or not, would require a lot more motivation than I can muster.

amenod · on Aug 30, 2014

For experienced programmer this is most certainly true. If you are just starting however it is much easier if tutorial uses the same platform you are targeting.

melling · on Aug 30, 2014

I think ARM assembler might be more useful because of all the mobile devices and Raspberry Pi's.

pjc50 · on Aug 30, 2014

Indeed - I've found being able to read ARM assembler extremely useful, especially when dealing with buggy closed source libraries or doing WinCE development. Haven't had cause to write very much though. Writing assembler is really only useful when you're doing vectorised or other high-performance arithmetic on the CPU.

theoutlander · on Aug 31, 2014

Thanks for the writeup. I'm going to try it one of these days when I get a break from work/startup. I ordered a few books recently and have been tinkering with writing an OS....got as far as writing a bootloader and then switched gears into learning ASM.

I bought Peter Norton's Assembly Language Book for the IBM PC. This book is pretty awesome and so relevant even 25 years later!! It covers the basics really well and stops at 386 (the latest proc then) so I don't feel inundated with hundreds of CPU architectures. Yes, I'll eventually get to those.

The other two books (in case anyone else is interested): X86 Assembly Language and C Fundamentals by Joseph Cavanagh. Operating Systems Design and Implementation (3rd Edition).

Also, while tinkering with ASM I decided to install MSDOS since the ASM book uses Debug.exe and I couldn't seem to find it for newer OS's. While looking to download MSDOS, I discovered the source code for MSDOS 1.1 & 2.0 @ http://www.computerhistory.org/atchm/microsoft-research-lice....

okasaki · on Aug 30, 2014

The code doesn't render with links2. Does it rely on javascript?

krakensden · on Aug 30, 2014

Yup- they're gists. I'm pretty sympathetic, setting up pygments, etc., is sort of a pain.

DigitalJack · on Aug 30, 2014

desertmonad has a gist with 32bit and 64bit hello world's for osx. https://gist.github.com/desertmonad/36da2e83569bc8b120e0

CraigJPerry · on Aug 30, 2014

Another great resource, albeit for x86 rather than 64 http://savannah.nongnu.org/projects/pgubook/

pekk · on Aug 30, 2014

It isn't nearly as hard to find x86 guides as x86-64 guides.

0xAX · on Aug 30, 2014

That's why i started to write it :)

CraigJPerry · on Aug 30, 2014

In what material way, in the context of a beginners guide, are they different?

Calling convention, word size and the r- prefixes. If you compete the PGU book (you could just write 32 bit code if you want), these are trivialities.

infoseckid · on Aug 31, 2014

You guys might enjoy this http://www.pentesteracademy.com/course?id=7

31reasons · on Aug 30, 2014

Its funny after so many years Code is still called Text in assembly.

robert_tweed · on Aug 30, 2014

That's not the way I remember it in MASM/DOS. I think it's a Unix-specific thing.