A book teaching assembly language programming on the ARM 64 bit ISA

Sirened · on Dec 23, 2022

For all those learning (or even those who've learned :P), my favorite cheatsheet that I always pull up while writing ARMv8-A assembly is this one [1] from the University of Washington. ARMv8-A has a lot of fairly complex instructions and sometimes it's hard to remember all the odds and ends.

[1] https://courses.cs.washington.edu/courses/cse469/19wi/arm64....

ryao · on Dec 23, 2022

Do you have one that can explain the operand syntax for the instructions that have operands starting with 'v' here:

https://gcc.godbolt.org/z/16a91n4zc

murderfs · on Dec 23, 2022

Those are NEON registers. v0.2d is the vector register v0, interpreted as two double words (i.e. two 64-bit elements).

ARM's documentation for this kind of thing is fairly good if you can find it: https://developer.arm.com/documentation/102474/0100/Fundamen...

PainfullyNormal · on Dec 23, 2022

> You will need to run a ARM Linux VM on the Macintosh - even on ARM-based Macs. Why? Apple. That's why.

I'm going to need a better answer than that. The whole reason I want to learn ARM is to get closer to the machine I'm actually using, not to get closer to a linux virtual machine that I will never use again running on the machine I'm actually using.

pmontra · on Dec 23, 2022

https://github.com/below/HelloSilicon

PainfullyNormal · on Dec 23, 2022

Oh, awesome. Thanks!

saagarjha · on Dec 23, 2022

Because they probably have Linux syscall numbers in their examples and those won’t work on macOS.

ryao · on Dec 23, 2022

https://www.qemu.org/docs/master/user/main.html#linux-user-s...

QEMU can fix that.

pm215 · on Dec 23, 2022

No, it can't -- QEMU's usermode emulator translates Linux-to-Linux only, it does not let you run Linux binaries on macos. And since it's an emulator it's arguably taking you even further from the bare metal than running a VM, which (assuming hypervisor.framework or equivalent) is at least executing the guest instructions directly on the real CPU.

canis-fam · on Dec 28, 2022

Criticism appreciated.

Reasons are now explained in the README.

We promise a future chapter to bridge what is used elsewhere back to the Mac M1. And also, a chapter bridging to what is used on Windows ARM machines.

pmontra · on Dec 23, 2022

Two notes to the author:

1. It would be nice to explain how to install an assembler (part of gcc?), how to run it on source code, how to run the program. Googled, not tested:

  as -o prog.o prog.s
  ld -o prog prog.o
  ./prog

2. Apparently GitHub accepts commits in the future because the last ones in this repo are timestamped Dec 23 2023.

masklinn · on Dec 23, 2022

> Apparently GitHub accepts commits in the future

Of course.

And this is made very annoying by github also insisting on reordering commits “chronologically” rather than topologically. If someone fucks up their local date (because they wanted to test something and forgot to reset it) you get a commit permanently pinned at the top of a branch until that date passes.

Kwpolska · on Dec 23, 2022

Why would GitHub prevent commits from the future? Git allows commit dates from the future and from before git was invented, and GitHub's job is to store git repositories.

pmontra · on Dec 24, 2022

Locally git can't tell if the clock is in the future, unless it starts checking ntp servers or the like. However it's not its own business IMHO.

As a collaboration tool GitHub might refuse timestamps too much in the future, at least as an optional setting of the repository. If a coworker inadvertently moves the clock forward by an year when looking for the day of next new year eve, then makes a day worth of commits and pushes them, we'll have to delete those commits and make them again with the correct date.

ryao · on Dec 23, 2022

I am not the author, but I suspect qemu-user would be helpful here:

https://www.qemu.org/docs/master/user/main.html

He would just need to say how to get an assembler that would work. llvm-as might work, but I am not sure how to get it to produce code for another architecture offhand.

ptspts · on Dec 23, 2022

On my Ubuntu, there are many cross-assemblers (including ARM, RISC-V and PowerPC) and linkers available in packages. These packages have binutils in their name. Once installed, the assembler command has -as at the end of its name, and the linker has -ld there.

canis-fam · on Dec 28, 2022

Criticism appreciated.

The main README now contains install and build instructions.

The main body of the book will, over time, be updated to include build instructions for each example.

As per date... I'm at a loss. I believe my own machines are set correctly.

boberoni · on Dec 23, 2022

I’m a noob at assembly and this was something that I never figured out:

For assembly languages, how does package management work? As an example, how would I “import” code from my co-worker or from an open source library on GitHub?

My first thought is that “maybe this is what a linker does…”, but what sort of toolchains exist for dependency management in assembly programming? What’s the analogy to pip, npm, cargo, etc.?

robinsonb5 · on Dec 23, 2022

The idea of having dependency and package management system as an inherent part of a language's ecosystem is pretty new, and there isn't really any such thing even for C, let alone assembly language. So the short answer is "that's your problem", to solve however you wish! Personally I use git submodules.

anta40 · on Dec 23, 2022

>> What’s the analogy to pip, npm, cargo, etc.?

AFAIK no one build such convenient tool for assembly programming. So, simply put the the library codes inside your project folder, configure your Makefile/batch file, tell the linker where to find required .a/.lib files, etc.

More manual work, eh.

wyldfire · on Dec 23, 2022

The way you can interact with other code is via symbolic references. Indeed, the linker resolves most of these references. These are stored in an object file as a "relocation".

It's considerably less convenient than a package system.

shadowofneptune · on Dec 23, 2022

Even being able to use that assembly source as listed is not a guarantee. There's no standard macro language, even if two assemblers share most of their syntax. As other posts have said, binary compatibility is more important.

flohofwoe · on Dec 23, 2022

> What’s the analogy to pip, npm, cargo, etc.?

mkdir and cp basically (or git submodules if you want to get fancy)

dboreham · on Dec 23, 2022

It's what the linker does.

AshamedCaptain · on Dec 23, 2022

Your operating system's package manager is the one going to do that.

canis-fam · on Dec 28, 2022

Thank you for the post here. Interest in the book has exploded.

I've already implemented some of the criticism written here. It is appreciated.

-- book's author

uptakeinhibitor · on Dec 23, 2022

You might be interested in this repo that converts asmtutor to Arm64. It's not the most faithful port though:

https://github.com/lirorc/arm-asm-examples

ryao · on Dec 23, 2022

> In fact, we would argue that the study of assembly language is extremely important to the building of competent software engineers. Further, we would argue that teaching the x86 instruction set is cruel as that ISA was born in the 1970s and has simply gotten more muddled with age.

> The MIPS instruction set is another ISA that is often covered in College level courses. While kinder and gentler than the x86 ISA, the MIPS processor isn't nearly as relevant as the ARM family.

I disagree. My university taught me MIPS assembly and its only utility has been in enabling me to understand x86_64 assembly poorly. When I look at the disassembly of my code, it is in x86_64 assembly, yet I realized yesterday that I have no clue what the difference is between signed and unsigned subtraction on x86_64. Had that been covered in my university class, I would not be in a position where I need to reverse engineer how that works by working out the math based on the very succinct documentation.

I only recently dabbled with aarch64 and POWER9 because I felt inspired to try to make ZFS' fletcher4 checksum algorithm run faster on various architectures. I had no success on POWER9 due to the Talos II workstation being severely memory bandwidth limited, but had fairly good success on x86_64 and aarch64, although the PRs with that code are still works in progress.

A guide for NEON and SVE would be great, since that is the only aspect of the aarch64 assembly that I did not understand well enough to muddle through it in my recent fletcher4 adventures, but that does not appear to be here. Thankfully, Clang did an awesome job compiling GNU C dialect vector extensions into efficient, easy to understand NEON assembly (even if I had no clue about the notation used for registers), so I did not need to learn the syntax.

For students, I would suggest learning x86_64, since that is what they are going to be using and if they can learn that, they can learn just about any assembly language (although the RISC-V Vector Extension might pose some trouble). The broader industry shift from x86_64 to aarch64 is still years away and ARM seems determined to kill it before it is here by suing Qualcomm for wanting to build ARM processors under the license they already paid ARM to give them. ARM can say whatever they want in court, but at the end of the day, ARM IP now looks toxic. I hope RISC-V can take the baton from aarch64. It could very well be our only hope for a x86_64-free future given ARM's major strategic mistake.

celrod · on Dec 23, 2022

> yet I realized yesterday that I have no clue what the difference is between signed and unsigned subtraction on x86_64

It's two's complement. There is no difference between signed and unsigned for addition, subtraction, or multiplication. https://en.m.wikipedia.org/wiki/Two%27s_complement

renox · on Dec 23, 2022

In the value no, in the condition code registers status, there may be differences.

masklinn · on Dec 23, 2022

> ARM seems determined to kill it before it is here by suing Qualcomm for wanting to build ARM processors under the license they already paid ARM to give them

That is not what they are doing at all and it’s very dishonest of you to pretend it is.

klelatti · on Dec 23, 2022

Indeed. I'm really surprised that some seem to be saying that Arm trying to enforce the terms of contracts freely entered into by two of its customers is a bad thing for the wider ecosystem.

billfruit · on Dec 23, 2022

Is learning any sort of Assembly language a good investment of time? Is that time better spent more usefully studying other topics that help solve interesting problems?

Are there interesting problems that require knowledgeable of Assembly language to solve?

Would Dijsktra, for example have adviced teaching for or against teaching Assembly language?

rramadass · on Dec 23, 2022

>Would Dijsktra, for example have adviced teaching for or against teaching Assembly language?

Dijkstra makes a distinction between Computing Science and Computer Engineering. The former is the Theoretical Science part i.e. Mathematical Logic, Automata, Algorithms, Computability, Complexity etc. The latter is the design of practical machines where we can realize the running of algorithms from the former domain. Dijkstra did both but his interests/focus were more in the former domain.

However, he called the General-purpose Computer "A Radical Novelty" from the pov that it allowed us to do any computation that we can think of (subject to Mathematical limitations/Computability) unlike any other previous man-made inventions. Many of his EWDs talk about this. Given that Assembly language is the native language of a Computer and all Higher-level language abstraction constructs have to map to it there is great value in learning the structure of Assembly language and how to program using it. There is also the beauty of Engineering involved, where layers of abstraction come together in a harmonious manner to give you a Machine with almost limitless capabilities. Also note that Assembly language is the Computer Architecture view for a Programmer while Computer Organization is the engineering implementation of it.

The following two books are highly recommended for further understanding:

1) Code: The Hidden Language of Computer Hardware and Software by Charles Petzold.

2) Structured Computer Organization by Andrew Tanenbaum.

tengwar2 · on Dec 24, 2022

A minor niggle there: machine code is the native language of the computer. Assembly language is a slightly higher-level and slightly less expressive language. Most of the time, they can be seen as equivalent, but as a trivial example, a generation of us had to DB Z80 opcodes when using an 8080 assembler. More practically, things like breakpoints have to be set by writing binary to memory - there is no assembly language involved. At a less practical level, you can have things like self-modifying code, or code where the meaning changes depending on what byte offset you start.

microtherion · on Dec 23, 2022

Writing assembly language is a rather specialized skill that few people need to master nowadays.

Being able to read it, on the other hand, I find really useful for debugging. You can follow execution into places where you don't have code at hand, find code that was not translated the way you expected (e.g. due to surprising aggressive optimizations), etc.

DiscoDays · on Dec 23, 2022

> Is learning any sort of Assembly language a good investment of time?

Learning to understand Arm architecture specifically or computer architecture in general can be invaluable. A particular Assembly syntax is just a way of expressing what you want to do.

I would suggest reading a book about computer architecture or taking a course. And if you are like me and are interested in what complicated things hardware does to fulfil the needs of software, you might want to first read a book on operating systems (like OSTEP by Arpaci-Dusseaus, which is marvel of accessibility and challenge for students), because it will make you ask relevant question.

sandruso · on Dec 23, 2022

Understanding lower levels can be satisfying for certain type of people. I find it very useful when you can reason about stuff that is mostly invisible to other people. But if you are building web apps then this stuff is 95% useless as you can’t leverage the knowledge because web apps are much higher level.

If you are curious about this stuff definitely look into it. It may be worth to you.

billfruit · on Dec 23, 2022

Not just web apps, isn't most of assembly language just incidental details? Is it fundamental knowledge? I doubt.

nottorp · on Dec 23, 2022

Most is details you don't need. However knowing how a function call works, what kind of branching is available and how to do basic operations with collections (arrays or more) at the assembly level will allow you to write more efficient code in higher level languages.

sandruso · on Dec 23, 2022

It is fundamental knowledge, but you can skip it thanks to higher levels of abstraction.

In the end everything is just bunch of ones and zeroes and clever hardware architectures.

Nand2tetris is great source for figuring out whether this is something you want to get in to.

anta40 · on Dec 23, 2022

>> Are there interesting problems that require knowledgeable of Assembly language to solve?

If you find reverse engineering interesting, at minimum being able to read assembly is a must. I only learn assembly for fun, e.g writing SNES/Sega/Atari games.

As a mobile application developer, obviously assembly is practically useless for work.

flohofwoe · on Dec 23, 2022

Any sort of serious optimization work may require mulling over disassembly listings of compiler output (at least after you've exhausted the 'simple' high level optimization options).