Hacker News new | past | comments | ask | show | jobs | submit login
A Friendly Introduction to Assembly for High-Level Programmers (shikaan.github.io)
286 points by shikaan 3 months ago | hide | past | favorite | 60 comments



Biased opinion(because this is how I did it in uni): Before looking at modern x86, you should learn 32-bit MIPS.

It's hopelessly outdated as an architecture, but the instruction set is way more suited for teaching (and you should look at the classic procedure from Harris and Harris where they actually design a pipelined MIPS processor component by component to handle the instruction set).


I'd recommend RISC-V instead. It is close to MIPS assembly but more consistent without MIPS's quirks.

I believe it has already replaced MIPS for teaching assembly language at many universities. Another one I've seen a lot is the 8-bit MCU Intel 8051.


People who say this rarely also recommend learning toki pona or Esperanto before learning a spoken language they have a practical need for. Some do, but few.

In don't think it makes sense to learn a "clean" language first. Always learn a useful language and never learn a language that has no organic utility as step 1 of learning a useful one.

Unless the person taking this advice will program MIPS devices, then go ahead.


This is only true if your goal is only to speak some language rather than be a linguist.

The analogy is a little off because most people do not much need to be linguists.

But programmers almost always benefit from being a linguist that can pick up any language and do some job and then the next job might be any other language.

The abstract language allows you to see the difference between the universal principles and the arbitrary quirks.

You don't think it's worth the abstract stage it to know that? I do.

Even though I'm not primarily a developer. This is just as an ordinary mere user of computers. Just the plain utility of being even basically literate and functional and do a little coding or modifying in any language that the current project happens to use is extremely valuable in whatever I want to do at any given time.

Not to mention the entirely other obvious thing that you almost always learn a simple version of any new thing rather than starting immediately with the most advanced and complex version.

No, this reasoning does not hold up to scrutiny.


Comparing spoken language with programming is such a terrible comparison to make. The two even at a fundamental level have almost nothing in common. The brain uses completely different parts for programming and human language.


Need-based learning is the way to go with this for sure. Z80 assembly (ZX Spectrum) used to be a popular online recommendation, it's probably a good thing it's no longer popular... On the other hand, I think most people who are casually interested in learning a bit more about assembly don't have a need, so aren't going to actually use it or do anything with it. If the need for any particular system just isn't there, just a weak desire to learn a little, learning fundamentals is a decent strategy, and picking a language that helps get to and express those fundamentals will be helpful even if you later on use a different language. What are the relevant fundamentals for assembly? Maybe just a story about how CPUs from the 70s sort of worked is fine. With spoken languages, there's not much difference in "cleaner" things like Esperanto vs. useful things like English in showcasing fundamentals like nouns and verbs. With assembly, x86 is kind of insane, so I think there are bigger differences.

Of course, some people might consider modern hardware advances to be the fundamentals. Like if you aren't learning about simd and so on, taking into account caches and multi-cores, what's even the point, sort of opinions. I can almost get behind that.

My biased opinion though is that if you want to learn beyond a casual level, you should definitely be learning something that is running on actual hardware, not some sort of emulation layer. x86 then comes back into prominence since most PCs are that. However in the context of a captive audience like students in a classroom, the teacher can decide on something less insane, more fun than things like figuring out how to take an -O0 compiled C file and optimize a loop (ignore -O2 will do it better), and more showcasing a "need" because unlike your x86 class assignment, you can't just trivially do it in a higher level language, and you can't just swap in an x86 chip to a PCB. My first introduction to real assembly in college was with a little PIC microcontroller to control a robot car. I knew a bit of x86 by then, just enough to compare, and it was such a breath of fresh air just to use something not x86. It could have been ARM (we later used an ARM chip to program an RTOS and do other embedded systems relevant stuff), or anything really, the important thing is it was real. In the quest for teaching a "simpler assembly" I know some schools have done things I consider absurd like having students learn some other architecture but can only run code in some provided emulator on Windows, not a real chip with real peripherals you need to write code to talk to.


MIPS is a load-store architecture and has branch delay slots, both of which make it complicated to learn for beginners. If you want to learn an obsolete ISA with no commercial relevance because it's simpler, I would recommend 68000 or 6502 instead.


Those are the ones I was fed in the early 00s and I can confirm they work fine for pedagogical purposes. But if it were my choice today, I think I'd go with THUMB on legacy 32-bit ARM chips. (This can even theoretically be used practically today: GBA games can be emulated on 3DS and probably also Switch.)


Learning assembly on the 68HC11 was the keystone moment in understanding programming at an intuitive level for me. What I learned applies to this day.


I suggest starting with 8-bit MCU assembly, like AVR or PIC, or even something like 6502. It's very easy and cheap to get started, anyone with an "arduino" could write assembly for it, and 8-bit assembly is still relevant for very low latency real-time systems, or cheap simple solutions. You can do a lot with 10MIPS 8-bit MCUs that has a bunch of easy I/O peripherals. It's easy and accessible. I think 32-bit assembly is less common because C/C++ is usually the right tool for most of the jobs a 32-bit CPU would be used for.


I think 6502 is a very good and accessible assembly language for beginners, and I've done a decent amount of work around it


I disagree. I did 6502 (and a bit of z80) in high school, those old Turbo-Pascal-embedded-assembly demoscene tutorials in high school and college, learned MIPS in college, learned msp430 (very risc-y) when working through microcorruption.com, wrote a (very simply) Cool-to-MIPS compiler working through the old Coursera compilers course, and wrote a 6502 emulator, AppleII emulator, and 6502 assembler.

I still feel completely intimidated and out of my depth with modern assembly. I have no idea where to start. There are just so many instructions, conventions, registers with strange names and conventions that include each other, etc. Add to that that many of the examples you see are trying to beat the compiler, so doing clever/wide stuff. A comprehensive, modern tutorial that starts simple but goes more or less the whole way would be welcome.


>I still feel completely intimidated and out of my depth with modern assembly. I have no idea where to start.

I'm not surprised. But keep in mind that "create something that works on a modern CPU and beats the compiler's output" is not the only possible goal of learning assembly programming. I would say it's not even the usual goal, and hasn't been for a long time.


It's not that bad if you don't expect too much of yourself. I could patch some software machine code just by drawing on my 6502 teenage experience from more than 20 years ago. x86 or IL ASM doesn't matter all that much. Core concepts are the same. Registers, stack, relative jumps conditioned on the contents of flag register.

One thing that tripped me up a bit is that on x86 standard practice is using "taking address" lea operation combined with weird addressing mode to do simple arithmetics. I had to ask ChatGPT wtf is that.

In the process I learned that 686 assembly is a very different beast and they don't play nicely with each other (or at all).


Ah, interesting. I think, looking back, that "lea" is usually where I start getting confused!

I apparently also need to internalize harder that I should just ask llms about everything :-)


I might also suggest ARM. Way back in the day, one of my first-year CS modules included writing a rudimentary MIPS emulator for much the same reasons as above, but I came to that knowing 6502, x86 and ARM2 assembler already, having cut my teeth on Acorn stuff, and at the time I still preferred ARM2 out of all these for clarity. ARM obviously remains relevant today, albeit with many revisions.


Even System/370 [0] architecture (mainframe) is a great place to start. There's a one-to-one correspondence between assembler instructions and machine instructions, which makes writing and debugging considerably easier. It's actually an incredibly robust processor architecture that's simple to understand.

[0] IBM System/370 Principles of Operation (the "big yellow book")

http://bitsavers.trailing-edge.com/pdf/ibm/370/princOps/GA22...


Are delay slots unique to mips? They're a pretty significant aspect (complication) that I don't see anywhere else.

Also, the gp register- don't know of any other arch where you need to set something like that up to access global variables. It's another layer of indirection which makes the assembly code harder to read (especially PIC code that works off of the value of the t9 register)


SPARC has delay slots. Here's some code. The clr %o3 and restore %o0 are in the delay slots.

    10082410   9de3bf98      save        %sp, 0xffffff98, %sp
    10082414   90102001      mov         1, %o0
    10082418   92100018      mov         %i0, %o1
    1008241c   94100019      mov         %i1, %o2
    10082420   7fffff17      call        dirList
    10082424   96102000      clr         %o3
    10082428   81c7e008      ret
    1008242c   91e80008      restore     %o0


>Are delay slots unique to mips?

No, several other architectures of that period have delay slots (80s/90s), and modern VLIWs/DSPs still often have them. Sometimes the slots longer than a single instruction.


I learned on the Motorola 68000 series. I can confirm, it is easier to learn on a smaller instruction set for a simpler architecture.


I learned 6502 back in the 80's, when it was current. Definitely easier than x86, but I kind of wonder if a modern learner wouldn't constantly be wondering if they were missing something and whether they were actually learning anything on what is now a completely obsolete framework. x86 is more complex, but you can actually see it _do_ something rather than hope that the emulator you're running it on is actually faithful to a real electronic device.


I learned on Z80. It's also relatively simple (and a lot of fun).


Emu86/assembler/MIPS/key_words.py: https://github.com/gcallah/Emu86/blob/master/assembler/MIPS/...

Emu86.assembler.virtual_machine > MIPSMachine(VirtualMachine) , RISCVMachine(VirtualMachine) , IntelMachine(VirtualMachine) https://github.com/gcallah/Emu86/blob/b48725898f37dede3ab254...

Learn x in y minutes > "Where X=MIPS Assembly" https://learnxinyminutes.com/docs/mips/


If we are not doing Intel/ARM architectures, a lot can be said for learning PDP11 assembly. Among other things, is maps very cleanly into concepts central to higher level programming languages, like indirection.


I'd suggest RISC-V, not only because it is inevitable and rapidly growing the strongest ecosystem, or because the MIPS ISA is no more (company who owns it switched to RISC-V) but because it is free of microarchitecture spilling into ISA, which MIPS is guilty of.

From a programmer POV, RISC-V is not unlike MIPS, but much easier.


Or something like msp430, doing math on 16bit address space is fun and only 4 hex digits. All the concepts transfer.


Working through the microcorruption.com challenges left me really liking msp430 assembly


It is probably one of the best ways to learn how computers work :)


Very nice, I'm always for more assembly-level programming and education!

A very minor grammar thing, here:

In our example, the mnemonic is mov, which stands for move, and the operands are rax and rbx. This instruction in plain English would read: move the content of rbx in rax.

I believe the last part would be better as "... content of rbx to rax", i.e. "to" instead of "in". I'm not a native speaker, though.


Agreed, indeed I thought that "Copy the contents of rbx to rax" might be even clearer (mov doesn't remove the value from rbx IIUC!)


I'd say 'Yoda speak' makes translating from Intel syntax to English a lot easier.

  mov rax rbx

  Into rax, copy the contents of rbx.


if you tell someone you're moving, a generalized first question they would ask is "where?" not "what".

move Fiji belongings


Thanks for the comment and for taking the time to read :) I fixed it and it flows much better now


- "Ask HN: Best blog tutorial explaining Assembly code?" (2023) re ASM, HLA, WASM, WASI and POSIX, and docker/containerd/nerdctl support for various WASM runtimes: https://news.ycombinator.com/item?id=38772493

- "Show HN: Tetris, but the blocks are ARM instructions that execute in the browser" (2023) https://news.ycombinator.com/item?id=37086102 ; the emu86 jupyter kernel supports X86, RISC, MIPS, and WASM; and the iarm jupyter kernel supports ARMv6


Nothing gave me an intuition for assembly like TIS-100


I have a nice print copy of the Shenzhen IO manual. I love those games so much, and agree that they're actually not a half bad way to get started with assembly!


Really well written - it was a pleasure to read. Concepts were introduced in small, consumable chunks, without being too slow or overwhelming. I hope more articles are coming.


Thanks for the kind words! Yes, there's more coming. I planned a series of seven articles, the second of which is already out


A comment on the second blog post:

Some condition codes are different depending on signedness of the numbers being used. "Greater"/"Less"/"Overflow" are for signed, "Above"/"Below"/"Carry" for unsigned. The sign flag by itself is not what you would test when comparing two numbers, since the subtraction done by CMP might have overflowed - that's why the condition for "Less" is defined as "Sign XOR Overflow".

There are various arguments for always using signed types in C, but none of that applies to assembly, and unsigned is more appropriate in most cases. So maybe these conditions should be introduced first?

Readers might be confused why it is called EFLAGS, or about the register names, so maybe a little history should be included: registers were originally 16 bits, then "E"xtended to 32 bits, and later "R" was used to indicate 64 bits. AH/CH/DH/BH correspond to the high byte of the 16 bit registers AX/CX/DX/BX, not the extended ones. These aren't used much anymore.

Good tutorial nonetheless!

So many others don't even mention that official documentation by Intel / AMD exists. Instead it's mostly "here's what code GCC/Clang generates for this C program", in that horrid AT&T syntax, and links to one of several third-party reference sites containing nothing but giant dense tables of mnemonics, opcodes and flags. No wonder when people reading those come away convinced that it's impossible to actually understand this stuff.


Hey, thanks for the feedback (:

The second article is still shaping up, but I felt like publishing it to get some early feedback. What you mention are a couple of the reasons.

At this point, I find it too dense (and it was even more so in first drafts) hence missing information. Maybe I should cut some parts, like the rip intro, and accommodate for more details about flags and conditions.

This is good signal for me. Thanks again for taking the time


Why does it say “constants here” in .data? .data is for initialized variables. Constants go into .rodata


You are correct.

The focus of the series is in on accessibility. Sometimes it comes at the expense of telling the whole story: I was even on the fence on introducing the .bss block, honestly.

The article mentions initialized variables, but maybe this warrants a footnote.

Thanks for the feedback and for reading through the article. I appreciate a lot :)


> The focus of the series is in on accessibility.

Teaching wrong things does not make the article better accessible, but more confusing.


I would add that if you're targeting HL programming language users, this is something they should already have the knowledge of. It does not need a particular effortt to map that to what they alrady know.


You got me thinking and I simplified the whole block. Removed the .bss part and just calling what goes in there "data".

Thanks once again for the feedback :)


This is a nice introduction! Myself, I started with Jeff Duntemann's "Assembly Language Step By Step" - http://duntemann.com/assembly.html

(He is a an engaging tech author - I have never not loved one of his books)


As someone who used to eat assembly instructions for breakfast back in the days and remembering when a MUL was taking more than 1 cycle, is there any resource you'd recommend to learn about using the highly vectorized/parallelized instruction sets in modern CPUs?

I know about Daniel Lemire / lemire.me

Anybody / anything else you'd recommend?


There's https://www.agner.org/optimize/microarchitecture.pdf for a bunch of microarchitectural information, covering many interesting things outside of specific instruction breakdown as in uops.info. chipsandcheese.com also covers a bunch of stuff. Then there's various more specific things in a bunch of places, e.g. http://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teardo..., https://web.archive.org/web/20240602004718/https://www.merse..., and a bunch more. https://dougallj.github.io/applecpu/firestorm.html for Apple M1.



Oh wow, tyvm: I looked a little bit already and it looks like a treasure trove. Very extensive.


I've been on Apple silicon since they've been out, I'd love a similar intro but based on this architecture.


Here are some simple programs in assembly for the M1 chip, annotated as best as possible. Perhaps they will help?

https://github.com/jdshaffer/Apple-Silicon-ASM-Examples


Thank you for the repository. I'm having a great time reading the code with all the comments. (The note about writing on a Oculus Quest had me cracking up.) You could easily write a great blog post with this content, or do some YT videos. The Apple's ARM assembly ecosystem is short on resources like this.


Thank you for your kind comments, and I'm very glad you like the samples. I actually can't remember what I wrote with my comments, though I tend to not take myself very seriously. smile

I'm not an expert with M1 assembly, just a tinkerer, so I never felt wise enough to write a blog or record videos. I wish I could have found some for myself!


Excellent post. Thanks for your time and effort.


Good stuff. My level was over 9000 :o)


Do yourself a favour and use a good macro assembler, and if doing x86, Intel syntax.


That was a nice little tutorial!


One thing I don’t like about these articles in general is they omit a very important step in describing assembly: how it’s actually used by a computer. I’ll give a brief intro to folks and hopefully not confuse things any further. If you want hands on experience, grab an STM32 Nucleo board, or do Ben Eater’s 8-bit processor DIY course. This is now getting into a very closely related subject matter called Embedded Systems via foundational electronics and electrical engineering studies. Anyone interested in this - may the Gods be on your side.

So, here we go. Blog post as a comment.

Assembly and mnemonics are there to be human readable representations of a fictitious thing called bits which are always “stored somewhere”. Now, bits don’t actually exist, nor does their storage, but voltage differences, magnetic/electric fields and currents do. There are no intermediate layers in hardware that translate your “program” into this pattern - your “files” already exist in as this pattern somewhere because again, it’s voltages, fields, and currents, and that’s all there is. This means all those abstractions and OSI layers don’t exist - all that stuff is make-believe. Files and file systems don’t exist. Hence, assembly doesn’t exist.

I’ll skip over a thing called microcode and microarchitecture. Both of these deal with taking your bit pattern and re-arranging it into a different pattern that the exact processor you’re using is optimized/carefully designed for and doing other mean things to your program. In general you wont encounter this when doing assembly or any other high level language, but certain processors do allow for some play at program-level. You can configure the processor exactly how you want it to function and this is packaged with your program code.

Now, when we talk about data buses, instruction buses, “cache lines”, and all these things what we are really talking about are parallel traces on a circuit board on which one can sample, usually though not always, a voltage difference on each trace and see if it’s “high” or “low” (your “bit”). The number of these traces usually corresponds to the number of bits. Hence, an 8-bit, 16-bit, 9-bit and a 32-bit processor again, usually though not always will have physical lines on a circuit board corresponding to this number. Hence, your assembly instruction is an English term for an ordering of these lines at a point in time.

Here’s the OG 8086 processor chip. You can see physically it has AD0 to AD15 pins, hence 16 “bits” of voltage levels that are “high” or “low”: https://www.geeksforgeeks.org/pin-diagram-8086-microprocesso...

So your assembly instruction is an arrangement of the voltage levels for these pins. The processor samples it, and proceeds to raise or lower other pins it has. And it goes round and round like this until it overheats and dies on you after some years of abuse.

You write your program, you store this in “memory” as an ordered arrangement of voltages/fields. A “controller” reads this and presents the processor at any point in time with the voltage levels, directly hooked up to its pins. The processor goes ahead and does it’s thing. There’s very little rocket surgery to it.

Processors of course are not usually this simple. They are nowadays structured do many steps at once, often times abusing a clock (again, voltage differences) and decreasing sizes of chips to go faster and faster.

One point to mention here is, you can probably now see that the data pins of something like an 8086 can literally be hooked up to anything - there’s no separation between “code” and “data”. By now you should understand that these, too, are make-believe. A pattern of signals is a pattern of signals no matter where it’s coming from. Although some processors will tell you otherwise and they get fancy with a thing called memory maps. They too, don’t exist.

So, armed with this hopefully demystifies for you a bit the why of assembly. It’s a convenient, human readable form of instructions that a processor can execute. It is a fairly banal and trivial thing in the end. The real guts of it is knowing how the processor will execute it, what it will do before and after. Assembly as a language can be picked up relatively quickly.

It should be noted that, should you choose to, you can open up the manual of your favourite processor, and chances are highly likely you’ll get a listing of not only mnemonics (for your reading pleasure), but the aforementioned voltage levels and functions of each of the of pins, as well as the “bit pattern” for each instruction. Some manuals will, if you’re really lucky, tell you how to design your circuit board for the processor to function properly, how far your memory is allowed to physically be from the processor, and what other things you must keep in mind.

Clear as mud?

(epilogue: if you’ve found this info tickles your fancy, you should definitely look into electronics and start building circuits, both analog and digital ones, get comfortable with reading data sheets and understanding how processors interact with the peripherals around them to perform their duties. This is an immense field of lots of things which would fascinate you and are truly science fiction.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: