# as hello.S -o hello.o
# ld hello -o hello
# ./hello
.data # .data section starts
msg: .ascii "Hello, World!\n" # msg is ASCII chars (.ascii or .byte)
len: .int 14 # msg length is 14 chars, an integer
# .int 32 bits, .word 16 bits, .byte 8 bits
.text # text (instruction code) section starts
.global _start # _start is like main(), .global means public
# public symbols are for linker to link into runtime
_start: # _start starts here
movl len, %edx # value 14 copied to CPU register edx
movl $msg, %ecx # memory addr of msg copied to CPU register ecx
movl $1, %ebx # file descriptor 1 is computer display
movl $4, %eax # system call 4 is sys_write (output)
int $128 # interrupt 128 is entry to OS services
movl $1, %eax # system call 1 is sys_exit (prog exits)
movl $77, %ebx # status return (shell command: "echo $?" to see it)
int $128 # call OS to do it via interrupt #128
I couldn't get the # style comments to work, so had to resort to /* */ instead. It's very similar because the code is quite simple, but could be a good starting point if you are interested in ARM assembly. You can use the same commands as above to build and run.
To use syscalls x86_64 assembly (at least on linux), you'll want to use the syscall instruction, and set up the registers according to the linux 64 bit syscall calling convention (which is different than the 32-bit convention).
Well.. to be fair as a tutorial on assembly language the linked article (which does nothing but make a C-linkage function (main) to call another function (printf)) is going to be a better choice.
Things like system calls and program startup are more advanced topics you'd introduce later, even if they make for shorter code.
And to nit even further: your example is "lower level" but still relies on a metric ton of magic being executed by the linker. If you're allowed to "just invoke ld" the OP's "just call and return from main" trick doesn't sound so bad.
Sigh... "just". Find me 100 programmers capable of writing a C function to call another and tell me how many of them understand the ELF format, or even know that it exists. It's a subject worthy of its own tutorial (which you conveniently linked to), and specifically it's a subject that is rather more complicated than how to write an two x86 Linux system calls.
this is a much better example since it does away with the needless complexities, and is a good starting point for looking at interrupts and how they are used in this environment... since their importance isn't hidden behind a layer of needless logic and a c library call.
> Who thinks in terms of CPU registers these days, for instance?
The people for whom it makes sense, same as the people who think about RTCs or deep sleep modes or programming single-board computers to read temperature sensors.
It isn't like the low level has gone away or become completely inaccessible. It's that now we have more options to write working code, and we can optimize for things beyond cycle-level and byte-level efficiency. Optimizing for readability, for example, wasn't really an option when all of the readable algorithms were unacceptably slow due to constant terms.
Breaking every rule about data-hiding to get somewhat better constant terms in the big-O analysis isn't virtuous, it's just what was forced on us by insufficient hardware.
"Who thinks in terms of CPU registers these days..."
That's pretty much the only way I think these days, but I guess assembly does not qualify as "modern language" even though it can and is used to control the lion's share of the world's "modern" computers. Matters little to me; I'm hooked.
I do not write in Lua, and I am sure I will be quickly corrected by someone who does, but isn't it considered both (virtual) "register-based" and "modern"? My sincerest apologies to the Lua experts if I am wrong.
This is my sentiment exactly. With all these fancy functional languages, non-imperative programming paradigms and heaps of JavaScript frameworks, there's one thing that one might forget: computers are made of silicon, not category theory.
It is easy to write down a bunch of assembly instructions. It's hard to make them do the thing you want, and it is our desires and our intent that need more than a little silicon to express.
It's been a while since I did any assembler, but one thing I remember: If you seriously want to write anything GAS may not be your best option.
nasm has a much saner syntax (and if you ever happened to have used tasm or masm it's much more similar to that).
Heavy lifting is done by an external routine being called.
That does not make you learn assembly language.
ASM is about interrupt (bottom half), calling other piece of assembly, saving and restoring the registers, and above all mastering the mov, and the art of deciphering how registers are used.
How would someone go about learning assembly these days? I have seen quite a few assembly posts for starting out but never know where to go from there.
Assembly Language step by step by Jeff Duntemann remains one of my favorite books overall (not just programming, not just computers). It was updated in the last few years and the 3rd edition remains quite good.
Nice, these look superior to the intel books (which intel graciously printed then mailed to me for free like 10 years ago, go intel!). Ill check them out.
Went to see the printf or puts implementation rather then just calling a library function. Then saw Intel assembly and reminded myself why I've never wrestled with it.
Agreed; most programmers I know who write a lot of x86 Asm tend to stick with the Intel syntax. I've never found the "it's easier to parse" argument for AT&T syntax particularly convincing, especially as GAS is written in C while some of the Intel syntax assemblers existing at the time were themselves written in x86 Asm. Neither is the "it makes it more consistent with the other arch's assemblers" - just look at GAS' syntax for MIPS and ARM. It seems to me like someone was really obsessed with the SPARC syntax (which I find roughly as horrendous.)
I can agree with most of that, except for the fact that Intel arguments are backwards. That throws me off every time. I don't know of any other assembler syntax that uses that argument order.
For me, I remember the notation by correlating it to its "high-level" equivalent.
mov eax, [ ptr ]
is like
eax = *ptr; // or, eax = ptr[ 0 ];
The offset/multiplier memory addressing format for AT&T syntax was always more troubling for me. Coming from a TASM/MASM/NASM/PASCAL/x86 background first, it felt "icky" to put offsets outside of the "brackets" (or parenthesis, as it were) [0][1].
Standard ARM, MIPS, PowerPC, x86, and Z80 Asm syntax all have the destination on the far left.
68k, Alpha, PDP-11, SPARC, and VAX have the destination on the far right.
The order is probably as contentious as the great endianness debate, but I think one of the most awkward parts of having src, dst order is that subtraction looks backwards. I prefer dst, src because it corresponds closely with the direction of assignment in higher-level languages:
I'd love to see a version of this done using the linux amd64 syscall abi. I've written similar programs for 32bit linux (I know the syscall instruction by heart: int 0x80), but I understand 64bit is more complex.
I don't know if I would call it harder, it's just a bit different is all.
Here you go:
; Hello World, linux x86_64, nasm syntax
section .rodata ; Begin read only data section
hello: db "Hello, World",0x0a ; String, 0x0a is \n
hello_len equ $-hello ; $ is current address, length is address after string - address of start of string
section .text ; begin code section
global _start ; export _start so the linker can see it
_start: ; program entry point
mov rax, 1 ; write(2) syscall number
mov rdi, 1 ; stdout
mov rsi, hello ; string address
mov rdx, hello_len ; string length
syscall ; execute the write syscall
mov rax, 60 ; exit(2) syscall number
mov rdi, 0 ; exit status
syscall ; execute the exit syscall
As someone who hasn't touched assembly in over a decade but had been meaning to get back into it, this post actually gives me a good starting point. Thanks for sharing it!
As aw3c2 mentioned, it's the spacing, which I didn't realize, so even if I can shrink the font (which yes I know I can, but why should I be forced to change how the web page was intended to be viewed?) but then it'll be unreadable if I want to cover more ground than having to scroll because of all the comment.
Yeah, line height is one of the biggest problems in web design. I have to change it on many text-heavy pages to make it readable.
It's not ideal, and should really fall upon designers to fix, but a workaround (in Chrome or Firefox) is to edit the page's source (either with Chrome webtools or Firebug) and add a line-height attribute to <p> or whatever tag or class is most applicable, and set it to something like 125% or 135%. Usually results in a much more readable page.