Hacker News new | past | comments | ask | show | jobs | submit login
“Hello World” in assembly language on Linux (jyotirmoy.net)
133 points by yomritoyj on April 22, 2015 | hide | past | favorite | 51 comments



Smidge simpler and uses only kernel routines

    #   as hello.S -o hello.o
    #   ld hello -o hello
    #   ./hello
    
    .data                         # .data section starts
    msg: .ascii "Hello, World!\n" # msg is ASCII chars (.ascii or .byte)
    len: .int 14                  # msg length is 14 chars, an integer
                                  # .int 32 bits, .word 16 bits, .byte 8 bits
    .text                         # text (instruction code) section starts
    .global _start                # _start is like main(), .global means public
                                  # public symbols are for linker to link into runtime
    _start:                       # _start starts here
       movl len,  %edx            # value 14 copied to CPU register edx
       movl $msg, %ecx            # memory addr of msg copied to CPU register ecx
       movl $1,   %ebx            # file descriptor 1 is computer display
       movl $4,   %eax            # system call 4 is sys_write (output)
       int  $128                  # interrupt 128 is entry to OS services
    
       movl $1,   %eax            # system call 1 is sys_exit (prog exits)
       movl $77,  %ebx            # status return (shell command: "echo $?" to see it)
       int  $128                  # call OS to do it via interrupt #128


ARM version:

    .data                                                                                       
    msg: .ascii "Hello, World!\n"                                                               
    len = . - msg                                                                               
                                                                                    
    .text                                                                                       
    .global _start                                                                              
                                                                                            
    _start:                                                                                     
      mov     r0, #1                /* fd 1 = stdout */                                         
      ldr     r1, =msg              /* message */                                               
      ldr     r2, =len              /* length of message */                                     
      mov     r7, $4                /* write */                                                 
      swi     #0                    /* syscall */                                               
                                                                                            
      mov     r0, $0                /* status */                                                
      mov     r7, $1                /* exit */                                                  
      swi     #0                    /* syscall */
I couldn't get the # style comments to work, so had to resort to /* */ instead. It's very similar because the code is quite simple, but could be a good starting point if you are interested in ARM assembly. You can use the same commands as above to build and run.


To use syscalls x86_64 assembly (at least on linux), you'll want to use the syscall instruction, and set up the registers according to the linux 64 bit syscall calling convention (which is different than the 32-bit convention).


Well.. to be fair as a tutorial on assembly language the linked article (which does nothing but make a C-linkage function (main) to call another function (printf)) is going to be a better choice.

Things like system calls and program startup are more advanced topics you'd introduce later, even if they make for shorter code.

And to nit even further: your example is "lower level" but still relies on a metric ton of magic being executed by the linker. If you're allowed to "just invoke ld" the OP's "just call and return from main" trick doesn't sound so bad.


There is no magic in what the linker is doing. It just needs to create an ELF file with two sections and set the start address.

Check this out http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...


Sigh... "just". Find me 100 programmers capable of writing a C function to call another and tell me how many of them understand the ELF format, or even know that it exists. It's a subject worthy of its own tutorial (which you conveniently linked to), and specifically it's a subject that is rather more complicated than how to write an two x86 Linux system calls.


this is a much better example since it does away with the needless complexities, and is a good starting point for looking at interrupts and how they are used in this environment... since their importance isn't hidden behind a layer of needless logic and a c library call.


Except the interrupts are not the efficient way to make a system call anymore. The specialized syscall (sysenter) opcode is much better.


[deleted]


I explicitly mentioned that opcode. By name, even.


right, but it keeps it very simple


I think it's # ld hello.o -o hello


Looking at that and getting reminded of how far removed from it modern languages are.

Who thinks in terms of CPU registers these days, for instance?


> Who thinks in terms of CPU registers these days, for instance?

The people for whom it makes sense, same as the people who think about RTCs or deep sleep modes or programming single-board computers to read temperature sensors.

It isn't like the low level has gone away or become completely inaccessible. It's that now we have more options to write working code, and we can optimize for things beyond cycle-level and byte-level efficiency. Optimizing for readability, for example, wasn't really an option when all of the readable algorithms were unacceptably slow due to constant terms.

Breaking every rule about data-hiding to get somewhat better constant terms in the big-O analysis isn't virtuous, it's just what was forced on us by insufficient hardware.


"Who thinks in terms of CPU registers these days..."

That's pretty much the only way I think these days, but I guess assembly does not qualify as "modern language" even though it can and is used to control the lion's share of the world's "modern" computers. Matters little to me; I'm hooked.

I do not write in Lua, and I am sure I will be quickly corrected by someone who does, but isn't it considered both (virtual) "register-based" and "modern"? My sincerest apologies to the Lua experts if I am wrong.


The VM uses registers (as opposed to e.g. a stack), but that's an implementation detail and not anything you think about when you write Lua.


This is my sentiment exactly. With all these fancy functional languages, non-imperative programming paradigms and heaps of JavaScript frameworks, there's one thing that one might forget: computers are made of silicon, not category theory.


It is easy to write down a bunch of assembly instructions. It's hard to make them do the thing you want, and it is our desires and our intent that need more than a little silicon to express.


It's been a while since I did any assembler, but one thing I remember: If you seriously want to write anything GAS may not be your best option. nasm has a much saner syntax (and if you ever happened to have used tasm or masm it's much more similar to that).


Syntax, in assembly, is the least of your problems.


I couldn't disagree more.

Having tried the perfection that is MASM, I'm always disappointed by what NASM lacks in comparison and I'm not at all comfortable with GAS.

Choosing a good assembler, an assembler that you like, that you're comfortable with, is critical.


YASM might be a better choice if you are starting from scratch.


Hello world example is a great misdirection.

Heavy lifting is done by an external routine being called.

That does not make you learn assembly language.

ASM is about interrupt (bottom half), calling other piece of assembly, saving and restoring the registers, and above all mastering the mov, and the art of deciphering how registers are used.

This is only just assembly quiche programming.


The next step would be to replace the printf() call by a call to write() and remove the dependency to libc.


Is it me or is his explanation of call and ret backwards? Shouldn't call decrement %rsp and ret increment it since the stack grows downward?


OP here. You are right. Fixed now.



i'm not sure if this is good.

for one it uses more code than necessary to do things it doesn't need to do (checking arguments, storing the string in two pieces)

for another its just not in the spirit of hello world, which is supposed to show something that just displays hello world.

its making assembly look harder than it is.


How would someone go about learning assembly these days? I have seen quite a few assembly posts for starting out but never know where to go from there.


Assembly Language step by step by Jeff Duntemann remains one of my favorite books overall (not just programming, not just computers). It was updated in the last few years and the 3rd edition remains quite good.


Definitely a great book! Once you're done, check out the MindShare books (especially their book on protected mode).


Nice, these look superior to the intel books (which intel graciously printed then mailed to me for free like 10 years ago, go intel!). Ill check them out.


> An integer or pointer return value is returned in %ebx.

Is this a typo? The return value is stored in %rax or %eax, not %ebx.


Went to see the printf or puts implementation rather then just calling a library function. Then saw Intel assembly and reminded myself why I've never wrestled with it.


This is the GNU assembler AT&T syntax though, which is known to be horrendous. Other assemblers have much nicer syntax.


Agreed; most programmers I know who write a lot of x86 Asm tend to stick with the Intel syntax. I've never found the "it's easier to parse" argument for AT&T syntax particularly convincing, especially as GAS is written in C while some of the Intel syntax assemblers existing at the time were themselves written in x86 Asm. Neither is the "it makes it more consistent with the other arch's assemblers" - just look at GAS' syntax for MIPS and ARM. It seems to me like someone was really obsessed with the SPARC syntax (which I find roughly as horrendous.)

http://x86asm.net/articles/what-i-dislike-about-gas/


I don't think either syntax is nicer than the other. It's just a matter personal preference and familiarity.


Intel syntax has more intuitive memory access syntax:

AT&T syntax:

    movl mem_location(%ebx,%ecx,4), %eax
Intel syntax:

    mov eax, [ebx + ecx*4 + mem_location]
The other differences are minor compared to that, IMHO.

https://en.wikipedia.org/wiki/X86_assembly_language#Syntax


I can agree with most of that, except for the fact that Intel arguments are backwards. That throws me off every time. I don't know of any other assembler syntax that uses that argument order.


For me, I remember the notation by correlating it to its "high-level" equivalent.

   mov eax, [ ptr ]
is like

   eax = *ptr;  // or, eax = ptr[ 0 ];
The offset/multiplier memory addressing format for AT&T syntax was always more troubling for me. Coming from a TASM/MASM/NASM/PASCAL/x86 background first, it felt "icky" to put offsets outside of the "brackets" (or parenthesis, as it were) [0][1].

[0] https://github.com/lpsantil/rt0/blob/master/src/lib/00_start...

[1] https://github.com/lpsantil/rt0/blob/master/src/lib/00_start...


Standard ARM, MIPS, PowerPC, x86, and Z80 Asm syntax all have the destination on the far left.

68k, Alpha, PDP-11, SPARC, and VAX have the destination on the far right.

The order is probably as contentious as the great endianness debate, but I think one of the most awkward parts of having src, dst order is that subtraction looks backwards. I prefer dst, src because it corresponds closely with the direction of assignment in higher-level languages:

    op a, b, c       ; a = b op c
    op a, b          ; a op= b


I'd love to see a version of this done using the linux amd64 syscall abi. I've written similar programs for 32bit linux (I know the syscall instruction by heart: int 0x80), but I understand 64bit is more complex.


I don't know if I would call it harder, it's just a bit different is all.

Here you go:

  ; Hello World, linux x86_64, nasm syntax

  section .rodata                 ; Begin read only data section
  hello: db "Hello, World",0x0a   ; String, 0x0a is \n
  hello_len equ $-hello           ; $ is current address, length is address after string - address of start of string

  section .text          ; begin code section
  global _start          ; export _start so the linker can see it

  _start:                ; program entry point
      mov rax, 1         ; write(2) syscall number
      mov rdi, 1         ; stdout
      mov rsi, hello     ; string address
      mov rdx, hello_len ; string length
      syscall            ; execute the write syscall

      mov rax, 60        ; exit(2) syscall number
      mov rdi, 0         ; exit status
      syscall            ; execute the exit syscall


The syscall version isn't much more complex. You can compare the syscall ABIs here [0]

[0] http://en.wikibooks.org/wiki/X86_Assembly/Interfacing_with_L...


Sounds complicated, if AMD and Intel both use different instructions for system calls: http://wiki.osdev.org/Sysenter http://articles.manugarg.com/systemcallinlinux2_6.html


syscall is the standard ABI for a 64-bit kernel in Long Mode [0].

[0] http://wiki.osdev.org/SYSENTER#Compatibility_across_Intel_an...


As someone who hasn't touched assembly in over a decade but had been meaning to get back into it, this post actually gives me a good starting point. Thanks for sharing it!


Is it just me or is that font way too large?


I think it's the line-height, super hard to keep track for me.


Sure would be nice if your Web browser allowed you to make text bigger or smaller whenever you pleased.


As aw3c2 mentioned, it's the spacing, which I didn't realize, so even if I can shrink the font (which yes I know I can, but why should I be forced to change how the web page was intended to be viewed?) but then it'll be unreadable if I want to cover more ground than having to scroll because of all the comment.


Yeah, line height is one of the biggest problems in web design. I have to change it on many text-heavy pages to make it readable.

It's not ideal, and should really fall upon designers to fix, but a workaround (in Chrome or Firefox) is to edit the page's source (either with Chrome webtools or Firebug) and add a line-height attribute to <p> or whatever tag or class is most applicable, and set it to something like 125% or 135%. Usually results in a much more readable page.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: