Hacker News new | past | comments | ask | show | jobs | submit login
Where should I start learning Assembly?
67 points by shinvou on Jan 29, 2014 | hide | past | favorite | 53 comments
So, yeah, where to start? I know C, Obj.-C, Java and Python. I am self-taught. Now I want to get in Reverse Engineering and I don't know where to get started. Feedback and help is highly appreciated!



Reverse engineering is quite a different skill set from assembly. Unless you are reverse engineering malware, whatever you are analyzing is unlikely to have been written in assembly or to be heavily obfuscated. Then it's more about knowing how certain high-level programming constructs (think virtual function calls in C++) will be translated into assembly by a compiler, what residual information there might be left in the binary or what all that noise is you are seeing (think C++ templates, destructors called for stack-allocated variables..).

For many reverse engineering projects, assembly might be a wholly uselss skill, since whatever you are looking at is actually MSIL or running on Python with its own embedded interpreter. Here assembly only serves you to quickly tell you would be wasting your time :)


You have to know assembly to be able to understand what you are looking at. Yes, you need to know more than just assembly, but you absolutely need to know assembly.


If you already know C, you can start out by looking at the machine code generated by your compiler with "objdump -d" on Linux and "otool -tV" on Mac. Start experimenting by writing out C constructs like functions, loops, switch statements, etc., and just looking at what the generated code looks like.

Of course, to do that, you need to find the manual for your machine architecture. The x86 manuals are, for example, available here:

http://www.intel.com/content/www/us/en/processors/architectu...

You also then start to notice things like the operating system specific application binary interfaces (ABI):

http://www.x86-64.org/documentation/abi.pdf

and object file formats such as ELF that's used in Linux:

http://www.skyfree.org/linux/references/ELF_Format.pdf

or Mach-O used in Mac OS X:

https://developer.apple.com/library/mac/documentation/develo...

You can also do the same thing with the JVM and look at its JIT-generated machine code with the '-XX:+PrintCompilation' option:

http://stackoverflow.com/questions/13086690/understanding-th...



This is probably the best guide (and I've read a lot of them) to actually writing assembly on your average PC one can get. I definitely recommend reading it.


IMHO, best ASM tutorial ever.


I started with the following book:

http://www.z80.info/zip/zaks_book.pdf

Wonderful book from which a lot of knowledge is applicable to other architectures straight away. It teaches you about planning, control structure implementation and the maths behind it all as well.


The glorious, good old Z80. Highly recommended.

Also, if you get one of those Z80-powered Texas Instruments calculator, you could do pretty neat things.


That's how I got my start. After TI-BASIC, Z80 assembly was the second programming language I learned, at age 12. It turned out to be a great foundation. For one thing, it was fairly easy to understand, from a syntactic perspective. Secondly, it gave me a much better foundation for understanding the lower-level aspects of C, letting me concentrate more on understanding the more complicated abstractions, and what they actually represent.


Code by Charles Petzold [1] is a fantastic introduction. It isn't so much the nitty gritty "this opcode performs this operation, and these are all the tricks to making it do things, edge cases and things you should worry about" and more along the lines of "what opcodes should a CPU have, and how do those translate into electricity flowing through physical wires?" I feel like really thinking through that book made MIPS and x86 assembly much easier for me.

1 - http://www.charlespetzold.com/code/


In addition, if you like "Code", I'd recommend The Pattern on the Stone by Danny Hillis (creator of Thinking Machines' Connection Machine supercomputer).

It's much shorter than "Code" but covers basically the same ground much more quickly, but Code might be better first because it really explains it thoroughly.


Although I cannot claim to know a lot, http://microcorruption.com was a very nice "fun" way to at least start with a small, easy to grasp instruction set.


It's a great way to get into reversing and assembly of you like the "series of puzzles" format of a ctf.


1. i suggest diving a little into a processor architecture first. Z-80 and 8085 are almost the same, conceptually. Once you grasp the fundamentals, you can move onto x86. It too builds upon the architectures mentioned previously. Added concepts are- pipelining, segmentation etc. One of the best sources for me has been- http://www.amazon.com/Microprocessors-Principles-Application...

2. Knowing how the microprocessor works comes really handy while coding assembly as you can't 'catch exceptions' out there. It is like treading a land-mined area and nothing can replace the knowledge of the fundamental terrain- the architecture.

3. Since you know C, you can start with some serious gdb usage, as mentioned by @penberg.

4. Then find your sweet spot between these two ends. You could start with embedded robotics, another viable hobby could be IoT application. Two added advantages of these over 'theoretical' assembly language learning are that-

a) You are doing something with a real-scenario implementation, so you're surely hooked.

b) You can eventually mold a business model around it if you end up with something really innovative.


Start with a computer architecture introduction. The McGraw Hill Computer Science series book "Computer Architecture" did a good job of creating a fictional processor and then designing the machine code for it. "Assembly" is just a way to represent machine code in text files.

That way you will learn what it is the computer is trying to do, and how constraints on how it is built change that.

Then I'd suggest some cheap 8 bit Microprocessors like the AVR series and the PIC series from Atmel and Microchip respectively, (the AVR has solid C support so its probably a better single choice, but the PIC has weirdness associated with architecture constraints which is good to understand as well).

Once you are a pro writing AVR assembly code, then grab a copy of x86 assembly and a description of the Pentium architecture. To do it proper justice start with an 8086 assembly book, then a 286 assembly book, then a 386 one, and finally a Pentium one. That will let you see how the architecture evolved to deal with the availability of transistors.


Get IDA pro and start reversing things with some clear objective. I learned a lot having friends that knew and competing with them to remove limits on commercial software when I was a teenager.

Making trial version complete and so on. Some times it was really easy(just finding a jmp and changing it), other times we had to compare with the complete program, finding code blocks,patching the trial and making all checksums and stuff to work.

None of the software that we cracked was released to the public, it was just for fun.

At the time there was little exercises called "crackme" for exercising your abilities.

It takes at least over a year of work to start being really good at this, and is not like Obj.C, Java or Python, or even c, but way more tedious. Without having friends on this and clear objectives I would had found it boring.

It would be probably a better idea to buy a micro processor and code simple things in assembly, like blinking LEDs.



First, find Core Wars and play it until you can beat the "tutorial" programs. Hell, I should reimplement Core Wars as a JavaScript app doing CodeCombat style instruction for assembly.


Yeah, an MMO Core wars could be fun :)


As an option to jumping into real world assembly language there is Knuth's MMIX [and MIX]. It provides access to the underlying concepts alongside structured exercises. One might say it's an "onramp to the foundations of computer science." I prefer "gateway drug to TAoCP" however.

http://www-cs-faculty.stanford.edu/~knuth/mmix.html

The first fascicle is a free download and the place to start.


I would recommend picking a project that you can do only in Assembly. For me, this was creating a special waveform on a microchip controller. I had to create a custom 800kHz signal using a 16MHz clock, so there was no way other than to respect each and every clock cycle, and make the most of it.

The key is to choose a project that you are excited about. If you pick another blah assembly tutorial, without the excitement of a project pushing you, your enthusiasm will evaporate sooner or later.


Check out the bomb lab from CMUs systems course. Its an assignment specifically designed to teach you assembly and gdb via reverse engineering a binary "bomb". There are 6 levels, and you need to figure out the right password for each level by reading the assembly/inspecting the program via gdb.

http://csapp.cs.cmu.edu/public/labs.html


A good way to learn asm is through books but there are not many for current architectures (especially x64, except the official Intel manuals which are quite good but also hard to read). Nevertheless, there are some on ARM which I can recommend, namely: ARM System Developer's Guide by Sloss, Symes and Wright. ARM Assembly Language by Hohl. ARM SoC Architecture by Furber.

IDA Pro is the industry standard for reverse engineering but it also is expensive (like USD $2k). There is a free version but it doesn't offer 64bit, so not really an option for modern ObjC or Intel computers. As you've mentioned ObjC chances are you work on OS X. IDA pro is not working well on OS X (the recommended way is to use the Windows version via virtualbox and not the OS X version). Still, Hopper.app is a great alternative on OS X. Not as good as IDA, but it has a Python interface, GDB support, and decompile support for ARM, Intel (and some knowledge regarding Objc). And it's only ~USD$100. [There is also a Windows version of hopper.app but it seems not yet ready to use, as I've only heard bad things about it there so far.]


For MIPS (recommended for starting out), check out my post. It walks you through creating the initial program in C all the way through finding its vulnerability and exploiting it. The buffer overflow building is done in Python through Bowcaster. http://csmatt.com/notes/?p=96 (also check out the links at the end). Good luck!


I'm writing a tutorial in x86-64 assembly on OS X that you might enjoy: https://plus.google.com/+MagnusHoff/posts/9gxSUZMJUF2

Its focus is actually writing assembly on an acutal computer, with the goal of implementing a snake game.


If you're on a mac, XCode has a really nice feature: using the Assistant Editor (press the "bowtie icon"), you can get (dis-)assembly parallell to your source code and step through it with the debugger. A really convenient way of learning what's going on, and also understanding potential inefficiencies!


Well, that depends how comfortable you are thinking in terms of machine code. It takes a completely different mindset because you're now literally dealing with blocks of memory -- even more so than C.

It also depends how steep of a learning curve you want to encounter. I, personally, have not yet played with x86 assembly because the documentation for them is so unfriendly for beginners. To that end, when I want to play around in Assembly and learn techniques for that level of programming, I usually play with the DCPU (http://dcpu.com/dcpu-16/). It's fake and was designed for a (sadly) not-to-be-made game. But it is an absolute joy to program in.

Play around with that until you're comfortable and THEN tackle x86.


As an intermediate step, you could also study LLVM bitcode. It should give you a good idea of what assembly languages "feel" like without tying you to a particular architecture. It is easy enough to write smallish programs in the ASCII format and assemble them with llvm-as.


I'd second what others have said and go with a micro like an avr or a pic. Tons of open source support and a small system you can totally "own" will help you understand not just the code but how computers execute code at the lowest human-legible level.


Id start with ARM first. Its a lot easier to pick up and is a lot easier than x86. Also take a look at the C++ itanium abi. It can be found on the GCC website. It explains the rules of going from C++ to assembly.



I disagree. It diverges into HLA (high level assembly) which is pretty much a macro monoculture that is tied to this book and nothing else. I was rather disappointed with the book.


And most people have a brain tumor after reading it because of his use of high level abstractions.


I found this on HN a while back... This is a fun way to get your feet wet:

https://microcorruption.com/

I would also grab a copy of Art of Assembly Language.


I can suggest this free book called "PC Assembly Language" by Dr Paul Carter.

http://www.drpaulcarter.com/pcasm/

The tutorial has extensive coverage of interfacing assembly and C code and so might be of interest to C programmers who want to learn about how C works under the hood. All the examples use the free NASM (Netwide) assembler. The tutorial only covers programming under 32-bit protected mode and requires a 32-bit protected mode compiler.


Try having some fun with Core War. (https://en.wikipedia.org/wiki/Core_War)


As always learning by doing is the best, look at this old school website: http://www.japheth.de/index.html Aside of it's manual, he also recommends the (partially free) book http://www.phatcode.net/res/223/files/html/toc.html


http://flatassembler.net/ is very good assembler (linux, win, dos) http://flatassembler.net/docs.php is good place to start and http://board.flatassembler.net/ is very good place to explore


I enjoyed Jeff Duntemann's "Assembly Language Step-by-Step". I see there is a 3rd edition. Nice writing style and overall fun read.


Yes! I came in here to recommend that one as well. He does an excellent job of not only talking about the mechanics of the language, but also the system components to which the mechanics directly relate: and he does so in a way that is both easy to understand and thorough. Such a good book.


I second that. That book is excellent.


I found it very useful to read the Intel software developer's manual to get an understanding of the instruction set. If doing this for the x86 architecture seems too daunting at first, a fun alternative is to read the manual for the AVR microcontroller which powers the Arduino and then program an Arduino in assembly.


A good place to start programming assembly are on micro controllers (Arduino etc.). They have a more limited set of instructions, registers etc, and an easy to grasp memory layout. The development environments also often come with a pretty good debugger/simulator so you can step through your code and we how it works.

Good luck!


This isn't the most aesthetic site, but the content really is top-notch. If you really want to learn assembly (MIPS, in particular), I can't recommend this enough:

http://chortle.ccsu.edu/AssemblyTutorial/index.html



I highly recommend Computer Systems: A Programmer's Perspective

http://www.amazon.com/Computer-Systems-Programmers-Perspecti...


Because Transport Tycoon is written in Assembly by Chris Sawyer. (I know, pretty amazing right?)


1988?


Which assembly? x86, PowerPC, ARM, MIPS?

Personally my favourites are 6502 (http://skilldrick.github.io/easy6502/) and 68k (http://www.easy68k.com/) tho' neither of these are realistically of any commercial use.


6502 is great for getting into assembly. It counts as tiny and I've done a great deal of things on the c64 including fixed point arithmetics, cellular automata and the like. Also a good place to start your "descent" into low-level code that article that was recently on here, http://www.chiark.greenend.org.uk/~sgtatham/cdescent/


An example of a practical modern 8-bit assembly language to learn would be for Atmel AVR microcontrollers. It's small enough to get your head around, and useful for DIY hardware projects.


I concur that 6502 programming is great fun. When I had a bout of C=64 nostalgia recently, I created a simple Tetris clone as a relearning project:

https://github.com/cjauvin/tetris-c64


x86 ? This should get you started: http://www.asmcommunity.net/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: