Don't want to take the wind out of anyone's sails, but this program is hardly hard-to-hack. Bravo for getting to grips with ELF, assembly and reverse engineering. But this article represented just the first few steps on a long an intriguing road.
If it was hard-to-hack then I would expect (at least) the following:
* Output messages can't be discovered using "strings"
* Program is self-encrypted
* Password isn't even stored, just hash result.
The "hard-to-hack" program presented would take about 30 seconds using IDA[1].
That was mostly the point of the article: that it wasn't so hard to hack in the end and all the information needed to break it was visible in plain sight.
Like you said, if you really wanted to write a hard to hack binary, just use a strong hash without the plaintext on a hellish password. Heck, just leave the hash in the strings output :)
Any chance someone who knows more assembly than me can explain how the symbol names for dlsym() are retrieved?
ie I would have expected to see 'ptrace', 'scanf' and 'printf' in the strings output, but they must be obfuscated in some way (otherwise I guess there's no point using the dlopen/dlsym trick at all.)
I only see one call to dlsym (at 8048506), so it seems to me the program is doing something tricky to build each symbol name string and then calling a routine there to dlsym() it.
That's about where my x86-fu fails me, though, and I remember I should be working on other things. :/
Interesting, but the objdump output is very primitive compared to more advanced disassemblers, which should be able to provide string cross-references etc in-line.
I was actually trying to find a good one to use when first trying to be lazy but to no avail.
A few friends also (unsuccessfully) tried in parallel to get the password and they were using IDA (http://www.hex-rays.com/idapro/) but I have not personally tried it. It seems like a good option (although it is not open source, which irks me a little :P).
I also tried to use an existing ASM-to-C decompiler called Boomerang (http://boomerang.sourceforge.net/license.php) but the output was a complete mess to understand (and compile). Maybe I'll try writing one of these when I'm bored on another lazy Friday :)
Any other (preferably open) recommendations for Linux?
I didn't have much luck using Boomerang recently either.
The REC decompiler (http://www.backerstreet.com/rec/rec.htm) isn't horrible. For simple stuff, it'll give you reasonable looking C-ish code. For anything slightly more complex, it may produce wrong code. It's not so good at eliminating duplicate variables, but manually removing them isn't hard, they're easy to see.
I've recently been reversing a few stripped DLLs on Windows. REC worked well on the short functions but severely changed the logic of a few more complicated ones, especially doing bit shifts, concatenating bytes, and doing complex loops.
I've seen IDA. I'd love to use it but it's expensive and I don't reverse engineer enough to justify asking for the company to buy it. That, and I'd also have to learn how to use it effectively, which would add time and possibly stunt me learning the basics first. Since I'm most certainly doing my work for commercial purposes, the demo / educational versions of IDA aren't usable for me (license agreement says so).
EDIT: REC studio does not appear to be free (as in speech) software but it is free (as in beer) to use for most purposes and it runs on Windows / Linux / Mac.
Probably because writing disassemblers is a pain in the ass. (I have a half-finished x86 disassembler written in JavaScript: https://github.com/luser/disasmx86.js )
That is much better! I had to dig up the articles on the AT&T syntax to understand the plain objdump out but with this, that's no longer needed. I need to get to know my tools much more :)
If it was hard-to-hack then I would expect (at least) the following:
The "hard-to-hack" program presented would take about 30 seconds using IDA[1].[1] http://www.hex-rays.com/idapro/
(And I consider myself an amateur at this kind of thing).