Decompiler output; product of the HBGary/Anonymous dump. The most interesting thing here is probably the emails from HBGary folks about StuxNet in the accompanying blog post[1]. (For public purposes, the CEO wants everyone to know, they know nothing about it, but Aaron Barr was talking about it with various people anyway.)
Unfortunately, the decompiler output doesn't convey much as it stands, unless you like sorting through pages and pages of
That said... I wonder how useful would github be as a collaborative decompilation space - grab a file, annotate / simplify as much as possible, send pull request, repeat...
I wonder if Google's translation tools could be adapted to the task: "Suggest a better decompilation," perhaps?
Does anybody have a good reference for how decompilers work? When converting assembly to C by hand I always went through multiple passes (not necessarily in this order): first a literal pass that looked a lot like this dump with gotos and variables named after registers, then I'd identify if/else and switch blocks, then I'd convert any reverse gotos into a while loop, convert while loops into for loops, identify data structures, duplicate lines that had been optimized to a single reference with jmps, etc.
Also, you have specific idioms for popular compilers (vc, gcc, icc, borland) and value propagation. You could even include an SMT solver to identify constraints and propagate the range to child basic blocks.
I wanted to post about this, but I could not remember the name and googling things like '(collaborative OR crowdsource) static binary analysis' was futile. Thanks, it was driving me insane.
Although in this case, the object code may have been deliberately obscured. On reflection, I can't help wondering how much of the seemingly pointless diddling of local variables is the result of turning a naive decompiler on a bytestream where the actual instructions are somehow encrypted or masked.
Either local2 or global0 are pointers to a structure. Neither is detected. Most sensible decompilers would change the names and just make it `local5 = global0[local2].blah` instead. Advanced decompilers also can avoid gotos in almost any case. It also uses function calls without parameters or output, assuming some stack layout probably.
It's a really really bad output, not a result of obscured compilation.
I've seen HexRays do the same thing - most of the time you need to actually tell it that a variable is a pointer before it tries to do that.
Stating the function type explicitly also helps it display the right arguments given to it. This output could be significantly improved if the assembly version were available as well.
Hex-rays is much much better than this. You'll have to see the actual output to understand, but at least here are some examples. It does not do such a bad job with type identification, although it certainly isn't perfect.
I don't think it would necessarily crash, unless you intentionally wrote code to target a specific bug in Hex-rays itself.
You can certainly get Hex-rays to output confusing information on purpose without actually targeting it. If this was obfuscated, you'd expect that would be much more likely.
You could compile the decompiled code with an optimizing compiler, then decompile that, or use GCC's C backend, or something to automatically simplify the program?
Is this decompiler output accurate? I'm suspicious because proc12 [¹] looks like memcpy, except it's advancing a byte at a time and copying an int at a time:
"(For public purposes, the CEO wants everyone to know, they know nothing about it, but Aaron Barr was talking about it with various people anyway.)"
Which is of course not a good idea in the PR 2.0 age. Anyone found emails explaining why they did this?
late reply but hopefully you'll see.
AFAIK, stuxnet is not an application where you can simply 'remove' the centrifuge-targeting payload and add your own.
That, combined with the fact that all 4 0-days have been patched, means that I don't think the author's commentary is accurate.
There are much better options for HBGary had they wanted command of a distributed botnet and they would have known that.
it's just building the stack frames in C. at the end this is just assembly with C syntax - and the reason why I rather like disassemblies than decompilates.
Title insults the intelligence of HN readers, that's obviously nothing more than minimally annotated Hex Rays decompiler output.
There is nothing new to see here. A quick Google search for "stuxnet.zip" reveals other samples, undamaged by some PR whoring idiot running it through IDA.
Has anyone suggested using Github as a social space to work on decompiling malware before? Github has some potential to be a unique space to work on this sort of problem in a collaborative setting. The story is that someone thought of that.
As for the title, I'll forgive it just as I'd forgive any other nascent Github project stating its goal rather than its present state in the link. The point is to get interested people working on it together.
Given Stuxnet's purported sophistication, I'd be shocked if it didn't employ obfuscations that rendered decompilation ineffective, at least prior to annotation within IDA first, which doesn't appear to have been done here.
Decompiler output is nice to glance at quickly, but as demonstrated elsewhere in the thread, it is only of superficial benefit when faced with even remotely complex code. For example, it cannot discover a struct's fields - they must be manually inferred and input into IDA before decompilation. A mess will result on trying to merge the output from a run with this information with a run that did not have it.
There are tools already in use for collaborative disassembly over the Internet, but a Github repo containing auto-generated source is not one of them. For all intents and purposes, it looks like someone's made minimal use of the IDA GUI without much clue for what they're doing. That's why I called it a PR stunt.
This is not Hex-rays output. It is much better than this. This is boomerang, which is a free decompiler that hasn't been maintained much lately. You'll note that not much has been updated on their page since 2006.
This github project is pretty much useless for those who want to learn about Stuxnet. Better to load the binary into IDA Freeware instead.
Stuxnet does appear to be an unusually large project (base classes, ungainly modular structure) for malware. This reinforces what I said earlier about its lack of stealth for the payload.
It does not appear to be sophisticated in any way except for its payload, which some evidence seems to claim was carefully constructed (e.g., with a PLC testbed). The "embarrassing" fact I was referring to in the above post is that its lack of stealth revealed its payload to the world, and no competent intelligence agency has that goal if the purpose of the worm itself is to do some damage.
Perhaps the worm is a way to draw the heat off the real deployment method. Or it is industrial sabotage gone awry. There is still not enough evidence to come to any conclusions on it, except this is not what an eleet cyberweapon would look like if you were to find one.
I don't know how any reasonably intelligent person could continue to stubbornly insist that stuxnet was too lame to be done by a government agency. This isn't even a question, of course it was. It isn't even speculation anymore. The only question is which one(s).
Yeah, good point - I would lump that in as being the same thing. They usually get companies like General Dynamics to do this type of thing. Point is, it wasn't amateurs or "basement patriots," and it cost a lot.
I like the fuss this "recent discovery" makes all around the web when this article here: http://ma2moun.com/blog/2010/09/stuxnet-source-code-samples/
is FOUR MONTHS OLD, and contains the exact same code output than the github "source code" (The second screenshot has the exact same content as 016169EBEBF1CEC2AAD6C7F0D0EE9026.c)
This looks like the output of HexRays [1], which is the best C decompiler I've seen. Defining some structures and typing the variables would make it a lot more readable in some cases. I've used HexRays to make sense of C++ games several times, after investing enough time to define and use data types, the result looks [2] much better, but then again, that's unobfuscated and not optimized C++, not old-school "I know what I'm doing and I don't need the compiler second-guessing me" C.
Unfortunately there's still a lot decompilers can't do, mostly compiler tricks (e.g. storing a pointer to the middle of a structure instead of its beginning, and using negative offsets to access other struct members [3] - this is usually obvious when looking at it, but HexRays can't make sense of it so it falls back to raw pointer math).
If you have it installed at the moment (I don't unfortunately), you can always give it a go for comparison... The binary is apparently available from http://tuts4you.com/request.php?3011 (pass tuts4you)
This decompiled output shows a very nice example where C can really be no more than "glorified assembly". (Anyone help me with proper attribution for the quote?)
Heck, you don't even need that. Put this in a file named "Hello.hs":
module Hello where
main = print "Hello world!"
and run "ghc -fvia-C -keep-hc-files Hello.hs". Compilation may fail with linker errors, but today we don't care about that. We just want to look at the resulting Hello.hc file, which is the C representation of the Haskell, and generally looks like:
I admit to basing this on nothing more than a gut feeling, but a "sufficiently smart decompiler" seems even less plausible than a "sufficiently smart compiler".
Ah, that's a classic approach: no clue, but strong opinions. "goto" hardly qualifies as a sign of bad code; otherwise the Linux kernel would be in a terrible position. If used well, it's a valid solution to many flow-problems.
I know goto is not necessarily a sign of bad code and there are situations where they are needed. That being said even though I am not the most experienced, I have never needed to use a goto (except QuickBasic maybe).
The linux kernel works very well but does "just working" necessarily mean that that is the only way have it function. "It works" doesn't mean its good code.
I did not look at the code very closely but one goto I saw seemed to point from the main loop into a function. Which to me seems like an odd use, a use that can easily be avoided.
That's because you have not yet programmed enough / read enough programs.
> I did not look at the code very closely but one goto I saw seemed to point from the main loop into a function. Which to me seems like an odd use, a use that can easily be avoided.
Goto from one function to another? Not possible in standard C. If this was written in C it is most probable that compiler optimisation caused the direct jump. Even if you were not indeed talking of something like that you missed the obvious that compilation optimisation and imperfection of the decompilator are the far more probable cause of presence of curious goto.
You could also have noticed the far more interesting use of cast of char* to function pointer which are then called. Did the original programmer wrote this mess? Hint: probably not too...
As for the quality of the source code of the Linux kernel, well, apart from if you work on safety critical software (in which case it's like comparing apples to carrots anyway, because they are far more important differences to general purpose software than the mere presence or absence of gotos) I doubt you've seen a lot of far better code.
Indeed I even doubt you could advance reasonable reason for why goto should be 100% banished in all situation, especially when doing dynamic allocation or resources. Maybe you don't even have ever read the original Dijsktra's paper or the Knuth's paper http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pd... (curiously (or not) it's the case for many cargo cult goto haters even for just the orinal Dijkstra's paper)
I'm a little surprised that nobody has pointed out this is a decompiled version of the code. That means it took machine language and tried to convert it back to C. Given that machine language only has goto in it, anywhere where you might see a loop in this code is from the decompiler looking at a stereotypical usage of the jump instructions and reverse engineering a C loop statement.
At the base level, programs are nothing but gotos strung together. Everything else is added at a higher layer.
Also, critiquing the coding style of decompiled output is particularly missing the point. But at least the brace style is consistent, right? Almost as if it were rigidly generated by a tool from another source of input, rather than by a human, no?
Seriously, it is one of the biggest open-source C codebases out there, with thousands of hackers working on it. And the result is very good, as the majority of the internet is running on it.
I don't argue that linux itself is bad, it's obviously very good overall. That doesn't say anything about overall code quality though - as usual there are good, bad and really ugly parts.
"I've looked at the source and there are pieces that are good and pieces that are not. A whole bunch of random people have contributed to this source, and the quality varies drastically." -Ken Thompson
Unfortunately, the decompiler output doesn't convey much as it stands, unless you like sorting through pages and pages of
That being one of the more interesting sections; there are stretches with dozens of lines in a row of the form "localfoo = localbar".It does seem to suggest, at least, that this dump didn't have the actual source.
[1] http://crowdleaks.org/hbgary-wanted-to-suppress-stuxnet-rese...