Hacker News new | past | comments | ask | show | jobs | submit login
Stuxnet is now on GitHub (github.com/laurelai)
248 points by steipete on Feb 13, 2011 | hide | past | favorite | 66 comments



Decompiler output; product of the HBGary/Anonymous dump. The most interesting thing here is probably the emails from HBGary folks about StuxNet in the accompanying blog post[1]. (For public purposes, the CEO wants everyone to know, they know nothing about it, but Aaron Barr was talking about it with various people anyway.)

Unfortunately, the decompiler output doesn't convey much as it stands, unless you like sorting through pages and pages of

    local199 = local191;
    local203 = local191 + 0x6f02418d;
    local3 = proc2(0x10021238, param1, param2, param9, param5); /* Warning: also results in local190 */
    local208 = local3;
    local209 = local190;
    local211 = local203;
That being one of the more interesting sections; there are stretches with dozens of lines in a row of the form "localfoo = localbar".

It does seem to suggest, at least, that this dump didn't have the actual source.

[1] http://crowdleaks.org/hbgary-wanted-to-suppress-stuxnet-rese...


A silly decompiler that doesn't even recognise for loop with a break (https://github.com/Laurelai/decompile-dump/blob/master/outpu...)

That said... I wonder how useful would github be as a collaborative decompilation space - grab a file, annotate / simplify as much as possible, send pull request, repeat...


'collaborative decompilation space' (github-based or not) sounds like a sick idea, actually.


I wonder if Google's translation tools could be adapted to the task: "Suggest a better decompilation," perhaps?

Does anybody have a good reference for how decompilers work? When converting assembly to C by hand I always went through multiple passes (not necessarily in this order): first a literal pass that looked a lot like this dump with gotos and variables named after registers, then I'd identify if/else and switch blocks, then I'd convert any reverse gotos into a while loop, convert while loops into for loops, identify data structures, duplicate lines that had been optimized to a single reference with jmps, etc.


Pretty much that, yes.

Also, you have specific idioms for popular compilers (vc, gcc, icc, borland) and value propagation. You could even include an SMT solver to identify constraints and propagate the range to child basic blocks.


Bincrowd

http://www.zynamics.com/bincrowd.html

The plugin is open source and the community server is free.


I wanted to post about this, but I could not remember the name and googling things like '(collaborative OR crowdsource) static binary analysis' was futile. Thanks, it was driving me insane.


Although in this case, the object code may have been deliberately obscured. On reflection, I can't help wondering how much of the seemingly pointless diddling of local variables is the result of turning a naive decompiler on a bytestream where the actual instructions are somehow encrypted or masked.


I really don't think so. The decompilation result is genuinely silly. For example look at:

    if (*(int*)(local2 + global0) == 0x4550) {
        local5 = *(unsigned short*)(local2 + global0 + 20);
Either local2 or global0 are pointers to a structure. Neither is detected. Most sensible decompilers would change the names and just make it `local5 = global0[local2].blah` instead. Advanced decompilers also can avoid gotos in almost any case. It also uses function calls without parameters or output, assuming some stack layout probably.

It's a really really bad output, not a result of obscured compilation.


I've seen HexRays do the same thing - most of the time you need to actually tell it that a variable is a pointer before it tries to do that.

Stating the function type explicitly also helps it display the right arguments given to it. This output could be significantly improved if the assembly version were available as well.


This came from one of the better known commercial decompilers. Which ones do you consider 'advanced'?


Pointed out before: http://news.ycombinator.com/item?id=2214052 - this is from boomerang - not a known commercial decompiler really.


Hex-rays is much much better than this. You'll have to see the actual output to understand, but at least here are some examples. It does not do such a bad job with type identification, although it certainly isn't perfect.

http://www.hex-rays.com/hexcomp11.shtml


Stuxnet didn't really do anything to actively thwart reverse engineering. See here: http://rdist.root.org/2011/01/17/stuxnet-is-embarrassing-not...

Truly obfuscated code will just plain cause the Hex-Rays decompiler to crash, producing absolutely no output at all.


I don't think it would necessarily crash, unless you intentionally wrote code to target a specific bug in Hex-rays itself.

You can certainly get Hex-rays to output confusing information on purpose without actually targeting it. If this was obfuscated, you'd expect that would be much more likely.


You could compile the decompiled code with an optimizing compiler, then decompile that, or use GCC's C backend, or something to automatically simplify the program?


No. There is just too much information lost going from source to binary and when you use optimization the results are more information lost, not less.

The idea works well enough if you just want get optimized asm though. http://events.ccc.de/congress/2010/Fahrplan/events/4096.en.h...


Is this decompiler output accurate? I'm suspicious because proc12 [¹] looks like memcpy, except it's advancing a byte at a time and copying an int at a time:

  *(int*)local6 = *(int*)local5;
  local5++;
  local6++;
Either the memcpy code is strangely broken, or the decompiler is putting int * where it means char *.

[¹] See full code at https://github.com/Laurelai/decompile-dump/blob/master/outpu... and same for proc20: https://github.com/Laurelai/decompile-dump/blob/master/outpu...


"(For public purposes, the CEO wants everyone to know, they know nothing about it, but Aaron Barr was talking about it with various people anyway.)" Which is of course not a good idea in the PR 2.0 age. Anyone found emails explaining why they did this?


"Throughout the following emails it is revealed that HBGary Federal may have been planning to use Stuxnet for their own purposes"

what does that even mean? nonsensical.


Parses fine here.

"Here are some emails that show this company was planning to use Stuxnet"

Stuxnet is a distributed bot. It can be commanded. The post is saying this company planned to do that.

I don't know whether that's true or not but it seems easy to understand.


late reply but hopefully you'll see. AFAIK, stuxnet is not an application where you can simply 'remove' the centrifuge-targeting payload and add your own. That, combined with the fact that all 4 0-days have been patched, means that I don't think the author's commentary is accurate. There are much better options for HBGary had they wanted command of a distributed botnet and they would have known that.


Many of the emails in that article are ordered newest to oldest... not a great link.


this is just in time for the RSA show that is starting on Monday.


it's just building the stack frames in C. at the end this is just assembly with C syntax - and the reason why I rather like disassemblies than decompilates.


For those curious, a Microsoft employee broke down each of the exploits that Stuxnet used at a conference recently: http://www.youtube.com/watch?v=rOwMW6agpTI


For anyone watching this video, zoom to 6:55 to skip a long, dull discussion about filling empty seats in the audience.


that's a great talk. I like how most of the bugs within the OS logic as opposed to overflow bugs. It does lend itself to the stability of the exploits


Thx for this link, very kewl talk.


Title insults the intelligence of HN readers, that's obviously nothing more than minimally annotated Hex Rays decompiler output.

There is nothing new to see here. A quick Google search for "stuxnet.zip" reveals other samples, undamaged by some PR whoring idiot running it through IDA.


>There is nothing new to see here.

Has anyone suggested using Github as a social space to work on decompiling malware before? Github has some potential to be a unique space to work on this sort of problem in a collaborative setting. The story is that someone thought of that.

As for the title, I'll forgive it just as I'd forgive any other nascent Github project stating its goal rather than its present state in the link. The point is to get interested people working on it together.


Given Stuxnet's purported sophistication, I'd be shocked if it didn't employ obfuscations that rendered decompilation ineffective, at least prior to annotation within IDA first, which doesn't appear to have been done here.

Decompiler output is nice to glance at quickly, but as demonstrated elsewhere in the thread, it is only of superficial benefit when faced with even remotely complex code. For example, it cannot discover a struct's fields - they must be manually inferred and input into IDA before decompilation. A mess will result on trying to merge the output from a run with this information with a run that did not have it.

There are tools already in use for collaborative disassembly over the Internet, but a Github repo containing auto-generated source is not one of them. For all intents and purposes, it looks like someone's made minimal use of the IDA GUI without much clue for what they're doing. That's why I called it a PR stunt.


This is not Hex-rays output. It is much better than this. This is boomerang, which is a free decompiler that hasn't been maintained much lately. You'll note that not much has been updated on their page since 2006.

http://boomerang.sourceforge.net/

This github project is pretty much useless for those who want to learn about Stuxnet. Better to load the binary into IDA Freeware instead.

Stuxnet does appear to be an unusually large project (base classes, ungainly modular structure) for malware. This reinforces what I said earlier about its lack of stealth for the payload.

http://rdist.root.org/2011/01/17/stuxnet-is-embarrassing-not...

It does not appear to be sophisticated in any way except for its payload, which some evidence seems to claim was carefully constructed (e.g., with a PLC testbed). The "embarrassing" fact I was referring to in the above post is that its lack of stealth revealed its payload to the world, and no competent intelligence agency has that goal if the purpose of the worm itself is to do some damage.

Perhaps the worm is a way to draw the heat off the real deployment method. Or it is industrial sabotage gone awry. There is still not enough evidence to come to any conclusions on it, except this is not what an eleet cyberweapon would look like if you were to find one.


I don't know how any reasonably intelligent person could continue to stubbornly insist that stuxnet was too lame to be done by a government agency. This isn't even a question, of course it was. It isn't even speculation anymore. The only question is which one(s).


to me it looks like it was done by an external contractor _for_ a government agency.

cyber-warfare is the shit now - and everyone wants a piece of that cake. (even if it means mixing 0days with ex-javacoders)


Yeah, good point - I would lump that in as being the same thing. They usually get companies like General Dynamics to do this type of thing. Point is, it wasn't amateurs or "basement patriots," and it cost a lot.


I like the fuss this "recent discovery" makes all around the web when this article here: http://ma2moun.com/blog/2010/09/stuxnet-source-code-samples/ is FOUR MONTHS OLD, and contains the exact same code output than the github "source code" (The second screenshot has the exact same content as 016169EBEBF1CEC2AAD6C7F0D0EE9026.c)


I wonder if anyone is going to send a DMCA takedown to github.....


What's the license? :-)


Man oh man. That would take a long time to figure out. Is this really the best a C decompiler can do?


This looks like the output of HexRays [1], which is the best C decompiler I've seen. Defining some structures and typing the variables would make it a lot more readable in some cases. I've used HexRays to make sense of C++ games several times, after investing enough time to define and use data types, the result looks [2] much better, but then again, that's unobfuscated and not optimized C++, not old-school "I know what I'm doing and I don't need the compiler second-guessing me" C.

Unfortunately there's still a lot decompilers can't do, mostly compiler tricks (e.g. storing a pointer to the middle of a structure instead of its beginning, and using negative offsets to access other struct members [3] - this is usually obvious when looking at it, but HexRays can't make sense of it so it falls back to raw pointer math).

[1] http://www.hex-rays.com/ [2] http://img824.imageshack.us/img824/1188/hexrayscpp.png [3] https://github.com/Laurelai/decompile-dump/blob/master/outpu...


If you have it installed at the moment (I don't unfortunately), you can always give it a go for comparison... The binary is apparently available from http://tuts4you.com/request.php?3011 (pass tuts4you)



Depends how much you want to spend on it. Hex-Rays for example is pretty decent and not even comparable to the mess on github.


Actually, the mess on github is hex-rays.


https://github.com/Laurelai/decompile-dump/blob/master/outpu...

"WARNING: CHLLCode::appendTypeIdent: declaring type void as int for param2"

http://www.google.com/codesearch?as_q=WARNING%3A+CHLLCode%3A...

http://boomerang.sf.net/ is clearly what was used.


Not likely, unless it's some ancient version. See http://img715.imageshack.us/img715/1051/hexraysstx.png posted above - that's hex rays. Not sure which one was used, but 1.1 gives similar output.


Yep, you're right. I was accidentally talking out of my ass. Sorry :)


This decompiled output shows a very nice example where C can really be no more than "glorified assembly". (Anyone help me with proper attribution for the quote?)



What do you suppose a decompiled chunk of optimized Haskell would look like?


Heck, you don't even need that. Put this in a file named "Hello.hs":

    module Hello where

    main = print "Hello world!"
and run "ghc -fvia-C -keep-hc-files Hello.hs". Compilation may fail with linker errors, but today we don't care about that. We just want to look at the resulting Hello.hc file, which is the C representation of the Haskell, and generally looks like:

    II_(rhG_closure);
    II_(si4_closure);
    FN_(Hello_main_entry) {
    FB_
    if ((W_)(((W_)Sp - 0x10UL) < (W_)SpLim)) goto _cip;
    Hp=Hp+2;
    if ((W_)((W_)Hp > (W_)HpLim)) goto _cip;
    Hp[-1] = (W_)&stg_CAF_BLACKHOLE_info;
    ;EF_(newCAF);
    {void (*ghcFunPtr)(void *);
and so on for quite a while longer. Passing this through a compile and decompile step might actually clean it up a bit....


Without a very clever decompiler, I'd say `ugly'.


I admit to basing this on nothing more than a gut feeling, but a "sufficiently smart decompiler" seems even less plausible than a "sufficiently smart compiler".


My first version was just `ugly'. But I guarded it.


On the contrary, I think the decompiled output highlights the difference between C and assembly, because it looks nothing like typical C code.


So I see GoTos in that code. I hope that was the decompiler not the author because thats just sacrilege.


Ah, that's a classic approach: no clue, but strong opinions. "goto" hardly qualifies as a sign of bad code; otherwise the Linux kernel would be in a terrible position. If used well, it's a valid solution to many flow-problems.


I know goto is not necessarily a sign of bad code and there are situations where they are needed. That being said even though I am not the most experienced, I have never needed to use a goto (except QuickBasic maybe).

The linux kernel works very well but does "just working" necessarily mean that that is the only way have it function. "It works" doesn't mean its good code.

I did not look at the code very closely but one goto I saw seemed to point from the main loop into a function. Which to me seems like an odd use, a use that can easily be avoided.


> I have never needed to use a goto

That's because you have not yet programmed enough / read enough programs.

> I did not look at the code very closely but one goto I saw seemed to point from the main loop into a function. Which to me seems like an odd use, a use that can easily be avoided.

Goto from one function to another? Not possible in standard C. If this was written in C it is most probable that compiler optimisation caused the direct jump. Even if you were not indeed talking of something like that you missed the obvious that compilation optimisation and imperfection of the decompilator are the far more probable cause of presence of curious goto.

You could also have noticed the far more interesting use of cast of char* to function pointer which are then called. Did the original programmer wrote this mess? Hint: probably not too...

As for the quality of the source code of the Linux kernel, well, apart from if you work on safety critical software (in which case it's like comparing apples to carrots anyway, because they are far more important differences to general purpose software than the mere presence or absence of gotos) I doubt you've seen a lot of far better code.

Indeed I even doubt you could advance reasonable reason for why goto should be 100% banished in all situation, especially when doing dynamic allocation or resources. Maybe you don't even have ever read the original Dijsktra's paper or the Knuth's paper http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pd... (curiously (or not) it's the case for many cargo cult goto haters even for just the orinal Dijkstra's paper)


I'm a little surprised that nobody has pointed out this is a decompiled version of the code. That means it took machine language and tried to convert it back to C. Given that machine language only has goto in it, anywhere where you might see a loop in this code is from the decompiler looking at a stereotypical usage of the jump instructions and reverse engineering a C loop statement.

At the base level, programs are nothing but gotos strung together. Everything else is added at a higher layer.

Also, critiquing the coding style of decompiled output is particularly missing the point. But at least the brace style is consistent, right? Almost as if it were rigidly generated by a tool from another source of input, rather than by a human, no?


You just need to make sure you're using it well if you use it :)


Unless you have proper support for functions. Optimized tail calls are as cheap as goto, but cleaner in the code.


since when is linux kernel a reference on good code?


Since 1991 :)

Seriously, it is one of the biggest open-source C codebases out there, with thousands of hackers working on it. And the result is very good, as the majority of the internet is running on it.


I don't argue that linux itself is bad, it's obviously very good overall. That doesn't say anything about overall code quality though - as usual there are good, bad and really ugly parts.

"I've looked at the source and there are pieces that are good and pieces that are not. A whole bunch of random people have contributed to this source, and the quality varies drastically." -Ken Thompson




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: