If it has access to source code, it can instrument the build process, and obtain disassembly that is high quality enough to support rewriting. Using it's scheme API you can modify the CFG of each procedure directly, serialize the rewritten parts out as nasm, and even relink with the object files you don't have source for.
It works with any build system, and supports gcc / as / ld and cl / link.
So it may not have actually been written using a custom pl.
Also interesting that the CodeSurfer product referenced above is "sponsored by several government agencies, including the US Air Force, the US Navy, the Office of the Secretary of Defense, and the Department of Homeland Security", according to its own website.
There are some similar products from sgvsarc called Crystal REVS from SGV Sarc (http://www.sgvsarc.com/products.htm). Does anyone know how they compare against Codesurfer and related products from GrammaTech ?
But surely the compiler (assuming a compiler is used) would convert the new high level language into regular Scheme primitives - I think it's unlikely that the result wouldn't be identifiable.
No... the product allows you to write scripts to manipulate its machine code IR database in scheme, and then spit out the machine code as nasm assembly, assembly them, and then run the appropriate linker in the same way that was used to produce the original exe. Scheme is used as a macro language. So you use scheme to say: change the code at EA 0xdeadbeef from a mov to a jmp. You can reorder functions, insert and remove code, etc. It works because it has very high quality disassembly based on observing compiler and linker invocations and introspecting the artifacts involved.
Ahh, that makes more sense, I thought it meant simply creating a higher level language from Scheme rather than manipulating the last stage(s) of producing the binary.
What exactly is the benefit of obfuscating the source language? Your hypothesis that it's written in Scheme is reasonable, but a DSL by any other name is a basket of Lisp macros. It's not a new language, but at the same time, it's kind of a Domain-specific language.
At any rate, I don't think that if it was Scheme that the goal was to obfuscate that it was written in Scheme.
See my comment below. I don't think you quite understood what I meant. I'm not saying the code was written in scheme. I'm saying there is a product that allows you to write scheme macros to manipulate a database of machine code IR derived from disassembly and then turn the modified database back into an executable.
Hiding the source language makes identifying the origin of the malware difficult. There are obvious reasons to do that.
Hiding the source language makes identifying the origin of the malware difficult.
How so? Knowing that it was written using VC would hardly help identifying the origin.
That's not to say that you're wrong about the tool used, but I don't believe the goal was to cover their tracks, but some kind of optimization. Viruses often face space constraints.
I am not knowledgeable enough to say much on this topic, but I was wondering if maybe such rewriting would also serve to make it easy to mutate code to change its signature?
I will be repeating a notion I read on YCombinator elsewhere - but I, too, find it incredibly cool that we live in a time when wars are fought online like that.
We have online revolutionaries anarchists and REAL nation-wide revolutions, started on online networks (talking about Arabic Spring here); we got FBI agents, looking through IP addresses on IRC networks to catch a small group of bragging attackers; we got invisible army of Chinese hackers that noone knows who they are, only that they are really good; some unknown entity making amazingly well done and thought out trojan like stuxnet and now duqu, that seems to be right from pages of some hyperbolic comic book; and, last but not the least, the Russian mafia lords employing Zeus trojans and whatnot to make botnets that mine bitcoin, purely digital currency.
It's an amazing world we live in. Can't wait what the future will bring.
You seem to be pretty enthusiastic about rather worrying and even disturbing developments. This is not a science fiction novel, this is real life. One day it is Israeli hackers destroying Iranian centrifuges, perhaps the next day it will be nuclear reactor facilities that are sent into meltdowns.
And yet, I can't help myself but watch in fascination as all this happens. Maybe it's because this time, the war is fought with means and tools I understand (if only a little)? Maybe.
Maybe it has something to do with the morbid fascination people have with anything destructive - the World War II books and movies are still sold like cakes, while noone actually wants to repeat the world war.
I, for one, appreciate your enthusiasm. I mean, it's just amazing to look at how far man has advanced from just 100 years ago.
And for me, it's definitely not about the destruction. I just think it's awesome to realize just what humans are capable of. Now, humans are also capable of some ridiculously destructive things. But the problem with @adriand's point of view is that it's as if there's something to be lost from all of Man's destruction. When really, what significance does Earth have in the Universe anyway?
It's pretty significant to us, because we're here :) Seriously, from our point of view, including adriand's, there is something to be lost from Mankind's destruction because it would be the end of our entire species. From the perspective of the universe as a whole, it may not be significant, but it seems eminently sensible for humans to be concerned about it.
Also, it is possible that Earth is significant to the universe because it's the only place where intelligent life has arisen. I don't expect that this is actually the case, but so far we have no evidence to the contrary, and if it were true it would be tragic if we wiped ourselves out by playing with virii and nuclear technology.
500 years ago all human settlements on Earth had always been agrarian economies with different levels of development but primitive nonetheless, until the industrial revolution started.
What if it had never happened? Now we now there were previous attempts at industry, like steam engines and chemistry, but there was always a war, a drought or some other disaster that destroyed the framework where said developments were made, and sometimes even killed the people making them (Archimedes for example).
What if the universe is just like that? what if we're the most advanced species and all the aliens out there are either animals or still haven't even figured out how to build clocks or engines?
If that's the case then all the knowledge that ever existed would die with us.
It's quite likely that humans need conflict in their lives to feel complete. The area of evolutionary psychology points strongly in that direction as does a subjective observation of our history.
I find online wars to be much more favorable than actual bloodshed. Even an occasional meltdown would be much less disastrous than real life conflict.
Maybe so, likely though this is what the erosion of existing power structures looks like first hand. I think we should just be thankful it is relatively peaceful compared to the natural order of such events historically speaking.
...more likely an embattled people facing an existential threat resort to any means possible to slow down and thwart the efforts of an enemy developing a doomsday weapon.
I find this theory by a commenter on the original Kaspersky blog post very interesting. I certainly don't really have the knowledge to judge if it has any merits, but I still find it amusing nonetheless.
It's very cool from a geek / technological point of view but I can't help but think that non-lifesaving technology ultimately brings us little good, if you look at the big picture.
"Lifesaving" is a pretty nebulous concept. Does a plow which enables higher crop yields and thus less starvation constitute as life saving? Or a weather satellite, which allows prediction of rain patterns and more efficient irrigation usage? Modern technology is all interconnected, and only a few extreme leaf nodes could be unambiguously declared non-life saving.
Unless I'm mistaken it looks like a very dynamic language. The screenshot they're showing seems to point at initialisation of a new object, which actually copies function pointers for each of its methods. That's not needed for static languages which would just point to vtables. It looks like it doesn't use real GC though - object's destructor is called right away on a failed allocation. And the destructor is possible to change too...
So something like compiled javascript sans GC really. Or maybe like precompiled python.
Doesn't seem very obfuscated either imho - there's a bunch of static data copied in a series of moves. If someone really wanted to obfuscate those, this looks like a fairly low hanging fruit: grab a list of 5+ mov-s of constants and change them into xor+copy of a memory range to confuse pointer detection.
Can you see any more characteristics in that fragment?
-Everything is wrapped into objects
-Function table is placed directly into the class instance and can be
modified after construction
-There is no distinction between utility classes (linked lists, hashes) and
user-written code
-Objects communicate using method calls, deferred execution queues and
event-driven callbacks
To me this just seems like someone wrote their own little OO system in C, similar to how GObject works. The book Object Oriented Programming with ANSI by Axel-Tobias Schreiner[1] even has example types which use the nomenclature 'ctor' and 'dtor' as in the snippet of code they show (See section 2.5, page 17). It isn't hard to write a little class generator that writes out all this boilerplate code[2] from a C++/C# like input file. The benefit is, of course, the resulting code size and avoiding any linkage to the std C++ library.
The inconsistent placement of the "this" argument in function calls seems to support this being C. The vtable moving around would indicate that each class layout is hand-written, though.
Yeah, the author made a point of noting "this" could be in a register or the stack, but that to me just says "C". The functions moving around wouldn't necessarily mean it is written by hand, though. There just needs to be some rules governing the system and we don't know what those rules are (yet).
I would just be very surprised if this is anything other than some convention developed on top of C.
Differing calling conventions can point to combination of hand crafted object system in C with some custom code generator with some high level input that produces machine code directly without C in between. When you generate machine code that does not directly interface with system libraries it is often useful to ignore platform ABI calling conventions and make up your own.
Perhaps they use some kind of right-to-left fastcall convention. Or maybe they are just unconventional, putting "this" at the end of the parameter list, hence ending up in different registers or the stack depending on the number of arguments?
Perhaps if they posted more examples.. I could see it being useful to put the "data" before the vtable in certain types so that one could simply cast the type to get the value instead of having to call an accessor function. A string type could have the char* as the first member. A linked list could have the data void* as the first member. If they posted a complete list of all the types they have found and which ones had non-standard vtables, it might be easier to make a call on whether it was done by hand or not.
> The code your referring to .. the unknown c++ looks like the older IBM compilers found in OS400 SYS38 and the oldest sys36.
> The C++ code was used to write the tcp/ip stack for the operating system and all of the communications. The protocols used were the following x.21(async) all modes, Sync SDLC, x.25 Vbiss5 10 15 and 25. CICS. RSR232. This was a very small and powerful communications framework. The IBM system 36 had only 300MB hard drive and one megabyte of memory,the operating system came on diskettes.
> This would be very useful in this virus. It can track and monitor all types of communications. It can connect to everything and anything.
But this comment doesn't ever mention specifically what makes his suggestion "look like" the given examples. I find it highly unlikely, given all the available networking/comm libraries available that old, proprietary IBM code would be used. Maybe there's something to it, but he certainly didn't mention anything convincing.
More unusual (to me) is that there are two separate comments suggesting it may be RPG (an OS400/iSeries language), which is very unlikely due to it not being an OOP language therefore not having constructor/destructor functionality, and otherwise a very high level language.
I'd guess some high level assembly, though this suggestion does look interesting.
Writing an unpolished programming language isn't that much work in comparison to writing a complex virus. Especially low level languages where instructions map pretty closely to the CPU instructions are easy to create.
I think it makes a lot of sense to write a custom programming language/compiler because virus scanners tend to use fingerprints to recognize dangerous pieces of code. So you want a compiler that deliberately obfuscates the code it writes and also outputs instructions in such a way that it avoids triggering known virus scanner fingerprints.
Agreed. Writing compilers is easy; The "hard" aspects of creating a new language usually boil down to issues like tooling, documentation and support libraries. In the case of a virus, the only users of the language are the virus authors and the language can be highly tailored to the domain.
Instead of trying to compile code examples in every candidate PL, they should:
1. Crawl x86 binaries from the Internet / download sites / code archives.
2. Write M/R job, which will disassemble and look for patterns they discovered.
3. Once patterns found - investigate the source of binary (i.e. who uploaded it to download site, maybe it was on university FTP server or maybe it's part of commercial driver released by company XYZ).
It might well be a macro language and not compiled. For example, HLA (http://en.wikipedia.org/wiki/High_Level_Assembly) has many of the features that are present here. It has its own library functions, objects/classes, and produces code that looks bit like it was compiled.
Looking the assembly I see two things: (1) no name mangling. So either this was lost in the decompilation/deciphering phase by the Kaspersky guys or mysterious language does not support method overloading. (2) the assembly looks reasonably tight and optimized, so a solid code generation backend (GCC,LLVM,MSVC,...) was used.
Especially because of the name mangling i was thinking of Vala [0]. However, Vala relies on GObject and does probably not work on Windows. Anyways, I guess it's an OO language compiled to C in an intermediate step. This would explain (2).
Read some hacker/cracker zines from the late 80's/early 90's (Phrack, 2600, etc being the well known ones) and you'll quickly gain a deep perspective on the pace of computing/IT security advancement over the decades. Stuxnet and Duqu will suddenly appear to be a minor revisions of 20 year old technology/techniques. You'll see code for polymorphic engines written in TurboPascal suddenly break out into obscured x86 assembler interrupt handlers. Detailed analysis of reverse engineering efforts against "packed" code on foreign and obscure architectures. And the word "cyber" used in all seriousness.
It is quite amusing to see how many old ideas from the dawn of computing have been reinvented multiple times over the years.
Not necessarily. Occam's Razor. It's more likely that they used something obscure to compile or obfuscate the code or wrote a tool to do so. Creating a new language just to write this seems highly unlikely.
Is writing something like this in pure assembly beyond the realm of possibility for some reason? There are still a few dedicated people that code on the metal for high level operating systems (Steve Gibson comes to mind immediately)
Writing something like this in assembly isn't impossible (or that difficult), but the patterns look like something that you wouldn't be using if you were writing in assembly.
My bet is that it's just C with a hand-rolled OO framework.
is it dumb to suggest that someone who understands c++ (or other compiled oo language) would/could write assembler in this way? just from reading the description (things like variable locations of method tables, various registers for "this", and lack of memory management) it sounds like it could be handwritten, but structured in a similar way to c++.
The disassembly snippet looks like typical C/assembly handcrafted object system without real classes, but responses by Kaspersky guys in blog comments seem to imply that it uses these objects even in places where it is highly impractical when writing code manually. So it's possibly C/assembly written by typical hardcore Java programmer, but I find that highly unlikely.
If I read that properly, it sounds like Objects have their own function tables- this would seem to indicate an object oriented language based on prototypal inheritance.
I've seen the theory that Stuxnet/Duqu was developed by a state thrown around a lot, but what's the actual evidence?
It's not particularly hard to write a simple programming language. Worms are very specialized pieces of code. It doesn't seem that crazy that someone would create a language tailored for worm development.
The sheer complexity is off the charts. Stuxnet's sophistication and complexity is an cybernetic equivalent of Manhattan project. E.g.: As a domain expert on industrial automation of this kind Langners states that whoever created Stuxnet _had to have a testing facility_. How many hackers do you know who build uranium enrichment centrifuges to test their cyber attack tools?
Yeah thought so.
Edit: TL;DR: There are two pieces of evidence. No.1: Motive, No.2: Sophistication and complexity.
They learnt from watching all the research firms reverse engineer Stuxnet and eventually stop it. What they are doing is obfuscating the output. If you look at a default DLL or EXE build from VS it is amazing how much information is included that helps you attach a debugger and work out how it works.
The authors learnt from the Stuxnet experience and I wouldn't be surprised if they are not testing their own worm using black-box reverse engineering tools to figure out what the research guys will work out when they eventually find it in the wild.
This has worked so well that Kaspersky think that the authors actually invented a new language, when it is likely still just C++, some machine generated code, some obfuscator tools (game makers have been using them for years to stop crackers) and likely manually changing the outputted assembler.
> The authors learnt from the Stuxnet experience and I wouldn't be surprised if they are not testing their own worm using black-box reverse engineering tools to figure out what the research guys will work out when they eventually find it in the wild.
Don't they mention that these components were floating around in 2007?
Where does it say that? All the references are to 'Duqu Framework', which they recently found, I may have missed something
They also completely rule out C++, C etc. when what they should be ruling out is C++, C compiled with a standard VS compiler (or an easily recognizable compiler). It is silly to completely rule out C++ and C just because they don't immediately recognize the output and because it doesn't reference anything else
"Duqu was first detected in September 2011, but Kaspersky Lab believes it has seen the first pieces of Duqu-related malware dating back to August 2007"
Why would creating your own programming language for a virus be a good thing? If viruses are the only thing written in this language, wouldn't the language make it easier for the anti-virus companies to detect it without having to worry as much about false positives?
It also seems to me that a new language for this exact purpose is unlikely, however, it could very well be a proprietary or otherwise unknown language that may have been built for another purpose (internal company, domain-specific development, etc) and not often seen in this context.
Normally compilers compile each operation into a single destination machine code chunk. If you ignore efficiency, there are many ways to implement the same operation in machine code. A Cracker Compiler could contain multiple destination chunks and randomly choose one at compile time.
Writing a detector for a virus with infinite code representations would be difficult.
Please forgive, and correct, me if I have this wildly wrong:
This is referred to as Stuxnet 2. And the original was "proven" to have been made to attack Iranian nuclear labs, and what not. Conclusion being that it was made by some government agency. I suppose foil hat theory would point fingers at CIA/NSA type people.
Assuming the above is correct, or correct enough, its it not surprising to see what might be a new language for this virus, if it has a nation state's resources behind it? If that is the case, what chance is there that any one will be able to crack this mystery?
This makes fairly little sense to me. Why wouldn't one write such a virus using straight ASM, or possibly write a VM in ASM and write the payload in the VM's bytecode (this option makes it particularly easy to do metamorphic code, though doing it with well-written ASM is also very possible)? It seems like creating a custom language -- or hacking up compiled C++ or whatnot -- is a bit of overkill considering that the basic tenets of virus writing are: keep it simple, don't get caught; this wouldn't aid in either of those.
Perhaps because this is no simple virus? Perhaps because Stuxnet/Duqu is one of if not the most sophisticated examples of cyber warfare* we have public examples of? Perhaps because its compiled binary is a whopping half-megabyte in size?
The amount of effort that went into Stuxnet is truly massive. It'd be no surprise if a custom framework or even a domain-specific language were written to support it.
* Usually I cringe at just hearing the phrase cyber warfare and at the ridiculous way in which it is used by media and the government. But looking at what Stuxnet accomplished (physically damaging or disabling critical components of the Iranian nuclear enrichment program), the term applies completely.
Maybe they just embedded something like RTOS-32. The company that makes it (www.On-Time.com) say it can be integrated with Visual Studio and used along side Visual C++.
What driverdan said makes sense. It is much more likely that they obfuscated or masked their code instead of inventing an "unknown" programming language. Even so, the obfuscation method used I'm sure would be pretty cunning (seeing as these folk mean business).
It's interesting to see Kaspersky suggest that the state is behind this solely based on some unknown code. Does anyone know why that would be a likely conclusion on their part?
From what I gathered from the article, this doesn't appear to be obfuscated since the purpose/method is transparent in the decompiled code, it simply doesn't match the constructs you would expect to see in common programming languages when decompiled.
I like the idea that it is an OO language that forgoes its own runtime library and instead uses the Win32 API as its native library. This would be a great language to write viruses in -- perfect for just glueing together APIs and doing some very small scale business logic without having to learn C++.
http://www.grammatech.com/research/products/CodeSurferx86.ht...
If it has access to source code, it can instrument the build process, and obtain disassembly that is high quality enough to support rewriting. Using it's scheme API you can modify the CFG of each procedure directly, serialize the rewritten parts out as nasm, and even relink with the object files you don't have source for.
It works with any build system, and supports gcc / as / ld and cl / link.
So it may not have actually been written using a custom pl.