People who compress their go binaries (or any other binaries, really) - please be aware that thus doing, you stop the OS from being able to page out your executable (rarely a big loss), and also to be unable to share executable pages (not a huge loss for a 2MB executable, a huge loss for a 100MB executable).
If there's only one copy of a program running, it won't matter - but if you are running hundreds of copies (even docerized and stuff), you are likely better off NOT upxing.
> and also to be unable to share executable pages (not a huge loss for a 2MB executable, a huge loss for a 100MB executable).
I don't think people care about that nowadays, seeing how popular Docker containers are. I think Docker containers already make it so that you cannot share executable memory between different containers because each one runs in its private namespace.
The only one that could matter here would be mnt. And since bind mounts normally wouldn't cache pages separately, I don't think that would happen for the namespaces either. Happy to be proven wrong though.
More specifically I wouldn't expect "free -m" to produce different result depending on the namespace it's run in.
I think what OP meant is, if you have a page that's backed by disk, then when you're low on RAM, the OS can simply drop that page -- it's sitting there on disk if you ever need it back. But if you have a page that's not backed by disk, then you have to write it out to disk before you can drop it.
Indeed, as jlebar notes - since the actual executable code (the thing that the CPU executes) does not exist directly in the executable file, the OS will have to write it out to the swap if it needs the memory (unlike uncompressed files, where it just reuses the memory and later reloads from the executable file).
It is rarely a big loss, because executables that are in use tend to remain in memory if the program is actually active. If you have a 300MB demon that sleeps, though, you will likely notice a swap out to magnetic disk.
Also if you have the same executable running several times on a machine, they share the RAM for the code (read-only pages, which should be most of them).
That doesn't work for UPX because each execution decompresses anew, which makes it a "new executable" from the OS' point of view.
The only thing that would help is kernel space merging, but that's really only activated for some virtual machines.
I wrote "or any other binaries, really". It's just that
(a) go compiles static binaries which makes UPX especially effective, unlike e.g. common C and C++ projects with their hundreds of .so/.dlls ; Delphi/FPC and Nim or the other two ecosystems that share this trait, but neither is as common as go.
(b) it's not good for non-static binaries, that is C#, Java, Python have no benefit from this.
(c) At the time I posted it, there were already 3 or 4 posts extolling the virtues of compressing Go executables.
A little walk down memory lane:
I once ran the exe mailing list for exe packers and protection tools. There was a whole scene of people in the 90s writing such tools and writing unpackers and removal tools for such things. UPX was one of the later ones that still existed when most of this scene vanished.
(Incidentally, these advanced packers also tend to frustrate RE to some extent, since the same tricks they use to increase compression ratios can often greatly confuse RE tools.)
Yes, this was my first experience with this piece of software. You can pretty clearly tell that it is from UPX by examining the file in a hex editor.
I still have the malicious file on VM for me to do some analysis on it later. (if anyone would like it, feel free to contact me)
edit: added the contact me
I remember cases where the AV successfully detected the upxed executable, but not the original, because upx was so widespread that the most common version of the infected file was upxed.
Compressing them this way however makes situation worse for memory manager.
If you use uncompressed (or transparently compressed by the filesystem) binary, your process has mmaped the memory pages, which can be discarded and then reloaded, as needed.
If you use self-extractor, your process has dirty pages that it itself wrote, that cannot be discarded, but must be moved to swap if needed.
The more you use the same executable for multiple processes, the worse the effect is. The ro mmaped pages are shared among them all, the written pages are private to each process.
I would be surprised to see practical performance degradation in uncompressing executable code before jumping to the program on today's machine. The largest binary in my /usr/bin/ is 50 megabytes. On the other hand, for very, very large binaries it's probably faster to decompress in memory rather than load all the bits from disk.
Further, most executables aren't static these days. (I often wish they were, though!). What type of binaries have you got, and are they really so big that it's worth the hassle to compress them just to save disk space?
The binaries are mostly stuff like pandoc and compiled statically so that I can run them anywhere. Nothing too special.
Its not technically needed, but it makes network transfer faster and in general thats good enough. Its not really intended to reduce disk space really, just more a way to make things more manageable.
Curious — which network transfer protocol are you using that doesn’t support compression on the fly? And if its just for transfer, why not gzip instead?
UPX is usually better than other generic compression utilities. IIRC it actually modifies the executable before compressing it to provide better compression. I think on Windows it reversibly changes relative addresses to absolute addresses so you get better compression rates.
2GiB quotas on home directories is another reason, but explaining it all on hn isn't really a goal of mine. Suffice it to say there are a confluence of issues. As to network compression via ssh say versus compression of the binary, time it you might be surprised at the difference.
Not only does it exist, but it's insanely good at packing go static binaries. I don't remember the compression ratio, but I think it's something like 20% of the original size.
In the 90s I was still starting to explore computers and executables, etc, and it blew my mind when I found about packers and protection tools and RE in general. Good times! Oh nostalgia...
I just remembered: ProcDump32! Geez, that really blew my mind at the time and in a way still does.
I used it to compress a Lazarus (open source Delphi clone) executable. The results were great (executable size reduced by more than 50%, iirc from 2 mb to around 800 kB).
Offering a sub MB executable in the era of 100 MB electron apps is totally pioneer :)
I've seen some posts about it over here recently and believe the critical users/developers mass to make it a successful dev system is achieved, however it still has to struggle to be taken into consideration in some contexts for not using a major language. If it allowed to build GUIs for C++ (QT/*TK and other builders are not even close to its usability level) it would probably become mainstream in a week.
Sometimes i wonder how hard it would be to make some C++ Builder-like modifications to Clang (like Embarcadero did for newer C++ Builder versions) to allow it use Free Pascal objects directly for people who really want to avoid Pascal.
Of course to do it properly it'd need:
* A modified Clang (or other C++ compiler) that can use .ppu files with the necessary C++ language extensions for properties, callbacks, sets, enhanced RTTI, etc
* A C/C++ library that uses the Free Pascal RTL for all memory operations
* Lazarus' CodeTools to add C++ support for automatically creating missing event handler code (and removing unnecessary code), handling syntax completion, code completion for missing identifiers, property getters/setters and private fields, inherited fields, etc to the same standard as the Free Pascal code
* All the involved teams to agree to play nice with each other :-P
Also if such a thing would be done, judging from what most Lazarus and FPC devs do so far, it'd probably be done in a way that is as compatible with C++ Builder as possible.
TBH i don't really hold my breath, but who knows, weird stuff has happened before in both FPC and Lazarus :-P
As someone "on the other side ;-)", I don't think you protected much. UPX is pretty much the classic "Hello World" of unpacking manually, and tools like PEid will still be able to tell it's UPX from the decompressor stub alone.
It's been years since I unpacked a UPX manually, but I still remember what it looks like: a PUSHA at the start to save all the registers, a lot of decompression code, and finally a POPA and a JMP to the OEP. Incidentally this general pattern is also shared by a bunch of other simple packers (more focused on compression than anti-RE) so unpacking them follows the same process.
That's interesting. The UPX string is most likely a name of section in PE file. It's first UPX string you will find in the file.
How did UPX loader managed to find the section in which packed content is stored?
UPD. It's REALLY easy to "hack" this protection. You simply need to attach a debugger and you will see unprotected exe file in the memory. There are tools to convert loaded unprotected exe file into regular exe file on the disk. So... No one really tried to hack you. Sorry.
You can inspect the code running on your machine. The machine code.
At what level should one expect it's user to understand the code running on one's machine? If I have you the source to my application in brainfuck, would that suffice?
Warning: although UPX is awesome, be wary of using it to distribute software to a wide audience as it seems to trigger false positives in some antivirus software.
I'm surprised to hear that --- I can and have seen it happening with more advanced/obscure/protective packers, but UPX is so common and very easily unpacked (and thus scanned by AVs) that I'd say any AV which gets confused by UPX is not worth using at all.
And yet... in our case, the false-positive rate went from about one a month to one a year when we stopped using UPX. For a binary that didn’t change, mind you.
You'd think that after reporting a false positive once, an AV vendor would whitelist the hash of the binary, but no. Some of them were re-detecting malware time and time again. Until we stopped using UPX.
Could UPX put something in the header that said something akin to, 'I am not a signifier of malware, perform your check on the internal contents instead.'
Then AV companies could see that and not flag it as malware unless they had additional reason to think it was.
That doesn't seem like it'd be terribly difficult but there's a good chance I'm missing something.
You're talking as if the AV companies don't know hat UPX is.
They know it very well, but adding code to do decompression while performing scan is more complex and will surely reduce performance.
If the AV is already slow, they might decide to just label any UPX binary, since (let's not lie) most malware will be compressed with UPX or other tools.
If the AV is already slow, they might decide to just label any UPX binary, since (let's not lie) most malware will be compressed with UPX or other tools.
IMHO an AV that doesn't know how to unpack UPX is almost like an AV that doesn't know how to unpack ZIP or RAR... and yet they universally do the latter.
You'd think that after reporting a false positive once, an AV vendor would whitelist the hash of the binary, but no. Some of them were re-detecting malware time and time again. Until we stopped using UPX.
I have a feeling that your false positives are caused by the fact that UPX (and other compressors) naturally create very high-entropy files, and AVs which do signature-type comparisons would like to reduce signature length as much as possible, so they also choose very high-entropy portions of malware to be as distinctive as possible while remaining short; but that also increases the chances of such sequences being found in other benign high-entropy files.
I'm almost willing to bet that your re-detections are not detecting the same malware, but new ones' signatures as the AV vendor adds them --- which coincidentally happens to match some other high-entropy portion of your binary.
Then again, the quest for speed and high detection rates (while false positive rates seem to be less of a concern) among AV vendors has lead to some massively embarrassing mistakes, like considering the mere existence of a path as detection of malware:
PortableApps.com used UPX for most open source releases up until a couple years ago. We stopped due to antivirus false positives combined with the fact that most folks have more space for their apps. We still make available the tool we use called PortableApps.com AppCompactor. It provides a simple GUI to use UPX on a whole directory and sub-directories. Plus it can optionally recompress JAR and ZIP files using 7-Zip. If it's useful to you, you can grab it here: https://portableapps.com/apps/utilities/portableapps.com_app...
We used to compress all our binaries (desktop software developers), but fighting false positives from antivirus vendors became an endless nightmare. We just gave up and stopped using binary compressors entirely.
Funny to see this here, it's been ages since I've seen UPX mentioned. In the early 2000's I had written some software whose executable was around a megabytes or maybe several megabytes in size coming out of VB6. On one of mid 90's test laptops we used at the time to ensure it would run on even the crummiest of machines it launched NOTICEABLY faster when packed with UPX. The hard disk in that machine was so incredibly slow loading less off the disk and decompressing the executable in RAM was easily an order of magnitude faster.
I do most of my web-dev in Nim these days. Meaning my ELFs are ultimately produced by GCC og Clang. Everyting statically linked - and I mean everything: For clib I use Musl. And then I UPX the bejesus out of them.
It's simply nice to ship a fully working app, with SQLite* and everything, which will basically run anywhere with a Linux kernel, in a single executable far below 2 MB.
*) Yes, the vast majority of the world's websites need nothing fancier than SQLite to keep them happy. And manageable.
SQLite-only can be a feature. On my long list of potential projects is a cloud storage/PIM/etc. application (similar to OwnCloud) that only supports SQLite in order to scale badly to more than 50-100 users, thus forcing users to decentralize and federate.
The one time I came across upx, it was used on some malware. It was on a programme named gnome-pty-helper in a user's .config directory that was installed in cron and set to phone home to some locations that were stored in clear once upx had been used to unpack it.
Years back I used gzexe and also some pkzip based thing on DOS. On a modern system, you're better of enabling filesystem level compression which also won't break OS paging if the executable is run more than once.
The exception are NSIS installers, self-extracting archives (exe rar files), files with IDL interfaces.
When NSIS starts they will try to open it's own exe file and find the section in which it's packed data is stored. But UPX will remove those sections and create .UPX section with compressed data.
When I first found upx I did this a couple of times only to fail pretty badly and then I stopped doing it. This was like 7-8 years ago when I first tried the portable version. Never found the cause till today.
I used to use this a lot, back in the bad old days, when drive space was at a premium.
These days I struggle to fill my hard drives no matter how wasteful I am with downloading videos and not bothering to clean up afterwards... and the amount of hard drive space you can buy per dollar keeps growing faster than I can fill my disks.
Much trickier issues to tackle are speed (unless you go with SSD's, but then you run in to space issues again, and reliability issues), backups, and data integrity. All of these issues are made much harder by the sheer amounts of data we're storing these days. Executables usually account for only a relatively small fraction of that space.
I think upx is more useful for static binaries like that of Haskell applications which is kinda huge. (GHC produces huge binaries - eg. pandoc or ghc-mod). A 100 something mb binary is not what you usually have. UPX can work its magic stuff like that. More manageable not necessarily essential but when you need it you need it badly.
Note that upx can also cause compatibility problems. For example when macOS Sierra was releases many older apps that used upx needed to be uncompressed and recompressed with a newer upx version in order to get them working again.
Just to chip in on my experience on this matter. I use UPX as one of the RE defense method on a couple of Delphi based software we build which our customer runs regularly on their servers. One of the challenge is some A/V throws a false positive upon checking the result files. Somehow this became no longer an issue after applying code-signing to the UPX output executables.
How does UPX defend against reverse engineering? The binary literally contains the code to reverse the UPX compression (otherwise it couldn't run), and I'd expect all antiviruses to be able to unpack UPX executables.
If there's only one copy of a program running, it won't matter - but if you are running hundreds of copies (even docerized and stuff), you are likely better off NOT upxing.