Hacker News new | past | comments | ask | show | jobs | submit login
Static analysis of an unknown compression format (epita.fr)
222 points by kalenz on April 7, 2012 | hide | past | favorite | 12 comments



That's a fairly common compression technique on PS3 titles. Breaking the file into regularly sized chunks about 64kb (for lzma) permits decompression of the file in parallel on the SPUs as the dictionary and decompressed data can fit entirely in local store.

Since optical drives have terrible seek times and low bandwidth most assets are stored compressed on disc.


Is there any tool to generate bitmaps from binary files? For example using each byte as grey value and letting the user specify the column width. Or with more than one byte and then using color.

Wouldn't this be a trivial and somewhat useful thing to see structure in binary files?


Something I wrote a while back, using Python and PIL.

    import sys
    import math
    from PIL import Image

    def getSize(len, width):
        h = int(math.ceil(float(len) / width))
        return (width, h)
        
    def getImage(file, width=256):
        d = file.read()
        s = getSize(len(d), width)
        im = Image.new('L', s)
        im.putdata(d)
        return im
        
    def main():
        if len(sys.argv) < 2:
            print 'Usage: file2img.py <file> <output> [width]'
            return
            
        with open(sys.argv[1], 'rb') as f:
            if len(sys.argv) > 3:
                width = int(sys.argv[3])
                img = getImage(f, width)
            else:
                img = getImage(f)
            img.save(sys.argv[2])
        
    if __name__ == "__main__":
        main()
Here's a visualization of user32.dll on Windows. Notice how you can actually make out the images that are embedded in the resources!

http://i.imgur.com/7yEHR.png


You might find this useful: http://corte.si/posts/visualisation/binvis/index.html

and related: http://corte.si/posts/visualisation/entropy/index.html (visualizing entropy in binary files)

I used his software on an unknown format from the 1988 game Circuit's Edge to find out if/where the file was encrypted or compressed. It works well and was able to give me a (very) general idea of where in the file I needed to look further.

Edit: Nope, I remembered incorrectly. I used it on the game's .exe to tell me if I was dealing with a compressed executable. It turned out that it was compressed and so I went on a search for an ancient unpacker.

See before and after:

edge.exe packed: http://imgur.com/PlHGl

edge.exe unpacked: http://imgur.com/jeFLb


In my favorite static analysis toolset (Radare, http://radare.org/), there's a nice feature similar to this. It "zooms" out on a big file by summarizing portions of it into chunks and displaying colored ascii characters in a chart to represent the block.

With the entropy and ASCII printable filters setup, it makes repeating patterns and TEXT sections pretty clear.


It is very simple to create the binary versions of http://en.wikipedia.org/wiki/Netpbm_format from the command line, if you have that binary chunk in a file. Something like:

  echo P4 > file
  echo "640 480" >> file
  echo 255 >> file
  cat binaryBlob >> file


When doing simple graphics work at university, I would name files that contained nothing but pixel data foo.raw and IIRC some image editors would offer the options you mention when opening the files, then display them fine. Maybe that was before .raw got associated with cameras (and presumably more complex).


raw (can be) interpreted from any stream of binary data, you can open EXEs in photoshop as RAW, save it as a PNG, transfer it, then resave it as a raw and it will run.


Both GIMP and Photoshop can read any binary file as a "raw" file format, where you can specify the bit-depth (e.g. greyscale, RGB, RGBA, etc) and column width.


Reminds me of reverse engineering an email archiving format used by EMC IIRC. It used an ancient PKZIP compression algorithm to compress COM IStream data and then wrapped the compressed data in what was essentially a proprietary linked list.


I think this overlooks the classic technique of reverse engineers all over, IDA Pro!


Funny, I was just about to start playing "Tales of Symphonia" when I read this...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: