Hacker News new | past | comments | ask | show | jobs | submit login
File format wiki (archiveteam.org)
167 points by chei0aiV on Dec 9, 2015 | hide | past | favorite | 41 comments



I worked in video games reverse engineering for several years, so I'll take the occasion to quote a recent tweet from a friend/colleague of mine:

> At HandmadeCon @mike_acton brought up the game data format problem. I've reverse engineered hundreds of proprietary formats from games, and he's right that all I would need to make things simple is a basic description of the file data. Just struct declarations are enough. No encryption has stopped me, I always find the data I need, and your data all has the same general architecture, so there's no reason a game developer should be afraid of the public knowing how their game's data is laid out. This zeitgeist of closely guarding that information is doing you more harm than good — moddability is a key feature of many successful games, and all modders want is docs.

--

There's a lot of other industries that could stand to learn from this. Reinventing the wheel sucks. Doing your own, proprietary thing in your corner, taking on extra maintenance work, extra work to modify existing tools to make them compatible with your own image/archive/whatever format... it all sucks and nobody wins.


Any good sources you'd use to get off the ground running with file format reverse engineering?

I have a couple small projects on the backburner at work that require me to reverse engineer file formats. I've been able to decode a few basic details out of the files by looking for patterns in a hex editor but the finer details are still escaping me.


I'm self-taught so I won't be much help there. The usual path starts with learning assembly. If you have skills in reverse-engineering the actual executables, reversing the file formats is as simple as tracing back what happens when the files are read.

It gets tougher when there are anti-debugging protections in place but that generally happens only in the video games industry.

If the format is simple enough though, you will be fine with a good hex editor and some pattern recognition.


Along with learning assembly, find an active project looking for help. I cut my teeth on deconstructing\modding Oni (see http://wiki.oni2.net/ONCC for example) and it was great to have other people to bounce ideas off of.


The only way I've been able to do it is start writing a program that reads the header and prints details.

Your other option is something like QuickBMS - on mobile, can't link.


Correct me if I'm wrong, but haven't the various game modding and emulation scenes been notoriously anti-open-source? Even the mighty Xentax had political issues which resulted in the creation of ZenHax


That wildly depends on the community. Often enough, they are mostly composed of teenagers who got into scripting their favourite game. Which is awesome but unfortunately it means that they lack a lot of experience, end up in bad terms with the developers etc.

As a counterpoint, the community I founded around Hearthstone (http://hearthsim.info/) is extremely open source friendly and on excellent terms with the developers. It's really about how you decide to manage it.


> Just struct declarations are enough

To echo this with code, I've open sourced my collection of game archive unpackers. Given how similar they all are, this is a typical example of how to add a new decoder:

https://github.com/shish/pyge/blob/master/archive/PackDat3.p...

(Note that two of those 6 short strings are user-friendliness metadata and can be omitted, and one of the others is redundant -- "Magic Bytes", "Header Format" and "Directory Entry Format" are all you really need in most cases)


When you say you worked in video game reverse engineering, can you elaborate at all? That sounds incredibly interesting.


Most of my work was on World of Warcraft. I wrote an early Django version of what is now WoWDB (http://www.wowdb.com/).

The work consisted mainly in reverse engineering WoW's file formats, protocols etc in order to understand, index and organize data, and to be able to produce "datamined" patch notes as soon as new game files were available.

I got to work with and learn from people a thousand times smarter than me, it was by far my favourite job and it's why I still do various Blizzard-related work today.

It also creates an interesting relationship with the game's developers. I think on that front, many studios could learn a lot about working with the people interested in the technical aspects of their game, rather than working against them. Most get it horribly wrong, few of them get it right at all - Riot is one of the few that do, I wish I enjoyed LoL. :)


For an interesting yet sad comparison, see Rockstar vs Avalanche - GTAx vs Just Cause 2 and 3. While the modding future of JC3 is precarious due to Denuvo DRM (blame Square Enix), JC2 is pretty much the gold standard.

http://www.eurogamer.net/articles/2013-12-06-avalanche-gives...

vs

http://www.eurogamer.net/articles/2015-08-10-rockstar-bans-g...


Yeah. This happens a lot. It's infuriating.

Studios that get too large for their own sake and get completely disconnected from their userbase. They no longer see the players, the fun or even the game - they just see the numbers. Business guys thinking a video game company can be run like a bank.

The devs get blamed for it, too; as if they had anything to do with it. These decisions are made high above them.


You might want to talk him into putting that info on the fileformats wiki.


I already contribute to these types of project whenever I can, and I keep a close eye on the archive team.


FileFormat.info probably deserves a mention, given that has pretty much the same objective: http://www.fileformat.info/format/a.htm


Wow that site actually had a ton of stuff in a good structured way.

just go to: http://www.fileformat.info/format/${file_extention}/internal...


Doesn't FileFormat.info already 'solve the problem'? What does the new wiki offer that FileFormat.info lacks, and which couldn't be added to FileFormat.info?


You may not publish, copy, display, distribute, transmit, perform, modify, create derivative works from, or sell any Materials, information, products, or services obtained from this Site, except as otherwise expressly permitted under applicable law and as described in these Terms of Use. FileFormat.Info retains all right, title, and interest to the Materials.


You can also help the Archive Team by running the warrior which archives websites that are closing: http://www.archiveteam.org/index.php?title=ArchiveTeam_Warri...


A bit off topic, but I would like to know why did they choose mediawiki as a platform?

Under my limited experience working with wikipedia and wikidata I've seen it's not the best option to a) store structured data and b) edit the pages (markdown is, imho, much better for that).


Mediawiki is honestly the only option when it comes to user-friendly wiki platforms.

[Last time I said that, someone recommended Moin. I hope this won't happen this time - Moin is a UX joke]


Dokuwiki's UI is fine. Also, what about TWiki/FOSSwiki?


I think it's hosted somewhere that has automated tools for installing common software, and Mediawiki is what you get.


This reminds me of the Xentax wiki: http://wiki.xentax.com/


Good share, but why is this Show HN? Did you create it? It's not new...... etc


Has anyone an idea what happened to wotsit.org?


To be honest, I expected a bit more from the site, considering its name. It is good, don't get me wrong, covering a lot of file formats, but you have limited it to things on your computer. For me, if it were an all inclusive wiki, you should expand it to include things like financial file formats. Banks tend to use formats like ach and edi (for example). Those formats are extensive and can get quite complicated, but their file specs are published and available. The site is a great start, but I would love to see it expanded to include anything file format related. Even having the most obscure formats will make the site more appealing to people as a reference. Sort of 'the place to go' for file formatting. Just my .02.


Then get editing.


Like many ambitious wiki-based projects, this site seems to suffer from lack of focus. Look at these pages:

http://fileformats.archiveteam.org/wiki/Dendrochronology

http://fileformats.archiveteam.org/wiki/Quantum_computer

http://fileformats.archiveteam.org/wiki/TLD_.mobi

http://fileformats.archiveteam.org/wiki/Endianness

How is any of these things a file format? Because the definition of a file format in the FAQ (<http://fileformats.archiveteam.org/wiki/FAQ:File_Format>) is so broad that it can be made to encompass basically everything. The manifesto that started this site is a rambling sermon that doesn't clarify anything in this respect: it just repeats "let's solve the problem" without even properly defining what the problem is.

There are also other issues. The classification scheme is often Procrustean and confused. Error messages were until recently mixed up with error detection codes, which conflates two different meanings of "error". Similarly FUSE shares a category with HFS+, even though the former is an API, and the latter a disk format; distinct things which just happen to share the name of "file system". The pages are rather short and consist mostly of lists of links. Given the above-mentioned lack of clearly defined scope, I suspect many pages seem to be created about topics just because they're in the news and/or just to have a place to put a link to a "neat" blog post: see for example <http://fileformats.archiveteam.org/wiki/Facebook#Links>.

Last but not least, the whole site is rather ugly, and the logo is awfully non-descriptive of what it's supposed to contain; what am I supposed to do with this thumb, stick it up my arse?

It's a shame, really, because documenting file formats is a hard and valuable endeavour. But I don't think these people are going to do a good job of it.


Endianness is extremely relevant to file formats. The others certainly seem like irrelevant nonsense; "organic file formats"?


Archive Team is backing up DNA, just in case.


Looking at the "what links here" page, the "quantum computer" is probably there as a stop-gap definition because the term is used at "Quantum compressed archive".[1]

Any successful wiki will have a problem of demarcation; the initial signs being that people add whatever they want, including ramblings like Two cows. [2]

If the project goes along, taking Wikipedia as precedent there will develop two "inclusionist" and "deletionist" cultures that will fight over the definition of what the project should cover, but which would get rid of these extreme cases.

[1] http://fileformats.archiveteam.org/wiki/Quantum_compressed_a...

[2] http://fileformats.archiveteam.org/wiki/Two_cows


Except from the very first day, the declared final authority falls at my desk, and I am a hardcore inclusionist. So no, it won't be making that mistake.


Thanks for this. I write file parsers for language translation at my day job, so this could be a nice resource to have on hand.


Out of curiosity, are you familiar with Translate Toolkit? https://github.com/translate/

If you're not and are interested by it or in contributing to it, you should email me - I'll put you in touch.


I have not worked with the Translate Toolkit yet, but I'm a little familiar with a few of the tools. I'm always interested in learning or contributing where I can.

I currently work on a proprietary l10n/i18n system, so a lot of what I do is under an NDA. However, I'm slowly gaining traction in moving us, at least in part, to a more open model.

If you let me know your email, I will reach out.


My email is on https://leclan.ch - Translate was my previous job.


I know the probability tends to be zero, but is it Telelingua?


Not Telelingua here. I currently work on a proprietary system. Why do you ask?


Oh, I just had the joy to connect an application of the company I am working at with Telelingua. Everything went really well.


What is the name of your wiki?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: