Hacker News new | past | comments | ask | show | jobs | submit login
Steganography and Piracy in the Age of Digital Distribution (zacharyburt.com)
17 points by zackattack on July 28, 2010 | hide | past | favorite | 10 comments



My vote for encoding a key in a document goes as follows:

Make 3 spelling errors. Anywhere.

Copying all the text from my AgileW/Rails book comes out to 1392025 characters (wc -c). Comes out to being able to make 2,697,367,802,211,997,800 unique copies by location alone[1]. Significantly less data than a SHA key, but far more than necessary. Heck, making two unique errors gives you almost 2 trillion unique copies. Most books have a fair amount of errors already, what's a couple intentional ones? And note that it can be hidden in metadata too, so there's even more room.

[1]: (1392025 * 1392024 * 1392023)


Spelling mistakes are better than nothing, but if I can obtain two copies of the document, I can diff them and fix the mistakes.


I could diff a dump of all the kerning values too.


I picked spelling specifically because copying / OCRing will destroy kerning values, but not (likely) spelling. And if you change words into other words, running a spell check might not clean them out either :)

But yes. Diffing is always an option, and it's an ideal way to reveal steganographic data if you encode more than one thing in the same document.


While steganography may be part of a new kinder, gentler copy-control policy for many content producers, I have my doubts as to whether it will be very effective, for a few reasons.

Even if you assume that the techniques are not discovered before distribution, this doesn't actually impeded copying; it just helps you determine the original source. At that point, you are again faced with an enforcement problem. It's already quite possible to determine the source of pirated files using bittorrent, but even with thousands of lawsuits filed, use is not noticeably diminished.

But steganographic source determination is significantly weaker than bittorrent as far as enforcement and chilling effects go. With bittorrent, if you can make the expected cost of a lawsuit high enough to the users, you might make a dent in the wholescale copying of content. With steganographic techniques, only the original user is at risk. As soon as a single copy is out there, the rest of the sharers take no additional risk in spreading it far and wide.


To be fair it's an additional layer. Sharing a stolen PDF is just as illegal whether it's tagged with its purchaser or not.


Steganography is a misnomer for this kind of message. Steganography is about maximizing data payload while remaining undetectable. E.g., you want to send messages covertly without Big Brother knowing.

Watermarking is about maintaining message integrity in ways that are robust to distortion. E.g., you don't care who knows about the message, but it's impossible to remove the watermark without unacceptable distortion of the media.


Under this scheme, if my laptop with hundreds of paid-for documents is stolen, am I then liable for all the copies of these documents that are illicitly distributed?


I'm not sure I get the point. OK, so you figured out who bought the copy that was leaked. Then what? Sue them?


And who's to say they were the real leaker? It's pretty trivial to copy files, it could have easily been taken from their computer without their knowledge or consent.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: