Steganography and Piracy in the Age of Digital Distribution

Groxx · on July 29, 2010

My vote for encoding a key in a document goes as follows:

Make 3 spelling errors. Anywhere.

Copying all the text from my AgileW/Rails book comes out to 1392025 characters (wc -c). Comes out to being able to make 2,697,367,802,211,997,800 unique copies by location alone[1]. Significantly less data than a SHA key, but far more than necessary. Heck, making two unique errors gives you almost 2 trillion unique copies. Most books have a fair amount of errors already, what's a couple intentional ones? And note that it can be hidden in metadata too, so there's even more room.

[1]: (1392025 * 1392024 * 1392023)

donaldc · on July 29, 2010

Spelling mistakes are better than nothing, but if I can obtain two copies of the document, I can diff them and fix the mistakes.

eli · on July 29, 2010

I could diff a dump of all the kerning values too.

Groxx · on July 29, 2010

I picked spelling specifically because copying / OCRing will destroy kerning values, but not (likely) spelling. And if you change words into other words, running a spell check might not clean them out either :)

But yes. Diffing is always an option, and it's an ideal way to reveal steganographic data if you encode more than one thing in the same document.

ianferrel · on July 29, 2010

While steganography may be part of a new kinder, gentler copy-control policy for many content producers, I have my doubts as to whether it will be very effective, for a few reasons.

Even if you assume that the techniques are not discovered before distribution, this doesn't actually impeded copying; it just helps you determine the original source. At that point, you are again faced with an enforcement problem. It's already quite possible to determine the source of pirated files using bittorrent, but even with thousands of lawsuits filed, use is not noticeably diminished.

But steganographic source determination is significantly weaker than bittorrent as far as enforcement and chilling effects go. With bittorrent, if you can make the expected cost of a lawsuit high enough to the users, you might make a dent in the wholescale copying of content. With steganographic techniques, only the original user is at risk. As soon as a single copy is out there, the rest of the sharers take no additional risk in spreading it far and wide.

eli · on July 29, 2010

To be fair it's an additional layer. Sharing a stolen PDF is just as illegal whether it's tagged with its purchaser or not.

msg · on July 29, 2010

Steganography is a misnomer for this kind of message. Steganography is about maximizing data payload while remaining undetectable. E.g., you want to send messages covertly without Big Brother knowing.

Watermarking is about maintaining message integrity in ways that are robust to distortion. E.g., you don't care who knows about the message, but it's impossible to remove the watermark without unacceptable distortion of the media.

donaldc · on July 29, 2010

Under this scheme, if my laptop with hundreds of paid-for documents is stolen, am I then liable for all the copies of these documents that are illicitly distributed?

eli · on July 29, 2010

I'm not sure I get the point. OK, so you figured out who bought the copy that was leaked. Then what? Sue them?

maqr · on July 29, 2010

And who's to say they were the real leaker? It's pretty trivial to copy files, it could have easily been taken from their computer without their knowledge or consent.