Not only that, BitTorrent also uses a Merkle tree of the torrent data split into...

sp332 · on Feb 23, 2017

If you find a collision for the smallest piece, you don't need to find collisions for any higher nodes on the Merkle tree because you'll just be hashing the same values as the legit version.

MayeulC · on Feb 23, 2017

I don't know enough specifics about sha-1 or bit torrent, but it would seem to me that it depends on the data that is chosen for the base ("leaves") of the tree: if it just consists in taking the hashes, concatenating them, and move higher until you only have one hash, then yes.

However, if the data itself is concatenated for the first step, that might not be the case (depending on the sha-1 algorithm). I seem to recall that md5 operates on 512 bit chuncks (padded if necessary), and takes the result to seed the next computation. So, with md5 (if I recall correctly, still), the attack could work if each part of the torrent was a multiple of 512 bits, Merkel or not, and regardless of the base. Of course, you cannot change the whole torrent in that case, as it would be prohibitively expensive. And you still have to keep the same file size.

I would love to hear a bit more on the topic from someone more knowledgeable about hashes and torrents.

the8472 · on Feb 23, 2017

The process is fairly simple. Files are concatenated and that blob is then chunked into multiple-of-16KiB-sized pieces. Each piece is hashed. The hashes are concatenated and embedded as string in the torrent file. Then a specific subdictionary of the torrent file including the piece hashes, filenames and the length of the data is hashed to yield the infohash.

Magnets only convey the infohash. .torrent files convey the whole metadata, including the pieces hash string.

Aissen · on Feb 23, 2017

Then this smallest piece must also change the file type to an executable, and contain a meaningful payload.

sp332 · on Feb 23, 2017

Well, you could put most of the payload in other parts of the file. People probably wouldn't notice if they didn't go looking. And the small part could exploit a popular codec instead of being an executable itself.

Natanael_L · on Feb 23, 2017

Polyglot files are trivial to create, that's the easiest part here.

Aissen · on Feb 23, 2017

I didn't say it wasn't, it was for context. Also you need to be polyglot + have enough space for the collision computation.

I just checked, it should be really easy. A piece/chunk size in torrents is 64kB. The total size of each PDF in the paper is 2.7kB.

loeg · on Feb 23, 2017

A more common chunk size would be 256 kb - 2 MiB. Yeah, very doable.

runeks · on Feb 23, 2017

Good point. It should be possible to create a video file that fills ups exactly the first n pieces, such that the last piece contains only the ambiguous data.

the8472 · on Feb 23, 2017

Most torrents use a flat list of hashes. The merkle tree is an extension. With a collision you could replace the metadata wholesale though, pointing at completely different content.

But then the torrent client would also present that different content from the start when asking you what to save.

JeremyBanks · on Feb 23, 2017

Do you know of anybody that's actually using merkle torrents? I expect they're probably being used in some limited scenarios, but they're probably less than 1% of BitTorrent activity, so it's a bit misleading to refer to it when defending BitTorrent's security model.