This is because git adds a header and zlib compresses the PDFs such that they no longer collide when stored in git. But of course, they still collide when extracted from git:
$ ls -l; for i in 1 2; do sha1sum < shattered-$i.pdf; \
git cat-file -p $(git hash-object -w shattered-$i.pdf) |
sha1sum; done; find .git/objects -type f -print0 | xargs -0 ls -l
total 1664
-rw-r--r--@ 1 jay staff 422435 Feb 23 10:32 shattered-1.pdf
-rw-r--r--@ 1 jay staff 422435 Feb 23 10:32 shattered-2.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a
38762cf7f55934b34d179ae6a4c80cadccbb7f0a
38762cf7f55934b34d179ae6a4c80cadccbb7f0a
38762cf7f55934b34d179ae6a4c80cadccbb7f0a
-r--r--r-- 1 jay staff 381104 Feb 23 10:41 .git/objects/b6/21eeccd5c7edac9b7dcba35a8d5afd075e24f2
-r--r--r-- 1 jay staff 381102 Feb 23 10:41 .git/objects/ba/9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0
It's worth noting that either of the changes, adding a header or deflating the content, would remove the collision. The former because this is a chosen-prefix collision attack, the latter because the compression alters the content entirely.
I'm not a cryptographer, so I wonder: do the git header and zlib compression add significant complexity to manufacturing two files that collide inside git?
Probably just as easy. They used a fixed header for PDF and a tweakable middle. So just add the git blob header, as long as the tweakable middle is fixed size.
It's worth noting that either of the changes, adding a header or deflating the content, would remove the collision. The former because this is a chosen-prefix collision attack, the latter because the compression alters the content entirely.
I'm not a cryptographer, so I wonder: do the git header and zlib compression add significant complexity to manufacturing two files that collide inside git?