That is something I wanted to know, does IPFS guarantee that same two files have...

hot_gril · 2024-01-31T22:54:37 1706741677

Yes, a file's hash is only based on its contents. The way I understand it, a file doesn't really live in a directory, it's more like a directory (which is a kind of file itself) references files. So the same file can be in two directories, yet it'll have the same URL/hash. And if you "add" files to a directory, you're really uploading a separate copy of the dir that'll have a different hash.

I checked myself on this, but someone else might want to check me cause I'm not an expert.

wuiheerfoj · 2024-01-31T23:13:46 1706742826

This is generally true, though it’s possible to encode the same data into a slightly different shaped DAG to optimise for eg video streaming performance afaiu (balanced vs imbalanced). UnixFS vs raw bytes may also be different but I’m not 100%

hot_gril · 2024-01-31T23:26:30 1706743590

From the fs's point of view, these are different file contents. But yeah, there's nothing stopping you from pinning something different that looks the same to a person.

wuiheerfoj · 2024-02-03T08:31:48 1706949108

Once decoded they would be the same file contents - imagine one DAG where the depth is log(n) and it’s a perfectly balance tree, and another where the depth of the left-hand branch is 1, right hand branch contains another subtle with left-hand size 1 etc etc etc.

The leaves are the same in both cases, so the file contents are the same, though the latter is quicker to stream (though not to verify) and the CIDs will be different

fwip · 2024-01-31T23:03:45 1706742225

Yes, IPFS hashes individual "blocks" (pieces) of files. If two files have the same content, they will share block hashes.

ranger_danger · 2024-02-01T03:47:26 1706759246

Basically it depends on specific settings that can be changed in the client as to how the individual block pieces are encoded and therefore what the resulting hash ends up being. So no there's no inherent guarantee but you may get lucky with some copies of the same file.

mikegreenberg · 2024-02-01T01:08:41 1706749721

Caveat: The other comments mention the file's contents being the only dependency on the hash, but the algo used to hash would also need to be the same. If the hash algo changes in two cases, the same content would have a different hash in those two cases.

hot_gril · 2024-02-01T01:46:27 1706751987

In this case, would pinning the file make it accessible from either hash? I'd expect it to, but idk, I've only ever seen sha256 hashes on IPFS.

jaccarmac · 2024-02-01T01:59:26 1706752766

Kinda. Shooting from the hip based on fuzzy gatherings from IPFS usage here, but as I understand it: The leaf-level data blocks will be shared between the Merkle trees, but at least the tip (the object a given hash actually refers to) and maybe some of the other structural information will be different.