Hacker News new | past | comments | ask | show | jobs | submit login

Garage splis the data into chunks for deduplication, so it basically already does content addressed storage under the hood..

They probably don't expose it publicly though.




Yeah, and as far as I understood they use the key hash to address the overall object descriptor. So in theory using the hash of the file instead of the hash of the key should be a simple-ish change.

Tbh I'm not sure if content aware chunking isn't a sirens call:

  - It sounds great on paper, but once you start storing encrypted (which you have to do if you want e2e encryption) or compressed blobs (e.g. images) it won't work anymore.

  - Ideally you would store things with enough fine grained blobs that blob-level deduplication would suffice.

  - Storing a blob across your cluster has additional compute, lookup, bookkeeping, and communication overhead, resulting in worse latency. Storing an object as a contiguous unit makes the cache/storage hierarchies happy and allows for optimisations like using `sendfile`.

  - Storing the blobs as a unit makes computational storage easier to implement, where instead of reading the blob and processing it, you would send a small WASM program to the storage server (or drive? https://semiconductor.samsung.com/us/ssd/smart-ssd/) and only receive the computation result back.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: