> Embeddings are a type of lossy compression Always felt they're more like hashe...

DougBTX · 2024-01-27T08:27:10.000000Z

> Always felt they're more like hashes/fingerprints for the RAG use cases.

Yes, I see where you’re coming from. Perceptual hashes[0] are pretty similar, the key is that similar documents should have similar embeddings (unlike cryptographic hashes, where a single bit flip should produce a completely different hash).

Nice embeddings encode information spatially, a classic example of embedding arithmetic is: king - man + woman = queen[1]. “Concept Sliders” is a cool application of this to image generation [2].

Personally I’ve not had _too_ much trouble with running out of RAM due to embeddings themselves, but I did spend a fair amount of time last week profiling memory usage to make sure I didn’t run out in prod, so it is on my mind!

[0] https://en.m.wikipedia.org/wiki/Perceptual_hashing

[1] https://www.technologyreview.com/2015/09/17/166211/king-man-...

[2] https://github.com/rohitgandikota/sliders