Hacker News new | past | comments | ask | show | jobs | submit login

> Embeddings are a type of lossy compression

Always felt they're more like hashes/fingerprints for the RAG use cases.

> Typically documents are broken down into chunks

That's what I would have guessed. It's still surprising that the embeddings don't fit into RAM though.

That said (the following I just realized), even if the embeddings don't fit into RAM at the same time, you really don't need to load them all into RAM if you're just performing a linear scan and doing cosine similarity on each of them. Sure it may be slow to load tens of GB of embedding info... but at this rate I'd be wondering what kind of textual data one could feasibly have that goes into the terrabyte range. (Also, generating that many embedding requires a lot of compute!)




> Always felt they're more like hashes/fingerprints for the RAG use cases.

Yes, I see where you’re coming from. Perceptual hashes[0] are pretty similar, the key is that similar documents should have similar embeddings (unlike cryptographic hashes, where a single bit flip should produce a completely different hash).

Nice embeddings encode information spatially, a classic example of embedding arithmetic is: king - man + woman = queen[1]. “Concept Sliders” is a cool application of this to image generation [2].

Personally I’ve not had _too_ much trouble with running out of RAM due to embeddings themselves, but I did spend a fair amount of time last week profiling memory usage to make sure I didn’t run out in prod, so it is on my mind!

[0] https://en.m.wikipedia.org/wiki/Perceptual_hashing

[1] https://www.technologyreview.com/2015/09/17/166211/king-man-...

[2] https://github.com/rohitgandikota/sliders




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: