Hacker News new | past | comments | ask | show | jobs | submit login

Seems like a better comparison is to a one-way hash function.

Given a set of vectors resulting from an embedding model, it’s cheap & easy to check if a certain document is similar to the original source, as you just run the embedding on the comparison document and choose your favorite similarity metric.

However it’s very hard to recreate the source itself — as far as I can tell, you’d basically have to run a very expensive form of blind gradient descent: generate multiple texts, run embedding on them, check similarity, pick the closest one, and repeat with that text as the starting point.

Maybe someone can correct me if there exists an efficient way of reconstructing the original text. I would be very interested to know.

Edit: I appear to have just described a super naive and slow implementation of vec2text (see sibling comment). Will have to read that paper in more detail but it looks really cool.




It is possible to reconstruct the text from an embedding but also more importantly, I believe that by embedding, the OP means sending the computed embedding matrix instead of the input. Which is a simple matrix inversion problem.

Computing output embeddings is different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: