Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Create embeddings efficiently for an AI notes app with E2EE
3 points by satyajeetjadhav 15 days ago | hide | past | favorite | 6 comments
Hi,

I am building a notes application that automatically finds related notes - thinkdeli.com. I want to implement end-to-end encryption sometime soon.

To find related notes, I create embeddings, add them to an index, and find the nearest neighbors.

I am creating embeddings of the user's notes locally using transformers.js. But setting up the pipeline takes up a lot of memory (around 400+ MB on Chrome). This makes it somewhat impractical to use on older devices.

Is there a more efficient way to create embeddings locally?

Creating embeddings via an API will be more efficient, but that would mean sending users unencrypted notes over the cloud to the service. Edit - What I mean is this. The user's notes must be unencrypted and readable as plain text on the server to create embeddings. This defeats the purpose of end-to-end encryption.

I would appreciate any pointers. Thanks a lot!




> The user's notes must be unencrypted and readable as plain text on the server to create embeddings.

Consult a security expert before doing this, but here’s an idea: encrypt each word of the text, send the encrypted tokens over the wire, and then use an embedder trained on text encrypted with that method.

If you use an asymmetric encryption method, you could even throw away the private key.

The result still would be a substitution cypher on words, so it would not resist frequency analysis and it won’t help at all that, if your users manage to extract the key, they can encrypt text to figure out the mapping, but it would protect against people ‘accidentally’ looking at text of your users.

Periodically switching the encryption key wouldn’t be that hard.


Which embedding model are you using?

Perhaps pick one with lower memory usage from this list?

https://huggingface.co/spaces/mteb/leaderboard



Sorry, I should have phrased the last part of the problem better. I already use https.

The user's notes must be unencrypted and readable as plain text on the server to create embeddings. This defeats the purpose of end-to-end encryption.


I don’t understand the question fully but maybe you are looking for something like this?

https://aws.amazon.com/ec2/nitro/nitro-enclaves/


I’m not sure if there are implementations for browsers, but look into embeddings with homomorphic encryption.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: