Hacker News new | past | comments | ask | show | jobs | submit login

Dense vectors are pervasive in Neural Networks. Take a layer's activations and you have a new one (which is what NN embeddings are).

Indexing them is really hard: The local neighborhood's volume to be searched grows by radius^dimension; you need a specialized engine.

Now, say you are a three letter agency and have Facebook's image data (2B faces, 100B images). You train a deep learning image classification network to identify faces. But for training time reasons you only can do it on a dataset of about 10k different faces, 10M images. The NN will identify characteristic visual features to make its classification, available at the n-1 layer as activations.

But you need to have the rest of the dataset searchable! You now index the activations of all the 100B remaining images. And with this you can search by similarity of visual features. If you have an image of someone you want to to track, get its activations and search for other images that have similar vectors.

This works for everything that a NN can model: speech, words, words in sentences, videos, etc.

----

On a another note Facebook has a feature-rich, GPU-powered, scalable indexation engine called FAISS:

https://github.com/facebookresearch/faiss




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: