For those unaware, OpenAI recently announced [0] an API change where they said their newer models are using Matryoshka representation learning for shortening embeddings. Basically you can use a shorter prefix of the full representation to do query/lookup for cheaper without losing much quality. Quote:
“Native support for shortening embeddings:
Using larger embeddings, for example storing them in a vector store for retrieval, generally costs more and consumes more compute, memory and storage than using smaller embeddings.
Both of our new embedding models were trained with a technique [Matryoshka Representation Learning] that allows developers to trade-off performance and cost of using embeddings. Specifically, developers can shorten embeddings (i.e. remove some numbers from the end of the sequence) without the embedding losing its concept-representing properties by passing in the dimensions API parameter. For example, on the MTEB benchmark, a text-embedding-3-large embedding can be shortened to a size of 256 while still outperforming an unshortened text-embedding-ada-002 embedding with a size of 1536.”
Context: OpenAI was "caught" using Matryoshka embeddings in their new release. They apologized and added references to the paper in their release notes.
You can put a subset of the dimensions in your vector database, thus saving a lot of cost by reducing memory/compute when retrieving nearest neighbors.
They declare by fiat that retaining only the first few dimensions should lead to low classification error and construct the training loss accordingly. (Equation 1 on page 4 of the PDF.)
Both are methods to reduce the overall size of your embeddings, but from what I understand, quantization is generally better than dimensionality reduction, especially if training is quantization-aware.
More seriously, it looks like a potential reduction in the cost to train a neural network.
You can think of every meaningful step forward in deep learning as a reduction in the cost of training, or an improvement in the ability of the signal to propagate.
The paper talks about a specific type of model called an embedding model that produces vectors for datapoints that are useful for downstream tasks.
Normally you have to choose a single dimensionality for the vectors and store them all in a database that’s proportional to the size of those vectors.
The method described is for a model that has multiple “options” for the lengths of its embeddings. You can use smaller vectors to save space but they don’t work quite as well. Paper analyzes this more in-depth.
“Native support for shortening embeddings:
Using larger embeddings, for example storing them in a vector store for retrieval, generally costs more and consumes more compute, memory and storage than using smaller embeddings. Both of our new embedding models were trained with a technique [Matryoshka Representation Learning] that allows developers to trade-off performance and cost of using embeddings. Specifically, developers can shorten embeddings (i.e. remove some numbers from the end of the sequence) without the embedding losing its concept-representing properties by passing in the dimensions API parameter. For example, on the MTEB benchmark, a text-embedding-3-large embedding can be shortened to a size of 256 while still outperforming an unshortened text-embedding-ada-002 embedding with a size of 1536.”
[0] https://openai.com/blog/new-embedding-models-and-api-updates