It's just OpenAI embeddings. We fetch them and just push then to Redis with a 30...

philbo · on Aug 21, 2023

Thanks!

OpenAI embeddings are 1 per request payload, right? Have you hit any rate limits doing that?

We have a performance budget of ~1 second for the generate-index-search pipeline, which may or may not be feasible. I discounted OpenAI because it seemed like we're guaranteed to hit the rate limit if we flood them with concurrent requests for embeddings. Typical corpus size that we need to work with is 20 concurrent documents ranging from ~100kb to ~2mb. Chunking those documents to fit the 8k token context window balloons the request count further.

Tostino · on Aug 21, 2023

You absolutely want to chunk them smaller than 8k. Have you tested different chunk strategies? It can make a huge difference for actually recalling useful information in small enough chunks to be usable.

philbo · on Aug 22, 2023

Thanks for the tip, I haven't played around with chunk size much at all so far.