Hacker News new | past | comments | ask | show | jobs | submit login
Creating AI assistant with GPT and Ruby and Redis using embeddings (release.com)
113 points by erik_landerholm on April 26, 2023 | hide | past | favorite | 25 comments



From a discussion with a friend today..

Are embeddings a hack? Is building out tooling and databases and APIs and companies around embeddings all going to be for naught as soon as there's a solid LLM/API with a big enough context window?


I don't think the context window will ever be big enough for some use cases. There was a recent paper talking about a million tokens, but that's just the Harry Potter books. Which is amazing, but you know there's going to be use cases that will have more than 7 books worth they want to use. Furthermore, the performance will be better when you don't have to give it all 1 million tokens, but just the most relevant parts of a context.


The short answer is that, yes, embeddings are probably a hack in the same way that using bits or short variable names were hacks to reduce memory usage. At some point you are correct: someone would prompt "given <large amount of data>, answer <user request>".


But embedding-based semantic search can handle arbitrary sized databases. I fully believe context windows are going to grow: I am skeptical they will grow to cover "all your company's documents" or even "the full encyclopedia" sizes.


> I fully believe context windows are going to grow: I am skeptical they will grow to cover "all your company's documents" or even "the full encyclopedia" sizes.

This is the type of statement that I feel like is often/usually wrong -- at least for the common case. The last time I had this argument was about CDs and how eventually we'll start burning them because they'll be in the cloud, and my friend arguing that storage and network bandwidth would make that impractical if everyone did it.

I expect context window compression or smart ways to embed them so they still provide useful context in "most" cases, even if not-lossless, will be an active area of research.

EDIT: That said, looking a the original question -- I do think vector embeddings are still useful in their own right and somewhat orthogonal to context window sizes. IMO.


It's more than just optimizing for space (which is still going to be important), it's also about using vector databases to seed the data from a wider dataset and translating that into something the AI can use. I mean technically in the far future you could dump a whole database into the 'context' and work off of it, but Vector DBs will fill that role in the meantime and add a memory layer on top of it for future queries.


Can't Agree with that more.

LLMs should not be trained to simply memorize information. Instead, they should be designed to understand and identify patterns in the data, and use the knowledge stored in vector databases to organize and summarize information.

Vector databases can be used to store and organize knowledge in a way that is more accessible to LLMs. By using vector representations, LLMs can easily access and manipulate knowledge, allowing them to more effectively process and analyze large amounts of information.


I'd say:

Yes - embeddings are a hack:

No - there won't anything like a "real API" unless there's a new discovery or a shift in the way LLMs are constructed. It's not theoretically impossible but there's no clear way to get guaranteed results from present day LLMs, all they do output guesses from their input text (combining prompt text and then user text).


I can't say I'm very well versed in all of this but I was asking my coworkers today about whether embeddings were the way forward or if doing your own training would be more beneficial. Or even yet, could you take an open source model and train it specifically on just your content; would that wield better results?

Expanding context seems like an approach, but if you're trying to get an answer about your company's documentation, why would you need the entirety of GPT-X?


Every time I've asked this question the answer has been that injecting relevant content into the prompt provides much better results than attempting to fine-tune a model on your own content.

Here's a relevant quote: https://simonwillison.net/2023/Apr/15/ted-sanders-openai/


Thanks for that. The taking a test with open notes analogy makes a lot of sense.

Given that knowledge, as an end user it seems I would want to spend my time ensuring that the embedding data being selected is as good as possible.


The broad general training of GPT-X (and fine tuning on your content) provides context and (loosely speaking, at least) “analytical” ability, search-via-embeddings to inject material into the prompt provide exact recall of specific material, with capacity greater than the context limit.

Analogous, more or less, to a human with general experience (base training), experience with your code base (fine tuning), and the ability to reference the current code base directly (embedding-based search/recall). All three have a role, they are complementary rather than mutually exclusive.


Thanks for the explanation. Do you think that because GPT-X will likely have more base training than an open source model someone attempts to train themselves, the outcomes may end up being better if say the fine tuning and embedding were the same for both options?


Even with an incredibly long context window (say, 1M tokens), attention still suffers from a problem with long-term dependencies. This is probably why OpenAI hasn't publicly released their 32k token length model just yet.


I think they haven't released it because the capabilities it has are simply too powerful when combined with a vectorDB.


Maybe. It's also probably staggeringly expensive to run.


Probably all of the above


Curious what you mean by “too powerful”?


Embeddings are useful for sentiment analysis and search in general, but given a "powerful enough AI with enough of a context window" they may be obsolete indeed, if it can do all of those things.


That's gotta be a Hacker News bingo if I've ever seen one.


Think we can optimize this with rust


And Postgres.


And Typescript.


Don't forget Kubernetes.


Can use https://github.com/alexrudall/ruby-openai to do this sort of thing also :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: