Hacker News new | past | comments | ask | show | jobs | submit login
Emerging architectures for LLM applications (a16z.com)
255 points by makaimc on June 20, 2023 | hide | past | favorite | 95 comments



I am an AI researcher. Most actual AI researchers and engineers use very few of these tools - the only one being model providers like OpenAI API and public clouds (AWS, Azure, GCP). The rest of these are infra-centric tools that a16z is highly incentivized to over-inflate the importance of.


This does look like the sort of complex ecosystem that emerges when there is an inflection point and then, before consolidation happens. It reminds me of adtech in the early 2010s.

That said, while much of this might not have any real traction long-term, looking at what researchers use seems to miss the mark a bit. It’s like saying network technology researchers aren’t using Vercel.


There are some other useful ones in there. Hugging Face jumps out. W&B. I haven't used Mosaic but I could see myself for bigger projects, I know of at least two PIs @ Stanford using them.


I think the diagram was just meant to be comprehensive. The writeup itself doesn’t imply all the tool nonsense.


Yes, because you're not working on applications.


This blog post is not about AI researchers — it's about developers who make products out of LLMs.


> So, agents have the potential to become a central piece of the LLM app architecture (or even take over the whole stack, if you believe in recursive self-improvement). ... . There’s only one problem: agents don’t really work yet.

I really appreciate that they called out and separated some hype vs. practice, specifically with regards to Agents. This is something I keep hoping works better than it does, and in practice every attempt I've taken in this direction leads to disappointment.


What vector DB are you using? What is the data structure that you're vectorizing? What is your chunk size? Have you implemented memory? What prompt or technique are you using (ReACT, CoT, few-shot, etc)? Are you only using vector DBs? Do you use sequential chains? Does it need tools? Depending on your data, business case and what output you expect from the agent, there is no one-size-fits-them-all.


Ha, I had the exact same reaction

I feel like lots of paper are getting published and reviewed which is good as bad ideas don’t get to propagate for ages


This mirrors my experience as well. I've tried to use them for pretty straightforward support agent type tasks and found that they very often go down wrong paths trying to solve the problem.


Similarly for retrieval augmented LLM agents, they break down very quickly once the question is not directly found in the documents


I am so glad that top VCs are thinking along these lines, of architectures that incorporate AI as part of the flow.

We've spun off a company to realize the vision of bringing an open-source, standardized framework to the PHP ecosystem, where we've been building apps for communities for over a decade. It's "AI for the rest of us", but at the same time promoting positive collaboration between communities and AI. It also involves micropayments for tasks done by either an AI or human agent.

If you're a VC or an expert in the space, I'd love to get feedback on this: https://engageusers.ai/ecosystem.pdf

And if you want to get involved in any capacity, whether as an investor or developer, please email me greg at the domain engageusers.ai -- this time around we are planning to take on venture capital funding for this project, and syndicate a round later this summer.


This blog post is way more complex than it needs to be, a lot of what most people are doing with llms right now boils down to using vector databases to provide the "best" info/examples to your prompt. This is a slick marketing page but im not sure what they think they're providing beyond that.


This is what a16z does. A few years ago it was the "Modern Data Stack" and a few years before that it was "DevOps." For some reason, venture capitalists really like making these fancy charts to describe the obvious, and then mostly ignoring them during their investment decisions (or sometimes they make the investments, then they make the charts and put their portfolio companies in the boxes).


To protect their downside risk by seeding the broader marketplace with a narrative which later stage investors and acquirers will be influenced by in making decisions to invest or acquire.


The greater fool theory?


At this point I don’t think it’s fair to call it a theory.


Bit of self-promotion, but Milvus (https://milvus.io) is another open-source vector database option as well (I have a pretty good idea as to why it isn't listed in a16z's blog post). We also have milvus-lite, a pip-installable package that uses the same API, for folks who don't want to stand up a local service.

    pip install milvus
Other than that, it's great to see the shout-out for Vespa.


Appreciate sharing, will try Milvus

Vector database space is the Wild West, keep at it


Hell yeah! Feel free to reach out if you need any help.


Enjoying your guy's podcast!


Why do you not think it’s featured? Has a16z funded many of those companies, lol? And somehow, rejected milvus?


The guys at Milvus raised a total of $113M according to Crunchbase, second only to Pinecone, which is funded by a16z. You're not going to highlight the main competitor of one of your portfolio companies.


The post mentions six alternatives to Pinecone. The reality is Milvus isn't as relevant today as it was a year ago. I'm from Pinecone but I'll give credit where it's due: Weaviate, Chroma, and Qdrant completely lapped Milvus in the open-source space. That's why they got mentioned and Milvus didn't.


interesting to see that the word "generative" does not appear in this blogpost (apart from the tags). 6 months ago Generative AI was all the rage: https://a16z.com/2023/01/19/who-owns-the-generative-ai-platf...

I think this is a very well articulated breakdown of the "LLM Core Code Shell" (https://www.latent.space/p/function-agents#%C2%A7llm-core-co...) view of the world. but it is underselling the potential to leave the agents stuff to a three paragraph "what about agents?" piece at the end. the emerging architecture of "Code Core, LLM Shell" decentralizing and specializing the role of the LLM will hopefully get more airtime in the december a16z landscape chart!


Hi @swyx, Thanks for the kind words!

we actually just purposefully left that part a bit scarce because we have something else coming up on the topic! I'm sure we will be chatting through it soon :)


this and other different end-to-end architectures are offered in deepset/haystack, one of the best and quite mature frameworks to work with LLMs (pre-GPT craze) and do augmented retrieval, etc.

I do feel the article presents old concepts as "emerging".

if you are curious about building something quickly, you can jump into one of the tutorials https://haystack.deepset.ai/tutorials

Over a weekend I've used deepset/haystack to build a Q/A engine over open source communities slack and discord threads that can potentially have an answer - it was a joy and a breeze to implement. If you have question about Metaflow, K8s, Golang, Deepset, Deep Java Library and some other tech - try asking your quick question on https://www.kwq.ai :-)


Thanks for the mention and being part of Haystack's community :)


Microsoft guidance is legit and useful. It's a bunch of prompting features piled on top of handlebar syntax. ( And it has its own caching. Set temp to 0 and it caches. no need for LLM specific caching libs :) )

https://github.com/microsoft/guidance


Yes, but I've found LMQL equally impressive. It has a much better documentation than Guidance.


How prescient is the "Hidden Technical Debt" [1] paper from ~8 yrs ago compared to this? See the top of pg4 for a figure that I've personally found to be useful in explaining all the stuff necessary to put together a reasonable app using ML/DL stuff (up until today, anyway).

I see all the same bits called out:

- Data collection

- Machine /resource management

- Serving

- Monitoring

- Analysis

- Process mgt

- Data verification

There's some new concepts that aren't quite captured in the original paper like the "playground" though.

I've kind of been expecting a follow-up that shows an update to that original paper.

[1] https://proceedings.neurips.cc/paper_files/paper/2015/file/8...


> "For devs who see every database-shaped hole and try to insert Postgres"

I see sub second performance on >1m vectors with PgVector. Vector databases have a place but this statement seems disingenuous at best. Bringing on a vector database adds additional complexity and a giant chunk of use cases simply don't need it. Not to mention the additional latency you'd be adding.


>In-context learning solves this problem with a clever trick: instead of sending all the documents with each LLM prompt, it sends only a handful of the most relevant documents.

I don't even think this is a correct definition for "in-context learning". In-context learning is a type of few-shot learning in which examples of input/output pairs are provided as part of the prompt. The idea is that the model is able to "learn" the pattern of the task from the examples. Quoting from the GPT-3 paper:

>what we call “in-context learning”, using the text input of a pretrained language model as a form of task specification: the model is conditioned on a natural language instruction and/or a few demonstrations of the task and is then expected to complete further instances of the task simply by predicting what comes next.

I really don't think it's standard to refer to the process of embedding-based retrieval as "in-context learning".


I was taken aback by the terminology too. Isn’t Retrieval-Augmented Generation the standard nomenclature for this pattern? https://arxiv.org/abs/2005.11401


It's still "in-context learning" as per the GPT-3 definition because they are supplying some demonstrations of the task in the prompt.

Only thing special is that the input for each demonstration is obtained through embedding-based retrieval.


Is the emerging architecture made out to be more complicated than what most of the companies are currently building? Perhaps! But this is most likely the general direction where things will start trending towards as the auxiliary ecosystem matures.

Shameless plug: For fellow Ruby-ists we're building an orchestration layer for building LLM applications, inspired by the original, Langchain.rb: https://github.com/andreibondarev/langchainrb


Reading the comments, it seems like we need better human-agent interaction tools

Many are frustrated about not being able to better direct the agents

It’s like the agents have certain pre-learned things they can do, but they aren’t really learning how to apply those things to the environments their human operators want them to develop in

Or at least it is not easy/straightforward how to teach the model new tricks


100% agree with this.

The Agent & Tools metaphor makes a ton of sense for its simplicity, but has yet to scale beyond very simple agents.

I’m long on it as a programming model, but a lot of work is needed.


I was looking at various forms of indexing solutions to solve search and clustering problems with TerminusDB for clients. When I compared solutions against embeddings from LLMs, they LLMs were just far easier to work with and got much better results. I believe traditional text indexing will die quickly, as will a lot of the Entity Resolution and traditional clustering methods to be replaced completely by LLM's. We found them so compelling we wrote our own open source vector database sidecar: https://github.com/terminusdb-labs/terminusdb-semantic-index...


Oh, I should probably mention a blog I wrote describing how it works: https://github.com/terminusdb/technical-blogs/blob/main/blog...


It's a good article for people with no strong background in NLP or LLMs. Gives a comprehensive overview on how this could be applicable for startups.

Enterprise, not so quite, there are a lot of other stuff to consider like points missing, ethical application, filtering, security, points that are very important for enterprise customers.

Also, in-context learning is just one way to apply LLMs, there are way more applications like few short learning, fine tuning, depending on cost and application involved as I've highlighted here

https://twitter.com/igorcosta/status/1671316499179667456


I agree on the excerpt on agents. Reliability and reproducibility of task completion is the biggest problem for agents to cross the chasm to real life use cases. When agents are given an objective, they think everything from first principles or scratch about next best action to complete the objective and agent trajectory ends up becoming more of a linguistic dance. But we are solving some of the agent specific problems at SuperAGI https://github.com/TransformerOptimus/SuperAGI ( disclaimer : Im creator of it ) by doing agent trajectory fine tuning using recursive instructions. Think about objective as telling agent to go from A to B and instructions are akin to giving it directions about the route. And this instruction can be self created after every run and fed into subsequent runs to improve the trajectory.

Other problem with agent is : most independent agents are capable of doing very thin slice of use case, but for complex knowledge work tasks, more often than not, one agent is not enough. You need a team of agents. We introduced a concept of Agent Clusters - which operate in master slave architecture and coordinating among themselves to complete nuanced tasks and coordinating via shared memory and shared task list.

Another big bottleneck I think is lack of a notional concept of Knowledge for Agents. We have LTM and STM, but knowledge is specialized understanding of particular class of objectives ( ecommerce customer support, Account based marketing, medical diagnostics for particular condition etc ) plugged into the agent. Currently agents leverage on the knowledge available in the LLMs. LLMs are great for intelligence, but not necessarily knowledge required for an objective. So we added concept of knowledge - which is a embedding plugged into agent apart from LTM / STM

There lot of other challenges that need to solved like agent performance monitoring, agent specific models, agent to agent communication etc to truly solve for agents deployed in production. Not sure about point mentioned in the article that they might even take over the entire stack because autonomous agentic behaviour is good for certain use cases and not for all kinds of apps.


Should have Qdrant in the list of vector db's


I think sidecar vector databases that work with existing dbs will emerge as more prevalent than the pure vector DB. I also think the vector & graph combo on highly interconnected data will have additional benefits for those building a wide range of LLM applications. A good example is the VectorLink architecture with TerminusDB [1] which is based on Hierarchical Navigable Small World graphs written in Rust.

[1] https://github.com/terminusdb-labs/terminusdb-semantic-index...


With everyone writing about LLMs, and not time to read them even 1% of it all, the reason to read A16z is for technical/analytical merit, or for an investment-pumping angle?


In future we will see ton of similar charts borrowing elements from graph and signal theories. There's no limit on amount of different LLM-multiagents.


This sounds quite interesting, could you expand on it? What is some of the low-hanging fruit in your opinion? Do you have any examples of projects that are explicitly building on top of these ideas?


The space of possible GPT-4 outputs is hard to comprehend.

The space of possible different "graphs" of LLM agents connected to each other is even larger.

Each graph represents a multiagent system

Here's a generic syntax for notating graphs that don't have loops (essentially trees):

AgentName: Descriptive Name Goals: - Goal1 - Goal2 ...

Techniques: - Instruction1 - Instruction2 ...

Inputs: - From AgentName: Description of input - From OtherAgentName: Description of input ...

Outputs: - To AgentName: Description of output - To OtherAgentName: Description of output ...

-> SubAgentName1: Descriptive Name Goals: - Goal1 - Goal2 ...

Techniques: - Instruction1 - Instruction2 ...

Inputs: - From AgentName: Description of input - From OtherAgentName: Description of input ...

Outputs: - To AgentName: Description of output - To OtherAgentName: Description of output ...

  -> SubSubAgentName1: Descriptive Name
  Goals: 
    - Goal1
    - Goal2
  Techniques:
    - Instruction1
    - Instruction2
  Inputs:
    - From SubAgentName1: Description of input
  Outputs:
    - To SubAgentName1: Description of output
  ...

  -> SubSubAgentName2: Descriptive Name
  Goals: 
    - Goal1
    - Goal2
  Techniques:
    - Instruction1
    - Instruction2
  Inputs:
    - From SubAgentName1: Description of input
  Outputs:
    - To SubAgentName1: Description of output
  ...
-> SubAgentName2: Descriptive Name Goals: - Goal1 - Goal2 ...

Techniques: - Instruction1 - Instruction2 ...

Inputs: - From AgentName: Description of input ...

Outputs: - To AgentName: Description of output ...

---

Here's an example researcher agent and it's interior using the syntax. The English translation was lost in my notes but you can put this to a translator:

Tutkimus: Tavoitteet: - Tuottaa ja analysoida tiedusteluja - Rakentaa uutta tutkimusta - Tutkia tuntemattomia aiheita - Päätellä tiedusteluista Tekniset ohjeet: - Ohje / sääntö päättelylle 1 - Ohje / sääntö päättelylle 2 Syötteet: - Agentilta Meta-tietoisuus: Ehdotukset - Agentilta Alatutkimus: Tulokset Tulosteet: - Agentille Alatutkimus: Käskyt - Agentille Muisti: Tutkimus

-> Alatutkimus: Tavoitteet: - Suorittaa erityisiä tutkimustehtäviä Tekniset ohjeet: - Ohje / sääntö käskyjen noudattamiselle 1 - Ohje / sääntö käskyjen noudattamiselle 2 Syötteet: - Agentilta Tutkimus: Käskyt Tulosteet: - Agentille Tutkimus: Tulokset

-> Muisti: Tavoitteet: - Ylläpitää tutkimuksen tallennetta Tekniikat: - Ohje / sääntö muistikantojen käyttämiselle 1 - Ohje / sääntö muistikantojen käyttämiselle 2 Syötteet: - Agentilta Tutkimus: Tutkimus Tulosteet: Ei mitään

---

Signal theory becomes relevant when thinking about I/O, embedded agency, and when the agents aren't / cannot be constantly "reading" each other.

---

For similar projects, the current AutoGPT-style systems are very primitive and haven't adapted to my ideas. If what I call the cognitive architectures of the LLM-multiagent systems were carefully designed, which I predict will become a thing (and subject to ton of future research!), our AI systems could gain very advanced cognitive capabilities, perhaps even approaching humans but in their own, formal manner.

One person suggested me this:

https://princeton-nlp.github.io/SocraticAI/

I haven't read it but seems to have similarities.



I appreciate the detailed response. I'll look into the SocraticAI project as well.

FWIW I asked because I'm working on a toolkit for applying Monte Carlo tree search to agent graph generation and am always on the lookout for fundamental insights that could help direct its development.


I'm available via XMPP and email if you want to talk (addresses in profile).


I second the request for you to expand on that.....


I covered the subject during a python Atlanta talk last month. There isn’t much that’s new at the moment. Mostly because an LLM can be considered a software agent. That may change soon as things become more complex, though. Things like AWS’s Kendra show there’s some new patterns in the pipeline.

I’ll say this post is rather shallow to be considered technical or even fit the title


Ugh. Of course the Enterprise Architecture rears its ugly head here.

Just here to say that you can quickly build a robust feature with only OpenAI's APIs, redis, a text file for the prompt you parameterize (versioned), and a little bit of glue code (no LangChain). You can add instrumentation for observability around that like you would any other code.

I would wager that most enterprise use cases don't need most of the tools listed in this article, and using them is complete overkill.


Exactly!

We are talking about calling an API here people. Maybe what is behind the API is magical and powerful seeming but it's just an API that takes some context, the tokens to generate and a temperature setting.


Part of that diagram is about fine-tuning / retraining. Part is about managing canned prompts, since those matter enough to have their own development cycle. Part is about caching, since the fancier models are very expensive. Part is about filtering the output to not upset people, which is built in to the hosted versions (currently called "AI safety"). Etc.

Doing everything in that diagram is probably overkill for most uses. But using it as a starting point and trimming what you don't need will help a bit with triping over "oops I forgot to include that".


Makes sense.

But caching, for instance, doesn't need to be it's own lib does it? I don't want 'semantic caching' I want to cache the exact same query and I can do that without being LLM specific:

    from joblib import Memory

    @memory.cache
    def call_chat_completion_api_cached(max_tokens, messages,temperature):
    ...
I mean, I guess then I might want to store that somewhere central like redis and maybe slowly I need a specific cache tool. So I get your point. It's helpful to see the possibilities of approaching these problems.

But it also does feel like an land grab of supporting libs and infrastructure.


Most enterprises likely won't be able to use OpenAI since they might have proprietary information, so setting up a good on-prem open source LLM can be necessary. In fact, I am doing so right now with my company.


You can create your own instances of the OpenAI models in Azure to keep proprietary data with the bounds of your tenant there.


Interesting, can you please provide a link to the relevant documentation? The closest I found is https://azure.microsoft.com/en-us/blog/chatgpt-is-now-availa... and https://learn.microsoft.com/en-us/azure/cognitive-services/o... but wasn't able to find anything about the data provisioning terms you mention.



Thank you! I believe you may be referring to this:

> Fine-tuned OpenAI models. The Fine-tunes API allows customers to create their own fine-tuned version of the OpenAI models based on the training data that they have uploaded to the service via the Files APIs. The trained fine-tuned models are stored in Azure Storage in the same region, encrypted at rest and logically isolated with their Azure subscription and API credentials. Fine-tuned models can be deleted by the user by calling the DELETE API operation.

My follow-up question is: how is this different from OpenAI's finetuning?


You don't have to necessary load "proprietary info" into the model to generate valuable data


Sure you could and I bet there are a number of VC backed startups that do only this.

But there's usually a lot of reasons why these architectures could be a useful reference. In my project, I host my own trained LLM and one of the cost efficiencies comes from being able to cache at every step along the way. Then there is a large private media hosting consideration.

There is room for all sorts of setups and I kind of liked how the article mapped out some of the common paths.


Building flexibility in helps pivot easier.


100% this. If you’re building apps that call LLMs, most of the magic is already in the model, and what you are concerned about is tracking inputs and outputs.

A browser database and React is all I have needed for my LLM apps.


> You can add instrumentation for observability around that like you would any other code.

i built my own last weekend. https://github.com/smol-ai/logger

dumps things to json files, or to a log store. all you need for prompt engineering and monitoring really! no VC needed, no DataDog of AI yet


"This work is based on conversations with AI startup founders and engineers." - when you have millions in venture funding there is pressure from the VC, CEO and board to show some non-trivial "architecture"... telling them you have a prompt text file, redid and a bit of glue code might not go over that well (unfortunately)


Don't read it as "you should use these tools", reat it as "if you want this feature, here are some example tools that provide it".

.

> quickly build a robust feature with only

Much like how building Twitter is a weekend-sized project.


> Much like how building Twitter is a weekend-sized project.

More like a month, but yes. That's what we did - a month to launch, then another month to harden and add several features. Rolled out to all customers.

Building product features with LLMs is difficult, but not because of the architectural needs. It's an API you pass data to.


how can they make a sale to enterprise IT without a giant architecture diagram ? The next thing you know, java may rear its ugly head as well just when we thought python eliminated it completely for these applications.


Simple or complex, API and workflow orchestration will always be a thing.


Still relatively simple, the stack being LLM hopefully most of the actual “stack” work will be transferred inside the LLM. Example, if context size becomes unlimited, could do away with vector dbs.


The accuracy degrades with a larger context size, as pointed out in the article itself:

> Claude offers fast inference, GPT-3.5-level accuracy, more customization options for large customers, and up to a 100k context window (though we’ve found accuracy degrades with the length of input).


Great starting point! These diagrams notably miss a LLM firewall layer, which is critical in practice to safe LLM adoption. Source: We work with thousands of users for logicloop.com/ai


What do you mean by firewall layer? What tools do you use here?


These common issues tend to prevent LLMs from being used in the wild: * Data Leakage * Hallucination * Prompt Injection * Toxicity

So yes it does include prompt injection, but is a bit broader. Data Leakage is one that several customers have called out, aka accidentally leakage PII to underlying models when asking them questions about your data.

I'm evaluating tools like Private AI, Arthur AI etc. but they're all fairly nascent.


I’m a researcher in the space exploring few ideas with the intention of starting up. Would love to reach out to you and talk to you. Is there a way I can contact you?

My email is beady.chap-0f@icloud.com


I imagine he's talking about preventing prompt injection (or making shit up)


Yup, that's part of it but I mean it bidirectionally - users can accidentally leak data to models too, which is concerning to SecOps teams without a way to monitor / auto-redact.


That doesn't seem like the type of problem that can be solved with a drop-in solution.


I think we can detect atleast a few things like PII leaks etc. Don't you think those things alone are valuable?


No but that won't stop them from making a startup to sell you some snake oil that doesn't work!


are VCs doing architecture now ? huh... architecture astronauts much.

looks like they are playing catchup and trying to stay relevant.

What happened to their web3 vision ?


This feels like exactly what we have done with full stack engineering and recommend everyone in the space needs all of this…


They mention the contextual stack is is relatively underdeveloped. Any idea on what can be improved there?


Building contexts for structured data used in AI / GPT tasks is something I've seen little written about, but is obviously quite important.

Confluent calls it a "customer 360" problem [1] and I don't disagree.

We (Estuary) also wrote up a post showing an approach for Slack => ChatGPT => Google Sheets [2], and have more content coming for Salesforce, HubSpot, and some others.

[1] https://www.confluent.io/blog/chatgpt-and-streaming-data-for...

[2] https://estuary.dev/gpt-real-time-pipeline/


Do a16z invest in small scale AI companies? Or are they only doing series B+ investments?


they do. you cant be their size and not do everything.


Something not obvious to me with these VC diagrams wrt the memory tier being just vector DBs vs also including knowledge graphs

Good: We're (of course) doing a lot of these architectures behind-the-scenes for louie.ai and client projects around that. Vector embeddings are an easy way to do direct recall for data that's bigger-than-context. As long as the user has a simple question that just needs recalling a text snippet that fairly directly overlaps with the question, vector embeddings are magical. Conversational memory for sharing DB queries across teammates, simple discussion of decades of PDF archives and internal wikis... amazing.

Not so good: What happens when the text data to answer your question isn't a directly semantic search match away? "Why does Team X have so many outages?" => What projects is Team X on" + "Outages for those projects" + "Analysis for outage" . AFAICT, this gets into:

A. Failure: Stick with query -> vector DB -> LLM summary and get the wrong answer over the wrong data

B. AutoGPT: Getting into an autoGPT langchain that iteratively queries the vector DB, and iteratively reasons over results, & iteratively plans, until it finds what it wants. But autoGPT seems to be more excitement than production use. Many questions like speed, cost, & quality...

C. Knowledge graphs: Getting into use the LLM to generate a higher-quality knowledge graph of the data that is more receptive to LLM querying. The above question now becomes a simpler multi-hop query over the KG, so both fast and cost-effective... If you've indexed correctly and taught your LLM to generate the right queries.

(Related: If you're into this kind of topic, we're hiring here to build out these systems + help use them on our customers in investigative areas like cyber, misinfo, & emergency response. See new openings up @ ttps://www.graphistry.com/careers !)


Out of interest how do sentence embeddings work. I just got to the point of understanding what a transformer "does".

So you have token embeddings, but tokens are too small to be useful.

Is "what a sentence means" encoded as a vector once you have passed the embeddings through a transformer or two?


Yes -- this is a good article on it: https://txt.cohere.com/sentence-word-embeddings

As a blackbox, it is a generalization of word2vec to sequence2vec. For example, simply summing or averaging word vectors in a sentence can give you a fast & cheap sentence embedding.

But natural language sentences have more structure than natural language words. Ex: it matters precisely where "not" goes in a sentence. So a lot of impressive scientific experimentation went into making these models smarter, with many evolutions. Impressively, this so blackboxed now that doesn't super matter.

Implicit to my post here... that's powerful, and easy to use... but not necessarily a great knowledge representation for someone who wants good Q&A over enterprise-scale data. One of our customer scenarios: "What is known vs believed about incident X." We can index each paragraph as multiple sentence embeddings, so if any phrase matches a query, the full paragraphs can get thrown into GPT as part of our answer. Easy. However, if information in the paragraph may lead to wanting to get information from elsewhere in the system (mention of another team, project, incident, ...), that means either a Planning agent needs to then realize that and recursively generate more vector search queries (mini-AutoGPT)... or we need to index on more than the sentence embedding.

Again, super interesting problems, and we're hiring for folks interested in helping work on it!


Anyone building vectorDBs in-browser, possibly using WASM?


Any companies making vector databases for iOS or Android?


AKA one box and edge for every funded a16z startup




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: