Emerging architectures for LLM applications

ericjang · on June 21, 2023

I am an AI researcher. Most actual AI researchers and engineers use very few of these tools - the only one being model providers like OpenAI API and public clouds (AWS, Azure, GCP). The rest of these are infra-centric tools that a16z is highly incentivized to over-inflate the importance of.

paulgb · on June 21, 2023

This does look like the sort of complex ecosystem that emerges when there is an inflection point and then, before consolidation happens. It reminds me of adtech in the early 2010s.

That said, while much of this might not have any real traction long-term, looking at what researchers use seems to miss the mark a bit. It’s like saying network technology researchers aren’t using Vercel.

haldujai · on June 21, 2023

There are some other useful ones in there. Hugging Face jumps out. W&B. I haven't used Mosaic but I could see myself for bigger projects, I know of at least two PIs @ Stanford using them.

comfypotato · on June 21, 2023

I think the diagram was just meant to be comprehensive. The writeup itself doesn’t imply all the tool nonsense.

smeagull · on June 21, 2023

Yes, because you're not working on applications.

golergka · on June 21, 2023

This blog post is not about AI researchers — it's about developers who make products out of LLMs.

bluecoconut · on June 20, 2023

> So, agents have the potential to become a central piece of the LLM app architecture (or even take over the whole stack, if you believe in recursive self-improvement). ... . There’s only one problem: agents don’t really work yet.

I really appreciate that they called out and separated some hype vs. practice, specifically with regards to Agents. This is something I keep hoping works better than it does, and in practice every attempt I've taken in this direction leads to disappointment.

majestic5762 · on June 20, 2023

What vector DB are you using? What is the data structure that you're vectorizing? What is your chunk size? Have you implemented memory? What prompt or technique are you using (ReACT, CoT, few-shot, etc)? Are you only using vector DBs? Do you use sequential chains? Does it need tools? Depending on your data, business case and what output you expect from the agent, there is no one-size-fits-them-all.

baner2022 · on June 20, 2023

Ha, I had the exact same reaction

I feel like lots of paper are getting published and reviewed which is good as bad ideas don’t get to propagate for ages

liampulles · on June 20, 2023

This mirrors my experience as well. I've tried to use them for pretty straightforward support agent type tasks and found that they very often go down wrong paths trying to solve the problem.

jerpint · on June 20, 2023

Similarly for retrieval augmented LLM agents, they break down very quickly once the question is not directly found in the documents

EGreg · on June 20, 2023

I am so glad that top VCs are thinking along these lines, of architectures that incorporate AI as part of the flow.

We've spun off a company to realize the vision of bringing an open-source, standardized framework to the PHP ecosystem, where we've been building apps for communities for over a decade. It's "AI for the rest of us", but at the same time promoting positive collaboration between communities and AI. It also involves micropayments for tasks done by either an AI or human agent.

If you're a VC or an expert in the space, I'd love to get feedback on this: https://engageusers.ai/ecosystem.pdf

And if you want to get involved in any capacity, whether as an investor or developer, please email me greg at the domain engageusers.ai -- this time around we are planning to take on venture capital funding for this project, and syndicate a round later this summer.

killdozer · on June 20, 2023

This blog post is way more complex than it needs to be, a lot of what most people are doing with llms right now boils down to using vector databases to provide the "best" info/examples to your prompt. This is a slick marketing page but im not sure what they think they're providing beyond that.

TechBro8615 · on June 20, 2023

This is what a16z does. A few years ago it was the "Modern Data Stack" and a few years before that it was "DevOps." For some reason, venture capitalists really like making these fancy charts to describe the obvious, and then mostly ignoring them during their investment decisions (or sometimes they make the investments, then they make the charts and put their portfolio companies in the boxes).

im_down_w_otp · on June 21, 2023

To protect their downside risk by seeding the broader marketplace with a narrative which later stage investors and acquirers will be influenced by in making decisions to invest or acquire.

TechBro8615 · on June 21, 2023

The greater fool theory?

im_down_w_otp · on June 22, 2023

At this point I don’t think it’s fair to call it a theory.

fzliu · on June 20, 2023

Bit of self-promotion, but Milvus (https://milvus.io) is another open-source vector database option as well (I have a pretty good idea as to why it isn't listed in a16z's blog post). We also have milvus-lite, a pip-installable package that uses the same API, for folks who don't want to stand up a local service.

    pip install milvus

Other than that, it's great to see the shout-out for Vespa.

baner2022 · on June 20, 2023

Appreciate sharing, will try Milvus

Vector database space is the Wild West, keep at it

fzliu · on June 21, 2023

Hell yeah! Feel free to reach out if you need any help.

serjester · on June 21, 2023

Enjoying your guy's podcast!

tartakovsky · on June 20, 2023

Why do you not think it’s featured? Has a16z funded many of those companies, lol? And somehow, rejected milvus?

SheepHerdr · on June 20, 2023

The guys at Milvus raised a total of $113M according to Crunchbase, second only to Pinecone, which is funded by a16z. You're not going to highlight the main competitor of one of your portfolio companies.

gk1 · on June 21, 2023

The post mentions six alternatives to Pinecone. The reality is Milvus isn't as relevant today as it was a year ago. I'm from Pinecone but I'll give credit where it's due: Weaviate, Chroma, and Qdrant completely lapped Milvus in the open-source space. That's why they got mentioned and Milvus didn't.

swyx · on June 20, 2023

interesting to see that the word "generative" does not appear in this blogpost (apart from the tags). 6 months ago Generative AI was all the rage: https://a16z.com/2023/01/19/who-owns-the-generative-ai-platf...

I think this is a very well articulated breakdown of the "LLM Core Code Shell" (https://www.latent.space/p/function-agents#%C2%A7llm-core-co...) view of the world. but it is underselling the potential to leave the agents stuff to a three paragraph "what about agents?" piece at the end. the emerging architecture of "Code Core, LLM Shell" decentralizing and specializing the role of the LLM will hopefully get more airtime in the december a16z landscape chart!

rajko_rad · on June 20, 2023

Hi @swyx, Thanks for the kind words!

we actually just purposefully left that part a bit scarce because we have something else coming up on the topic! I'm sure we will be chatting through it soon :)

stan_kirdey · on June 21, 2023

this and other different end-to-end architectures are offered in deepset/haystack, one of the best and quite mature frameworks to work with LLMs (pre-GPT craze) and do augmented retrieval, etc.

I do feel the article presents old concepts as "emerging".

if you are curious about building something quickly, you can jump into one of the tutorials https://haystack.deepset.ai/tutorials

Over a weekend I've used deepset/haystack to build a Q/A engine over open source communities slack and discord threads that can potentially have an answer - it was a joy and a breeze to implement. If you have question about Metaflow, K8s, Golang, Deepset, Deep Java Library and some other tech - try asking your quick question on https://www.kwq.ai :-)

aantti · on June 21, 2023

Thanks for the mention and being part of Haystack's community :)

adamgordonbell · on June 21, 2023

Microsoft guidance is legit and useful. It's a bunch of prompting features piled on top of handlebar syntax. ( And it has its own caching. Set temp to 0 and it caches. no need for LLM specific caching libs :) )

https://github.com/microsoft/guidance

behnamoh · on June 21, 2023

Yes, but I've found LMQL equally impressive. It has a much better documentation than Guidance.

mikehollinger · on June 20, 2023

How prescient is the "Hidden Technical Debt" [1] paper from ~8 yrs ago compared to this? See the top of pg4 for a figure that I've personally found to be useful in explaining all the stuff necessary to put together a reasonable app using ML/DL stuff (up until today, anyway).

I see all the same bits called out:

- Data collection

- Machine /resource management

- Serving

- Monitoring

- Analysis

- Process mgt

- Data verification

There's some new concepts that aren't quite captured in the original paper like the "playground" though.

I've kind of been expecting a follow-up that shows an update to that original paper.

[1] https://proceedings.neurips.cc/paper_files/paper/2015/file/8...

serjester · on June 21, 2023

> "For devs who see every database-shaped hole and try to insert Postgres"

I see sub second performance on >1m vectors with PgVector. Vector databases have a place but this statement seems disingenuous at best. Bringing on a vector database adds additional complexity and a giant chunk of use cases simply don't need it. Not to mention the additional latency you'd be adding.

Imnimo · on June 21, 2023

>In-context learning solves this problem with a clever trick: instead of sending all the documents with each LLM prompt, it sends only a handful of the most relevant documents.

I don't even think this is a correct definition for "in-context learning". In-context learning is a type of few-shot learning in which examples of input/output pairs are provided as part of the prompt. The idea is that the model is able to "learn" the pattern of the task from the examples. Quoting from the GPT-3 paper:

>what we call “in-context learning”, using the text input of a pretrained language model as a form of task specification: the model is conditioned on a natural language instruction and/or a few demonstrations of the task and is then expected to complete further instances of the task simply by predicting what comes next.

I really don't think it's standard to refer to the process of embedding-based retrieval as "in-context learning".

nonfamous · on June 21, 2023

I was taken aback by the terminology too. Isn’t Retrieval-Augmented Generation the standard nomenclature for this pattern? https://arxiv.org/abs/2005.11401

lovelearning · on June 21, 2023

It's still "in-context learning" as per the GPT-3 definition because they are supplying some demonstrations of the task in the prompt.

Only thing special is that the input for each demonstration is obtained through embedding-based retrieval.

sourcelabs · on June 21, 2023

Is the emerging architecture made out to be more complicated than what most of the companies are currently building? Perhaps! But this is most likely the general direction where things will start trending towards as the auxiliary ecosystem matures.

Shameless plug: For fellow Ruby-ists we're building an orchestration layer for building LLM applications, inspired by the original, Langchain.rb: https://github.com/andreibondarev/langchainrb

nico · on June 20, 2023

Reading the comments, it seems like we need better human-agent interaction tools

Many are frustrated about not being able to better direct the agents

It’s like the agents have certain pre-learned things they can do, but they aren’t really learning how to apply those things to the environments their human operators want them to develop in

Or at least it is not easy/straightforward how to teach the model new tricks

eob · on June 21, 2023

100% agree with this.

The Agent & Tools metaphor makes a ton of sense for its simplicity, but has yet to scale beyond very simple agents.

I’m long on it as a programming model, but a lot of work is needed.

ggleason · on June 21, 2023

I was looking at various forms of indexing solutions to solve search and clustering problems with TerminusDB for clients. When I compared solutions against embeddings from LLMs, they LLMs were just far easier to work with and got much better results. I believe traditional text indexing will die quickly, as will a lot of the Entity Resolution and traditional clustering methods to be replaced completely by LLM's. We found them so compelling we wrote our own open source vector database sidecar: https://github.com/terminusdb-labs/terminusdb-semantic-index...

ggleason · on June 21, 2023

Oh, I should probably mention a blog I wrote describing how it works: https://github.com/terminusdb/technical-blogs/blob/main/blog...

ironfootnz · on June 21, 2023

It's a good article for people with no strong background in NLP or LLMs. Gives a comprehensive overview on how this could be applicable for startups.

Enterprise, not so quite, there are a lot of other stuff to consider like points missing, ethical application, filtering, security, points that are very important for enterprise customers.

Also, in-context learning is just one way to apply LLMs, there are way more applications like few short learning, fine tuning, depending on cost and application involved as I've highlighted here

https://twitter.com/igorcosta/status/1671316499179667456

snowcrash123 · on June 21, 2023

I agree on the excerpt on agents. Reliability and reproducibility of task completion is the biggest problem for agents to cross the chasm to real life use cases. When agents are given an objective, they think everything from first principles or scratch about next best action to complete the objective and agent trajectory ends up becoming more of a linguistic dance. But we are solving some of the agent specific problems at SuperAGI https://github.com/TransformerOptimus/SuperAGI ( disclaimer : Im creator of it ) by doing agent trajectory fine tuning using recursive instructions. Think about objective as telling agent to go from A to B and instructions are akin to giving it directions about the route. And this instruction can be self created after every run and fed into subsequent runs to improve the trajectory.

Other problem with agent is : most independent agents are capable of doing very thin slice of use case, but for complex knowledge work tasks, more often than not, one agent is not enough. You need a team of agents. We introduced a concept of Agent Clusters - which operate in master slave architecture and coordinating among themselves to complete nuanced tasks and coordinating via shared memory and shared task list.

Another big bottleneck I think is lack of a notional concept of Knowledge for Agents. We have LTM and STM, but knowledge is specialized understanding of particular class of objectives ( ecommerce customer support, Account based marketing, medical diagnostics for particular condition etc ) plugged into the agent. Currently agents leverage on the knowledge available in the LLMs. LLMs are great for intelligence, but not necessarily knowledge required for an objective. So we added concept of knowledge - which is a embedding plugged into agent apart from LTM / STM

There lot of other challenges that need to solved like agent performance monitoring, agent specific models, agent to agent communication etc to truly solve for agents deployed in production. Not sure about point mentioned in the article that they might even take over the entire stack because autonomous agentic behaviour is good for certain use cases and not for all kinds of apps.

arguflow · on June 20, 2023

Should have Qdrant in the list of vector db's

RealQwertie · on June 20, 2023

I think sidecar vector databases that work with existing dbs will emerge as more prevalent than the pure vector DB. I also think the vector & graph combo on highly interconnected data will have additional benefits for those building a wide range of LLM applications. A good example is the VectorLink architecture with TerminusDB [1] which is based on Hierarchical Navigable Small World graphs written in Rust.

[1] https://github.com/terminusdb-labs/terminusdb-semantic-index...

neilv · on June 20, 2023

With everyone writing about LLMs, and not time to read them even 1% of it all, the reason to read A16z is for technical/analytical merit, or for an investment-pumping angle?

Xen9 · on June 20, 2023

In future we will see ton of similar charts borrowing elements from graph and signal theories. There's no limit on amount of different LLM-multiagents.

killthebuddha · on June 20, 2023

This sounds quite interesting, could you expand on it? What is some of the low-hanging fruit in your opinion? Do you have any examples of projects that are explicitly building on top of these ideas?

Xen9 · on June 22, 2023

The space of possible GPT-4 outputs is hard to comprehend.

The space of possible different "graphs" of LLM agents connected to each other is even larger.

Each graph represents a multiagent system

Here's a generic syntax for notating graphs that don't have loops (essentially trees):

AgentName: Descriptive Name Goals: - Goal1 - Goal2 ...

Techniques: - Instruction1 - Instruction2 ...

Inputs: - From AgentName: Description of input - From OtherAgentName: Description of input ...

Outputs: - To AgentName: Description of output - To OtherAgentName: Description of output ...

-> SubAgentName1: Descriptive Name Goals: - Goal1 - Goal2 ...

Techniques: - Instruction1 - Instruction2 ...

Inputs: - From AgentName: Description of input - From OtherAgentName: Description of input ...

Outputs: - To AgentName: Description of output - To OtherAgentName: Description of output ...

  -> SubSubAgentName1: Descriptive Name
  Goals: 
    - Goal1
    - Goal2
  Techniques:
    - Instruction1
    - Instruction2
  Inputs:
    - From SubAgentName1: Description of input
  Outputs:
    - To SubAgentName1: Description of output
  ...

  -> SubSubAgentName2: Descriptive Name
  Goals: 
    - Goal1
    - Goal2
  Techniques:
    - Instruction1
    - Instruction2
  Inputs:
    - From SubAgentName1: Description of input
  Outputs:
    - To SubAgentName1: Description of output
  ...

-> SubAgentName2: Descriptive Name Goals: - Goal1 - Goal2 ...

Techniques: - Instruction1 - Instruction2 ...

Inputs: - From AgentName: Description of input ...

Outputs: - To AgentName: Description of output ...

---

Here's an example researcher agent and it's interior using the syntax. The English translation was lost in my notes but you can put this to a translator:

Tutkimus: Tavoitteet: - Tuottaa ja analysoida tiedusteluja - Rakentaa uutta tutkimusta - Tutkia tuntemattomia aiheita - Päätellä tiedusteluista Tekniset ohjeet: - Ohje / sääntö päättelylle 1 - Ohje / sääntö päättelylle 2 Syötteet: - Agentilta Meta-tietoisuus: Ehdotukset - Agentilta Alatutkimus: Tulokset Tulosteet: - Agentille Alatutkimus: Käskyt - Agentille Muisti: Tutkimus

-> Alatutkimus: Tavoitteet: - Suorittaa erityisiä tutkimustehtäviä Tekniset ohjeet: - Ohje / sääntö käskyjen noudattamiselle 1 - Ohje / sääntö käskyjen noudattamiselle 2 Syötteet: - Agentilta Tutkimus: Käskyt Tulosteet: - Agentille Tutkimus: Tulokset

-> Muisti: Tavoitteet: - Ylläpitää tutkimuksen tallennetta Tekniikat: - Ohje / sääntö muistikantojen käyttämiselle 1 - Ohje / sääntö muistikantojen käyttämiselle 2 Syötteet: - Agentilta Tutkimus: Tutkimus Tulosteet: Ei mitään

---

Signal theory becomes relevant when thinking about I/O, embedded agency, and when the agents aren't / cannot be constantly "reading" each other.

---

For similar projects, the current AutoGPT-style systems are very primitive and haven't adapted to my ideas. If what I call the cognitive architectures of the LLM-multiagent systems were carefully designed, which I predict will become a thing (and subject to ton of future research!), our AI systems could gain very advanced cognitive capabilities, perhaps even approaching humans but in their own, formal manner.

One person suggested me this:

https://princeton-nlp.github.io/SocraticAI/

I haven't read it but seems to have similarities.

Xen9 · on June 22, 2023

Proper syntax: https://textbin.net/lkttepexey

killthebuddha · on June 22, 2023

I appreciate the detailed response. I'll look into the SocraticAI project as well.

FWIW I asked because I'm working on a toolkit for applying Monte Carlo tree search to agent graph generation and am always on the lookout for fundamental insights that could help direct its development.

Xen9 · on June 23, 2023

I'm available via XMPP and email if you want to talk (addresses in profile).

czbond · on June 20, 2023

I second the request for you to expand on that.....

pryelluw · on June 20, 2023

I covered the subject during a python Atlanta talk last month. There isn’t much that’s new at the moment. Mostly because an LLM can be considered a software agent. That may change soon as things become more complex, though. Things like AWS’s Kendra show there’s some new patterns in the pipeline.

I’ll say this post is rather shallow to be considered technical or even fit the title

phillipcarter · on June 21, 2023

Ugh. Of course the Enterprise Architecture rears its ugly head here.

Just here to say that you can quickly build a robust feature with only OpenAI's APIs, redis, a text file for the prompt you parameterize (versioned), and a little bit of glue code (no LangChain). You can add instrumentation for observability around that like you would any other code.

I would wager that most enterprise use cases don't need most of the tools listed in this article, and using them is complete overkill.

adamgordonbell · on June 21, 2023

Exactly!

We are talking about calling an API here people. Maybe what is behind the API is magical and powerful seeming but it's just an API that takes some context, the tokens to generate and a temperature setting.

tbrownaw · on June 21, 2023

Part of that diagram is about fine-tuning / retraining. Part is about managing canned prompts, since those matter enough to have their own development cycle. Part is about caching, since the fancier models are very expensive. Part is about filtering the output to not upset people, which is built in to the hosted versions (currently called "AI safety"). Etc.

Doing everything in that diagram is probably overkill for most uses. But using it as a starting point and trimming what you don't need will help a bit with triping over "oops I forgot to include that".

adamgordonbell · on June 21, 2023

Makes sense.

But caching, for instance, doesn't need to be it's own lib does it? I don't want 'semantic caching' I want to cache the exact same query and I can do that without being LLM specific:

    from joblib import Memory

    @memory.cache
    def call_chat_completion_api_cached(max_tokens, messages,temperature):
    ...

I mean, I guess then I might want to store that somewhere central like redis and maybe slowly I need a specific cache tool. So I get your point. It's helpful to see the possibilities of approaching these problems.

But it also does feel like an land grab of supporting libs and infrastructure.

satvikpendem · on June 21, 2023

Most enterprises likely won't be able to use OpenAI since they might have proprietary information, so setting up a good on-prem open source LLM can be necessary. In fact, I am doing so right now with my company.

nonfamous · on June 21, 2023

You can create your own instances of the OpenAI models in Azure to keep proprietary data with the bounds of your tenant there.

abrichr · on June 21, 2023

Interesting, can you please provide a link to the relevant documentation? The closest I found is https://azure.microsoft.com/en-us/blog/chatgpt-is-now-availa... and https://learn.microsoft.com/en-us/azure/cognitive-services/o... but wasn't able to find anything about the data provisioning terms you mention.

rpeden · on June 21, 2023

This might be what you're looking for: https://learn.microsoft.com/en-us/legal/cognitive-services/o...

abrichr · on June 22, 2023

Thank you! I believe you may be referring to this:

> Fine-tuned OpenAI models. The Fine-tunes API allows customers to create their own fine-tuned version of the OpenAI models based on the training data that they have uploaded to the service via the Files APIs. The trained fine-tuned models are stored in Azure Storage in the same region, encrypted at rest and logically isolated with their Azure subscription and API credentials. Fine-tuned models can be deleted by the user by calling the DELETE API operation.

My follow-up question is: how is this different from OpenAI's finetuning?

arthurcolle · on June 21, 2023

You don't have to necessary load "proprietary info" into the model to generate valuable data

robbomacrae · on June 21, 2023

Sure you could and I bet there are a number of VC backed startups that do only this.

But there's usually a lot of reasons why these architectures could be a useful reference. In my project, I host my own trained LLM and one of the cost efficiencies comes from being able to cache at every step along the way. Then there is a large private media hosting consideration.

There is room for all sorts of setups and I kind of liked how the article mapped out some of the common paths.

j45 · on June 21, 2023

Building flexibility in helps pivot easier.

jchanimal · on June 21, 2023

100% this. If you’re building apps that call LLMs, most of the magic is already in the model, and what you are concerned about is tracking inputs and outputs.

A browser database and React is all I have needed for my LLM apps.

swyx · on June 21, 2023

> You can add instrumentation for observability around that like you would any other code.

i built my own last weekend. https://github.com/smol-ai/logger

dumps things to json files, or to a log store. all you need for prompt engineering and monitoring really! no VC needed, no DataDog of AI yet

hackernoteng · on June 21, 2023

"This work is based on conversations with AI startup founders and engineers." - when you have millions in venture funding there is pressure from the VC, CEO and board to show some non-trivial "architecture"... telling them you have a prompt text file, redid and a bit of glue code might not go over that well (unfortunately)

tbrownaw · on June 21, 2023

Don't read it as "you should use these tools", reat it as "if you want this feature, here are some example tools that provide it".

.

> quickly build a robust feature with only

Much like how building Twitter is a weekend-sized project.

phillipcarter · on June 21, 2023

> Much like how building Twitter is a weekend-sized project.

More like a month, but yes. That's what we did - a month to launch, then another month to harden and add several features. Rolled out to all customers.

Building product features with LLMs is difficult, but not because of the architectural needs. It's an API you pass data to.

emmender · on June 21, 2023

how can they make a sale to enterprise IT without a giant architecture diagram ? The next thing you know, java may rear its ugly head as well just when we thought python eliminated it completely for these applications.

j45 · on June 21, 2023

Simple or complex, API and workflow orchestration will always be a thing.

m3kw9 · on June 20, 2023

Still relatively simple, the stack being LLM hopefully most of the actual “stack” work will be transferred inside the LLM. Example, if context size becomes unlimited, could do away with vector dbs.

sourcelabs · on June 21, 2023

The accuracy degrades with a larger context size, as pointed out in the article itself:

> Claude offers fast inference, GPT-3.5-level accuracy, more customization options for large customers, and up to a 100k context window (though we’ve found accuracy degrades with the length of input).

akisej · on June 20, 2023

Great starting point! These diagrams notably miss a LLM firewall layer, which is critical in practice to safe LLM adoption. Source: We work with thousands of users for logicloop.com/ai

applgo443 · on June 20, 2023

What do you mean by firewall layer? What tools do you use here?

akisej · on June 20, 2023

These common issues tend to prevent LLMs from being used in the wild: * Data Leakage * Hallucination * Prompt Injection * Toxicity

So yes it does include prompt injection, but is a bit broader. Data Leakage is one that several customers have called out, aka accidentally leakage PII to underlying models when asking them questions about your data.

I'm evaluating tools like Private AI, Arthur AI etc. but they're all fairly nascent.

applgo443 · on June 21, 2023

I’m a researcher in the space exploring few ideas with the intention of starting up. Would love to reach out to you and talk to you. Is there a way I can contact you?

My email is beady.chap-0f@icloud.com

__loam · on June 20, 2023

I imagine he's talking about preventing prompt injection (or making shit up)

akisej · on June 20, 2023

Yup, that's part of it but I mean it bidirectionally - users can accidentally leak data to models too, which is concerning to SecOps teams without a way to monitor / auto-redact.

lukasb · on June 20, 2023

That doesn't seem like the type of problem that can be solved with a drop-in solution.

applgo443 · on June 21, 2023

I think we can detect atleast a few things like PII leaks etc. Don't you think those things alone are valuable?

__loam · on June 20, 2023

No but that won't stop them from making a startup to sell you some snake oil that doesn't work!

emmender · on June 21, 2023

are VCs doing architecture now ? huh... architecture astronauts much.

looks like they are playing catchup and trying to stay relevant.

What happened to their web3 vision ?

zitterbewegung · on June 20, 2023

This feels like exactly what we have done with full stack engineering and recommend everyone in the space needs all of this…

applgo443 · on June 20, 2023

They mention the contextual stack is is relatively underdeveloped. Any idea on what can be improved there?

jgraettinger1 · on June 20, 2023

Building contexts for structured data used in AI / GPT tasks is something I've seen little written about, but is obviously quite important.

Confluent calls it a "customer 360" problem [1] and I don't disagree.

We (Estuary) also wrote up a post showing an approach for Slack => ChatGPT => Google Sheets [2], and have more content coming for Salesforce, HubSpot, and some others.

[1] https://www.confluent.io/blog/chatgpt-and-streaming-data-for...

[2] https://estuary.dev/gpt-real-time-pipeline/

DethNinja · on June 20, 2023

Do a16z invest in small scale AI companies? Or are they only doing series B+ investments?

swyx · on June 20, 2023

they do. you cant be their size and not do everything.

lmeyerov · on June 20, 2023

Something not obvious to me with these VC diagrams wrt the memory tier being just vector DBs vs also including knowledge graphs

Good: We're (of course) doing a lot of these architectures behind-the-scenes for louie.ai and client projects around that. Vector embeddings are an easy way to do direct recall for data that's bigger-than-context. As long as the user has a simple question that just needs recalling a text snippet that fairly directly overlaps with the question, vector embeddings are magical. Conversational memory for sharing DB queries across teammates, simple discussion of decades of PDF archives and internal wikis... amazing.

Not so good: What happens when the text data to answer your question isn't a directly semantic search match away? "Why does Team X have so many outages?" => What projects is Team X on" + "Outages for those projects" + "Analysis for outage" . AFAICT, this gets into:

A. Failure: Stick with query -> vector DB -> LLM summary and get the wrong answer over the wrong data

B. AutoGPT: Getting into an autoGPT langchain that iteratively queries the vector DB, and iteratively reasons over results, & iteratively plans, until it finds what it wants. But autoGPT seems to be more excitement than production use. Many questions like speed, cost, & quality...

C. Knowledge graphs: Getting into use the LLM to generate a higher-quality knowledge graph of the data that is more receptive to LLM querying. The above question now becomes a simpler multi-hop query over the KG, so both fast and cost-effective... If you've indexed correctly and taught your LLM to generate the right queries.

(Related: If you're into this kind of topic, we're hiring here to build out these systems + help use them on our customers in investigative areas like cyber, misinfo, & emergency response. See new openings up @ ttps://www.graphistry.com/careers !)

quickthrower2 · on June 21, 2023

Out of interest how do sentence embeddings work. I just got to the point of understanding what a transformer "does".

So you have token embeddings, but tokens are too small to be useful.

Is "what a sentence means" encoded as a vector once you have passed the embeddings through a transformer or two?

lmeyerov · on June 23, 2023

Yes -- this is a good article on it: https://txt.cohere.com/sentence-word-embeddings

As a blackbox, it is a generalization of word2vec to sequence2vec. For example, simply summing or averaging word vectors in a sentence can give you a fast & cheap sentence embedding.

But natural language sentences have more structure than natural language words. Ex: it matters precisely where "not" goes in a sentence. So a lot of impressive scientific experimentation went into making these models smarter, with many evolutions. Impressively, this so blackboxed now that doesn't super matter.

Implicit to my post here... that's powerful, and easy to use... but not necessarily a great knowledge representation for someone who wants good Q&A over enterprise-scale data. One of our customer scenarios: "What is known vs believed about incident X." We can index each paragraph as multiple sentence embeddings, so if any phrase matches a query, the full paragraphs can get thrown into GPT as part of our answer. Easy. However, if information in the paragraph may lead to wanting to get information from elsewhere in the system (mention of another team, project, incident, ...), that means either a Planning agent needs to then realize that and recursively generate more vector search queries (mini-AutoGPT)... or we need to index on more than the sentence embedding.

Again, super interesting problems, and we're hiring for folks interested in helping work on it!

ukuina · on June 21, 2023

Anyone building vectorDBs in-browser, possibly using WASM?

LeicaLatte · on June 20, 2023

Any companies making vector databases for iOS or Android?

fnordpiglet · on June 21, 2023

AKA one box and edge for every funded a16z startup