I wanted to share with you guys a walkthrough, breaking the creation of an AI chat app with your data into a super simple, few-command process using SuperDuperDB and MongoDB Atlas.
At SuperDuperDB (Open-source project) we took the initiative to build and deploy an AI chatbot that digs into technical documentation. You can check it out here: https://www.question-the-docs.superduperdb.com/
Generally, the generic implementation for such a chat application could involve a complex sequence of operational steps: converting text-data from your database to vectors, setting up a vector-index for efficient vector location, establishing an endpoint for a LLM like OpenAI, setting up another endpoint for the process of converting a question to a vector, locating relevant documents to the posed question via vector-search, and sending those context documents to the LLM.
But with SuperDuperDB and MongoDB Atlas you can complete these steps in a more streamlined way.
Here's a quick look at how easy this is.
Connect MongoDB and OpenAI with SuperDuperDB:
from superduperdb.db.base.build import build_datalayer
from superduperdb import CFG
import os
ATLAS_URI = "mongodb+srv://<user>@<atlas-server>/<database_name>"
OPENAI_API_KEY = "<your-open-ai-api-key>"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
CFG.data_backend = ATLAS_URI
CFG.vector_search = ATLAS_URI
db = build_datalayer()
So when you have a question about your data, you can now dig into your MongoDB documents with the help of AI!
.
To Set up a vector-index:
from superduperdb.container.vector_index import VectorIndex
from superduperdb.container.listener import Listener
from superduperdb.ext.openai.model import OpenAIEmbedding
In this instance, the model used for creating vectors is OpenAIEmbedding, but it's entirely customizable, with options ranging from CohereAI API and Hugging-Face transformers to sentence-transformers and self-built models in torch.
The Listener component tracks new incoming data and computes new vectors as it arrives, while the VectorIndex connects user queries with computed vectors and the model. By adding this nested component to db, the components are activated and prepared for vector-search.
To Add a question-answering component:
from superduperdb.ext.openai.model import OpenAIChatCompletion
chat = OpenAIChatCompletion(
model='gpt-3.5-turbo',
prompt=(
'Use the following content to answer this question\n'
'Do not use any other information you might have learned\n'
'Only base your answer on the content provided\n'
'{context}\n\n'
'Here\'s the question:\n'
),
)
db.add(chat)
This single command creates and sets up an OpenAI hosted LLM to work in tandem with MongoDB Atlas. The prompt is modifiable and can be set up to ingest the 'context' using the format variable '{context}'. The results of the vector search are inserted into this format variable.
So when you have a question of your data, you can now dig into your MongoDB documents with the help of AI!
input = 'Explain to me the reasons for the change of strategy in the company this year.'
This simple command triggers a vector-search query in the 'context' parameter, and the results are added to the prompt to prepare the LLM to base its answer on the relevant documents located in your MongoDB database.
We hope you find this method can be helpful for you guys!
I wanted to share with you guys a walkthrough, breaking the creation of an AI chat app with your data into a super simple, few-command process using SuperDuperDB and MongoDB Atlas.
At SuperDuperDB (Open-source project) we took the initiative to build and deploy an AI chatbot that digs into technical documentation. You can check it out here: https://www.question-the-docs.superduperdb.com/
Generally, the generic implementation for such a chat application could involve a complex sequence of operational steps: converting text-data from your database to vectors, setting up a vector-index for efficient vector location, establishing an endpoint for a LLM like OpenAI, setting up another endpoint for the process of converting a question to a vector, locating relevant documents to the posed question via vector-search, and sending those context documents to the LLM.
But with SuperDuperDB and MongoDB Atlas you can complete these steps in a more streamlined way.
Here's a quick look at how easy this is.
Connect MongoDB and OpenAI with SuperDuperDB:
from superduperdb.db.base.build import build_datalayer from superduperdb import CFG import os ATLAS_URI = "mongodb+srv://<user>@<atlas-server>/<database_name>" OPENAI_API_KEY = "<your-open-ai-api-key>" os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY CFG.data_backend = ATLAS_URI CFG.vector_search = ATLAS_URI db = build_datalayer()
So when you have a question about your data, you can now dig into your MongoDB documents with the help of AI! .
To Set up a vector-index:
from superduperdb.container.vector_index import VectorIndex from superduperdb.container.listener import Listener from superduperdb.ext.openai.model import OpenAIEmbedding
collection = Collection('documents')
db.add( VectorIndex( identifier='my-index', indexing_listener=Listener( model=OpenAIEmbedding(model='text-embedding-ada-002'), key='txt', select=collection.find(), ), ) )
In this instance, the model used for creating vectors is OpenAIEmbedding, but it's entirely customizable, with options ranging from CohereAI API and Hugging-Face transformers to sentence-transformers and self-built models in torch.
The Listener component tracks new incoming data and computes new vectors as it arrives, while the VectorIndex connects user queries with computed vectors and the model. By adding this nested component to db, the components are activated and prepared for vector-search.
To Add a question-answering component:
from superduperdb.ext.openai.model import OpenAIChatCompletion
chat = OpenAIChatCompletion( model='gpt-3.5-turbo', prompt=( 'Use the following content to answer this question\n' 'Do not use any other information you might have learned\n' 'Only base your answer on the content provided\n' '{context}\n\n' 'Here\'s the question:\n' ), )
db.add(chat)
This single command creates and sets up an OpenAI hosted LLM to work in tandem with MongoDB Atlas. The prompt is modifiable and can be set up to ingest the 'context' using the format variable '{context}'. The results of the vector search are inserted into this format variable.
So when you have a question of your data, you can now dig into your MongoDB documents with the help of AI!
input = 'Explain to me the reasons for the change of strategy in the company this year.'
response, context = db.predict( 'gpt-3.5-turbo', input=input, context=collection.like({'txt': input}, vector_index='my-index').find() )
This simple command triggers a vector-search query in the 'context' parameter, and the results are added to the prompt to prepare the LLM to base its answer on the relevant documents located in your MongoDB database.
We hope you find this method can be helpful for you guys!