Hi HN,
A few of our team members at Airbyte (and Joe, who killed it!) recently played with building our own internal support chat bot, using Airbyte, Langchain, Pinecone and OpenAI, that would answer any questions we ask when developing a new connector on Airbyte.
As we prototyped it, we realized that it could be applied for many other use cases and sources of data, so... we created a tutorial that other community members can leverage [http://airbyte.com/tutorials/chat-with-your-data-using-opena...] and the Github repo to run it [https://github.com/airbytehq/tutorial-connector-dev-bot]
The tutorial shows:
- How to extract unstructured data from a variety of sources using Airbyte Open Source
- How to load data into a vector database (here Pinecone), preparing the data for LLM usage along the way
- How to integrate a vector database into ChatGPT to ask questions about your proprietary data
I hope some of it is useful, and would love your feedback!
https://python.langchain.com/docs/integrations/llms/ollama
This can be a great option if you'd like to keep your data local versus submitting it to a cloud LLM, with the added benefit of saving costs if you're submitting many questions in a row (e.g. in batches)