Hi HN,
I am Jiayuan, and I'm here to introduce a tool we've been building over the past few months: Devv (https://devv.ai). In simple terms, it is an AI-powered search engine specifically designed for developers.
Now, you might ask, with so many AI search engines already available—Perplexity, You.com, Phind, and several open-source projects—why do we need another one?
We all know that Generative Search Engines are built on RAG (Retrieval-Augmented Generation)[1] combined with Large Language Models (LLMs). Most of the products mentioned above use indexes from general search engines (like Google/Bing APIs), but we've taken a different approach.
We've created a vertical search index focused on the development domain, which includes:
- Documents: These are essentially the single source of truth for programming languages or libraries; I believe many of you are users of Dash (https://kapeli.com/dash) or devdocs (https://devdocs.io/).
- Code: While not natural language, code contains rich contextual information. If you have a question related to the Django framework, nothing is more convincing than code snippets from Django's repository.
- Web Search: We still use data from search engines because these results contain additional contextual information.
Our reasons for doing this include:
- The quality of the index is crucial to the RAG system; its effectiveness determines the output quality of the entire system.
- We focus more on the Index (RAG) rather than LLMs because LLMs evolve rapidly; even models performing well today may be superseded by better ones in a few months, and fine-tuning an LLM now has relatively low costs.
- All players are currently exploring what kind of LLM product works best; we hope to contribute some different insights ourselves (and plan to open source parts of our underlying infrastructure in return for contributions back into open source communities).
Some brief product features:
- Three modes: - Fast mode: Offers quick answers within seconds. - Agent mode: For complex queries where Devv Agent infers your question before selecting appropriate solutions. - GitHub mode(currently in beta): Links directly with your own GitHub repositories allowing inquiries about specific codebases.
- Clean & intuitive UI/UX design.
- Currently only available as web version but Chrome extension & VSCode plugin planned soon!
Technical details regarding how we build our Index:
- Documents section involves crawling most documentation sources using scripts inspired by devdocs project’s crawler logic then slicing them up according function/symbol dimensions before embedding into vector databases;
- Codes require special treatment beyond just embeddings alone hence why custom parsers were developed per language type extracting logical structures within repos such as architectural layouts calling relationships between functions definitions etc., semantically processed via LMM;
- Web searches combine both selfmade indices targeting developer niches alongside traditional API based methods. We crawled relevant sites including blogs forums tech news outlets etc..
For the Agent Mode, we have actually developed a multi-agent framework. It first categorizes the user's query and then selects different agents based on these categories to address the issues. These various agents employ different models and solution steps.
Future Plans:
- Build a more comprehensive index that includes internal context (The Devv for Teams version will support indexing team repositories, documents, issue trackers for Q&A)
- Fully localized: All of the above technologies can be executed locally, ensuring privacy and security through complete localization.
Devv is still in its very early stages and can be used without logging in. We welcome everyone to experience it and provide feedback on any issues; we will continue to iterate on it.
[1]: https://arxiv.org/abs/2005.11401
https://devv.ai/search?threadId=dl3rtxmcsruo
EDIT: The syntax came from a language proposal in a github issue from 8 years ago, so I guess it's not fully hallucinated. But still not the best choice of what source to use.