Rear is a really interesting project with admirable goals. I believe this is just the beginning, but you have already done a great job!
I have been working on my note-taking application (https://github.com/dvorka/mindforger) for some time and wanted to go in the same direction. However, I gave up (for now). I used ggerganov/llama.cpp to host LLM models locally on a CPU-only machine with 32GB RAM, and used them for both RAG and note-taking use cases (like https://www.mindforger.com/index-200.html#llm). However, it did not work well for me - the performance was poor (high hardware utilization, long response times, failures, and crashes) and the actual responses were rarely useful (off-topic and impractical responses, hallucinations). I tried llama-2 7B with 4b quantization and a couple of similar models. Although I'm not happy about it, I switched to an online commercial LLM because it performs really well in terms of response quality, speed, and affordability. I frequently use the integrated LLM in my note-taking app as it can be used for many things.
Anyway, Reor "only" uses the locally hosted LLM in the generation phase of the RAG, which is a nicely constraint use case. I believe that a really lightweight LLM - I'm thinking about a tiny base model fine-tuned for summarization - could be the way to go (fast, non-hallucinating). I'm really curious to know if you have any suggestions or if you will have any in the future!
As for the vector DB, considering the resource-related problems I mentioned earlier, I was thinking about something similar to facebookresearch/faiss, which, unlike LanceDB, is not a fully-fledged vector DB. Have you made any experiments with similarity search projects or vector DBs? I would be interested in the trade-offs similar to small/large/hosted LLMs.
Overall, I think that both RAG with my personal notes as a corpus and a locally hosted generic purpose LLM for the use cases I mentioned above can take personal note-taking apps to a new level. This is the way! ;)
I have been working on my note-taking application (https://github.com/dvorka/mindforger) for some time and wanted to go in the same direction. However, I gave up (for now). I used ggerganov/llama.cpp to host LLM models locally on a CPU-only machine with 32GB RAM, and used them for both RAG and note-taking use cases (like https://www.mindforger.com/index-200.html#llm). However, it did not work well for me - the performance was poor (high hardware utilization, long response times, failures, and crashes) and the actual responses were rarely useful (off-topic and impractical responses, hallucinations). I tried llama-2 7B with 4b quantization and a couple of similar models. Although I'm not happy about it, I switched to an online commercial LLM because it performs really well in terms of response quality, speed, and affordability. I frequently use the integrated LLM in my note-taking app as it can be used for many things.
Anyway, Reor "only" uses the locally hosted LLM in the generation phase of the RAG, which is a nicely constraint use case. I believe that a really lightweight LLM - I'm thinking about a tiny base model fine-tuned for summarization - could be the way to go (fast, non-hallucinating). I'm really curious to know if you have any suggestions or if you will have any in the future!
As for the vector DB, considering the resource-related problems I mentioned earlier, I was thinking about something similar to facebookresearch/faiss, which, unlike LanceDB, is not a fully-fledged vector DB. Have you made any experiments with similarity search projects or vector DBs? I would be interested in the trade-offs similar to small/large/hosted LLMs.
Overall, I think that both RAG with my personal notes as a corpus and a locally hosted generic purpose LLM for the use cases I mentioned above can take personal note-taking apps to a new level. This is the way! ;)
Good luck with your project!