Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I made an app to use local AI as daily driver (recurse.chat)
637 points by xyc 6 months ago | hide | past | favorite | 237 comments
Hi Hackers,

Excited to share a macOS app I've been working on: https://recurse.chat/ for chatting with local AI. While it's amazing that you can run AI models locally quite easily these days (through llama.cpp / llamafile / ollama / llm CLI etc.), I missed feature complete chat interfaces. Tools like LMStudio are super powerful, but there's a learning curve to it. I'd like to hit a middleground of simplicity and customizability for advanced users.

Here's what separates RecurseChat out from similar apps:

- UX designed for you to use local AI as a daily driver. Zero config setup, supports multi-modal chat, chat with multiple models in the same session, link your own gguf file.

- Import ChatGPT history. This is probably my favorite feature. Import your hundreds of messages, search them and even continuing previous chats using local AI offline.

- Full text search. Search for hundreds of messages and see results instantly.

- Private and capable of working completely offline.

Thanks to the amazing work of @ggerganov on llama.cpp which made this possible. If there is anything that you wish to exist in an ideal local AI app, I'd love to hear about it.




> Thanks to the amazing work of @ggerganov on llama.cpp which made this possible. If there is anything that you wish to exist in an ideal local AI app, I'd love to hear about it.

The app looks great! Likewise, if you have any requests or ideas for improving llama.cpp, please don't hesitate to open an issue / discussion in the repo


Oh wow it's the goat himself, love how your work has democratized AI. Thanks so much for the encouragement. I'm mostly a UI/app engineer, total beginner when it comes to llama.cpp, would love to learn more and help along the way.


Wow I've been following your work for a while, incredible stuff! Keep up the hard work, I check llama.cpp's commits and PRs very frequently and always see something interesting in the works (the alternative quantization methods and Flash Attention have been interesting).


Did not expect to see the Georgi Gerganov here :) How is GGML going?

Поздрави!


So far is going great! Good community, having fun. Many ideas to explore :-)


Nothing to add except that your work is tremendous


> Full Text Search. Blazingly fast search over thousands of messages.

Natural language processing has come full circle and just reinvented Ctrl+F.

I had to double check that a regular '90s search function was actually the thing being advertised here, and sure enough, there is a gif demonstrating exactly that.


Ctrl+F only gets you so far. It doesn't allow you to perform semantic searches, for example. If you don't happen to know a unique word (or set of words) to search for, you're out of luck.

Just the other day, I was able to find a song by typing the phonetic pronunciation (well, as best I could) into ChatGPT, and it knew which song I was talking about right away. No way a regular search engine would've helped me there.


No. Your own data only gets you so far. And this is exactly the issue. No local model will make sense because the dataset its given is so small compared to what you are referring to - chatgpt.

It's useless locally.


I’ve been using 7b models to work with large text volumes (like entire books) with nothing short of phenomenal results. It has cut the time I need to accomplish many tasks by >90% and often offers insights I might easily have missed. My methodology requires a bit of time and compute to prepare a new subject matter expert system, but the results are absolutely worth it.


Would you care to share how you technically are doing this ? Please


If you only need to query your own data, and can't upload that data to ChatGPT for compliance/security reasons, I think local LLMs are far from useless.


I have been downvoted here which is fine. But no one took the time to elaborate on where i am wrong ? Tell me where i am wrong please.


Yes but this is missing from most chatbot UIs (ChatGPT, Gemini, Claude, etc. ) so is therefore very useful. Machato has it but it is very laggy.


I'm a big fan of ctrl+f, but semantic search is a life saver that conventional search simply cannot compare to.


Yeah, I think the call out here is specifically because you the ChatGPT interface doesn't have a search feature (on web). Interestingly, on their iOS app, you can search.

I often find myself opening the app on my phone if I want to find a previous conversation, even if I'm at my desk.


The search feature on ChatGPT for Android only works for a tiny number of recent chats.


i'm more interested how to perform semantic search over messages efficiently. Like to receive the reference to the og message. Is it creating an llm response with the potential content and how does it find the og message? is it performing tf/idf+cosine search after that or how?


and yet ChatGPT doesn't support it.


I will totally pay for something like this if it answers from my local documents, bookmarks, browser history etc.


There are already several RAG chat open source solutions available. Two that immediately come to mind are:

Danswer

https://github.com/danswer-ai/danswer

Khoj

https://github.com/khoj-ai/khoj


Stupid question but what does RAG stand for?


Retrieval augmented generation. In short you use an LLM to classify your documents (or chunks from them) up front. Then when you want to ask the LLM a question you pull the most relevant ones back to feed it as additional context.


I dont get it. To my understanding it takes huge amounts of data to build any any form of RAG. Simply because it enlarges the statistical model you later prompt. If the model is not big enough how would you expect it to answer you in a non qualifying matter ? It simply can't.

So I don't really buy it and I have yet to see it work better than any rdbms search index.

Tell me I am wrong, I would like to see a local model based on my own docs being able to answer me quality answers based on quality prompts.


RAG doesn't require much data or involve any training, it is a fancy name for "automatically paste some relevant context into the prompt"

Basically if you have a database of three emails and ask when Biff wanted to meet for lunch, a RAG system would select the most relevant email based on any kind of search - embeddings are most fashionable, and create a prompt like

"""Given this document: <your email>, answer the question "When does Biff want to meet for lunch?"""


That's not how RAG works. What you're describing is something closer to prompt optimization.

Sibling comment from discordance has a more accurate description of RAG. There's a longer description from Nvidia here: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-ge...


Right, you read something nebulous about how "the LLM combines the retrieved words and its own response to the query into a final answer it presents to the user", and you think there is some magic going on, and then you click one link deeper and read at https://ai.meta.com/blog/retrieval-augmented-generation-stre... :

> Given the prompt “When did the first mammal appear on Earth?” for instance, RAG might surface documents for “Mammal,” “History of Earth,” and “Evolution of Mammals.” These supporting documents are then concatenated as context with the original input and fed to the [...] model

Finding the relevant context to put in the prompt is a search problem, nearest neighbour search on embeddings is one basic way to do it but the singular focus on "vector databases" is a bit of hype phenomenon IMO - a real world product should factor a lot more than just pure textual content into the relevancy score. Or is your personal AI assistant going to treat emails from yesterday as equally relevant as emails from a year ago?


Legit explanation, that's how it works AFAIK.


RAG:

1. First you create embeddings from your documents

2. Store that in a vector db

3. Ask what the user wants and do a search in the vector db (cosine similarity etc)

4. Feed the relevant search results to your LLM and do the usual LLM stuff with the returned embeddings and chunks of the documents


Although RAG is often implemented via vector databases to find 'relevant' content, I'm not sure that's a necessary component. I've been doing what I call RAG by finding 'relevant' content for the current prompt context via a number of different algorithms that don't use vectors.

Would you define RAG only as 'prompt optimisation that involves embeddings'?


Sure thing, your RAG approach sounds intriguing, especially since you're sidestepping vector databases. But doesn't the input context length cap affect it? (chatgpt plus at 32K [0] or gpt4 via open ai at 128K [1]) Seems like those cases would be pretty rare though.

[0]: https://openai.com/chatgpt/pricing#:~:text=8K-,32K,-32K

[1]: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turb...


Yes, context window is a limiting factor, but that's true however you identify the content to augment generation.


You're misunderstanding. Imagine your query is matched against chunks of text from the database, where the relevance of information is evaluated for the window each time it slides. Then collecting the n most relevant chunks, these are included in the prompt so then the llm can provide answers from source documents verbatim. This is useful for cases where precise and exact answers are needed. For example, searching the docs for some package for the right api to call. You don't mant a name that's close to right it has to be correct to the character.


Ahh ok I see. It's basically what MS CoPilot 365 does too, with its "Grounding" step.


Yes.


This. There was a post in HN last week, iirc, referring to just such a solution called ZenFetch (?). I would have adopted it in a heartbeat but they don’t currently have a means of exporting the source data you feed to it (should you elect it as your sole means of bookmarking, etc)


Hey there,

This is Gabe, the founder of Zenfetch. Thanks for sharing. We're putting together an export option where you can download all your saved data as a CSV and should get that out by end of week.


Seems like this would be a good tool to build lessons on - if you could share a "class" and export a link for others to then copy the class, and expand on the lesson/class/topic into their own AI. but as a separate "class" and not fully integrated to my regular history blob?

I want the ability to search all my downloaded files and organize them based on context within. Have it create a category table, and allow me to "put all pics of my cat in this folder, and upload them to a gallery on imgur."


We're working on the ability to share folders of your knowledge so that others can search/chat across them.

We've been thinking of this as a "subscription" to the creator's folder. Similar to how you might subscribe to a Spotify playlist


Consider using tar files for this. Lots of tooling (versioning, hashing, storage) around this already, and docker layers comes to mind.


Or aN RSS?


Yes it would be the next big focus on this. Personal data connectivity is what I see where local AI would excel - despite model power differences.


I have doubts about that. Most personal data actually lives in the cloud these days. If you need your Gmail emails, you'll need to use their API which is guarded behind $50k certification fee or so. I think there is a simpler version for personal use, but you still need to get the API key. Who's going to teach their mom about API keys? So I think for a lot of these data sources you'll end up with enterprise AIs integrating them first for a seamless experience.


Why wouldn't you be able to use IMAP over the gmail api? IMAP returns the text and headers of all your emails, which is what you'd want the LLM to ingest anyway.


Seconding a sibling question: What $50k API fee? To access your gmail? I've been using gmail since 2008 or so without ever touching their web/app interface or getting an API key. You just use it as an IMAP server.


To use Google's sensitive APIs in production you have to certify your product and that costs tens of thousands. To be honest, didn't think about imap at first, but it looks like that could be getting tougher soon too https://support.google.com/a/answer/14114704?hl=en. Soon they will require oAuth for imap and with oAuth you'll need the certification: https://developers.google.com/gmail/imap/xoauth2-protocol. If it's for personal use, you might be able to get by with just with some warnings in the login flow but it won't be easy to get oAuth flow setup in the first place.


Yeah, Thunderbird integrated oAuth in the last few releases, mainly to keep up with the Gmail and Hotmail requirements. Made it very user-friendly to set up in the GUI right within T-bird. I don't see this being a major obstacle.

I'm not sure I can imagine a scenario in production where Google would, or should, allow API access to individual gmail accounts. What's that for? So you can read all your employees' mail without running your own email server?


I'm not sure what you mean.

> You will no longer use a password for access (with the exception of app passwords)

I'm not seeing anywhere that I'd need to pay money to use OAuth via an app like Thunderbird or another email client. That app would either need to support using OAuth to let the user auth and get credentials, or use an app password.


Right, but Thunderbird had to pay up and set themselves as a middleman to allow this. My point is that local LLMs might not have that many advantages for personal data because most of that data doesn't live locally on your computer, to begin with. I guess an argument could be made that running them locally prevents an AI provider from gobbling up ALL of your data. On the other hand, Google already has most of our my data: emails, youtube, gmail, etc.


I think this is a good take. While there's big enough niche for personal data locally, I'd love if there's a way to solve for email/cloud data requiring API keys.


Ideally, though, a sufficiently smart LLM shouldn't need API access. It could navigate to your social media login page, supply your credentials, and scrape what it sees. Better yet, it should just reverse-engineer the API ;)


What?

I manage both gmail and protonmail via thunderbird - where I have better search and sort using IMAP.


Good to know there's a market for that. Currently building out something. Integrating from numerous sources, processing and then utilizing those.

nice.


Yeah, we’re getting closer to “Her”


I would even let it have longer processing times for queries to apply against each document in my system, allow it to specialize/train itself on a daily basis…

Use all the resources you want if you save me brainpower


Agree, there's a non real-time angle to this.


"give me a summary of the news around this topic each morning for my daily read"

Help me plan for upcoming meetings whereby if I put something in calendar, it will build a little dossier for the event, and include relevant info based on the type of event or meeting, mostly scheduling reminders or prompting you with updates or changes to the event etc.


“filter out baby pictures from my family text threads”



Next version of MacOS will probably have that.


As long as you use Safari for browsing, Notes for note taking, iCloud for mail …


https://news.ycombinator.com/item?id=38787892 ("Show HN: Rem: Remember Everything (open source)") ?

https://github.com/jasonjmcghee/rem


plus one, I would love to configure a folder of markdown/txt(+ eventually images and pdfs) files that this can have access to. Ideally it could RAG over them in a sensible way. Would love to help support this!


Thank you! I'd love to learn more about your use cases. Would you mind sending an email to feedback@recurse.chat or DM me on https://x.com/chxy to get the conversation started?


I use paperless-ngx for that.


Thank you for the work.

Please take this in a nice way: I can't see why I would use this over ChatbotUI+Ollama https://github.com/mckaywrigley/chatbot-ui

Seem the only advantage is having it as MacOS native app and only real distinction is maybe fast import and search - I've yet to try that though.

ChatbotUI (and other similar stuff) are cross-platform, customizable, private, debuggable. I'm easily able to see what it's trying to do.


Not everyone is a dev


HN users keep forgetting that


Dangerously close to the old famous Dropbox comment to the original ShowHN post


Thanks for sharing ChatbotUI. While I'm not an author, I use it extensively and contribute to it. Thanks to the permissive license, I could offer ChatbotUI as a hosted solution with our API keys. https://labs.writingmate.ai.


Hey, i bought it, nice work!

A few things:

* The main thing that makes ChatGPTs ui useful to me is the ability to change any of my prompts in the conversation & it will then go back to that part of the converation and regenerate, while removing the rest of the conversation after that point.

Such a chat ui is not usable for me without this feature.

* The feedback button does nothing for me, just changes focus to chrome.

* The LLaVA model tells me that it can not generate images since it is a text based AI model. My prompts were "Generate an image of ..."


> * The main thing that makes ChatGPTs ui useful to me is the ability to change any of my prompts in the conversation & it will then go back to that part of the converation and regenerate, while removing the rest of the conversation after that point.

Agreed, but what I would also really like (from this and ChatGPT) would be branching: take a conversation in two different ways from some point and retain the seperate and shared history.

I'm not sure what the UI should be. Threads? (like mail or Usenet)


ChatGPT does this. You just click an arrow and it will show you other branches.


I have ChatGPT4, I have no idea what arrow you are talking about. Could you be more specific? I see now arrow on any of my previous messages or current ones.


By George, ItsMattyG is right! After editing a question (with the "stylus"/pen icon), the revision number counter that appears (e.g. "1 / 2") has arrows next to it that allow forward and backward navigation through the new branches.

This was surprisingly undiscoverable. I wonder if it's documented. I couldn't find anything from a quick look at help.openai.com .


Careful what you trust with help.openai.com. You used to be able to share conversations, now it's login walled when you share, and the docs don't reflect this (if someone can recommend a frontend that has this functionality, for quick sharing of conversations with others via a link, taking recommendations, thank you in advance).


I have a very simple UI with threading. It's really unpolished though.

https://eimi.cns.wtf/

https://github.com/python273/eimi


Nice suggestion! Threading / branching won't be too crazy to support. I'll explore ChatGPT style branch or threads and see what'll work better.


1000 upvotes for you. My brain can't compute why someone hasn't made this, along with embeddings-based search that doesn't suck.


They did make it, in 2021. https://generative.ink/posts/loom-interface-to-the-multivers... (click through to the GitHub repo and check the commit history, the bulk of commits is at least 3 years old)


I bet UI and UX innovation will follow, but model quality is the most important thing.

If I were OpenAI, I would 95% of resources on ChatGPT5, and 5% into UX.

Once the dust settles, if humanity still exists, and human customers are still economically relevant, AI companies will shift more resources to UX.


I understand your point, but my take is that when we talk about AI and its impact, we're talking about the entire system: the model, and what is buildable with the model. To me, the gains available from doing innovative stuff w/ what we're colloquially calling "UI" exceeds, by a bunch, what the next model will unlock. But perhaps the main issue is that whatever this amazing UI might provide, it's not protectable in the way the model is. So maybe that's the answer.


Thank you for the support and the valuable feedback! Sorry about the response time, I haven't expected the incoming volume of requests.

* For changing prompt in the middle - I'll take a crack at it this week. It's on top of my post launch list.

* Feedback button: Thanks for reporting this. The button was supposed to open default email client to email feedback@recurse.chat

* LLaVA model: I'll add more documentation. You are right Llava could not generate images. It can only describe images (similar to GPT-4v). For image generation, it's not supported in the app. While I don't have immediate plans for image generation, check out these projects for local image generation.

- https://diffusionbee.com/

- https://github.com/comfyanonymous/ComfyUI

- https://github.com/AUTOMATIC1111/stable-diffusion-webui


> The LLaVA model tells me that it can not generate images since it is a text based AI model.

Because it can't generate images, it can only describe images provided by the user.


So there are a few questions that leap out at me:

  * What are you using for image generation? Is that local as well (stable diffusion?) Does it have integrated prompt generation? 


  * You mention the ability to import ChatGPT history, are you able to import other documents?

  * How many "agent" style capacities does it have? Can it search the web? use other APIs? Prompt itself? 

  * Does it have a plugin framework? you mention that it is "customizable" but that can mean almost anything. 

  * What is the license? what assurances do users have that their usage is private? I mean, we all know how many "local" apps exfiltrate a ton of data.


> What are you using for image generation?

It doesn't look like it supports image generation unfortunately. If it did then I would definitely adopt this as my daily driver.


Cool, instant buy for me. A few little suggestions:

- Make the system font (San Francisco) an option for the UI. Maybe even SF Mono as an option as well?

- A little more help about which model to use for beginners would be nice. Maybe just an intro screen telling you how to get going.

- Would be great if Command-comma opened settings, like most Mac apps.

- Would be great if clicking web links opened Safari (or my preferred browser), rather than a small window that loads nothing!


Thank you! and thanks so much for the feature suggestions:

- Make the system font (San Francisco) an option for the UI. Maybe even SF Mono as an option as well?

Reasonable request! Won't be too hard to add

- A little more help about which model to use for beginners would be nice. Maybe just an intro screen telling you how to get going.

Yes better onboarding wizard would definitely make this easier for beginners. Don't have much capacity right now, but I'll keep this in mind.

- Would be great if Command-comma opened settings, like most Mac apps.

Nice suggestion. Will probably get to this when I add some keyboard shortcuts like new chat / search etc.

- Would be great if clicking web links opened Safari (or my preferred browser), rather than a small window that loads nothing!

Ah that's odd, it's supposed to open the link. which link do you have if you don't mind sharing? (feel free to email support@recurse.chat)


What are the MacOS and hardware requirements? How does it perform on a slightly older model, lower powered Mac? I wish I could test this to see how it would perform, and while it's only $10, I don't want to spend that just to realize it won't work on my older, underpowered Mac mini.


Good question, I'll put some system requirements on the website. It only supports mac with Apple Silicon now, if that's helpful.


Instant buy, great work and the price point is exactly right. Good luck!


Appreciate your support. Thank you so much!


Possibly a strange question, but do you have plans to add online models to the app? Local models just aren't at the same level, but I would certainly appreciate a consistent chat interface that lets me switch between GPT/Claude/local models.


You could try out Prompta [1], which I made for this use case. Initially created to use OpenAI as a desktop app, but can use any compatible API including Ollama if you want local completions.

[1]: https://github.com/iansinnott/prompta


This one doesn't seem to support system prompts, which are absolutely essential for getting useful output from LLMs.


You can update the system prompt in the settings. Admittedly this is not mentioned in the README, but is customizable.


> the system prompt

There isn't a singular system prompt. It really does matter!

Copy the OpenAI playground, you'll thank yourself later


Fair point, and it's not implemented that way currently. It's more like "custom instructions" but thanks for pointing that out. I haven't used multiple system prompts in the OpenAI playground either, so I hadn't given it much thought.


You use multiple system prompts in a single chat? What for?


I've run into the same problem with deploying Gemini locally, it does not seem to support System Prompts. I've cheated around this by auto-prepending the system prompt to the user prompt, and then deleting it from the user-displayed prompt again.


Can you speak more to this? I get useful output from LLMs all the time, but never use system prompts. What am I missing?


Sure, I use one system prompt template to make ChatGPT be more concise. Compare these two: https://sharegpt.com/c/fEZKMIy vs https://sharegpt.com/c/S2lyYON

I use similar ones to get ChatGPT to be more thorough or diligent as well. From my limited experience with local models, this type of system prompting is even more important than with ChatGPT 4.


Is there a difference in using a system prompt and just pasting the "system prompt" part at the beginning of your message?


Haven't tested, but having it built-in is more convenient, and convenience is why I'm using these tools in the first place (as a replacement for StackOverflow, for example).


Not strange at all! It's a very valid ask. The focus is local AI, but GPT-3.5/GPT-4 are actually included in the app (bring your own key), although customization is limited. Planning to expose some more customizability there including API base urls / model names.


Shameless plug: if you need multiple AI Service Provider, give BoltAI[0] a try. It’s native (not Electron), and supports multiple services: OpenAI, Azure OpenAI, OpenRouter, Mistral, Ollama…

It also allows you to interact with LLMs via multiple different interfaces: Chat UI, a context-aware called AI Command and an Inline mode.

[0]: https://boltai.com



...how did you highlight a specific sentence like that?


Looks like a Chromium-specific feature: https://web.dev/articles/text-fragments

Pretty cool. Doesn't work on Firefox.


It just worked on Safari on iOS. That’s pretty impressive.



I read the website for 30 seconds and instantly bought it.

It's clean, easy to use, and works really well! Easy local server hosting was cool, too. I've used the other LLM apps, and this feels like those, but simplified. It just feels good to use. I like it a lot!

I'm gonna test drive it for a while, and if I keep using it regularly, I'll definitely be sending in some feedback. Other users have made a lot of really great recommendations already, I'm excited to see how this evolves!


Thanks so much for the kind words and giving it a spin!

Feel free to send feedback, issues, feature suggestion as you use it more, I'm all ears. My twitter DM is also open: https://x.com/chxy.


Any chance to see it available on other operating systems as well?


Unfortunately not now. If you are interested in email updates: https://tally.so/r/wzDvLM


There's another one someone made for blind users like themselves and me, called Vollama (they use a mac, so VoiceOver + Llama). It's really good. I haven't tested many others for accessibility, but it has RAG and uses Ollama as backend, so works very well for me.

https://github.com/chigkim/VOLlama/


It's very nice that there exists something like that. I am an author of one of the similar apps [1] someone listed in a different thread. I was hoping I could get in touch with someone like you who could give me some feedback on how to make my app more accessible for users like you. I really want to it be an "LLM for all" kind of app but despite my best efforts and intention, I suck at it. Any chance of getting in touch with you and get some feedback? Only if you want and have time, no pressure at all.

[1] https://msty.app


Sure, I'll probably join the discord tomorrow morning, but a few notes:

* For apps like this, using live regions to speak updates may be helpful. either that or change the buttons, like from "download local AI" to "configuring." Maybe a live region would be best for that one since sighted people would probably be looking near the bottom for the status bar, but anyway... * Using live regions for chats is pretty important, because otherwise we don't know when a message is ready to read, and it makes reading those messages much simpler. The user types the message, presses Enter, and the screen reader reads the message to them. So, making a live region, and then sending the finished message, or a finished part of a message, to that live region would be really helpful. * Now on to the UI. At the top, we have "index /text-chat-sessions". I guess that should just say "chats"? Below that, we have a list, with a button saying the same thing. After that list with one item, is a button that says "index /local-ai". That should probably just be "local AI". Afterwards, there is "index /settings", which should just be "settings." Then, there is an unlabeled button. I'm guessing this is styled to look like a menu bar, across the top of the window, so it'd be the item on the right side. Now, there's a button below that that says "New Chat^N". I, being a technical user, am pretty sure the "^N" means "Control + N", but almost no one else knows that. So, maybe change that text label. Between that and the Recent Chats menu button are two unlabeled buttons. I'm not sure why a region landmark was used for the recent chats list, but after the chat name "hello" in this case, where I can rename the chat, there is an unlabeled button. The button after the model chooser is unlabeled as well. After the user input in the conversation, there are three unlabeled buttons. After the response, there is a menu button with (oh, that's cool) items to transform the response into bullets, a table, ETC. but that menu button was unlabeled so I had to open it to see what's inside. After that, all other buttons, like for adding instructions to refine this message, are also unlabeled.

So, live regions for speaking chat messages and state changes like "loading" or "ready" or whatever (keep them short), and label controls, and you should be good to go.

Live regions: https://developer.mozilla.org/en-US/docs/Web/Accessibility/A...


Wow! This is already very helpful and was the kind of feedback I was looking for. Thank you!


Hi, I just use msty. Could it use an already downloaded gguf file?


Not right now but that’s something we plan to support soon. Supporting Ollama downloaded models is getting released either today or tomorrow, gguf support might go into the next release. Would love to chat with you to learn more about your use case. Mind saying hi on our Discord?


Hey. I'm sorry about your condition. I feel I'm approaching blindness eventually, this is very random, but perhaps you could share any resources I could learn to prepare for this so I could continue using the web when/if it happens.


I'll try. To get things started, if you have an iPhone, check out AppleVis:

https://applevis.com/

If you have Android:

https://blindandroidusers.com/

I believe Hadley is still a good resource: https://hadleyhelps.org/welcome-hadley

I hope this helps get you started.


Honest question - can it be used for programming? Or anyone maybe can recommend local-first development LLM which would take in all project (Python / Angular) and write code based on full repo, not only the active window as with Copilot or Jetbrains AI


Check out the continue dev plugin (available for VS Code and Jetbrains). You can attach it to OpenAI or local models and it can consider files in your codebase. It has a @Codebase keyword, but so far I get better results in specifically pointing to the needed files.


Have you tried using Copilot's @workspace command in the chat?


looks promising, but after looking at the website I'm yearning to learn more about it! How does it compare to alternatives? What's the performance like? There isn't enough to push me to stop using ChatGPT and use this instead. Offline is good, but to get users at scale there has to be a compelling reason to shift. I don't think that offline capabilities are going to be enough to get significant number of users.

Another tip, I try out a new chat interface to LLMs almost every week and they're free to use initially. There isn't a compelling reason for me to spend $10 from the get to for a use case that I'm not sure about yet.


The compelling reason to shift to local/decentralized AI is that all of compute will soon be AI and that means your entire existence will go into it. The question you should ask yourself is do you want everything about you being handled by Sam Altman, Google, Microsoft, etc? Do you want all of your compute dependent on them always being up and do you want to trust their security team with your life? Do you want to still be using closed/centralized/hosted AI when truly open AI surpasses all of them in performance and capability. If you have children or family, do you want them putting their entire lives in the hands of those folks.

Decentralized AI will eventually become p2p and swarmed and then the true power of agents and collaboration will soar via AI.

Anyway, excuse the soap box, but there are zero valid reasons for supporting and paying centralized keepers of AI that rarely share, collaborate or give back to the community that made what they have possible.


> when truly open AI surpasses all of them in performance and capability.

Is this true? I've tried llama last year and it was not very helpful. GPT4 is already full of problems and I have to keep circumventing them, so using something less capable doesn't get me too excited.


Maybe this isn't for everyone, just the people who place a high value on privacy.


If your ultimate goal is privacy, then you should only be looking at open source chat UI front ends:

https://github.com/mckaywrigley/chatbot-ui

https://github.com/oobabooga/text-generation-webui

https://github.com/mudler/LocalAI

And then connecting them to off-line models servers:

- Ollama

- llama.cpp

And you should avoid closed source frontends:

- Recurse

- LM Studio

And closed source models

- ChatGPT

- Gemini


Are you implying Claude is an open source model?


I don't think the list was meant to be exhaustive.


But how can I guarantee this app is private?

I'm assuming I cannot block internet access to the app because it needs to verify App Store entitlement.


I mean, ok, then how do you distinguish yourself from LM Studio (Free)


Looks great! Does it support different sized models, i.e. can I run llama 70B and 7B, and is there a way to specify which model to chat with? Are there plans to allow users to ingest their own models through this UI?


If you have a gguf file you can link it. For ingesting new models - I'm thinking about adding some CRUD UIs to it, but I'd like to keep a very small set of default models.


thanks, its a great project


Love this! Just purchased. I am constantly harping on decentralized AI and love seeing power in simplicity.

Are you on Twitter, Threads, Farcast? Would like to tag you when I add you to my decentralized AI threads.


Thank you so much for the support! Simplicity is power indeed. I'm on twitter: https://x.com/chxy


Found your Twitter account in a previous post. Just tagged you.


Awesome, thanks for the tag!


What's your farcaster?


For an app like this, I would really like a spoken interface. Any possibility of adding text-to-speech and speech-to-text so that users can not only type but also talk with it?


yes I wish it could talk. It's after other priorities though, but I might try something experimental.


There are a lot of tools listed in this thread, but I am not seeing the thing I want which is:

- Ability to use local and OpenAI models (ideally it has defaults for common local models)

- Chat UX

- Where I can point it to my JS/TS codebase

- It indexes the whole thing including dependencies for RAG. Ideally indexing has some form of awareness of model context length.

- I can use it for codegen / debugging.

The closest I have found has been aider, but it's python and I get into general python hell every time I try and run it.

Would appreciate a suggestion.


You will sell more if instead of telling us it's for "chatting with local AI" you tell us what we can accomplish by chatting with local AI. I don't need to chat, I need to get certain tasks done. What tasks can it do? (Don't answer me, put it on your landing page and app store listing)


Wow, I did not expect at all this will end up on the front page. Thank you for all the enthusiasm, I'll try to get to more questions later today but if there's something I missed my X/twitter DM is open: https://x.com/chxy


It seems "local" is all you need :)


You can’t buy these apps from the Apple store without providing identity to Apple in the form of a phone number (required for an Apple ID) and linking it to a hardware serial number (the app store app transmits this).

It would be nice to be able to buy the app directly from you, instead of putting a surveillance company in the loop. I don’t use an Apple ID on a macintosh.

(I would like to avoid using one on an iPhone/iPad/Vision Pro/AppleTV, but it is impossible to install apps on those without one. Please do not bring this terrible circumstance to be the default on macs, too.)


It would be cool to have the option to use the OpenAI API as well in the same interface. http://jan.ai does this, so that's what I'm using at the moment.


I've found great utility with `llm` https://llm.datasette.io, a CLI to interact with LLMs. It has plugins for remote and local models.


Good to know. I've learned lots of things from Simon Willison's blog (datasette's author), so can't imagine llm being unuseful.


Not a bit of open code while I'm 100% sure they use some that require it. If you"re using AI + Your data without insight on how it's used you're a fool. 2 cents


Congrats! Plans on Windows support?


Thanks! Sorry no immediate plan. People have recommended Chat with RTX so it might be worth checking out. https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generat...


It looks amazing, OP! I'm sad I'm missing out as a Windows user.


You can try https://curiosity.ai, supports Windows and macOS


Is the haiku example a real Haiku?

I think it gives you 4, 7, and 9 syllables in the lines.

I bet you can coax it to give you a better example, if you tinker a bit.


This is awesome. I currently use Ollama with OpenWebUI but am a big fan of native apps so this is right up my alley.


It looks like an Electron app, and not a native app.

https://imgur.com/a/pz0kzJ1


Thank you!


Won't work on my Intel Macbook :-(


This is disappointing. Anything similar available for Intel Macs?


The app is great but honestly I'm impressed with the home page! Can you go into more details on how you made the home page? What did you use to make the screenshots, and are you using any tools to generate the HTML/CSS/etc?


Thanks! honestly it's a quick hack together compared to the app. screenshots are from screen.studio. website is built with https://astro.build


Seriously? It grinds my phone to a near halt just trying to scroll from top to bottom. Worse in Firefox but still pretty bad in chrome.


Interesting. I was using my PC to view it and it was fast and beautiful.


how big is the local model? what is the Mac spec requirement? I don't want to download and find out it won't work in my computer. It seems like the first question everyone would ask and should be addressed on the website.


Appreciate the feedback! It works on mac with Apple Silicon only. I'll put some system requirements on the website.


It uses ollama which is based on llama.cpp, and adds a model library with dozens of models in all quant sizes.


no this doesn't use ollama, just based on llama.cpp.


I want something that starts as a simple manager for my reminders, something that tells me what to do next. And then, as features are being added, grows into a full-blown personal assistant that can book flights for me.


This looks fantastic on macos. I like the project.

What does this have that is better than https://github.com/open-webui/open-webui ?


Without Apple Shortcuts support I can't pay for this. I get pretty much the same experience from GPT4All. Hoping you add support CLI, Shortcuts or something along those lines.


Thanks for the suggestion! I play with Apple Shortcuts sometimes. It's an exemplary example of how easy end user programming could be. Will keep this in mind.


The headline had me thinking you had a DIY self-driving car for a moment there. Didn't initially register that this was just the common metaphor. Looks like a great app.


How different is this compared to Jan.ai for example?


as i understand jan.ai is more focused on enterprise / platform, while I'd see where recursechat would go is more like "obsidian.md" but as your personal AI.


Obsidian has add-ons which do much of this.


People are treating Obsidian like it's the next Emacs


Hey! This is awesome! How hard would it be to plug it into something like Raindrop.io (bookmark manager) to train on all bookmarks collected?


haven't tried Raindrop.io, looks neat! Saw some other posts mentioning bookmarks as well. I'll keep this in thought, but will have to try it out first to find out.


Appreciate it, thank you.


Out of curiosity – how is this app built?:-)

There is a demo clip with a vertical scroll bar which does not fade out as it would do in a native mac app:)


Scroll bars don't fade out if you're using a mouse (as opposed to just a trackpad) or if you've set Mac OS Settings > Appearance > Show scroll bars to "Always".


I see! I’ve not used mouse on a mac:-o

Anyway the UI looks not mac native. I’m interested what it is:-)


Yeah I am curious what the app is built with. I saw someone mention it's using Electron, so that's a start.



No iPhone app? Assuming it looks to connect to a local server or are you actually downloading the llms local to the device?


Will this work on an M1 Mac Book Air? Looking for an offline solution like this but wary of hardware requirements.


This looks interesting -- might implement it. I'm curious how to ensure that it is local only?


I'll give it a shot. Appreciate the effort on keeping it local.


I am very glad to see that kind of app. Well done!


Will it work fine on Macbook Air M2 16GB ?


I wonder how much space it takes.


Any censorship?

(Can't try MacOS Apps)


any plans on supporting ollama integration?


Sadly I can't try this because I'm on Windows or Linux.

Was testing apps like this if anyone is interested:

Best / Easy to use:

- https://lmstudio.ai

- https://msty.app

- https://jan.ai

More complex / Unpolished UI:

- https://gpt4all.io

- https://pinokio.computer

- https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generat...

- https://github.com/LostRuins/koboldcpp

Misc:

- https://faraday.dev (AI Characters):

No UI / Command line (not for me):

- https://ollama.com

- https://privategpt.dev

- https://serge.chat

- https://github.com/Mozilla-Ocho/llamafile

Pending to check:

- https://recurse.chat

Feel free to recommend more!


lmstudio is using a dark pattern I really hate. Don't have a Github logo in your webpage if your software is not source available. It just takes to Github to some random config repos they have. This is poor choice in my opinion.


We call that stolen valor.


Since I couldn't find it in your list, I'd like to plug my own macOS (and iOS) app: Private LLM. Unlike almost every other app in the space, it isn't based on llama.cpp (we use mlc-llm) or naive RTN quantized models (we use OmniQuant). Also, the app has deep integrations with macOS and iOS (Shortcuts, Siri, macOS Services, etc).

Incidentally, it currently runs Mixtral 8x7B Instruct[2] and Mistral[3] models faster than any other macOS app. The comparison videos are with Ollama, but it generalizes well to almost every other macOS app that I've seen uses llama.cpp for inference. :)

nb: Mixtral 8x7B Instruct requires an Apple Silicon Mac with at least 32GB of RAM.

[1]: https://privatellm.app/

[2]: https://www.youtube.com/watch?v=CdbxM3rkxtc

[3]: https://www.youtube.com/watch?v=UIKOjE9NJU4


What's the performance like in tokens/s?


You can see ms/token in a tiny font on the top of the screen, once the text generation completes in both the videos I'd linked to. Performance will vary by machine. On my 64GB M2 Mac Studio Max, I get ~47 tokens/s (21.06ms/token) with Mistral Instruct v0.2 and ~33 tokens/s (30.14ms/token) with Mixtral Instruct v0.1.


Interesting! What's the prompt eval processing speed like compared to llama.cpp and kin?


I haven't run any specific low level benchmarks, lately. But chunked prefilling and tvm auto-tuned Metal kernels from mlc-llm seemed to make a big differenced, the last time I checked. Also, compared to stock mlc-llm, I use a newer version of metal (3.0) and have a few modifications to make models have a slightly smaller memory and disk footprint, also slightly faster execution. Because unlike the mlc-llm folks, I only care about compatibility with Apple platforms. They support so much more than that in their upstream project.


thanks, I'll give it a crack


MacGPT is way handy because of a global keyboard shortcut which opens a spotlight-like prompt. I would love to have a local equivalent


I am the author of Msty app mentioned here. So humbled to see an app that is just about a month old that I mostly wrote for my wife and some friends to begin with (who got overwhelmed with everything that was going in LLM world), on the top of your list. Thank you!


Looks interesting, but can't see what it is doing. Any link to the source code?


One bit of feedback: there's nowhere to put system messages. These can be much more influential than user prompts when it comes to shaping the tone and style of the response.


That's on the top of our list. It got pushed back because we want to support creating a character/profile (basically select a model and apply some defaults including a system prompt). But I feel like that was a mistake tomwait for it. Regardless, it is getting added in the next release (the one after something that is dropping in a day or 2, which is a big release in itself)


1) What are the mac system requirements? Does it need a specific OS version?

2) If you're privacy first, many would feel a lot more comfortable if this was released as an app in the app store so it will be sandboxed. This is important because it's not open source so we have no idea what is happening in the background. Alternatively open source it, which many here have requested.


If you need help for testing the Linux version let me know, I’d be happy to help


I was actually looking for one! What's the best way to reach you? Mind jumping on our Discord so that I can share the installer with you soon?



Add Open-WebUI (used to be Ollama-WebUI)

https://github.com/open-webui/open-webui

a well featured UI with very active team


Try this one: https://uneven-macaw-bef2.hiku.app/app/

It loads the LLM in the browser, using webgpu, so it works offline after the first load, it's also PWA you can install. It should work on chrome > 113 on desktop and chrome > 121 on mobile.


Oh thanks! didn't know there are quite a few ChatGPT local alternatives. I was wondering what users they are targeting. Engineers or average users? I guess average users will likely choose ChatGPT and Perplexity over local apps for more recent knowledge of the world.


Hi. I'm the author of Msty app, 2nd on the list above. You are right about average users likely choosing ChatGPT over local models. My wife was the first and the biggest user of my app. A software engineer by profession and training but she likes to not worry about LLM world and just to use it as a tool that makes you more productive. As soon as she took Msty for a ride, I realized that some users, despite their background, care about online models. This actually led me adding support for online models right away. However, she really likes to make use of the parallel chat feature and uses both Mistral and ChatGPT models to give same prompt and then compare the output and choose the best answer (or sometimes make a hybrid choice). She says that being able to compare multiple outputs like that is a tremendously helpful. But that's the extent of local LLMs for her. So far my effort has been to target a bit higher than the average users while making it approachable for more advanced users as well.


Looks great, though the fact that you have to ignore your anti-virus warning during installation, and the fact that it phones home (to insights.msty.app) directly after launch despite the line in the FAQ on not collecting any data makes me a little skittish.


I’m looking for a ChatGPT client alternative, i.e. I can use my own OpenAI API key in some other client.

Offline isn’t important for me, only that $20 is a lot of money, when I’d wager most months my usage is a lot less. However, I’d still want access to completion, DALL-E, etc.

Would Msty be a good option for me?


Give it a try and see how you feel. "Yes, it will" be a dishonest answer to be completely honest at least at this point. The app has been out for just about a month and I am still working in it. I would love a user like you to give it a try and give me some feedback (please). I am very active on our Discord if you want to get in touch (just mention your HN username and I will wace).


Thank you so much, I’m excited to give this a try in the next few days.


Thanks for the list. Tried Jan just now as it is both easy and open source. It is a bit buggy I think but the concept is ace. The quick install, tells you which models work on your machine, one click download and then a chatgpt style interface. Mistral 7B running on my low spec laptop at 6 token/s making some damn sense is amazing. The bugs are at the inference time. Could be hardware issues though, not sure. YMMV


Do any of these let you dump in a bunch of your own documents to use as a corpus and then query and summarize them ?


Author of Msty here. Not yet but I am already working on the design for it to be added in very near future. I am happy to chat more with you to understand your needs and what you are looking in such apps. Please hop on the Discord if you don't mind :)


Some of my usecases would be summarizing a PDF report, analyzing json/csv data, upload a dev project to write a function or feature or build a UI, rename image files, categorize images, etc



Yes, GPT4All has RAG-like features. Basically you configure some directories and then have it load docs from whatever folders you have enabled for the model you're currently using. I haven't used it a ton, but I have used it to review long documents and it's worked well depending on the model.


Open-WebUI has support for doing that, it works using #tags for each document so you can ask questions about multiple specific documents.


The new one straight from Nvidia does I believe.


Khoj was one of the first 'low-touch' solutions out there I think. It's ok, but still under active development, like all of them really.

https://khoj.dev/


What about https://github.com/open-webui/open-webui ?

Seems to have more features than all of them


We just added local LLM support to our curiosity.ai app too - if anyone wants to try we're looking for feedback there!


Just FYI, llamafile includes a web-based chat UI. It fires up automatically.


have you seen llamafile[0]?

[0] https://github.com/Mozilla-Ocho/llamafile




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: