Show HN: Open-source macOS AI copilot using vision and voice

e28eta · on Dec 12, 2023

Did you find that calling it “OSX” in the prompt worked better than macOS? Or was that just an early choice that you didn’t spend much time on?

I was skimming through the video you posted, and was curious.

https://www.youtube.com/watch?v=1IdCWqTZLyA&t=32s

code link: https://github.com/elfvingralf/macOSpilot-ai-assistant/blob/...

ralfelfving · on Dec 12, 2023

No, this is an oversight by me. To be completely honest, up until the other day I thought it was still called OSX. So the project was literally called cOSXpilot, but at some point I double checked and realize it's been called macOS for many years. Updated the project, but apparently not the code :)

I suspect OSX vs macOS has marginal impact on the outcome :)

e28eta · on Dec 12, 2023

Haha, makes perfect sense, thanks for the reply!

hot_gril · on Dec 12, 2023

Heh. I remember calling it Mac OS back in the day and getting corrected that it's actually OS X, as in "OS ten," and hasn't been called Mac OS since Mac OS 9. Glad Apple finally saw it my way (except it's cased macOS).

jondwillis · on Dec 12, 2023

You should add an option for streaming text as the response instead of TTS. And also maybe text in place of the voice command as well. I have been tire-kicking a similar kind of copilot for awhile, hit me up on discord @jonwilldoit

ralfelfving · on Dec 12, 2023

There's definitely some improvements to shuttling the data between interface<->API, all that was done in a few hours on day 1 and there's a few things I decided to fix later.

I prefer speaking over typing, and I sit alone, so probably won't add a text input anytime soon. But I'll hit you up on Discord in a bit and share notes.

jondwillis · on Dec 12, 2023

Yeah, just some features I could see adding value and not being too hard to implement :)

tomComb · on Dec 12, 2023

> text in place of the voice command as well

That would be great for people with Mac mini who don't have a mic.

ralfelfving · on Dec 12, 2023

Hmmm... what if I added functionality that uses the webcam to read your lips?

Just kidding. Text seem to be the most requested addition, and it wasn't on my own list :) Will see if I add it, should be fairly easy to make it configurable and render a text input window with a button instead of triggering the microphone.

Won't make any promises, but might do it.

danaris · on Dec 13, 2023

People with a Mac mini may not have a webcam, either!

ralfelfving · on Dec 21, 2023

It was a joke.

ralfelfving · on Dec 21, 2023

Added text input instead of voice as an option today.

faceless3 · on Dec 12, 2023

Wrote some similar scripts for my Linux setup, that I bind with XFCE keyboard shortcuts:

https://github.com/samoylenkodmitry/Linux-AI-Assistant-scrip...

F1 - ask ChatGPT API about current clipboard content F5 - same, but opens editor before asking num+ - starts/stops recording microphone, then passes to Whisper (locally installed), copies to clipboard

I find myself rarely using them however.

ralfelfving · on Dec 12, 2023

Nice!

Art9681 · on Dec 12, 2023

Make sure to set OpenAI API spend limits when using this or you'll quickly find yourself learning the difference between the cost of the text models and vision models.

EDIT: I checked again and it seems the pricing is comparable. Good stuff.

ralfelfving · on Dec 12, 2023

I think a prompt cost estimator might be a nifty thing to add to the UI.

Right now there's also a daily API limit on the Vision API too that kicks in before it gets really bad, 100+ requests depending on what your max spend limit is.

krschacht · on Dec 23, 2023

I love it! I’ve been circling around a similar set of ideas, although my version integrates with the web-based ChatGPT:

https://news.ycombinator.com/item?id=38244883

There are some pros and cons to that. I’m intrigued by your stand-alone MacOS app.

hackncheese · on Dec 12, 2023

Love it! Will definitely use this when a quick screenshot will help specify what I am confused about. Is there a way to hide the window when I am not using it? i.e. I hit cmd+shift+' and it shows the window, then when the response finishes reading, it hides again?

ralfelfving · on Dec 12, 2023

There's a way for sure, it's just not implemented. Allowing for more configurability of the window(s) is on my list, because it annoys me too! :)

hackncheese · on Dec 12, 2023

Annoyance Driven Development™

poorman · on Dec 12, 2023

Currently imagining my productivity while waiting 10 seconds for the results of the `ls` command.

ralfelfving · on Dec 12, 2023

It's a basic demo to show people how it works. I think you can imagine many other examples where it'll save you a lot of time.

hot_gril · on Dec 12, 2023

The demo on Twitter is a lot cooler, partially because you scroll to show the AI what the page has. Maybe there's a more impressive demo to put on the GH too?

thomashop · on Dec 12, 2023

Just used it with the digital audio workstation Ableton Live. It is amazing! Its tips were spot-on.

I can see how much time it will save me when I'm working with a software or domain I don't know very well.

Here is the video of my interaction: https://www.youtube.com/watch?v=ikVdjom5t0E&feature=youtu.be

Weird these negative comments. Did people actually try it?

ralfelfving · on Dec 12, 2023

So glad when I saw this, thanks for sharing this! It was exactly music production in Ableton was the spark that lit this idea in my head the other week. I tried to explain to a friend that don't use GPT much that with Vision, you can speed up your music production and learn how to use advanced tools like Ableton more quickly. He didn't believe me. So I grabbed a Ableton screenshot off Google and used ChatGPT -- then I felt there had to be a better way, I realized that I have my own use-cases, and it all evolved into this.

I sent him your video, hopefully he'll believe me now :)

thomashop · on Dec 12, 2023

You may be interested in two proof of concepts I've been working on. I work with generative AI and music at a company.

MidiJourney: ChatGPT integrated into Ableton Live to create MIDI clips from prompts. https://github.com/korus-labs/MIDIjourney

I have some work on a branch that makes ChatGPT a lot better at generating symbolic music (a better prompt and music notation).

LayerMosaic allows you to allow MusicGen text-to-music loops with the music library of our company. https://layermosaic.pixelynx-ai.com/

ralfelfving · on Dec 12, 2023

Oooh. Yes, very interested in MusicGen. I played with MusicGen for the first time the other week and created a little script that uses GPT to create the prompt and params which is stored to a text file along with the output. Let it loop for a few hours to get a few 100 output files that allowed me to learn a bit more about what kind of prompts that gave reasonable output (it was all bad, lol!)

ralfelfving · on Dec 12, 2023

Oh LayerMosaic is dope. I'm not entirely sure how it works, but the sounds coming out of it is good -- so you have me intrigued! Can I read more about it somewhere, I might have a crazy idea I'd like to use this for.

ralfelfving · on Dec 12, 2023

My brain read midjourney until I clicked on the GH link. What a great name, MIDIjourney!

mikey_p · on Dec 12, 2023

Is it just me or is it incredibly useless?

"Here's a list of effects. Here's a list of things that make a song. Is it good? Yes. What about my drum effects? Yes here's the name of the two effects you are using on your drum channel"

None of this is really helpful and I can't get over how much it sounds like Eliza.

thomashop · on Dec 13, 2023

I just made a video where I test it with a proper use case. It helps me find effects to make a bassline more dubby and helps carve out frequencies in the kick drum to make space for the bass.

https://www.youtube.com/watch?v=zyMmurtCkHI

thomashop · on Dec 12, 2023

I made that video right at the start but since then I've asked it for example what kind of compression parameters would fit with a certain track and it could explain to me how to find an expert function which I would have had to consult a manual for otherwise.

urbandw311er · on Dec 13, 2023

Yeah I thought the same. Ultra generic advice and no evidence it has actually parsed anything unique or useful from the user’s actual composition.

thomashop · on Dec 13, 2023

I made another one: https://www.youtube.com/watch?v=zyMmurtCkHI

In the one I posted I was just so amazed how well it worked and didn't really try anything useful. In this video you can see it giving me quite good advice on how to make a bassline dubby and how to carve frequencies out of the kick drum to make space for the bass.

It also looks at spectrograms and gives feedback / takes them into account. I'm pretty amazed.

ralfelfving · on Dec 13, 2023

Did you change the GPT Vision system prompt at all? I wonder if changing it to state getting help with specifically Ableton, and maybe some guidelines around what kind of help you want could make it better?

thomashop · on Dec 13, 2023

No. But I found it good enough as it is

pelorat · on Dec 12, 2023

I mean it does send a screenshot of your screen off to a 3rd party, and that screenshot will most likely be used in future AI training sets.

So... beware when you use it.

zwily · on Dec 12, 2023

OpenAI claims that data sent via the API (as opposed to chatGPT) will not be used in training. Whether or not you believe them is a separate question, but that's the claim.

thomashop · on Dec 12, 2023

Beware of it seeing a screenshot of my music set? OpenAI will start copying my song structure?

You can turn it on and off. Not necessary to turn it on when editing confidential documents.

You never enable screen-sharing in videoconferencing software?

aaronscott · on Dec 12, 2023

I completely agree. A huge business with a singular focus isn’t going to pivot into the music business (or any of the myriad use cases the general public throws at it). And if they did use someone’s info, it’s more likely an unethical employee than a genuine business tactic.

Besides, the parent program uses the API, which allows opting out of training or retaining that data.

mecsred · on Dec 12, 2023

Yes this makes perfect sense. As we know, businesses definitely do not treat data as a commodity and engage in selling/buying data sets on the open market as a "genuine business tactic". Therefore, since the company in question doesn't have a clear business case for data collection currently, we can be sure this data will never be used against our interests by any company.

rchaves · on Dec 13, 2023

Hey, I was working on something to allow GPT-V to actually do stuff on the screen, click around and type, I tested on my Mac and it’s working pretty well, do you think it would be cool to integrate? https://github.com/rogeriochaves/driver

ralfelfving · on Dec 13, 2023

Yes. I think you commented this somewhere else, and I like it. I was considering doing something similar to have it execute keyboard commands, but decided it would have to wait for a future version. I think click + type + and performing other actions would be powerful, especially if it can do it fast and accurate. Then it's less about "How do I do X?", and more "Can you do X for me?".

zmmmmm · on Dec 12, 2023

I've been wanting to build something like this by integrating into the terminal itself. Seems very straight forward and avoids the screen shotting. So you would just type a comment in the right format and it would recognise it:

    $ ls 
    a.txt b.txt c.txt

    $ # AI: concatenate these files and sort the result on the third column
    $ #....
    $ # cat a.txt b.txt c.txt | sort -k 3

This already works brilliantly by just pasting into CodeLLaMa so it's purely terminal integration to make it work. All i need is the rest of life to stop being so annoyingly busy.

paulmedwards · on Dec 12, 2023

I wrote a simple command line app to let me quickly ask a quick question in the terminal - https://github.com/edwardsp/qq. It outputs the command I need and puts it in the paste buffer. I use it all the time now, e.g.

    $ qq concatenate all files in the current directory and sort the result on the third column
    cat * | sort -k3

zmmmmm · on Dec 12, 2023

yep absolutely - have seen a few of those. And how well they work is what inspires me to want the next parts, which are (a) send the surrounding lines and output as context - notice above I can ask it about "these files" (b) automatically add the result to terminal history so I can avoid copy/paste if I want to run it. I think this could make these things absolutely fluid, almost like autocomplete (another crazy idea is to actually tie it into bash-completion so when you press tab it does the above).

CodeLLama with GPU acceleration on Mac M1 is almost instant in response, its really compelling.

paulmedwards · on Dec 14, 2023

Yes, that's a good suggestion. I've just pushed a change to my utility to provide the paste buffer along with the question. This does mean you need to select the lines first but will work with your exact question now. It's actually useful to quickly provide more data when asking a question where I would have needed to think more about how to phrase the question previously. Btw it automatically puts the output into the paste buffer so there is no need to manually copy the result before pasting.

Of course, full integration with the terminal would be good!

ukuina · on Dec 12, 2023

This is very cool! Thank you for working on it and sharing it with us.

ralfelfving · on Dec 12, 2023

Thank you for checking it out! <3

qup · on Dec 12, 2023

I have a tangential question: my dad is old. I would love to be able to have this feature, or any voice access to an LLM, available to him via an easy-to-press external button. Kind of like the big "easy button" from staples. Is there anything like that, that can be made to trigger a keypress perhaps?

ralfelfving · on Dec 12, 2023

I personally have no experience with configuring or triggering keyboard shortcuts beyond what I learned and implemented in this project. But with that said, I'm very confident that what you're describing is not only possible but fairly easy.

behat · on Dec 12, 2023

Nice! Built something similar earlier to get fixes from chatgpt for error messages on screen. No voice input because I don't like speaking. My approach then was Apple Computer Vision Kit for OCR + chatgpt. This reminds me to test out OpenAI's Vision API as a replacement.

Thanks for sharing!

ralfelfving · on Dec 12, 2023

Thanks! You could probably grab what I have, and tweak it a bit. Try checking if you can screenshot just the error message and check what the value of the window.owner is. It should be the name of the application, so you could just append `Can you help me with this error I get in ${window.owner}?` to the Vision API call.

I_am_tiberius · on Dec 12, 2023

I would love to have something like this but using an open source model and without any network requests.

dave1010uk · on Dec 12, 2023

LLaVA, Whisper and a few bash scripts should be able to do it. I don't know how helpful the model is with screenshots though.

1. Download LLaVA from https://github.com/Mozilla-Ocho/llamafile

2. Run Whisper locally for speech to text

3. Save screenshots and send to the model, with a script like https://til.dave.engineer/openai/gpt-4-vision/

trenchgun · on Dec 12, 2023

Probably in three months, approximately.

dekhn · on Dec 13, 2023

I misread the title and thought this was an app you run on a laptop as you drive around... which if you think about it, would be pretty useful. A combined vision/hearing/language model with access to maps, local info, etc.

ralfelfving · on Dec 13, 2023

It would be really cool, and I think we're not very far away from this being something you have on your phone.

The pilot name comes from Microsoft's use of "Copilot" for their AI assistant products, and I tried to play on it with macOSpilot which is maco(s)pilot. I think that naming has completely flown over everyone's heads :D

smcleod · on Dec 12, 2023

Nice project, any plans to make it work with local LLMs rather than "open"AI?

ralfelfving · on Dec 12, 2023

Thanks. Had no plans, but might give it a try at some point. For me, personally, using OpenAI for this isn't an issue.

hmottestad · on Dec 12, 2023

I think that LM Studio has an OpenAI "compliant" API, so if there is something similar that supports vision+text then it would be easy enough to make the base URL configurable and then point it to localhost.

Do you know of a simple setup that I can run locally with support for both images and text?

kssreeram · on Dec 13, 2023

People reading this should check out Iris[1]. I’ve been using it for about a month, and it’s the best macOS GPT client I’ve found.

[1]: https://iris.fun/

LeoPanthera · on Dec 13, 2023

Oof, $20/month is a lot, when I already have my own OpenAI API key.

kssreeram · on Dec 13, 2023

I guess having to enter the API key is not a great user experience for regular people who aren’t developers.

immy · on Dec 13, 2023

Due to ChatGPT Plus at $20/mo, and it not replacing ChatGPT, it doesn't stand strong for price conscious consumers. But I bet there's plenty who don't care.

kssreeram · on Dec 13, 2023

I can see that.

For me, it did replace ChatGPT for one reason: The convenience of a lightweight Iris window being just a hotkey away.

mdrzn · on Dec 13, 2023

I wish there was something like this for Windows!

d4rkp4ttern · on Dec 13, 2023

I’ve looking for a simple way to use voice input on the main ChatGPT website, since it gets tiresome to type a lot of text into it. Anyone have recommendations? The challenge is to get technical words right.

ralfelfving · on Dec 13, 2023

If you're ok with it, you can use the mobile app -- it supports voice. Then you just have the same chat/thread open on your computer in case you need to copy/paste something.

d4rkp4ttern · on Dec 14, 2023

Good idea, yes I do use the iOS app with voice all the time. But didn’t occur to me to use the iOS app to start a chat and continue on desktop. The main pain though is where I have lengthy back and forth with GPT4 discussing an approach or getting some piece of code just right. It often gets tiring enough that I just quickly type with lots of typos and it still does fine. But I’d rather not have to do that because these typo-filled chats will be hard to search though later :)

pyryt · on Dec 12, 2023

Do you have use case demo videos somewhere? Would be great to see this in action

ralfelfving · on Dec 12, 2023

There's one at 00:30 in this YouTube video (timestamped the link): https://www.youtube.com/watch?v=1IdCWqTZLyA&t=32s

quinncom · on Dec 12, 2023

I’d love to see a version of this that uses text input/output instead of voice. I often have someone sleeping in the room with me and don’t want to speak.

ralfelfving · on Dec 21, 2023

Added the text input option today.

ralfelfving · on Dec 12, 2023

You're not the first to request. Might add it, can't promise tho.

stephenblum · on Dec 12, 2023

You made real-life Clippy! for the Mac. This would be great to be for other mac apps too. Add context of current running apps.

ralfelfving · on Dec 12, 2023

It should work for any macOS app. It just takes a screenshot of the currently active window, you can even append the application name if you'd like.

lordswork · on Dec 12, 2023

This looks very cool. Does anyone know of something similar for Windows? (or does OP intend to extend support to Windows?)

ralfelfving · on Dec 12, 2023

Hey, OP here. I don't have a Windows machine so have not been able to confirm if it works, and probably won't be able to develop/test for it either -- sorry! :/

I suspect that you should be able to take my code and only require a few tweaks to make it work tho, shouldn't be much about it that is macOS only.

coolspot · on Dec 12, 2023

For testing/development, you can download a free Windows VM here: https://developer.microsoft.com/en-us/windows/downloads/virt...

jamesmurdza · on Dec 12, 2023

Have you thought about integrating the macOS accessibility API for either reading text or performing actions?

ralfelfving · on Dec 12, 2023

No, my thought process never really stretched outside of what I built. I had this particular idea, then sat down to build it. I had some idea of getting OpenAI to respond with keyboard shortcuts that the application could execute.

E.g. in Photoshop: "How do I merge all layers" --> "To merge all layers you can use the keyboard shortcut Shift + command + E"

If you can get that response in JSON, you could prompt the user if they want to take the suggested action. I don't see myself using it very often, so didn't think much further about it.

spullara · on Dec 12, 2023

Did you not find the built-in voice-to-text and text-to-speech APIs to be sufficient?

ralfelfving · on Dec 12, 2023

Didn't even think of them to be honest.

qirpi · on Dec 12, 2023

Awesome! I love it! I was just about to sign up for ChatGPT Plus, but maybe I will pay for the API instead. So much good stuff coming out daily.

How does the pricing per message + reply end up in practice? (If my calculations are right, it shouldn't be too bad, but sounds a bit too good to be true)

ralfelfving · on Dec 12, 2023

I have a hard time saying how much this particular application cost to run, because I use the Voice+Vision APIs for so many different projects on a near daily basis and haven't implemented a prompt cost estimator.

But I also pay for ChatGPT Plus, and it's sooo worth it to me.

If you'd like to skip Plus and use something else, I don't think my project is the right one. I'd STRONGLY suggest you check out TypingMind, the best wrapper I've found: https://www.typingmind.com/

qirpi · on Dec 12, 2023

Wow, thanks for sharing that link, I've been looking for something like this :)

satchlj · on Dec 12, 2023

It's not working for me, I get a "Too many requests" http error

ralfelfving · on Dec 12, 2023

Hmm.. OpenAI bunch a few things into some error. Iirc this could be because you're out of credits / don't have a valid payment method on file, but it could also be that you're hitting rate limits. The Vision API could be the culprit, while in beta you can only call it X amount of times per day (X varies by account).

Make the console.log:s for the three API calls a bit more verbose to find out which call is causing this, and if there's more info in the error body.

mdrzn · on Dec 13, 2023

Very cool, would love to have a Windows version of this.

ralfelfving · on Dec 13, 2023

I've not tried this on Windows, but might actually work if you run the packager. Try it. If it doesn't work, there shouldn't be too much that is macOS specific -- so you should be able to tweak the underlying code to work with Windows with fairly few changes.

knowsuchagency · on Dec 12, 2023

This is brilliant!

ralfelfving · on Dec 12, 2023

Glad you liked it!

Jayakumark · on Dec 12, 2023

Was following these two projects by someuser on Github which makes similar things possible with Local models. Sending screenshot to openai is expensive , if done every few seconds or minutes.

https://github.com/KoljaB/LocalAIVoiceChat

While the below one uses openai - don't see why it can't be replaced with above project and local mode.

https://github.com/KoljaB/Linguflex

ralfelfving · on Dec 12, 2023

Nice! Although the productivity increase from being able to resolve blockers more quickly adds up to a lot (at least for me), local models would be more cost effective -- and probably feel less iffy for many people.

I went for OpenAI because I wanted to build something quickly, but you should be able to replace the external API calls with calls to your internal models.

netika · on Dec 12, 2023

Such a shame it uses Vision API, i.e. it can not be replaced by some random self-hosted LLM.

ralfelfving · on Dec 12, 2023

It can be replaced with a self-hosted LLM, simply change the code where the Vision API is being called. That's true for all of the API calls in the app.

freedomben · on Dec 12, 2023

Actually it's open source, so it can be replaced by some random self-hosted LLM

iandanforth · on Dec 12, 2023

For example, one of these:

https://opencompass.org.cn/leaderboard-multimodal

jackculpan · on Dec 12, 2023

This is awesome

ralfelfving · on Dec 12, 2023

Thanks, glad you liked it!

fake-name · on Dec 13, 2023

> Open Source

> off to OpenAI Vision

Pick one

nbzso · on Dec 13, 2023

Welcome to the future where nobody is professional because there is no need for professionals. Just ask Corporate Overlord Surveillance Bot to give you instruction on what to do and how to think. Voilà. You are the master of the Universe. Dunning-Kruger champion for the ages to come.

The problem is obvious. Time to reaction. API calls limitation. Average response for a complex task due to limitation of the vision module. Similar functionality has to be available for free with local model tuned to those type of tasks - helper/copilot. Apple and Microsoft will include helper models into the OS soon. Let's hope they are generous and don't turn this to a local data gathering funnel (I have my doubts on this).

swiftcoder · on Dec 12, 2023

Worth mentioning that if you are in a corporate environment, running a service that sends arbitrary desktop screenshots to a 3rd party cloud service is going to run afoul of pretty much every security and regulatory control in existence

ralfelfving · on Dec 12, 2023

I assume that anyone capable of cloning the app, starting the it on their machine and obtaining + adding an OpenAI API key understands that some data is being sent offsite -- and will be aware of their corporate policies. I think that's a fair assumption.

greenie_beans · on Dec 12, 2023

that's a fair assumption. feels like swiftcoder is just trying to gotcha

isoprophlex · on Dec 12, 2023

You're telling me... the cloud... is other people's computers?!

thelittleone · on Dec 12, 2023

The control for that is endpoints should be locked down to prevent install of non approved apps. Any org under regulatory controls would have some variation of that. Safe to assume an orgs users are stupid or nefarious and build defences accordingly.

abrichr · on Dec 12, 2023

This is exactly why in https://github.com/OpenAdaptAI/OpenAdapt we have implemented three separate PII scrubbing providers.

Congrats to the op on shipping!

brookst · on Dec 12, 2023

True, but also true of other screen capture utilities that send data to the cloud. Your PSA is true, but hardly unique to this little utility. And probably not surprising to the intended audience.

havkom · on Dec 12, 2023

A lot of negative comments here. However, I liked it!

Perfect Show HN and a great start of a product if the author wants to.

ralfelfving · on Dec 12, 2023

Thank you, it's my first GH project & Show HN.. and.. yeah.. learning here :D

jonplackett · on Dec 12, 2023

Also think this is fun.

In general I’m pretty excited about LLM as interface and what that is going to mean going forward.

I think our kids are going to think mice and keyboards are hilariously primitive.

ralfelfving · on Dec 12, 2023

Before we know it, even voice might be obsolete when we can just think :) But maybe at that point, even thinking becomes obsolete because the AI:s are doing all the thinking for us?!

amelius · on Dec 12, 2023

Please include "OpenAI-based" in the title. (Now many people here are disappointed).

ralfelfving · on Dec 12, 2023

Fair point, didn't think it would matter so much. Can't edit it any more, otherwise I'd change it to add OpenAI to the title!

qainsights · on Dec 12, 2023

Great. I created `kel` for terminal users. Please check it out at https://github.com/qainsights/kel

dave1010uk · on Dec 12, 2023

Very cool! Have you had much luck with Llama models?

I made Clipea, which is similar but has special integration with zsh.

https://github.com/dave1010/clipea

qainsights · on Dec 12, 2023

Yes, I used Langchain for Llama.

qainsights · on Dec 12, 2023

Clipea is cool.

dave1010uk · on Dec 13, 2023

Thanks!

causal · on Dec 12, 2023

Chatblade is another good one: https://github.com/npiv/chatblade

ProfessorZoom · on Dec 12, 2023

e-e-e-electron... for this..

atraac · on Dec 12, 2023

Ah yes, cause what's better than building a real, working MVP? Learning Rust for half a year just so you can 'optimize' the f out of an app that does two REST calls.

wtallis · on Dec 12, 2023

To be fair, this does sound like the kind of app that would benefit from being able to launch instantly, and potentially registering with the OS as a service in a way that cross-platform frameworks like Electron cannot easily accommodate. But Rust would not be the easiest choice to avoid those limitations.

ralfelfving · on Dec 12, 2023

I don't know man. I'm new to development, it's what I chose, probably don't know any better. Tell me what you would have chosen instead?

lolinder · on Dec 12, 2023

Don't mind them—there's a certain subset of HN that is upset that web tech has taken over the world. There are some legitimate gripes about the performance of some electron apps, but with some people those have turned into compulsive shallow dismissals of any web app that they believe could have been native.

There's nothing wrong with using web tech to build things! It's often easier, the documentation is more comprehensive, and if you ever wanted to make it cross-platform election makes it trivial.

If you were working for a company it might be worth considering the trade-offs—do you need to support Macs with less RAM?—but for a side project that's for yourself and maybe some friends, just do what works for you!

ralfelfving · on Dec 12, 2023

Thank you for the explanation! At the end of the day, I'm a newbie and I'm in it to learn something new with each project. Next time I'll probably try my hand at a different framework.

millzlane · on Dec 12, 2023

I just watched a video about building a startup. One of the key points was to use what you know to get an MVP. Don't fret over which language or library to use (unless the goal is to learn a new framework). Just get building. I may not be a pro dev, but there is one thing I have learned over the years from hanging out amongst all of you. And that is, it doesnt matter if you are using emacs or vim, tabs vs spaces, or Java vs Python, the end product after all is what matters at the end of the day. Code can always be refactored.

Good luck in your development journey.

programmarchy · on Dec 12, 2023

My two cents: I think you made a good, practical choice. If you're happy with Electron, I'd say stick with it, especially if you have cross-platform plans in the future.

If you want to niche down into a more macOS specific app, you could learn AppKit and SwiftUI and build a fully native macOS app.

If you want to stay cross-platform, but you're not happy with Electron, then it might be worth checking out Tauri. It provides a JavaScript-based API to display native UI components, but without packaging a V8 runtime with your app bundle. Instead, it uses a native JavaScript host e.g. on macOS it uses WebKit, so it significantly reduces the download size of your app.

In terms of developing this into a product, on one hand it seems like deep integration with the host OS is the best way to build a "moat", but then again, Apple could release their own version and quickly blow a product like that out of the water.

airstrike · on Dec 12, 2023

I think the parent comment is a shallow dismissal, but since you're asking, I would have built in SwiftUI

guytv · on Dec 12, 2023

What's important is to get an product out there. Nobody cares what stack you use. just us geeks. don't get discouraged. you did well :)

xNeil · on Dec 12, 2023

electron's a really nice option, specially for people that aren't interested in porting their apps or spending too much time on development

this is a macOS specific app it seems - if you want better performance and more integration with the OS, i'd recommend using swift

ralfelfving · on Dec 12, 2023

Time to learn learn Swift in the next project then! Thank you for the deets.

Filligree · on Dec 12, 2023

The good news is you already have a tool to help you with inevitable XCode issues. grin

jdamon96 · on Dec 12, 2023

ignore the naysayers; nice job building out your idea

ralfelfving · on Dec 12, 2023

Thank you! I got pretty thick skin, but always a bit of insecurity involved in doing something the first time -- first public GH repo and Show HN :D

IMTDb · on Dec 12, 2023

[flagged]

ralfelfving · on Dec 12, 2023

Apparently we aren't, so I changed it :D

mwmercury · on Dec 12, 2023

[flagged]

kolinko · on Dec 12, 2023

According to your definition no program that connects using rest to a webservice is open source. That is absurd.

Open Source is defined by a license, not by what a program does. Also, it's trivial to connect such a program to another image recognition model if anyone wants.

a2128 · on Dec 12, 2023

I just think it's a bit misleading to call something "open-source (xyz)" if it wholly depends on a proprietary service that provides (xyz). If you made an open-source implementation of the Discord client, it'd be misleading to call it "Open-source direct messaging and communities app". At least in that example it would be possible to reimplement the backend and make it truly open-source, but so far nobody has been able to reimplement GPT-4V in a way that it's nearly as useful. Hence why some people get super excited when they see "Open-source [...] AI (using vision)" and super disappointed when it's just another wrapper for OpenAI

cqqxo4zV46cp · on Dec 12, 2023

No. That is a tone-deaf and disingenuous interpretation. If I read “open-source AI copilot”, I pretty strictly read this as implying that the model is open-source. And I’m far from an open-source purist! Hell, I’ve never once made a snarky comment about OpenAI “not being open” or anything!

mwmercury · on Dec 12, 2023

This is definitely clickbait. Do you see any other GitHub url on HN that need to stick "open source" on its title?

jondwillis · on Dec 12, 2023

So fork it and change it to use Ollama or whatever you want. It’s open source.

amelius · on Dec 12, 2023

Why the need to fork it, though? Shouldn't it be as simple as changing the URL of the API? If it is not so simple, then perhaps time for some standardization?

swells34 · on Dec 12, 2023

>Why the need to fork it, though?

Because you want it to behave differently than it does? It seems you are the one who wants things different, are complaining because they are not. Lucky for you, this is open source, and you can go ahead and fork it and change what you don't like!

jondwillis · on Dec 12, 2023

fork it and contribute back to the main, i should have said :)

amelius · on Dec 12, 2023

I'm contributing back not in code but in the form of architectural and UX advice :)

DeathArrow · on Dec 12, 2023

[flagged]

ralfelfving · on Dec 12, 2023

We can, if someone builds it. :)

I'm new to development, and this is what I went with. Don't know any better.

freedomben · on Dec 12, 2023

You made an open source tool and shared it with the world. Use whatever you want and don't feel bad about it. GP can port it to whatever native framework they want

jdamon96 · on Dec 12, 2023

if you want it, build it :)

Turing_Machine · on Dec 12, 2023

Sure we can. Write them.

Oh, you want someone else to write them. For free.

That's different, then. Carry on!

gmt2027 · on Dec 12, 2023

[flagged]

ralfelfving · on Dec 12, 2023

That's ok, I'm with you -- it's more of a joke with friends that developed after I started making tutorials. I primarily use a normal face expressions, and they said I should use one with open mouth -- so I did for funsies, and been doing it a few times over the months although I feel it's a bit cringe.

ralfelfving · on Dec 12, 2023

There, changed it.

xNeil · on Dec 12, 2023

hey - i know there's quite some negativity in this thread, but i just wanted to let you know that you don't have to change something if you don't want to

if that specific facial expression is getting you more views, i see that as justified. it's up to you whether or not you want to keep the thumbnail or not

lolinder · on Dec 12, 2023

It depends on which audience they care about more. Do they want the HN crowd—old and cynical curmudgeons who dislike most of what is trending on social media—or the peers who cause such things to trend? If they decide they want the HN types as an audience, listening to their feedback is a good start.

ralfelfving · on Dec 12, 2023

I appreciate the comment, I really do! Fwiw I play around with thumbnails and titles quite a bit to try to learn what seem to work, and I think the underlying sentiment in the comment rings true for most in my audience. Aaand, it may be a coincidence but the views just hockey-sticked when I changed it :D

artursapek · on Dec 12, 2023

Every time I see someone making that face I wanna stick a slimy cold hotdog in their mouth

ralfelfving · on Dec 12, 2023

I like hotdogs :)

LeoNatan25 · on Dec 13, 2023

“macOSpilot runs NodeJS/Electron”

Lost me.