Hacker News new | past | comments | ask | show | jobs | submit login

Great question -- I was waiting for someone to ask this!

Their product and launch is super cool. It's incredible how much it's able to do by just relying on tool use + micro agents + screen shots + coordinates to interact with websites.

There are a couple of thoughts here:

(1) Will their competitors wait around and not build something similar? Will xAI / Gemini / OpenAI / Mistral / MetaAI teams wait around? Probably not. This is likely a huge part of the future, and one company will not "take it all"

(2) How is value actually derived from these systems? Is a demo + cool usable product enough? Likely not. Most people actually want their workflow automated. For personal use-cases, this might be enough.. but enterprises likely want something more complex

(3) Will this be optimized for Claude only? What if you want to run this with your own open source LLMs? Or you want to point this at the best model on the market all the time? Will you get that flexibility through a solution provided by a big player? Likely not -- Anthropic has incentive to get you to use Claude under the hood

The last point is the one that gives me hope. Our open source users are able to pick their favourite model to run on. You're not locked into Cluade. You can run it on Gemini / GPT-4O or open source ones such as Llama 3.2.




Congrats on the launch! Curious to know, which OSS models you see works best at the moment?


We've had a decent amount of luck with InternVL 2.0 w/ Llama, and are pretty excited about Llama 3.2

It's still super early in the open source x vision model space. The limiter actually seems to be the vision encoder -- advancements here will pay off huge dividends

https://huggingface.co/spaces/opencompass/open_vlm_leaderboa...


Thank you! Great insight.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: