Hacker News new | past | comments | ask | show | jobs | submit login

Claude has been pretty great. I stood up an 'auto-script-writer' recently, that iteratively sends a python script + prompt + test results to either GPT4 or Claude, takes the output as a script, runs tests on that, and sends those results back for another loop. (Usually took about 10-20 loops to get it right) After "writing" about 5-6 python scripts this way, it became pretty clear that Claude is far, far better - if only because I often ended up using Claude to clean up GPT4's attempts. GPT4 would eventually go off the rails - changing the goal of the script, getting stuck in a local minima with bad outputs, pruning useful functions - Claude stayed on track and reliably produced good output. Makes sense that it's more expensive.

Edit: yes, I was definitely making sure to use gpt-4o




I installed Aider last week - it just started doing this prompt-write-run-ingest_errors-restart cycle. Using it with git you can also undo code changes if it goes wrong. It's free and open source.

https://aider.chat/


My experience reflects this, generally speaking.

I've found that GPT-4o is better than Sonnet 3.5 at writing in certain languages like rust, but maybe that's just because I'm better at prompting openai models.

Latest example I recently ran was a rust task that went 20 loops without getting a successful compile in sonnet 3.5, but compiled and was correct with gpt-4o on the second loop.


Weird. I actually used the same prompt with both, just swapped out the model API. Used python because GPT4 seemed to gravitate towards it. I wonder if OpenAI tried for newer training data? Maybe Sonnet 3.5 just hasn't seen enough recent rust code.

Also curious, I run into trouble when the output program is >8000 tokens on Sonnet. Did you ever find a way around that?


Sonnet 3.5 has a max output of 8192 tokens[0].

I break most tasks down into parts. Aider[1] is essential to my workflow and helps with this as well, and it's a fantastic tool to learn from. In fact, as of v0.52 I'm able to remove some of my custom code to run and test.

Started playing around with adding Nous[2] as well (aider is its code editing agent), but not enough that I'm using it practically yet.

[0] https://docs.anthropic.com/en/docs/about-claude/models

[1] https://github.com/paul-gauthier/aider/

[2] https://github.com/TrafficGuard/nous


Yeah, I know about the max tokens. As long as the code stays below that limit, I can get sonnet to emit complete python scripts; run those directly & return the results to sonnet, and I have a great feedback loop. That breaks down when sonnet can't emit all the code at once, because then you have to figure out ways to predictably assemble a larger script from a smaller one...that's where I couldn't find a decent solution.


Do you have a github for this process. I am learning how to do this kind of stuff. Would be cool to see how pros doing it.


That's pretty cool, can I take a look at that? If not, it's okay, just curious.


It's just bash + python, and tightly integrated with a specific project I'm working on. i.e. it's ugly and doesn't make sense out of context ¯\_(ツ)_/¯


Alright, no worries. Thanks for the reply




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: