I've found that GPT-4o is better than Sonnet 3.5 at writing in certain languages like rust, but maybe that's just because I'm better at prompting openai models.
Latest example I recently ran was a rust task that went 20 loops without getting a successful compile in sonnet 3.5, but compiled and was correct with gpt-4o on the second loop.
Weird. I actually used the same prompt with both, just swapped out the model API. Used python because GPT4 seemed to gravitate towards it. I wonder if OpenAI tried for newer training data? Maybe Sonnet 3.5 just hasn't seen enough recent rust code.
Also curious, I run into trouble when the output program is >8000 tokens on Sonnet. Did you ever find a way around that?
I break most tasks down into parts. Aider[1] is essential to my workflow and helps with this as well, and it's a fantastic tool to learn from. In fact, as of v0.52 I'm able to remove some of my custom code to run and test.
Started playing around with adding Nous[2] as well (aider is its code editing agent), but not enough that I'm using it practically yet.
Yeah, I know about the max tokens. As long as the code stays below that limit, I can get sonnet to emit complete python scripts; run those directly & return the results to sonnet, and I have a great feedback loop. That breaks down when sonnet can't emit all the code at once, because then you have to figure out ways to predictably assemble a larger script from a smaller one...that's where I couldn't find a decent solution.
I've found that GPT-4o is better than Sonnet 3.5 at writing in certain languages like rust, but maybe that's just because I'm better at prompting openai models.
Latest example I recently ran was a rust task that went 20 loops without getting a successful compile in sonnet 3.5, but compiled and was correct with gpt-4o on the second loop.