My experience reflects this, generally speaking. I've found that GPT-4o is bette...

generalizations · 2024-08-27T23:30:10 1724801410

Weird. I actually used the same prompt with both, just swapped out the model API. Used python because GPT4 seemed to gravitate towards it. I wonder if OpenAI tried for newer training data? Maybe Sonnet 3.5 just hasn't seen enough recent rust code.

Also curious, I run into trouble when the output program is >8000 tokens on Sonnet. Did you ever find a way around that?

SparkyMcUnicorn · 2024-08-29T13:29:36 1724938176

Sonnet 3.5 has a max output of 8192 tokens[0].

I break most tasks down into parts. Aider[1] is essential to my workflow and helps with this as well, and it's a fantastic tool to learn from. In fact, as of v0.52 I'm able to remove some of my custom code to run and test.

Started playing around with adding Nous[2] as well (aider is its code editing agent), but not enough that I'm using it practically yet.

[0] https://docs.anthropic.com/en/docs/about-claude/models

[1] https://github.com/paul-gauthier/aider/

[2] https://github.com/TrafficGuard/nous

generalizations · 2024-09-03T17:26:28 1725384388

Yeah, I know about the max tokens. As long as the code stays below that limit, I can get sonnet to emit complete python scripts; run those directly & return the results to sonnet, and I have a great feedback loop. That breaks down when sonnet can't emit all the code at once, because then you have to figure out ways to predictably assemble a larger script from a smaller one...that's where I couldn't find a decent solution.