How does debugging, versioning and replicability work?
A more useful construct might be as a commenting format
# description: ai prompt and human description
# expected: what this block is supposed to do
# some begin marker
... code ...
# some end marker
And then if say, an API changes in the future or other incompatibility happens, then the "test" fails and the AI is given the old code, output, expected output, and description and asked to spruce it up to the modern times and then it gets somehow put inline with a rollback option and some audit log.
You can also have some semvar extension "version x.y.z (ai mutation syntax signature)" to allow others to replicate behavior.
This construct also allows people to run it with or without AI, even after mutation, so there is no forced change on the executer and the code has a consistent repeatable comparable ground truth so that diagnostics and expectations can be preserved.
You can even extend existing document formatters to support 'AI-ifying' since in a well formatted documented codebase you're actually most of the way there.
Heck, maybe you can even sloppily inference it to well documented code already
A more useful construct might be as a commenting format
And then if say, an API changes in the future or other incompatibility happens, then the "test" fails and the AI is given the old code, output, expected output, and description and asked to spruce it up to the modern times and then it gets somehow put inline with a rollback option and some audit log.You can also have some semvar extension "version x.y.z (ai mutation syntax signature)" to allow others to replicate behavior.
This construct also allows people to run it with or without AI, even after mutation, so there is no forced change on the executer and the code has a consistent repeatable comparable ground truth so that diagnostics and expectations can be preserved.
You can even extend existing document formatters to support 'AI-ifying' since in a well formatted documented codebase you're actually most of the way there.
Heck, maybe you can even sloppily inference it to well documented code already