Well, the accrual of "useless code" (there's a name for this that I forgot") is a known problem, but it is also something that stabilizes the learning process
I don't think it's as simple as putting the length of the AST in the goal function (but it's something interesting to try).
Depending on compile speed vs running speed you might be better off interpreting your ASTs
It's bloat due to 'introns' (useless statements that don't effect the output, like x = x * 1). And yes, just adding a fitness function to shorten program length isn't optimal. I've found it easier to evolve successful programs (letting the bloat happen) and then keep removing statements from correctly generated programs whilst checking if the output is the same. Probably not optimal either but I feel like it gives better results.
That's kinda horrifying... Accruing useless code to reduce the probability of mutating the useful code? Sounds like a better regularization strategy is needed.
I'm generally pretty suspicious of generic algorithms; why take a random walk when you can March along the gradient towards a solution?
It might be interesting to try using GA for neural architecture, though, and gradient descent to train the network... (Though it sounds expensive.)
I don't think it's as simple as putting the length of the AST in the goal function (but it's something interesting to try).
Depending on compile speed vs running speed you might be better off interpreting your ASTs