Has someone tried classical program synthesis techniques on competitive programming problems? I wonder what would have been possible with tech from more than 2 years ago.
I don't know if anyone has tried it, but it's not a very objective evaluation. We have no good measure of the coding ability of the "median level competitor" so doing better or worse than that, doesn't really tell us anything useful about the coding capability of an automated system.
So my hunch is that it probably hasn't been done, or hasn't been done often, because the program synthesis community would recognise it's pointless.
What you really want to look at is formal program synthesis benchmarks and how systems like AlphaCode do on them (hint: not so good).