Yes, but the fact that PPO can learn very long range strategies with enough computation when most would expect it to diverge/fail to learn at all is already demonstrated by the original 5x5 DoTA bot. That's probably the same thing there: it can handle the long-range learning and so does fine at human-level APM, while the SSBM AI is stuck learning short-term strategies which heavily rely on simple reactive fast policies.
There's nothing about PPO that helps it learn long-range strategies. It primarily lets you make multiple steps for a single batch so you can converge faster.
In fact, for a single step with no policy lag, it's equivalent to a standard policy gradient update.
I suspect the difference that allows you to train with reaction time is an RNN or compensating for the lag some other way. I'm testing that out right now with my own SSBM bot: https://www.twitch.tv/vomjom