And there's your wide execution width making it possible to do the checks in parallel with the access. You are using up entries in your branch prediction structure when you have something like that, as well as instruction bandwidth, but I'd doubt the function will run any slower once past decode.
It is. As you say, there's a category of Spectre bugs where the CPU speculatively assumes a bounds check succeeds, then goes on to execute attacker-provided code.