We got approximately the same numbers (your clock speed is likely to be much higher). User/system time is quasi-bogus, since it’s a high core count system (although still a bit concerning). I accounted for possible cache effect by running each a number of times and reporting the last. I’m not really trying to make an absolute time comparison here, just pointing out that if 100ms is unacceptable, numpy would just miss that bar too. Once you’re past the bar of “this needs to be kept running”, I don’t think a constant factor of 100ms vs 1s makes much difference in QoL, and now we’re just comparing apples and oranges. A constant factor gain on the rest of the time can make a huge difference on the rate of results per second. But I actually hope both will improve!
Ah. I was trained by zsh's time reporting to focus on the last figure and noticed that "real" is at the top in your comment only after posting mine. And then still left scratching my head looking at "user" and "system" times an order of magnitude higher than "real".