I don't think the paper itself is misleading. I taught SAM earlier this week for my Frontiers in Deep Learning course and showed a figure from the paper with how long each component took, where they separate the components done on a GPU vs CPU/web browser.