They mention this multiple times in the paper. For example, in the “Limitations” section they write: “SAM can process prompts in real-time, but nevertheless SAM’s overall performance is not real-time when using a heavy image encoder.”
This paper is one of the highest quality papers released this year. I wish more papers were so clear and informative.
Yes, I read that section. They should have included the time required by the heavy image encoder- unless you know of a way to make SAM work with another encoder.
This paper is one of the highest quality papers released this year. I wish more papers were so clear and informative.