Many convolutional neural networks for image classification work on 299x299 inputs. If it had a higher resolution camera, it would just need to scale down every frame.
Most listed there are even smaller, 224×224px. These are models for Coral, though, right? It also says "These are not production-quality models; they are for demonstration purposes only."; the resolution may be part of the reason for that.
The CNN architectures are really designed to operate on such small inputs, even if you run them on RTX 4090s. It makes sense: even with 224x224 pixels you would find it easy to identify the subject of an image. Having a 50MP image of a chair doesn't really help you figure out what it is, and even makes it harder if you're only looking at one zoomed in region of the image at a time.
"Not production-quality models" might refer to: Training was not done for as many epochs as you might want to achieve peak accuracy, or different quantization methods might yield better performance, etc. Or, it's just a disclaimer that if you decide to sell a product using one of these models, don't blame them if it is bad at detecting hot dog vs not hot dog.
You can see input sizes of pre-trained models from Google: https://coral.ai/models/image-classification/