To be fair - there's probably not a whole lot of real world use cases that aren't highly specialised where there's a requirement to run object detection/identification on every single frame of a 24fps video.
If you want to run hours or days worth of video through an object detector - you probably want to go out and buy a gpu and machine to stick it on of your own...
I'm curious as to what the application you're thinking of where this seems like "real world usage"? (I can imagine applications like vision-controlled drones, but I'm pretty sure places like ATH Zurich have better solutions (as in "less generalised and more applicable to drone control") and in-house hardware to train and run it on.)
There are plenty of real applications for inferring every frame of video. Any real time monitoring application would run all the frames, even if you cut it to 1 FPS with multiple video sources the monthly price doesn't make sense.
One application would be nudity detection for a family friendly site, lots of video would need to be checked.
The argument that you would want to run your own machine validates my point. However the same could have been said for video encoding or any other form of intense processing which all now have cloud alternatives.
OK. I don't think this is the solution you're after if your problem includes "crowd sourced video".
Nudity detection though - I'd probably at least try doing something like "Check every 50+rand(100) frames, and only examine more carefully if you get hits on that sampling". Sure - that's "game-able" - but subliminal nudity isn't something I'd expect trolls or griefers to expend too much effort to slide past your filters...
Hey yes there is a need to run detection at 20+ FPS but you don’t need to run it “one frame at a time” your mostly processing the same information again and again redunduntly.
I agree you neeed 24 FPS output but you don’t need to process all 24 frames raw as images.
If you want to run hours or days worth of video through an object detector - you probably want to go out and buy a gpu and machine to stick it on of your own...
I'm curious as to what the application you're thinking of where this seems like "real world usage"? (I can imagine applications like vision-controlled drones, but I'm pretty sure places like ATH Zurich have better solutions (as in "less generalised and more applicable to drone control") and in-house hardware to train and run it on.)