I feel like all those deep learning papers distorted people's perception of scale. If you need to take those 7k images by hand because your application domain is obscure and they aren't available in an existing dataset, that's way beyond feasible.
You could generate 7k images at a resolution of 4096x2160 pixels by walking around the vehicle for just under four minutes while shooting 4k video at 30 FPS, something of which modern phones are capable.
Yes, but how different would those 7K frames would really be? Same lighting, same background, same surrounding objects, the exact same condition of the vehicle's interior and exterior, same quirks of the camera's color profile, etc, etc. It would be an interesting experiment to actually try this, but I have a feeling the results wouldn't be all that good. Point being, you probably wouldn't get most of the benefits of deep learning and you might as well use the same approach the author used.
No they won't. All of the pictures will have lighting from the time of day and weather conditions from the time and place the pictures were taken. The same problems will happen for the background. If I want my neural network to identify the make and model of cars, but every picture I have of a Mazda3 is taken at noon on a sunny day in suburbia then it is reasonably likely to train on the wrong features and either identify trucks on sunny days in suburbia as Mazda3's or not recognize a Mazda3 photographed on a rainy night.
A human might have difficulty recognizing a Mazda 3 on a rainy night as well. You can adjust color temperature and white balance in post-processing, or film a couple minutes at night too. Point is, generating 7k images is not insurmountable, especially in this case with the criteria that it only has to recognize a particular car.
I feel like all those deep learning papers distorted people's perception of scale. If you need to take those 7k images by hand because your application domain is obscure and they aren't available in an existing dataset, that's way beyond feasible.