You'd need both, of course. And huge teams of artists churning out proprietary dataset updates constantly.
The current situation where you'd download billions of free images off the Internet only works once, and only if you somehow justify it as a research endeavour.
Once this thing is monetized intellectual property laws will kick in.