Inevitably there will be copyrighted images, audio, and text mixed in with random social updates and discussions. It should be on the LLM builder to seek active consent, rather than everyone else to be vigilant and/or sue to get their work out of the model's data.