There are recurring patterns in a video or audio stream. More often than not these coincide with recurring patterns in the message, revealing information. For the sake of an example, take some pages of a novel as the one time pad. There will be statistical variations (not all ciphers in the ciphertext come with equal frequency) in the ciphertext, where the combination ocuring most will correspond to an "e" in the plain text and an "e" in the one time pad because both occur most often in English language. You can calculate the probabilities for each combination and thus deduce the plain text provided it is long enough. For a one time pad to be secure it must be truly random or pseudo-random.
As a way out you can agree to a seed to a secure pseudo random algorithm but that's commonly called a password or pre-shared secret. In fact it's more secure to use just the string 'The Nth 4096 bits from this week's mod(N,100) top video on YouTube' as pre-shared secret.
Can you tell me what's wrong with that approach? In my head, it seems reasonable.