I skimmed through the article but that's a lot of assumptions there if so.
1. So let's say that possible range of values is true (10 characters of specific range + 1). That would represent one big circle of possible area where videos might be.
2. Distribution of identifiers (valid videos) is everything. If Youtube did some contraints (or skewing) to IDs, that we don't know about, then actual existing video IDs might be a small(er) circle within that bigger circle of possibilities and not equally dispersed throughout, or there mught be clumping or whatever... So you'd need to sample the space by throwing darts in a way to get a silhouette of their skew or to see if it's random-ish, by I don't know let's say Poisson distribution.
Only then one could estimate the size. So is this what they're doing?
I see what you did there. So basically an overlapped proportion (or hits proportion) would be overlapping hits divided by samples run, and then an estimated total would be this proportion divided by total space of possibilities. That would work.
1. So let's say that possible range of values is true (10 characters of specific range + 1). That would represent one big circle of possible area where videos might be.
2. Distribution of identifiers (valid videos) is everything. If Youtube did some contraints (or skewing) to IDs, that we don't know about, then actual existing video IDs might be a small(er) circle within that bigger circle of possibilities and not equally dispersed throughout, or there mught be clumping or whatever... So you'd need to sample the space by throwing darts in a way to get a silhouette of their skew or to see if it's random-ish, by I don't know let's say Poisson distribution.
Only then one could estimate the size. So is this what they're doing?
Also.. anyone bothered to you know, ask Youtube?