> O(2^n) equal to O(3^n) Yep. So I looked into it and it seems they are not ...

psykotic · on Oct 9, 2012

> we can see from the trival case f(n)=3^n that O(3^n) is not a subset O(2^n) since 3^n > 2^n for all n as n approaches infinity, or there is no k such that is exists an n0, such that all n > n0 implies 3^n < k*2^n, therefore the two sets are not equal.

For what it's worth, this isn't really acceptable as a proof. The first half of your argument would go through equally well for 3n and 2n in place of 3^n and 2^n even though O(2n) = O(3n). The second half is simply a re-statement of what it means to say that 3^n is not in O(2^n) and doesn't actually prove it. The pertinent fact is that there is no constant k such that k >= (3/2)^n for all sufficiently large n because (3/2)^n goes to infinity as n goes to infinity. Compare this to the situation with 2n and 3n where (3n/2n) = 3/2 is a constant and hence bounded. A less trivial example would be something like n^2 + n versus n^2 where the ratio 1 + 1/n isn't constant but is nevertheless bounded for n >= 1.

radagaisus · on Oct 8, 2012

That's a great textbook summary, now for applications:

You got an array of your friends, an array of people who upvoted this post and an array of people who replied to this post. Filter all your friends who upvoted and replied to this post.

Now, before you write the code, how fast/slow will it run? for 1,000 friends? 1,000,000? will the runtime grow extremely fast? why?

Here's a Redis command that intersects two keys - http://redis.io/commands/sinter - the complexity is "O(N*M) worst case where N is the cardinality of the smallest set and M is the number of sets." - do you understand why? do you understand when it will be fast and when will it be slow?

hermannj314 · on Oct 8, 2012

I'll give this a shot.

The code I would write would take longer the more friends you have. Basically, it would grow linearly with the size of all the sets and the number of intersections found. Something like O(n * M) in the worst case. But I don't get why later you say that n is the size of the smallest set.

So I don't see how you can beat that. I guess you could sort the lists, but that take O(n log n) over M lists, and then lookups would take O(log n).

To me it will be fastest when the algorithm terminates the quickest - the smallest set contains nothing, the 2nd smallest set contains no intersecting keys, anything like that and will be the longest in the worst case (each set is a proper subset of the next) and progressively worse as the subsequent sets get larger.

As for that algorithm running in O(N * M), I am not sure I understood that at first, but I think I do now. If you hold all other set sizes constant, and can only vary the input of the smallest array or the number of sets and you will notice the running time follow O(N * M), but if you increase the size of the largest set by k, then the algorithm will take k times as long to run (in the worst case), but if the running time function f(N,M) belongs to O(N * M) then k * f(N,M) belongs to O(N * M) as well, so the O notation still represents the running time complexity even if you increase the size of the largest set.

So the running time will still increase with the largest set, but the O-notation of the algorithm will still belong to the same representation.

DannyBee · on Oct 9, 2012

With extra memory, and if you don't care about ordering, multiple set intersection can be made O(N) where N = size of each set added together, assuming no duplicates in any single set (which should be true, but it depends on how loosely you are using the word set) You only have to walk each element exactly once, and all other operations are (amortized) O(1)

For each element in all sets: Insert element into hash table. If it does not exist, use key = element, value = 1. If it exists, increment counter. This is O(1)

When done over all sets, it's O(N)

Walk hash table, the only elements that are in the intersection are those with value = number of sets. This is O(N)

O(N) + O(N) = O(N). This only works if there are no duplicates within a single set. It transforms set intersection into a counting problem over the universe of elements.

You can optimize it as well. You never need to deal with elements that don't exist in the hash table after the first set is processed, as they can never be in the intersection. This does not change the asymptotic complexity, except it means the hashtable will never resize after the first set. So it's advantageous to choose the smallest set to process first if you can do so easily. If you use dynamic perfect hashing or something similar, you can also guarantee all operations after the first set will be O(1), rather than amortized O(1).

xentronium · on Oct 9, 2012

That ("how would you implement ruby array intersection operator & and what computational complexity does your implementation have") was actually one of my interview warmup questions. In ruby core it's done exactly like you described, using a hash table, but I didn't know that until after the interview. I proposed a slightly worse solution, with sorting both inputs and then walking them simultaneously, which was O(n*log(n)). Still got the job, though :)

As a side note, I think that tasks like "implement a data structure with following operations and calculate computational complexity for each operation" are much better for exam than trivia questions like "Name the heap operations used by heapsort and the asymptotic running time of each.", with all due respect to @cperciva.

hermannj314 · on Oct 9, 2012

That's a very interesting solution, I would have never thought to try that.