Hacker News new | past | comments | ask | show | jobs | submit login

With extra memory, and if you don't care about ordering, multiple set intersection can be made O(N) where N = size of each set added together, assuming no duplicates in any single set (which should be true, but it depends on how loosely you are using the word set) You only have to walk each element exactly once, and all other operations are (amortized) O(1)

For each element in all sets: Insert element into hash table. If it does not exist, use key = element, value = 1. If it exists, increment counter. This is O(1)

When done over all sets, it's O(N)

Walk hash table, the only elements that are in the intersection are those with value = number of sets. This is O(N)

O(N) + O(N) = O(N). This only works if there are no duplicates within a single set. It transforms set intersection into a counting problem over the universe of elements.

You can optimize it as well. You never need to deal with elements that don't exist in the hash table after the first set is processed, as they can never be in the intersection. This does not change the asymptotic complexity, except it means the hashtable will never resize after the first set. So it's advantageous to choose the smallest set to process first if you can do so easily. If you use dynamic perfect hashing or something similar, you can also guarantee all operations after the first set will be O(1), rather than amortized O(1).




That ("how would you implement ruby array intersection operator & and what computational complexity does your implementation have") was actually one of my interview warmup questions. In ruby core it's done exactly like you described, using a hash table, but I didn't know that until after the interview. I proposed a slightly worse solution, with sorting both inputs and then walking them simultaneously, which was O(n*log(n)). Still got the job, though :)

As a side note, I think that tasks like "implement a data structure with following operations and calculate computational complexity for each operation" are much better for exam than trivia questions like "Name the heap operations used by heapsort and the asymptotic running time of each.", with all due respect to @cperciva.


That's a very interesting solution, I would have never thought to try that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: