I have no educational training in computer science, but I work as a software dev...

irahul · on Oct 8, 2012

> But I did have to answer this question: why does the application crash? "Oh...that's because the developer that was pontificating yesterday about how heap operations behave asymptotically forgot to check if the database connection was open before he called the Save method. Apparently, such arcane trivia is beneath him. I can fix it."

False dichotomy. You posit the developer who understands asymptotic notation can't figure out why does the application crash, or how to write to databases. I don't see how understanding asymptotic notation negatively affects someone's programming ability..

hermannj314 · on Oct 8, 2012

> I don't see how understanding asymptotic notation negatively affects someone's programming ability

Why would it make them worse?* I was just arguing it doesn't necessarily make them better. Just because you are a bad developer doesn't mean you get better when you learn about asymptotic behavior or heapsort or O-notation. Lots of really bad developers know about those things. And lots of really bad developers don't know about them as well.

Knowledge of those items is neither sufficient nor necessary to be either a competent or incompetent developer (at least for reasonable definitions of both competent and developer).

* Ok - maybe too much knowledge has some odd 2nd order negative effects on productivity: analysis paralysis, over-confidence, unwillingness to ask for help, programming toward a theoretically pure solution rather than a pragmatic solution.

radagaisus · on Oct 8, 2012

from all the questions this is the one you MUST know if you want me (or anyone) to trust you with a piece of code. It takes one hour (in wikipedia!) to learn everything you'll need for a day-to-day work complexity assessment with the Big O notation.

They asked us this questions on our high school final exams, I'm sure you'll manage.

mattdeboard · on Oct 8, 2012

Yeah, I mean, I'm also a self-taught, workaday programmer, and I agree with you. Understanding time complexity to a degree that you can communicate about it isn't something to shrug off. Don't be proud of ignorance.

The problem is that the perspective expressed by the GP comment is a pure expression of the "don't know what you don't know" adage. Sure, you can hammer out solutions to problems, but the more exposure you've got to more concepts in your field, 1. the bigger your toolkit and 2. the quicker you can start drawing conclusions about the problem.

I'm actually pretty shocked someone's confessing to not knowing what big O notation is and proud of it.

hermannj314 · on Oct 8, 2012

> O(2^n) equal to O(3^n)

Yep. So I looked into it and it seems they are not equal. Every f(n) in the set O(2^n) belongs to O(3^n). While this means that O(2^n) is a subset of O(3^n), we can see from the trival case f(n)=3^n that O(3^n) is not a subset O(2^n) since 3^n > 2^n for all n as n approaches infinity, or there is no k such that is exists an n0, such that all n > n0 implies 3^n < k*2^n, therefore the two sets are not equal.

Great. I still don't see how I am more qualified to do absolutely anything in life.

psykotic · on Oct 9, 2012

> we can see from the trival case f(n)=3^n that O(3^n) is not a subset O(2^n) since 3^n > 2^n for all n as n approaches infinity, or there is no k such that is exists an n0, such that all n > n0 implies 3^n < k*2^n, therefore the two sets are not equal.

For what it's worth, this isn't really acceptable as a proof. The first half of your argument would go through equally well for 3n and 2n in place of 3^n and 2^n even though O(2n) = O(3n). The second half is simply a re-statement of what it means to say that 3^n is not in O(2^n) and doesn't actually prove it. The pertinent fact is that there is no constant k such that k >= (3/2)^n for all sufficiently large n because (3/2)^n goes to infinity as n goes to infinity. Compare this to the situation with 2n and 3n where (3n/2n) = 3/2 is a constant and hence bounded. A less trivial example would be something like n^2 + n versus n^2 where the ratio 1 + 1/n isn't constant but is nevertheless bounded for n >= 1.

radagaisus · on Oct 8, 2012

That's a great textbook summary, now for applications:

You got an array of your friends, an array of people who upvoted this post and an array of people who replied to this post. Filter all your friends who upvoted and replied to this post.

Now, before you write the code, how fast/slow will it run? for 1,000 friends? 1,000,000? will the runtime grow extremely fast? why?

Here's a Redis command that intersects two keys - http://redis.io/commands/sinter - the complexity is "O(N*M) worst case where N is the cardinality of the smallest set and M is the number of sets." - do you understand why? do you understand when it will be fast and when will it be slow?

hermannj314 · on Oct 8, 2012

I'll give this a shot.

The code I would write would take longer the more friends you have. Basically, it would grow linearly with the size of all the sets and the number of intersections found. Something like O(n * M) in the worst case. But I don't get why later you say that n is the size of the smallest set.

So I don't see how you can beat that. I guess you could sort the lists, but that take O(n log n) over M lists, and then lookups would take O(log n).

To me it will be fastest when the algorithm terminates the quickest - the smallest set contains nothing, the 2nd smallest set contains no intersecting keys, anything like that and will be the longest in the worst case (each set is a proper subset of the next) and progressively worse as the subsequent sets get larger.

As for that algorithm running in O(N * M), I am not sure I understood that at first, but I think I do now. If you hold all other set sizes constant, and can only vary the input of the smallest array or the number of sets and you will notice the running time follow O(N * M), but if you increase the size of the largest set by k, then the algorithm will take k times as long to run (in the worst case), but if the running time function f(N,M) belongs to O(N * M) then k * f(N,M) belongs to O(N * M) as well, so the O notation still represents the running time complexity even if you increase the size of the largest set.

So the running time will still increase with the largest set, but the O-notation of the algorithm will still belong to the same representation.

DannyBee · on Oct 9, 2012

With extra memory, and if you don't care about ordering, multiple set intersection can be made O(N) where N = size of each set added together, assuming no duplicates in any single set (which should be true, but it depends on how loosely you are using the word set) You only have to walk each element exactly once, and all other operations are (amortized) O(1)

For each element in all sets: Insert element into hash table. If it does not exist, use key = element, value = 1. If it exists, increment counter. This is O(1)

When done over all sets, it's O(N)

Walk hash table, the only elements that are in the intersection are those with value = number of sets. This is O(N)

O(N) + O(N) = O(N). This only works if there are no duplicates within a single set. It transforms set intersection into a counting problem over the universe of elements.

You can optimize it as well. You never need to deal with elements that don't exist in the hash table after the first set is processed, as they can never be in the intersection. This does not change the asymptotic complexity, except it means the hashtable will never resize after the first set. So it's advantageous to choose the smallest set to process first if you can do so easily. If you use dynamic perfect hashing or something similar, you can also guarantee all operations after the first set will be O(1), rather than amortized O(1).

xentronium · on Oct 9, 2012

That ("how would you implement ruby array intersection operator & and what computational complexity does your implementation have") was actually one of my interview warmup questions. In ruby core it's done exactly like you described, using a hash table, but I didn't know that until after the interview. I proposed a slightly worse solution, with sorting both inputs and then walking them simultaneously, which was O(n*log(n)). Still got the job, though :)

As a side note, I think that tasks like "implement a data structure with following operations and calculate computational complexity for each operation" are much better for exam than trivia questions like "Name the heap operations used by heapsort and the asymptotic running time of each.", with all due respect to @cperciva.

hermannj314 · on Oct 9, 2012

That's a very interesting solution, I would have never thought to try that.

dscrd · on Oct 8, 2012

Hmm, does all this imply that a blue-collar IT-workers' class is rising? When I was working during the summers at construction sites in my youth, I noted that everybody all the time complained about the impractical decisions the engineers and architects had made, and grumbling that they had to be there to fix them.

Would they have been happier if they wouldn't have been needed at all?

(disclaimer: I tend to change my shirt between white-collar and blue-collar [figuratively, of course] where I work)

mattdeboard · on Oct 8, 2012

(Shameless plug) I just wrote about this very thing on Friday: http://mattdeboard.net/2012/10/05/larry-the-software-guy/

Not my best writing ever but that round's already downrange. Plus there's an excellent response from Alex Feinberg in the comments.

flatline · on Oct 8, 2012

Interesting analogy. I think it more strongly implies that for the type of applications most frequently written, general intelligence and experience trump theoretical knowledge. Honestly the education you get from a BS in comp sci (the level of knowledge the post addresses) isn't going to qualify you for much more than the guy who has 4 years of experience on his résumé. Much less in some circumstances actually. This is a strange field.

Altaco · on Oct 8, 2012

Meanwhile, you discover a long time down the road that the algorithm you thought was fine is actually incredibly slow when scaled to large inputs because you didn't have a good understanding of runtime analysis.

Or maybe, as you say, understanding more things makes you worse at coding. Who knows.

waterlesscloud · on Oct 8, 2012

I guess I'm not buying that Big-O is a necessary requirement to understand how fast or slow an algorithm will be. It could be that someone understands and uses the process behind it without being at all familiar with the notation.

jrabone · on Oct 8, 2012

Some things you can look up quickly. Being functionally innumerate is not something you can fix with Google.

The tedious details of whatever god-forsaken CRUD framework is flavour-of-the-week with the cool kids probably WERE beneath him.

gargleblaster · on Oct 8, 2012

I think that question 1 is not well formulated (unless that is the answer he expects). Equality makes only very limited sense for asymptotic notation.

crntaylor · on Oct 8, 2012

It is perfectly well formulated. The notation O(f(n)) denotes a set of functions. The question is asking if the two sets O(2^n) and O(3^n) are the same set.

nandemo · on Oct 8, 2012

That's right. Some people get confused due to the rather unfortunate convention of writing f(n) = O(n) instead of the correct f(n) ∈ O(n).

alexkus · on Oct 8, 2012

I think the OP is looking for a bit more of an answer than a simple Yes/No.

anonymouz · on Oct 8, 2012

He is also asking "Why?", i.e., for a proof[1] to back up your Yes/No answer. Seems like a perfectly good question.

[1] Calculation, if you prefer.

oggy · on Oct 8, 2012

It's a perfectly sensible question. Knowing that both O(2^n) and O(3^n) are sets, and using the definition of set equality, it asks whether "f in O(2^n)" implies "f in O(3^n)", and vice versa. Only one of the implications holds...