What sort of programming positions? I've done a fair amount of web development i...

beoba · on Feb 21, 2011

Woah, you may want to learn about those.

Understanding algorithmic complexity and knowing which data structure(s) to use to solve a given problem are universal skills.

paulnelligan · on Feb 21, 2011

I'm going to play devil's advocate and say that while it is good to know these things, they're no longer essential. I graduated two years ago, and have been developing in Ruby since then, I haven't seen a linked list since College. Why?, because I chose to work with a newer language, with a higher level of abstraction. The programmer of the near future will have a different set of challenges to previously. In the same vein, the guys who graduated 10-15 years ago weren't learning Fortran. But 25 or 30 years ago, I'm guessing they were.

brown9-2 · on Feb 21, 2011

But if you ever choose to stray outside of Ruby and web development, you may find yourself in a whole different boat.

beoba · on Feb 21, 2011

Here's a quick example of a case where knowledge of data structures and algorithmic complexity is useful, regardless of the language.

Let's say you've got two lists, named "A" and "B", which each have 1000 integers. You want to return a list of unique integers which are present in both lists (ie the intersection of two lists). These 'lists' could be arrays or linked lists.

The brute force method would be something like this:

  Out = []
  for a in A:
    for b in B:
      if a == b and !Out.contains(b)://O(n) list search
        Out.append(b)
  return Out

But this can be very slow, because you can end up in the territory of O(n^3) iterations across A, B, and Out (internally the language would generally be iterating across Out in order to complete that 'contains' call). In this specific example, that works out to be something along the lines of 1000x1000x1000 = 1000000000 iterations, vastly larger than the size of the original lists.

---

A better way is to sort one or both of the lists then do the comparisons along each of them (this syntax assumes the lists are specifically arrays, but it'd effectively be the same for linked lists):

  Out = []
  sort(A)
  sort(B)
  b = 0
  for a in xrange(len(A)):
    vala = A[a]
    valb = B[b]
    if vala == valb:
      Out.append(vala)
      ++a
      ++b
    elif vala < valb:
      ++a
    else://vala > valb
      ++b
  return Out

This is definitely better than the above example. Now we're performing two sorts, each O(n log n), then we're doing a linear iteration across both of those lists in parallel, each O(n). So now we end up with an overall complexity of O(n log n) as a result of those initial sorts. Let's estimate the total number of iterations to be around, I dunno, 10000(sort)+2000(iterate/compare) = 12000? The exact number of comparisons in a sort can vary by algorithm used and how things were ordered in the lists to begin with.

Not bad, definitely better than what we were doing before. But we might be able to go a little better...

---

Yet another way is to use some hash sets, which (generally) have O(1) insertions and retrievals, and only store unique values, such that if you insert the same value twice, the second insertion is effectively ignored. We can do something like this:

  Out_set = set([])
  A_set = set([a in A])
  for b in B:
    if A_set.contains(b)://O(1) set search
      Out_set.append(b)
  return list(Out_set)//optional, could return Out_set

Now we end up with an algorithm which is O(n), where we iterated over the items in A once to fill in A_set, then we iterated over the items in B once, and each "contains" call was an O(1) hash lookup inside of A_set. Then finally we iterated over Out_set once to create the output list. This final step is optional, we could also have just returned Out_set directly, but it doesn't effect algorithmic complexity in either case. Now we've got 2000-3000 iterations, depending on how whether we return a set or a list. And this additionally looks a bit simpler than the sort+iterate version I gave above.

---

So just by using our knowledge of data structures, we've turned ~1000000000 iterations into ~2000-3000 iterations. This is on the order of reducing the distance to the moon to around a kilometer. And that's just with two lists that are limited to 1000 items each.

And this sort of thing is a pretty common scenario in any language, no matter how 'abstracted' it is.

bryanh · on Feb 21, 2011

Thanks a bunch for the detailed examples. I've noticed this is a huge problem in my code: too many nested for loops. I've generally gotten better about using built in functions like set (in Python at least):

    Out = list(set(A+B)) # right?

Or even DISTINCT in SQL (when dealing with webapps).

And now I never doubt the usefulness of fuzzing the DB with hundreds of thousands of random entries. That quicky code that took 40ms to run over several dozen entries turns into a minute or more for several thousand.... Ouch! ;-)

naner · on Feb 21, 2011

A linked list is a pretty elementary data structure. That being said, these questions seem like they're for a job writing client software on Windows some time before .NET existed.

Stormbringer · on Feb 21, 2011

No, even in VB you wouldn't do that.

Of course, way back in the dark ages when I wrote VB for client Windows apps I was all "lets keep this nice and simple, nice and simple.... arrrrrrghhh, why doesn't this POS language do what I want?! HULK SMASH", and then I'd break out the Win32 APIs and hit it with the big guns.

I feel a little sorry for the 'normal' VB programmers, because they'd be following my code and then BLAM it'd disappear into hyperspace, and re-emerge and something magical† would have happened.

†Read as: a sufficiently advanced technology

Vivtek · on Feb 21, 2011

Ha! Fond, fond memories of VB for Word 6.0.

xiongchiamiov · on Feb 21, 2011

It's worth noting that Python has just one linear structure - lists* (Ruby avoids lists and just has arrays). Now, they're actually much fancier under the hood, which is why you can get away with not having a bunch of choices, but it's the approach of "Oh hey, I want to store a bunch of stuff in a sequence of some sort - why should it matter how that's implemented?".

Now, I happen to agree with you that it's important to know how a basic linked list works. Beyond that, though, is far less important to some people than others.

* Okay, I lied. Halfway through writing this I remembered tuples (duh!), and I'm sure someone is going to point out several other things I've missed.