Shuffling

ryandrake · on Nov 29, 2014

While understanding the how/why of algorithms is great, a word of advice: If this were a real project and your standard library has a shuffle() function, just use that one and get on with life.

Save yourself from having to worry about your hand-crafted one being subtly wrong. Save your peers from having to debug your hand-crafted one years after you've left the company. This tip goes for basically any routine that already comes built-into your language or libraries. Spend your time writing code that doesn't already exist. I'm looking at you, co-worker from 5 or so years ago who felt he had to re-implement strings, linked lists, and sorting.

Kortaggio · on Nov 28, 2014

This is great, I never had intuitive understanding of why the naive shuffle was bad other than a vague notion of the fallibility of pseudo-random number generators. Seeing a math equation is one thing but seeing a tree of all possible outcomes visualized in front of you not only makes the bias blatant, but also makes it obvious why the bias is occurring.

squeakynick · on Nov 28, 2014

Thanks for kind comments.

mattxxx · on Nov 29, 2014

I love the idea of combining this with the idea of proving code.

Like, "how can I prove my approach 'shuffles' an array?" ... I mean, just because it looks right... doesn't mean it is.

mattxxx · on Nov 29, 2014

Really useful to remind yourself of this!

karmadude · on Nov 28, 2014

Here is a demo of shuffling using a combination of Hindu & Riffle shuffle http://karmadude.github.io/CardShuffler/

I heard someone say once that we have to shuffle a deck of cards 7 times to get a good shuffle. This demo was setup to try to simulate how we shuffle cards, and turns out shuffling around 5 times gives a pretty good shuffle.

jpitz · on Nov 29, 2014

To paraphrase, 7 is reasonably able, 14 is guaranteed.

Knuth cites this in TAOVP vol 2, from Aldous And Diaconis, AMM 1986, pp 333-348.

betaveros · on Nov 30, 2014

There was a Google Code Jam problem this year precisely about detecting the difference between the fair shuffle and the "terribly, terribly wrong" shuffle: https://code.google.com/codejam/contest/2984486/dashboard#s=...

chuckledog · on Nov 30, 2014

Coincidentally, shuffling was the subject of this week's puzzle on "Interview Cake": https://www.interviewcake.com/question/shuffle

glomph · on Nov 28, 2014

What motivated that choice of psudocode? I found it very hard to read.

richcuteguy34 · on Nov 28, 2014

It's looks very close to Visual Basic to me ... :)

squeakynick · on Nov 28, 2014

Yes, it is VB6.

I did actually put quite a bit of thought into how I was going to put in the code samples. I elected to use BASIC like syntax because IMHO, this is close to as common denominator as possible and most universally understood (this is not an invite to start a flame war!)

I also elected to start by arrays at 1, rather than 0. The purest in my wanted to use zero, but to make it more universally understood, I elected to use one (because it also relates to the real world and card #1). I think more people will understand it this way.

The other thing I almost did not do (but changed my mind at the last minute), was to in-line the swap function and enumerate all three steps. I almost wrote it with Swap(i,s) function instead. Half of me thinks using a function is easier to understand, the other half likes the in-line for explaining it.

It's not as if the logic is very hard, there's one loop.

I'm curious what others think. Was it hard to understand? I'm always keen to learn more. Could I have explained it easier with the changes above?

keithpeter · on Nov 29, 2014

As a civilian (i.e. non-programmer) I found the example code understandable. I too am wondering what alternative presentation method the post above yours would suggest.

Nice article.

glomph · on Nov 29, 2014

Coming from python and not knowing VB I found the Do Loop syntax tricky at first. I am used to a loop explaining how long it will run for at the top (similar to the for loop here).

Being pseudocode I found it odd to mix loop syntax. One loop has its conditions at the top and the other at the bottom. I guess that is just taste though.

toolslive · on Nov 28, 2014

numbering should start at 0. http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EW...

He also made comments about using Basic. Seems you made the wrong choices...