i'm steve

Unbiased Shuffle - Fisher-Yates Shuffle Algorithms

Sun 08 Sep 2017

The Game

In a lucky draw game, say there are 3 prizes and 5 candidates. We put each of their name on a small piece of paper and drop them into a container. The game host will draw 3 papers from the container one by one randomly, and on each successive draw, the drawn paper will be PUT BACK into the container for the next draw. Name written on the drawn paper will be given the prize.

Is that how the game usually work? Does that sound fair to you?

Let’s look at the pesudo-code below for shuffling array. Ever written something similar?

var array_to_be_shuffled = [1, 2, 3, 4, 5];

for (int i=array_to_be_shuffled.length - 1; i>=0; i--) {
    int idx = randInt(0, 4);
    array_to_be_shuffle[i] = array_to_be_shuffled[idx];
}

print array_to_be_shuffle;
// Result: array_to_be_shuffle
// e.g. [2, 5, 1, 3, 4]

Loop thru the whole array, for each entry, randomly swap it with another entry within the array (as determined by idx)

At the first glance, it looks perfectly fine and snippets like above are pretty common in projects when you need to do some quick array shuffle.

Unfortunately, the above shuffling method are quite biased. ‘What?’ You may ask.

Hummm…

If you look at the code carefully, noticed the line written int idx = randInt(0, 5). This is where the biase originates from, as in every run of the loop, you are randomly swapping the current entry with another entry in the list EVEN IF THE ENTRY HAS BEEN SWAPPED BEFORE. Think about the lucky draw game described in the beginning of the article, winners are allowed to participate in another draw within the SAME game. Not Fair ehh?

Okay, let’s do some counting which can better illustrate the bias.

Say you are shuffling an array [1 2 3]

Original First Shuffle Second Shuffle
1 2 3 1 2 3 1 2 3
2 1 3
3 2 1
1 3 2
1 2 3 2 1 3 2 1 3
1 2 3
3 1 2
2 3 1
1 2 3 3 2 1 3 2 1
1 2 3
2 3 1
3 1 2
1 2 3 1 3 2 1 3 2
1 2 3
3 1 2
2 3 1
  • [1 2 3] : 4 times
  • [2 1 3] : 2 times
  • [3 2 1] : 2 times
  • [1 3 2] : 2 times
  • [3 1 2] : 3 times
  • [2 3 1] : 3 times

As illustrated, the probability of each possible shuffled sequence are not equal, some are more likely to appear – e.g. [1 2 3] [2 3 1]. So how do we fix it?

Ahhhaaaaa!

Since int idx = randInt(0, 5); is where the bias is, how about we modify it into, int idx = randInt(0, i); where i is the current loop index. In the lucky draw game terms, we don’t put back the drawn ball into the container, so as to remove the winner from being drawn again in the same game.

// Fisher-Yates Shuffle Algorithms

var array_to_be_shuffled = [1, 2, 3, 4, 5];

for (int i=array_to_be_shuffled.length - 1; i>=0; i--) {
    int idx = randInt(0, i);
    array_to_be_shuffle[i] = array_to_be_shuffled[idx];
}

Step Random (Idx) Original Array Shuffled Result
1 2 1 2 3 4 5 1 2 5 4 3
2 0 1 2 3 4 5 4 2 5 1 3
3 1 1 2 3 4 5 4 5 2 1 3
4 0 1 2 3 4 5 5 4 2 1 3
5 0 1 2 3 4 5 5 4 2 1 3

Shuffled entry are at the back of the array and wont be shuffled again

This becomes the famous Fisher-Yates Shuffle Algorithms, which is less biased when used for shuffling array.

Conclusion

Give Fisher-Yates Shuffle Algorithms a try next time when you want to shuffle an array / list. It has a simple implementation while removing bias from the process, but nevertheless, pay good attention on picking a good random generating function (randInt in the pesudo-code) which could be another potential source of bias.