How I randomize

Mostly for fun, I started doing some research on random and pseudo-random numbers. There's a lot to tell, but the long and short of it is that computers can't make truly random numbers by themselves. They need input from a truly random source. So the randomizers I had been using in my destinator before weren't perfectly random.

1. Requirements

The destinator, to produce a random destination, has to mimic 3 dice. But since the third die is only used as odd or even, we can just call it a coin toss. With two dice and a coin, there are 6 * 6 * 2 = 72 possible outcomes. This instantly confuses some, because the chart that comes with the game only seems to indicate 22 possibilities (2-12 odd and 2-12 even). However, not all the numbers from 2-12 are equally represented. There's only one way to roll a 2: 1 + 1, but there are 3 ways to roll a 4 (1+3, 2+2, 3+1), so 4 is three times more likely to come up than 2 (Settlers of Catan makes use of this in a more visible way). So to make this 2-dice-plus-1-coin roll into a single number, we take our numbers 1-72 and map them to a die roll:

So, to choose a region or city, I need to randomly generate a number from 1-72.

2. Sources

I used to use the rand() and mt_rand() functions in PHP to generate random numbers. This was probably sufficient for this game, but I'm obsessive. These functions only produce pseudo-random numbers which are therefore slightly predictable (I'm not sure how much because I certainly can't predict them). I wanted true randomness!

I now use random numbers generated by two different online services:

RANDOM.ORG

Random.org gets its random numbers using atmospheric noise. I get random numbers from them using their random integer generator.

The ANU Quantum Random Numbers Server

This service gets its numbers by measuring the quantum fluctuations of a vaccuum. Unlike random.org, this site doesn't have the option of giving me integers from 1 and 72, so I get raw binary from them. This means that I need to scale the results myself.

Scaling

The idea of scaling random data ends up being pretty complex. It basically comes down to taking a number between 1 and x, and transforming that into a number between 1 and y, but without losing the random distribution. Since the random data comes in as binary, I can only split it up into numbers that are powers of 2. If I look at just one bit (a bit is a 1 or a 0), it gives me a 50-50 randomization (like a coin toss) with 2 possible outcomes. Using 2 bits, I get 4 outcomes, (00, 10, 01, 11), using 3 bits gives me 8, 4 bits gives me 16, etc.

I need to take a binary number and get a result that's between 1 and 72. That needs 7 bits, but 7 bits can make 128 different numbers. What do I do if the number generated is over 72? If I just wrap around so that 73 = 1 and 74 = 2, then 128 = 56, and that means we are twice as likely to get 1-56 than 57-72, so that won't work. I could just throw out all the results that are too high, but that's wasteful, and I wanted to be a little bit more efficient than that. So I came up with my own scaling algorithm.

My little algorithm works as follows. Note, though, that my algorithm works with numbers from 0-71. Then at the end I convert any 0's to 72.

  1. I start with a long stream of 1's and 0's. We'll say 01011101100010011001011110100111001.
  2. I start crawling through the stream to collect 7 bits to make a number: 0-1-0-1-1-1-0 which equals 46. Perfect! Put that in my cache, and it's ready to use
  3. Now my remaining bits to be used are 1100010011001011110100111001. Continuing to crawl, I go to 1-1, but now I have a problem: any 7-digit binary number starting with 11 will be 96 or more, so I can't use this one.
  4. Rather than throw out all 7 bits, I only throw out these two and am left with 00010011001011110100111001. I continue crawling: 0-0-0-1-0-0-1 = 9! Cache it!
  5. 1001011110100111001. Continue crawling: 1-0-0-1-STOP! The 7-digit number 1001000 is 72 which is too high (remember, I'm currently looking for numbers from 0 to 71)
  6. This time I throw out all 4 of these bits and get 011110100111001. Now I continue: 0111101 = 61, cache it. 0011100 = 28, cache it! Now we only have one bit left over, so we have to throw it out (or maybe save it to put on the beginning of our next chunk!)
  7. So by throwing out numbers that start with 11, 101 or 1001 we got 46, 9, 61 and 28. 4 numbers out of 35 bits. Not bad.
  8. Let's check our work. If we just threw out high numbers, we'd split our original number into 7-bit chunks: 0101110,1100010,0110010,1111010,0111001. That's 46, 98, 50, 123, 57. We have to throw out 98 and 123, and only wind up with 3 good numbers out of 35 bits. Looks like my way was more efficient.
  9. Of course random numbers mean that the results will be different every time. I ran both methods over thousands of bits and got 8.7 bits per number using my method, and 11.5 bits per number using the plain-old way.

I'm not pretending to be a genius, I'm just excited to have found out this little algorithm and listed it here so you can see how my destinator works and, if you happen to find a flaw in my math that makes things less random, you can correct me.

3. Fallbacks

Since these random numbers rely on my server making a connection with a 3rd party servers (which have limits of their own), I needed a fallback. If I run out of numbers from random.org, it will fall back to PHP's rand() function. If the ANU server fails us, we fall back to the mt_rand() function.

4. Multiplying the randomness

Another possibility I took into account was one of my sources providing a nonrandom and predictable sequence. To mitigate this risk, every "roll" uses two sources of randomness. Unless there is an error, the two sources are random.org and ANU. I get a number from 1-72 from each of those sources and combine them by adding them together, and rotating them around the number 72. Simply put, if the two numbers added together are greater than 72, we subtract 72. So 70 + 70 = 140, and 140 - 72 = 68. For more examples:

So you see that whatever one source gives me, the other source can change it to any other number at all. This means that if either source is truly random, the resulting number will be truly random, even if the nonrandom source is known. Say we have three truly random numbers (23, 54, 2) and three known numbers (7, 7, 7). The result is 30, 61, 9. If you can only know/guess the 777 source, the other one effectively scrambles the results, and you wind up with perfect randomness. You could potentially have numbers that seem to unrandomize each other like (2, 4, 6) + (5, 3, 1) = (7, 7, 7), but 7, 7, 7 will eventually appear in a truly random sequence that's long enough, because true randomness means any number could come up, regardless of how many times it has come up already.

In the event that we're combining two pseudo-random numbers (like the ones from our fallbacks, rand() and mt_rand()), combining them like this will increase the randomness. If someone is able to predict one with 50% accuracy and the other with 40% accuracy, combining the two will make their guesses only 20% accurate!

5. Caching

I mentioned that I download chunks of numbers from my sources and then use them up one at a time. I do this because otherwise it really slows down the gameplay and puts unnecessary load on the two random number services I'm using. Instead, when my server needs more random numbers, it gets a big chunk of randomness from the source, processes it, and saves the resulting numbers for later use. So, now we've got a big list of random numbers in the same order they were received from the source, and while you're clicking "Go, Go, Go, Go," I'm just giving you the next number from the list (cache) which is local and super quick to get, rather than asking the source for new randomness, processing it, etc.

This does create one possible shortcoming in all of this: Cheating. If you can find and access my cache, then you will be able to predict what will be rolled next. My defense against that is to say: "Don't do that, you'll ruin the game!" Also, I've blocked access to the cache, so nobody should be able to access it anyways. But if you figure out a way, you're only hurting yourself. If you're playing against someone who may go to those extremes to cheat, you'd better stick to using dice... that you provide.