Shuffle algorithm?

Post by **Edward Faith** » Mon 24 Sep, 2007 1:46 am

This can probably only be answered by the programmer.

Explanatory background: With a physical deck, statisticians have discovered that it takes about six good riffle shuffles of a deck to thoroughly randomize it (i.e., remove predictable relationships between the final deck and the initial deck). Persi Diaconis (statistician) says seven shuffles. And so I really do riffle the physical deck six or more times.

I want do do the same with the electronic deck. I'd like to make a decision about how many times to shuffle it and how long a string to use for a seeded shuffle. I'm not comfortable with a shuffle algorithm that is only based on milliseconds, because there aren't enough milliseconds in a day to really do justice to all the possible permutations of the deck. In short, it's not random enough for my comfort. So I've been using seeds. But I'd like to know whether I'm wasting my time coming up with long seeds. Are long seeds hashed to some standard length before being used to seed the algorithm? If so, I'll just use a seed that length instead of wasting my time coming up with longer seeds. (The seeds I come up with are fully and densely random - I use data extracted from atmospheric noise with this service: http://www.random.org/strings/ , so I'm not talking about English phrases as strings or anything like that.)

Anyway, that's my question. The general question is, what is the randomization algorithm (roughly)? The more specific question is, how are the seeds processed? My main interest is how long the seed can usefully be.

Post by **Programmer** » Tue 25 Sep, 2007 3:40 pm

Hi Edward,

That is a very interesting question. The brief overview is as follows: the program takes the word or phrase that the user types in and then converts it to a number by adding up the ascii values of the characters typed in. It then adds the tick count since the computer started, and this new number is passed as a seed to the System.Random class to generate a series of pseudo random numbers.

These numbers are "pseudo" random because they are generated by the computer using a mathematical algorithm. The series could therefore be predicted by someone knowing the starting value and how the algorithm works. Each number that is generated becomes the seed for the next one and so on, and if you begin the cycle with the same seed twice you will in fact get the same series of numbers, which is why I add the tick count. If you don't give a seed the random function uses the tick count anyway, because it has to use something. Ticks are based on CPU cycles so they are a lot shorter than milliseconds, but there still aren't enough of them in the day if you ask me :)

Anyway, the bad news is that you aren't gaining anything at all by passing in a long string. There is no difference in the degree of randomness produced by the System.Random class whether you give it a seed number of 3 or a seed number of 3,000,000 - i.e. there is not really any "true" randomness at all besides that first number.

The only way to really introduce random shuffling would be to allow the user to plug in a hardware RNG and have the program interface with it. Hardware generators watch physical processes like the degradation of some element with an unstable decay to produce truly unpredictable results, but obviously they are not standard kit for a home PC. It would be interesting to look into making this available as an option, but I have no idea what it might involve or if it would really be feasible.

It is fair to say though that the program does come up with a sequence of numbers that would be almost impossible to predict for anyone who didn't have a brain the size of a planet and a lot of time on their hands. Not completely though, and here is a story that illustrates this. A couple of years ago some virus writers used a pseudo random function to select network addresses with which to propagate their script. Because each instance of the virus was seeded from the previous one it was possible to work backwards and find the "patient zero" - the original machine on which the virus was first executed. http://www.newscientist.com/article.ns? ... news_rss20.

That aside though, the other thing to consider is how well the program performs at shuffling the deck. Having generated a series of numbers using the process described above the program takes the deck and swaps two cards. It repeats this a thousand times. I wonder how this would compare to the affect of the riffle shuffle that you are describing. Do you have any numbers for that? I suspect that after 1000 position changes on a 78 card deck you would probably get close to 100% shuffled each time, so it might even be necessary to reduce this number in order to achieve greater realism on a single shuffle.

I can see that it would definitely be advantagous to have the program behave more transparently as far as shuffling goes, for the sake of those users who are interested.

Cheers,
Richard

Post by **Programmer** » Tue 25 Sep, 2007 5:08 pm

Hi Again,

Well, I have been thinking about this since posting, and really it shouldn't be too hard to add a tool that will take a long string of user-provided digits and use it as the basis for the shuffle. I don't really think it would be useful as standard functionality, but it would make a nice little addition for the utilities page on the Options form. I took a look at the generator on the random.org page and that would be able to do the job nicely. I think we would just need to agree on the number of card swaps that constitute a normal shuffle. Then the program can tell you that you need n numeric strings of such-and-such a length etc. So how many card swaps might it take?

Richard

Post by **Edward Faith** » Tue 25 Sep, 2007 5:43 pm

Richard,

Thank you for the answer. It was extremely helpful. While the pseudo-random sequence is not what I would consider "random" for my purpose, the tick count is legitimately "random" for my purposes, because it depends on the user's own timing, which is as "random" as a coin flip or riffling the cards, both of which depend on the user's own randomness. (I'm not going to go into whether the user is random - I'll take it as a given that he or she is.)

So the tick count really is a random event, or at any rate as far as I'm concerned it counts as one. And it's essentially the only random event (adding a random seed is equivalent to letting time pass so the tick count increments naturally, because the seed is merely converted into a number and added to the tick count).

Do you happen to know what the space of possible shuffles is that the algorithm can produce? To give an example of what I mean, as a first step the algorithm might represent the number in a 16-bit numeric variable. That variable can take on only 2^16 possible different values, or about 65,000 different values. If this is how the algorithm works, then there are at most about 65,000 ways to shuffle the cards.

The reason this is important is that to really thoroughly shuffle the deck we ideally want a method that can, with about equal probability, reach each of the possible orderings of the deck. Since there are 78 cards, then there are 78! (factorial) possible ways to order the deck (unless I've completely forgotten that part of combinatoric math). That's about 10^115. If our actual method can really only shuffle the deck in one of 65,000 ways, there is a large gap between what can actually be done and what can ideally be done.

This gap can (in principle) be closed by repeated shuffling. For example, if you shuffle twice, then the number of ways of doing this (i.e. of shuffling twice) is about 65,000 times 65,000 - that is, 2^16 times 2^16, or 2^32. I'm assuming there's no overlap. There's probably very little overlap. Shuffle six times and you've massively increased the number of possible orderings you can reach, and you're starting to make significant progress in the direction of 78!, which is about 10^115.

This is actually overkill, because you only need such a thorough shuffling if you are going to use the whole deck, and in actual fact your spread is going to use much fewer than 78 cards. The Celtic Cross spread uses 10 cards, I think. The first card of the cross is any one of the 78, the second one is any one of the remaining 77, etc. So the total number of possible Celtic Cross spreads is no higher than 78 x 77 x 76 x ... x 69, or ((78 choose 9) x 9!) (I think - feel free to correct my math). That's about 7 x 10^6. That's way smaller than 10^115.

What this means is that you probably don't have to shuffle as much to thoroughly randomize the deck simply for the purpose of displaying the first 10 cards.

Anyway, that's my thinking, which I'm sharing in order to explain where I'm coming from and why I'm interested in just how large a space of effective possibilities that initial seed (i.e., the tick count or the tick count plus user-supplied seed) occupies. I think that if I know this information, then I can with some confidence figure out how many times I need to shuffle the deck to get a really satisfactory result, in terms of really making available all the 7 x 10^6 possible celtic cross spreads (or other spread). I hope I've been coherent.

Edward

Post by **Edward Faith** » Tue 25 Sep, 2007 6:14 pm

Hi Richard,

Our comments "crossed in the mail". I posted my reply before I saw your second reply.

(First, let me acknowledge a previous blunder of mine, "(78 choose 9) x 9!)" should have 10 in place of 9. That's in addition to whatever other errors I committed.)

Well, I guess it would take me a while to calculate what you're asking for. The simplest number to express is the number of possible final orders of the deck, which I'm pretty sure is 78! (78 factorial), for a 78-card deck. Now, if you're talking about two-card swaps between two cards each of which was genuinely chosen at random, then what I need to do (I think) is calculate how many of these truly random two-card swaps need to be performed before we get comfortably close to reaching every possible final deck order with approximately equal probability.

The "dumb math" answer is easiest. Ignoring any subtleties (and doing so at our peril, but nevertheless doing so for convenience), each swap involves picking two cards at random and swapping them, and therefore there are (78 choose 2) possible swaps. Do it again, and the number of pairs of swaps is (78 choose 2) x (78 choose 2). I think one subtlety at this point is that the actual number of possibilities is somewhat smaller (about half, I think), but I'm ignoring subtleties. If we keep on calculting in this simplistic way then at a very rough guess a mere 39 swaps will be more than enough to get us to 78! and beyond. If you're doing 1000 swaps where each one is fully random then, estimating embarrassingly roughly, I would say that you've gone way beyond what you need to do.

This is very rough, and it would require me remembering a lot of math long forgotten to even think about getting closer to an actual value. Chances are good that there's a huge error in my estimate.

Post by **WebWeaver** » Tue 25 Sep, 2007 7:13 pm

Just as an aside to the idea that 7 is the number of shuffles needed to create true randomness, you need to remember that was for a 52 card deck, and Tarot has 78 cards, so the number of shuffles is different. I'm not sure what the exact number is for a 78 card deck, but if my memory is correct it was something like 13, but that is a total guess from my memory. It's a complicated formula to determine the number of shuffles needed.

$\frac{3}{2} \log_2 n + \theta$ shuffles are necessary and sufficient to mix up $n$ cards

Which doesn't make any sense to me

I real life if your looking for randomness your better off dealing into piles (I use 7) and then smooshing the cards.

Shari

Post by **Edward Faith** » Tue 25 Sep, 2007 7:28 pm

Richard,

I don't mean to break your program, but purely thinking about this as a puzzle with no restraints, I believe that a full randomization of the deck could be achieved in 77 steps. The method does not involve card swapping. The method is as follows:

1) Randomly pick a card from among the 78 cards. This is your first card in the new order of the deck.

2) Randomly pick a card from among the 77 remaining cards. This is your second card in the new order.

3) Repeat until you have reached the end. There are 77 steps in all. If each choice was truly random and unbiased, then your deck is fully shuffled and cannot become more shuffled than it is. Every possible final ordering of the deck is equally probable.

If your program is built around card swapping then please ignore this. It's of interest mainly as an algorithm whose success is easy to see - as a starting point for thinking about a rigorously complete shuffle. I've also been reading up (Wikipedia) on something called the "Knuth shuffle", which involves card swapping with certain restrictions. It also fully shuffles the deck in 77 steps, provided each step is fully randomized.

Edward

Post by **Edward Faith** » Tue 25 Sep, 2007 7:33 pm

Shari,

You're absolutely right. I can't say whether the equation is right, but I completely forgot about the different size of the deck. Another wrinkle is that if you're using reversals, that might involve some extra shuffling to randomize those thoroughly.

Edward

Post by **Programmer** » Tue 25 Sep, 2007 9:15 pm

Hey Guys,

Well this is all very interesting, I don't think I have ever thought about this as much as I have today. I remember when I came up with the idea of swapping cards it was because I thought that it was a fair approximation of the way people shuffle. The big idea of the program has always been that it is a deck emulator, so this seemed the right way to go. But on reflection that doesn't seem to emulate a human dealer very well at all. If I picture myself shuffling then the first thing that I do is cut the deck somewhere (but never right at the bottom or top, because my fingers are too thick). Then I take part of the deck and skim cards into the remaining part off the top of the first part. Then after a few times I take that part, put it at the other end of the deck, and get a fresh few off the bottom, repeat the process and so on. So human dealing isn't really random, each dealer probably has their own "fingerprint", even though if they deal long enough they have a good chance of having a truly randomized deck.

I find it very hard to follow the numbers because I am not a mathematician, but here are two extreme cases:

With the program working as it stands, taking 1000 swaps, and those swaps being either psuedo random at present, or truly random as we are speculating about, you could still end up by taking the same two cards each time and simply changing their positions, so the final deck is exactly the same as the first deck, except that two cards are slightly warmer. Or, the other extreme, as per Edward's later post, you might get really lucky and find that in just 34 swaps (or a small number, but not thousands) you have changed the position of every card in the deck, and therefore any card could be in any position.

It strikes me that what happens in reality will always be a curve between these two extremes, but the curve produced by a human dealer will be different to the curve produced by the swapping algorithm, which will be different to the curve produced by the cards migrating in an entirely random way from one part of the deck to another. Which curve is best might come down to a matter of opinion. Some users might find a more human shuffle psychologically more comforting, while others would prefer something more rarefied.

The reason I have the seeded shuffle in there at the moment, even though it doesn't actually add to the randomization, is because I think that people probably like to feel that the shuffle is influenced by their question (which it is, of course). When I think about a shuffle generated entirely by the decay of some radioactive isotope it sounds very interesting indeed. How about a shuffle generated by ambient noise emissions picked up by a probe somewhere on the surface of Mars? That would be sooo

, but you could get the same randomness without going so far afield. I think maybe the best thing I could do to help people see how the shuffling works, and maybe partly answer Edward's original request, is by having the program give some feedback about how the shuffling has affected the deck - what percentage of the cards are in a different position from their original ones. Then they could evaluate for themselves whether a reshuffle is needed.

Anyway, it is past my bedtime now, I am going to go and dream about shuffling decks :)

Richard

PS Shari, that was such a cool equation. You really had me going there, you shouldn't have let on that it didn't make any sense to you either :)

Post by **Edward Faith** » Wed 26 Sep, 2007 1:29 am

Richard,

That's a great thought - to make the program more open about how the deck has changed. I'm not sure that "percentage of the cards are in a different position from their original ones" really says much though, because I think moving the cards is a pretty low hurdle. After all, you could just cut the deck once, and all the cards would move to a new position, but nobody would consider that a really good shuffle.

Here's how I look at it. My real goal here is to make all possible orderings of the cards available as possibilities. Suppose that you ask a question for which there's some really good answer. If you're using a Celtic cross spread, then this best answer is one of (78 choose 10) * (10 factorial), or about 4.6 × 10^18 possible answers that could be given. But if you're using a pseudo-random shuffle seeded by a 32-bit random number, then mathematically after one shuffle there is no way you can reach more than 2^32 (or about 4 billion) possible answers. You're actually missing out on a lot of possible answers, which are mathematically excluded. This is true no matter how shuffled the deck looks to the naked eye, if the shuffling is being done pseudo-randomly based on a 32-bit number.

The only way that you can even hope to make all the conceivable answers really possible is to shuffle the deck at least twice. That's assuming the seed is a 32-bit number (if it's a 16-bit number then that's even less random). However, what I've read about tick counts (which seems to suggest that they typically really are just milliseconds and not literally the number of cpu cycles) suggests that realistically it varies over a much smaller number of possibilities unless you've literally had your program open for weeks. If you've had your program open only for half an hour or so, then there are only about 2 million possible values that your tick value can have taken, which is on the order of 10^6. If you shuffle three times, waiting about half an hour between shuffles, then you can raise the randomness to something like 10^18. That's pretty tough. Much better to shuffle many times. If you do that, and if the ticks are only about 1000 per second (as they seem typically to be from what I've read, though I can definitely be corrected on this) and if you shuffle about once every couple of seconds, then the randomness you're introducing each time you shuffle is about 10^3. To get near 10^18 you need to shuffle at least 6 times. Heh - this is actually a coincidence, I wasn't trying to make it come out similar to a physical deck.

Shuffling multiple times is a really powerful way to increase the randomness of your final result. Each time you shuffle, by choosing the exact moment of shuffling you pick a new random number. The randomness accumulates very quickly - it increases about exponentially. If shuffling once explores a space of 1000 possible shuffles, then shuffling twice explores a space of about 1 million possible shuffles, and shuffling three times explores a space of about a billion (i.e. a thousand million) possible shuffles. (I'm crossing my fingers here - I may be overly optimistic about certain aspects of this.)

Randomness is really important because if randomness is low then the number of possible answers that the tarot deck can give is low, for any given initial ordering of the deck. A good pseudo-random algorithm can do a great job of hiding the poverty of the seed that it uses, but the poverty is real nonetheless. The good news is that this is easily remedied by shuffling more than once. (Well, so I hope.)

Anyway, just some thoughts. The take-away message is that if I know how many ticks there really are in a second, the size of the variable that is used to hold the resulting tick count, and how that variable is employed by the algorithm (e.g., is it wasting valuable information), then I can guesstimate how many times I need to shuffle the deck. I feel guilty because it seems like a tall order. But I can't think of any way of simplifying the problem.

Edward

Orphalese Tarot Zone Community Forum

Shuffle algorithm?

Shuffle algorithm?

Re: Shuffle algorithm?

Re: Shuffle algorithm?

Re: Shuffle algorithm?

Re: Shuffle algorithm?

Re: Shuffle algorithm?

Re: Shuffle algorithm?

Re: Shuffle algorithm?

Re: Shuffle algorithm?

Re: Shuffle algorithm?