Shuffle algorithm?

Bug reports, feature suggestions etc...

Moderators: Programmer, WebWeaver, WillowsHeart

User avatar
Programmer
Major Contributor
Major Contributor
Posts: 1725
Joined: Sat 01 Jan, 2005 12:00 am
Location: Spain
Contact:

Re: Shuffle algorithm?

Post by Programmer »

Hi Edward,

It would be interesting to know how many shuffles/swaps are required to give this guarantee.

If you want to try casting around on the internet, you could start with a search on the Microsoft .NET Random(Int) class's Next() function which is in the System namespace. I don't know if you will find much on how the function works internally. I haven't been able to find much more than the basic information about the seed parameter and return value, both of which are 32bit signed integers, eg in the range of + or - 2,147,483,648 (hexadecimal 0x80000000). I guess if the tick count ever got to this number it would clock. It is novel to talk about a tick clocking, isn't it ;)

If you can't find out anything useful on this, there is also possibility of me making the program use the RNGCryptoServiceProvider, which is in the Cryptography namespace. It generates "cryptographically strong random numbers and is suitable for key and salt/IV generation. While RNGCyptoServiceProvider is technically a pseudo-random generator, it is NIST-certified as cryptographically strong". It might also be better documented.

Cheers,
Richard
Edward Faith
Registered User
Registered User
Posts: 16
Joined: Sat 22 Sep, 2007 5:49 am

Re: Shuffle algorithm?

Post by Edward Faith »

Richard,

Thanks. Like you, I haven't had success so far finding how the Random.Next() method works internally. I suppose it might change from version to version of .Net. I'll keep looking and let you know if I find anything, but for now I'll trust that it's a good algorithm that makes good use of the seed it's provided.

I've been thinking about the seed. That, at least, is not hidden, so it's possible to think about it more effectively. It occurs to me that if the tick count is really only a count of milliseconds, then it's not making full use of the capacity of the 32-bit variable, which can store a value up into the billions, whereas if the tick count is thousandths and if you shuffle the deck twice within about 5 seconds of each other, then the difference between the two tick counts will not be more than about 5000, which is less than optimal if your goal with multiple shuffling is to increase the randomness. (This is not dire, since you can just increase the number of shuffles - it's simply less than optimal.)

So I am interested to know just what the tick count is. If it's actually a count of cpu cycles that would be really nice, but it may be too much to hope for.

Also, correct me if I misunderstood, but I think you said that you do something like adding the ascii values of the input string. Or something along those lines. I've been thinking about this and I think it may be an inefficient use of the input string. Let me explain with a simplified analogy. Suppose (for this simplified demonstration) that the string is limited to the numerals 0 through 9. Now consider the space of all possible 10-character strings, for example:

9384629075

If your approach is to add the numerals, then the range of possible values that your result can take is very small, it's between 0 and 90. That is, the string giving the highest possible output value is:

9999999999

If you add up the numerals, they make 9*10 = 90.

But if, instead of adding up the numerals, you treat the string as the direct representation of a number base 10, then the range of possible values is every value between 0 and 9999999999, which is ten billion possible different values.

So I think that there is a much more efficient way of using the input string to create a number, one that does not throw away so much data and gives you a much wider range of possible numbers. There is any number of ways you could do this. For example, you could take each input character's ascii value, take its value mod 10 (which will be 0 to 9), and then treat the resulting string of numerals as a direct base-10 representation of a number. I am not specifically proposing you do this - I am just illustrating the idea of a different approach.

An efficient use of the input string will, I think, really increase its value as a supplemental source of randomness. This is especially valuable if the tick count alone is not making full use of the 32-bit variable.

Edward
User avatar
Programmer
Major Contributor
Major Contributor
Posts: 1725
Joined: Sat 01 Jan, 2005 12:00 am
Location: Spain
Contact:

Re: Shuffle algorithm?

Post by Programmer »

Hi Edward,

Well I can confirm that ticks are based on CPU cycles. The .NET Framework's TimeSpan class gives a TicksToMillisecond property for conversion purposes, which returns a figure based on the CPU's characteristics. I just tried it on my machine here, a bog standard Pentium 4, and it gives a figure of 10,000.

I am still not sure how to do the maths, but I think that even in a standard shuffle, the amount of time that needs to elapse on the system clock for a user to conceivably be able to end up with any possible final deck, just based on the inherently random matter of when their thumb hits the enter key, is probably quite short.

I am interested in your idea of using the seed more effectively though. It seems to me that one weakness with the current arrangement is that the whole sequence is based on one instance of the Random class. This means that if the same seed were used twice, however unlikely that is, the user would get the same shuffle. What do you think of the idea of using two instances of this class in the shuffle, each derived from the users search expression, with the program ensuring that the seed was different in each case. One RNG would be used to provide the indexes of the cards which are moved, and the other would provide the indexes of the reinsertion points. Wouldn't this exponentially increase the number of possible decks you could end up with?

Cheers,
Richard
Edward Faith
Registered User
Registered User
Posts: 16
Joined: Sat 22 Sep, 2007 5:49 am

Re: Shuffle algorithm?

Post by Edward Faith »

Hi Richard,

Just to clarify, does the 10,000 figure mean 10,000 ticks per millisecond, or in other words, 10,000,000 ticks per second? It sounds like it, I just wanted to double-check. If that's true, that's really good. That may, unfortunately, not be enough to guarantee that the tick count can take on any of 10 million values per second, because there may be a limitation on the computer's ability to time a mouse click or key press. But let's not pursue this right now because there are other things to think about.

On the matter of the maths, here is how I think of it. If there are 10 million ticks per second then there are 600 million ticks per minute. Each tick corresponds to one final deck, becuse the program takes a tick and runs it through the pseudo-random number generator (I'll call this the P-RNG), which can only give one result for that particular tick, so can only produce one final deck for that particular tick. Since there are 600 million ticks in a minute, then there is a list of 600 million final decks that you are in effect picking a final deck from by pressing a key randomly in the space of a minute.

This list of 600 million final decks, however, is vastly smaller than the complete list of all possible final decks. That list is on the order of 10^115, which is a number that is much bigger than the number of atoms in the universe (the latter number is something like 10^79). And much bigger, also, than 600 million, which is merely 6*10^8.

The user can conceivably end up with any of the list of 600 million final decks that correspond to the 600 million ticks in a minute, but the user cannot conceivably end up with any of the others - and the decks that he cannot conceivably end up with vastly outnumber the decks that he can.

"What do you think of the idea of using two instances of this class in the shuffle, each derived from the users search expression, with the program ensuring that the seed was different in each case."

But if they are derived from the same expression, then why aren't they the same? I'm not sure what you have in mind here. There are ways to do it right, but there are ways to do it wrong which superficially look right.

"One RNG would be used to provide the indexes of the cards which are moved, and the other would provide the indexes of the reinsertion points."

If they were the same algorithm keyed off the same seed, the result would be no shuffle at all - the cards would stay in place because the indexes would be identical.

So it's obviously very important that they be different. But I'm not clear on what you have in mind to create (or preserve) a difference.

"Wouldn't this exponentially increase the number of possible decks you could end up with?"

Depending one what you did exactly, it might do just that. Or it might give you absolutely no benefit at all. In the best case scenario my guess (which I'm pretty confident of) is that it would approximately square the number of possible final decks.

My thinking here is that the way you have it now, you're throwing away a lot of the information from the expression by adding up the ascii values. You can recover a lot of that information if, instead of producing one sum of the ascii values of all the characters, you divided up the characters into two piles and took one sum of ascii values for each pile and then used these two sums for the two PRNGs, one sum per PRNG.

But if all you do is take the first seed, and then pass it once through a third PRNG to produce a second seed for the second PRNG, then you will not increase the number of possible decks at all.
User avatar
Programmer
Major Contributor
Major Contributor
Posts: 1725
Joined: Sat 01 Jan, 2005 12:00 am
Location: Spain
Contact:

Re: Shuffle algorithm?

Post by Programmer »

Hi Edward,

Yes, let me clarify. I was definitely talking about two P-RNGs in the above example. One of them might be seeded from the first half of the users input, and one of them from the second half, for example.

I think we are back to the drawing board though because that really is a very big number we are looking for (your example of the number of atoms in the universe certainly brings it home).

What it makes me think, in fact, is that your original assertion that a human shuffler, in six shuffles, can produce a thoroughly randomised deck can't be right. I think he can do a shuffle which will satisfy the most demanding poker player, but I think that the tiny factors which randomise the shuffle (e.g. the amount of perspiration on the dealers hands, microscopic differences in the thickness of the cards etc) will only be enough to produce a tiny fraction of all those possible decks.

I think probably the way ahead is to have the seeded shuffle box treat the users input in one of two ways, depending on the form it takes. If it is purely numeric and more than a certain required length the program would assume that it is a string of random numbers provided by the user. It would then convert that into an array, discard every number over 77, and use each remaining member alternately as the card to move or the position to insert it at. No random function would be involved.

On the other hand text input would be treated more or less at present using a P-RNG, but perhaps with some improvements - for example, why not, doing 6 or 7 iterations each with a different seed (i.e. user input plus incrementing tick count). So it could attempt to emulate a really thorough professional dealer, even if, like him, it doesn't manage to reach every possible combination.

Do you agree that this might be the best way of satisfying everyone?

The other idea I was thinking about was having a background process going on whenever the program runs, which would store up random numbers based on small variations in mouse movement. This is a technique I have seen used on some cryptography software. It would be like a self-winding watch: a small graphic could show how the random number stock accumulates as the program runs, and how some of it is discharged each time the user shuffles. Like the solution suggested above, each card movement in the shuffle would consume one number. It might be a satisfying solution in some ways but I thing the number of people who would really appreciate what it was doing would be quite small.

Cheers,
Richard
Edward Faith
Registered User
Registered User
Posts: 16
Joined: Sat 22 Sep, 2007 5:49 am

Re: Shuffle algorithm?

Post by Edward Faith »

Hi Richard,

You make the point:

"What it makes me think, in fact, is that your original assertion that a human shuffler, in six shuffles, can produce a thoroughly randomised deck can't be right."

Well, I did make a mistake but it wasn't that. The mistake was to forget, in my last reply, to restate an earlier point, which is that our ultimate object may be more modest than the complete shuffling of the tarot deck. That was an important point, it really does make our task a lot easier, and I completely forgot about it so let me make up for that. There are, as I said, 78! permutations of the tarot deck, and this is vastly larger than the number of atoms in the universe. However, when the tarot deck is used to answer a question, only the top handful of the cards is used. So, while it is very hard to fully reshuffle the deck, it is much easier to fully reshuffle it for the purposes of getting an answer from the tarot deck.

There are 78! permutations of the complete deck, but there are far fewer possible answers given if (say) you are using a celtic cross spread. If you do this, you are only using the top 10 cards of the shuffled deck. Let me calculate this, and we'll see that it's much smaller than 78!.

The first card of a 10-card spread can be any of the 78 cards of the pack. The second card can be any of the remaining 77 cards. The third card can be any of the remaining 76 cards. A three-card spread, then, for starters, comes in only 78 x 77 x 76 varieties, which is a very small number. Specifically, it is:

78 * 77 * 76 = 456 456

That is, slightly less than half a million. That's pretty big in ordinary terms but pretty small compared to the number of ticks in a second.

The number of possible ten-card spreads is a lot bigger than this but still much smaller than the number of permutations of the 78-card deck. The idea of the number is the same. It's:

78 x 77 x 76 x 75 x 74 x 73 x 72 x 71 x 70 x 69

using Google calculator:

78 * 77 * 76 * 75 * 74 * 73 * 72 * 71 * 70 * 69 = 4.56617697 × 10^18

(in my earlier discussion I used a more compact way of representing this product, but this way is more clear what is going on)

So we're talking about a number on the order of 10^18. I'm not sure of your nationality - in the US a "billion" is 1000 million, and in London I think the term is "1000 million", with "billion" reserved for "million million". I'll use the (foreign to me) term, "1000 million". Using that terminology, then 1000 million is 10^9, and if you multiply that by itself, then you get:

1 000 million * 1 000 million = 1.0 × 10^18

So the number of possible celtic cross spreads is on the order of 10^18, which is just 1000 million times 1000 million. A big number, but still much smaller than the number of atoms in the universe.

In principle, if a single shuffle of the deck can produce one of 10^3 possible final decks (this is making the modest assumption of 1000 ticks per second, and a shuffle based on a random keypress within a second), then two shuffles can produce about 10^3 * 10^3 = 10^6, and three shuffles can produce about 10^3 * 10^3 * 10^3 = 10^9. There is no guarantee here for reasons I won't go into but I'm going to assume (maybe optimistically) that we don't have to worry about problems. Following this line of reasoning, then 6 shuffles can produce one of on the order of 10^18 possible final decks. This isn't quite right because the closer you get to reaching all the possibilities, the harder it is to reach every last possibility. But I think that after 6 shuffles we're still pretty close.

So that's why I said 6 shuffles. This means that the task is not all that bad.

To really illustrate this point (that you don't need to completely shuffle the deck), if you're doing a 1-card spread, then it may very well be enough simply to cut the deck a few times. If you cut the deck a few times, then at the end pretty much any card might be on top. It might be 2 of cups, 9 of swords, wheel of fortune, or any other one of the 78 possibilities. Just - don't try to do a two-card spread this way. The typical one-card spreads will be

2 of cups

or

9 of swords

and so on. Perfectly fine - all the 78 possibilities are accounted for.

But the typical two-card spreads, if you took an ordered pack and merely cut it a few times, would be:

2 of cups in the first card, 3 of cups in the second

9 of swords in the first card, 10 of swords in the second

and so on. This is not fine at all. You don't want the second card to be guaranteed to be the next card after the first card.

This illustrates the point that if you're going to do a spread with N cards, then your shuffling burden depends on the N you choose. Just, don't shuffle for an N-card spread, and then deal N+1 cards. That's a no-no.

"If it is purely numeric and more than a certain required length the program would assume that it is a string of random numbers provided by the user. It would then convert that into an array, discard every number over 77, and use each remaining member alternately as the card to move or the position to insert it at. No random function would be involved."

Yes, that would work totally and completely to fully shuffle the tarot deck, and there would be no worries whatsoever. The downside is that it would take a really long string to do this justice, or multiple shufflings, because 4 numerals will only swap 2 cards. You need a lot of 2-card swaps in order to get close to thoroughly shuffling the deck. So that would be a very long input numerical string. Doable, yes, but I'm not sure how easy. I would definitely be delighted with it and would find some way of obtaining the very long random numerical string that it would need. I think your method is to do 1000 swaps, right? I think that's definitely enough. But to achieve it with an input string I would need to provide a 4000-character-long numerical string. More, actually, because values above 78 would be dropped.

"The other idea I was thinking about was having a background process going on whenever the program runs, which would store up random numbers based on small variations in mouse movement. This is a technique I have seen used on some cryptography software. It would be like a self-winding watch: a small graphic could show how the random number stock accumulates as the program runs, and how some of it is discharged each time the user shuffles. Like the solution suggested above, each card movement in the shuffle would consume one number."

I agree that that would work.

"It might be a satisfying solution in some ways but I thing the number of people who would really appreciate what it was doing would be quite small."

I agree. However, my guess is that if you implemented a really bad pseudo-RNG approach that doesn't use any user input or tick counts but simply stores and uses the output from the previous invocation of the PRNG, the vast majority of people would not notice, and so they would not appreciate the difference between the bad pseudo-RNG system and the good one that you actually have. How could anyone really tell without reverse-engineering the code or doing a labor-intensive study of the output? And yet, you seem to agree with me that a bad pseudo-RNG system is not desirable, since you took the trouble to base it off of tick counts.

So, I think the difference between a good randomizing system and a bad one, or between a good one and a better one, is going to be lost on the vast majority of users, who will simply not be able to tell, even if they understand the difference, and the vast majority are not likely to understand the issues even. And as for actually verifying, even I can't actually tell - I trust you, that's all, I can't actually see that the user-supplied seed is having any effect, I can't actually see that you're using the random class the right way. I trust that you are, same as everyone else.

The point about users not appreciating it - I'm afraid that applies to this whole topic. I haven't noticed anybody else join in aside from Shari. So I'm in a pretty weak position with this concern. I do realize that.

Edward
Edward Faith
Registered User
Registered User
Posts: 16
Joined: Sat 22 Sep, 2007 5:49 am

Re: Shuffle algorithm?

Post by Edward Faith »

Richard,

May I suggest the following hybrid: the program collects mouse data in the way you suggest. Do not display prominently what it is doing but allow the user to find out if he's curious. Then, when the user performs the shuffle, use the mouse data until it is used up, and when it is used up switch to the pseudo-RNG based on tick count. This way, users who don't know and/or don't care about the difference are completely unaffected. The change is transparent to them. Meanwhile, users who care about "true randomness" versus pseudo-randomness can check at any time to see how much true random data has accumulated.

As with my previous suggestions, this is only meant to illustrate a general idea - the idea of having a true random system with a pseudo-random backup for when the true randomness runs out, and with some unobtrusive way for the user to find out - if he cares - how truly random his next shuffle is going to be.

One additional point, speaking to your suggestion about the user-inputed numerical string, is that a properly implemented Knuth shuffle would greatly cut down on the length of the numerical string that would need to be provided. It wouldn't need to be longer than 300 characters, I think, and that's supplying a generous buffer for the numeral-pairs that are bigger than 77.

Edward
Edward Faith
Registered User
Registered User
Posts: 16
Joined: Sat 22 Sep, 2007 5:49 am

Re: Shuffle algorithm?

Post by Edward Faith »

Richard,

Oops, I misinterpreted your comment. You wrote:

"What it makes me think, in fact, is that your original assertion that a human shuffler, in six shuffles, can produce a thoroughly randomised deck can't be right."

I misinterpreted what you were talking about. I thought you were talking about a human using Orphalese. Now it occurs to me you were talking about a human using a deck of playing cards. The source of my confusion is that the number I gave both times was 6. That comment of mine about a physical riffle of a physical deck of cards, that I made early on, was about an ordinary playing deck (and my thoughtless extension of it to the 78-card deck was corrected by Shari).

The number of permutations of a 52-card playing deck is 52! (52 factorial), which is

52 ! = 8.06581752 × 10^67

Which is lower than the number of atoms in the universe, but still enormous.

Also, it was not really my own assertion but something I said that statisticians claimed. That's significant because since I didn't actually originate the claim, I'm not really in a position to explain and defend the claim.

Still, it's an interesting claim. From what I understand, it's the product of empirical study rather than theoretical analysis. We can try to imagine how it might be that six riffle shuffles might thoroughly randomize a 52-card deck. I've looked at some of my own riffle shuffles, and my idealized image of a perfectly alternating riffle (Right Left Right Left Right Left) is wrong. It's more like RLLRLRRRLRLLRRLLR - that is, a rather messy alternation. At a very rough guesstimate, I would say that the number of possibilities is a significant fraction of 52 choose 26. I'm not going to explain why because that might take a while to justify. According to Google:

52 choose 26 = 4.95918533 × 10^14

If you riffle the deck six times, then a rough guesstimate of the number of possible outcomes is the above value to the power of 6. That is:

(52 choose 26)^6

which is (Google again)

(52 choose 26)^6 = 1.48751732 × 10^88

The actual value is probably somewhat smaller than this. In fact it definitely is, because there is a strict upper limit on the number of possible outcomes, namely 52!. Nevertheless, this napkin calculation at least helps to show how it might be that a mere six riffle shuffles of the deck might have the power to produce any possible permutation of the 52-card deck at all. I'm sure that any serious statistician would laugh this simple calculation out of the classroom, but I often find this sort of back-of-the-envelope guesstimate helpful.

Edward
User avatar
Programmer
Major Contributor
Major Contributor
Posts: 1725
Joined: Sat 01 Jan, 2005 12:00 am
Location: Spain
Contact:

Re: Shuffle algorithm?

Post by Programmer »

Hi Edward,

Well, I am still very doubtful about that... If you look at the physics of shuffling the deck in a very mechanical, modernistic way, in which the universe is seen as a giant machine, there can only really be one outcome from a human shuffling a deck. Supposing you have perfect, godlike knowledge of the exact location and velocity of every atom in the dealers brain and hands, and those in the cards, you would be able to see exactly what the outcome of the shuffle would be, even before it happened. To an omniscient being the human would be no more than a P-RNG.

Of course, nowadays we know about quantum mechanics, the uncertainty principle, the butterfly effect and so on (even if we don't understand them). But I wonder how much butterfly effect there can be between a dealer starting his six shuffles, and him finishing them a minute later. I think it would only punch a thin torch beam into that great sea of possibilities.

Anyway, that isn't really our problem here, even though it is a very interesting one. I am going to keep thinking about the best way to incorporate some of this into the program, bearing in mind that there are quite a few changes already lined up for the next beta - we might have to go for a partial solution this time round and a fuller one further down the line. I have friends staying this weekend so I won't be able to get near the computer very much, but I am sure I will still be pondering this from time to time.

BTW, I am British since you asked. There are probably a few clues in my spellings if you look for them (see, they are not all mistakes!). We used to regard a billion as a million million. That is how my dictionary defined it when I went to school, but these days the idea of a billion as a thousand million has all but won out, probably because that is the way the Bank of England defines it.

Cheers,
Richard
Edward Faith
Registered User
Registered User
Posts: 16
Joined: Sat 22 Sep, 2007 5:49 am

Re: Shuffle algorithm?

Post by Edward Faith »

Richard,
Well, I am still very doubtful about that... If you look at the physics of shuffling the deck in a very mechanical, modernistic way, in which the universe is seen as a giant machine, there can only really be one outcome from a human shuffling a deck. Supposing you have perfect, godlike knowledge of the exact location and velocity of every atom in the dealers brain and hands, and those in the cards, you would be able to see exactly what the outcome of the shuffle would be, even before it happened. To an omniscient being the human would be no more than a P-RNG.
The possibility of Laplace's demon has no bearing on what I am trying to say. I'll try to explain.

I do not usually define randomness in terms of non-determinism. I define it in different ways, but physical non-determinism is not one of those ways (unless I am specifically talking about quantum mechanics, which I hardly ever do). One of my favorite ways to define randomness is in terms of Kolmogorov complexity. Kolmogorov complexity is not a question of how something was produced (whether deterministically or non-deterministically) but of intrinsic qualities of it. Kolmogorov complexity is related to compressibility (in the sense of zipping a file). So from the point of view of Kolmogorov complexity, the possibility of determinism and Laplace's demon has no bearing on whether something is or is not random.

But, while Kolmogorov randomness does lurk in the background, here I did something more modest. I simply defined my terms early on. I wrote:
While the pseudo-random sequence is not what I would consider "random" for my purpose, the tick count is legitimately "random" for my purposes, because it depends on the user's own timing, which is as "random" as a coin flip or riffling the cards, both of which depend on the user's own randomness. (I'm not going to go into whether the user is random - I'll take it as a given that he or she is.)
I've defined my terms here. The quotation marks are intended to signal that I am defining terms. Randomness is nothing more or less than a measure of how much of the final selection of the tarot deck permutation originates outside the computer program - most importantly, in the human user, but possibly in something else that the human delegates to (e.g., atmospheric noise data).

For example, if a pseudo-random number generator does not use user input at all (e.g. not even the timing of the mouse click), then the resulting shuffle is not random at all, because the human is unable to do anything that would have the effect of selecting the final deck. The complete lack of randomness is nothing other than the complete lack of user input.

If, however, the precise timing of the mouse click creates a seed, then that seed constitutes human input. How much input it constitutes depends on how many possible values it can take on. For example, if there are 600 million ticks in a minute, and if the human clicks the mouse at some time within a minute, then the human has selected one seed out of 600 million possible seeds. And since this seed is used to select the final deck, he has selected one out of up to 600 million possible final decks.

And so on.

What I am describing is compatible with the possibility of determinism and Laplace's demon, because what I am talking about is independent of the question of whether humans are deterministic mechanisms or not. My concern is to allow the human to select the final permutation of the deck from among all possible permutations. It is outside the scope of my concern whether the human, himself, is an automaton.

Anyway, I hope that clarifies where I'm coming from.

Edward
Post Reply