## Thursday, January 5, 2012

### Information Highway: Hypergeometric Distribution

From a theoretical point of view one of the most interesting question in VtES is about card drawing probabilities. That is, how probable is it (for example) to draw one of our 2 Information Highways in our opening hand of 7 cards (and deck of 75 cards). The thing you need to answer this type of question is called a hypergeometric distribution (function).

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of (exactly) k successes in n draws from a finite population (of size N) without replacement.(1)

Above you see the mathematical formula for the hypergeometric distribution (i.e. the probability mass function), where
• N is the population size
• m is the number of potential successes in the population
• n is the number of draws
• k is the number of successes
You can see that this generalization is easily adapted to an (trading) card game:
• N is the size of deck
• m is number of cards I want to draw
• n is the number of draws (the number of cards of our opening hand)
• k is the number of cards I want to draw
If we continue the example from above, then N=75 (the decksize), m=2 (the number of Information Highways in our deck), n=7 (the number of draws = our initial handsize), and k=1 ( the number of cards we want to draw initially).

Oh wait, that last number for k is not entirely right! Actually we do not want to calculate the probability of having exactly one card in our opening hand (X=1), but having at least 1 card in our opening hand (X=1). Now we have two ways of solving this:
1. Add the two probabilities for X=1 and X=2 (see generalized formula below), or

2. Take 1 and then subtract the probability for X=0 from it.
Actually the later for is the probability of not having exactly zero copies of the card in our opening hand (therefore the term "1 minus ..."). What is the result of the example now?
So there's a 17.9% chance that we have at least one Information Highway in our opening hand (with 2 copies in out 75 card deck). Now the next question would be how does our chances change for drawing at least one copy of the card, if either change the decksize or if we change the number of copies (of Information Highways) in our deck.

As you can see above the calculations become tedious fast, if you want to calculate the resulting percentages over and over again for different parameters (different values of k, n, M or N). Luckily modern spreadsheet programs(2) have built-in functions for these type of calculations:
• In Microsoft Excel you can use the function "hypgeomdist" for calculation a certain probability. In general the looks like this, when you enter it:
= HYPGEOMDIST(successes_in_sample, sample_size, number_of_successes, population; culmulative)(3)
= HYPGEOMDIST(k; n; M; N; TRUE|FALSE)
When entering the parameters from our example above, it would look like this:
= 1-HYPGEOMDIST(0; 7; 2; 75; FALSE)
• A similar function (with the same name) exists in Apache OpenOffice:
= HYPGEOMDIST(k; n; M; N)
When entering the parameters from our example above, it would look like this:
= 1-HYPGEOMDIST(0; 7; 2; 75)
The chart below (using the Excel functions as shown above) shows the probability of drawing at least one copy of the two copies in a deck of x cards within the first 7 cards drawn.
As you can see it makes a huge difference how large your deck is. For the 60 card deck the probability is as high as 22,15%, and for the 90 card deck as low as 15,03%. This is one of reasons for playing smaller sized decks, especially if you're playing a number of cards that you only have one or two copies of in your deck.

What happens if now vary the number of copies of the card we want to have in our opening hand (in a deck of 75 cards).
As you can see again there's a dramatic change of the probabilities we're having when we increase the number of copies in our deck. I am not postulating to add seven Information Highway to your deck, but if you want to have a 50% chance of having some sort of crypt acceleration in your crypt you could something like 2 Information Highway, 2 Dreams of the Sphinx and 3 Zillah's Valley to your deck (for example).

Of course I'm aware, that all of you know, that increasing the number of copies in your deck or decreasing your decksize, increases the chances of drawing a particular card, but I wanted to give you exact numbers instead of an educated guess.

Varying decksize and the number of cards in the deck, the final chart looks like this:
Next time: more fun with multivariate hypergeometric distribution!

References:
Footnotes:
(1) cf. the binomial distribution, which describes the probability of k successes in n draws with replacement.
(2) In Perl you can use the module Math-GSL-0.26 (Math::GSL::CDF).
(3) Only Excel 2010 (or higher) have the 5th parameter (cumulative).

Anonymous said...

I like statitics alot. Thanks!

Anonymous said...