User blog:Belthazar451/How To Get A Ghazt; Or, So How Does This Binomial Probability Thing Work, Anyway?

So, the probability of getting a Ghazt has been tentatively established as being 1% per attempt. Each attempt is an independant chance - so it's 1% chance on the first attempt, and 1% on the second, the tenth, the hundredth, et cetera. "So," people ask. "How many times do I need to try before I can get one?".

Welcome to binomial probability.

Binomial Probability?
Binomial probability is the way to calculate the odds of a certain number of successes out of some other number of attempts in some situation where it's simply an either/or succeed-or-fail situations. For example, if we were flipping a coin, heads could be success and tails would be fail. If we wanted to roll a single die and count the number of sixes we get, success would be "six" and fail woulf be "not a six". The question we're asking here is (for example) "If we roll the die a hundred times, what are the odds of getting exactly sixteen sixes?" (If you're wondering, it's 10.65%.)

Mathematically speaking, the probability of success exactly k times out of n attempts when the probability is p is:

P(k) = nCk pk(1-p)n-k

Where nCk=n!/[k!(n-k)!] is what's called the binomial coefficient - it's where binomial probability gets its name. n!, incidentally, is read as "n-factorial" and is equal to the product of all the numbers from 1 to n - that is, 1*2*3*...*(n-2)*(n-1)*n. Most calculators have a factorial button on them somewhere. A few have nCk buttons too, but that's less common. If you want to do it in Excel, the function is COMBIN.

For my example above, rolling a die a hundred times and getting exactly sixteen sixes, we can substitute what we know - n = 100, k = 16, p = 1/6, which gives us

P(6) = 100C6 (1/6)16(5/6)100-16 = 1.345*1018 (1/6)16(5/6)84 = 0.1065

You can kinda see here that when n starts getting large, nCk starts getting ginormous - 69! is the largest factorial that most hand-held calculators can handle - so it's not really a good idea to work it out by hand.

The binomial probability distribution function (for n=100 and p=0.5) looks like this. The axis is a plot of the probability (on the y-axis) of getting exactly a specific number of successful attempts (on the x-axis). You can see the curve is symmetrical, with the mean (and mode) at x = np (which in this case is 100*0.5 = 50). If you add up all of the values from 0 to 100, you'll always find the grand total is 1 (because you're guaranteed to have some number of successes out of some number of attempts).



... Binomial Coefficients?
This section's kinda complex, so skip it if you think it's getting confusing.

Specifically, nCkis the coefficient of the kth term of the binomial expansion (x+y)n. In general,

(x+y)n = nC0xn + nC1xn-1y1 + nC2xn-2y2 + ... + nCk-1xn-(k-1)yk-1 + nCkxn-kyk + nCk+1xn-(k+1)yk+1 + ... + nCn-2x2yn-2 + nCn-1x1yn-2 + nCnxyn

Note that it's symmetrical, the same as the probability density function itself, so nC0 = nCn and nC1 = nCn-1, et cetera. For example, for n=4:

(x+y)4 = x4 + 4x3y + 6x2y2 + 4xy3 + y4

So,


 * 4C0 = 1
 * 4C1 = 4
 * 4C2 = 6
 * 4C3 = 4
 * 4C4 = 1

The binomial coefficients are also the values of the nth row of Pascal's Triangle. You can already see here that the binomial probability above is a specific application of this binomial expansion where x = 1-p and y = p - if you take the kth term in the full expansion above nCkxn-kyk and substitute for x and y, you get nCk(1-p)n-kpk, which is the same as what I wrote in the first section above.

But what does this have to do with the Ghazt?
So basically the question is "how many times do I need to try breeding a Ghazt before probability dictates that I'm more likely to get one than not?" We could calculate the probability of one success, then two, then three, and so on, but that would take us all day. If not all week. Fortunately, there's an easier way. Remember how I said the sum of all of the values added together was 1? Then if we subtract the probability of some specific number of successes from 1, then we get total probability of every possible result except for the one we subtracted.

So, since we don't care about how many times we succeed (so long as it's not zero), what we want to do is subtract the probability of getting exactly zero, which will give us the probability of everything that's not-zero.

So, mathematically, what we want to find is P(not k) = 1 - nCk pk(1-p)n-k where k = 0. That is, P (not 0) = 1 - nC0 p0(1-p)n-0. But nC0 = 1 and p0 = 1, so our probability of not getting zero successes in n attempts collapses to

P(not 0) = 1 - (1-p)n

Much simpler, yeah?

I guess... So?
So, the first obvious implication is that so long as p<1 then P(not 0) will never be 1 ( = 100%), regardless of the value of n, so no number of attempts will ever guarantee us a Ghazt (though if p = 1, you'd get a Ghazt on the first attempt anyway). In this case, p = 0.01 ( = 1%). So what can we do instead, then? We can use this equation to calculate the probability of success in n attempts, or we can rearrange to find the required number of attempts to give us a specific probability. That's... kind of a complex calculation (if you're wondering, it's n = log(1-p)[1-P(not 0)] ), so another option is to just plug numbers into Excel until you get what you're after. It's a bit brute-force, but it works.

Here's one I prepared earlier. This is the graph of the probability of at least one success for 1 to 250 attempts. You can see it's getting gradually closer and closer to 1, but never quite gets there.



Basically, after 70 attempts, the probability of at least one success from those attempts is 0.5 (= 50%). By 140 attempts, it's risen to 0.75 (= 75%). It crosses 0.9 (= 90%) at 230 attempts. As an interesting side note, for any binomial probability, when n = 100(1-p), then P(not 0) = 1-1/e ( = roughly 63%). It's a basic principle of logarithms.

Ok, I've made sixty-nine attempts and still don't have a Ghazt
Does that mean I've got a fifty-fifty chance from now on?

Well, no. Each attempt is still an independent chance, so it's still 1% for the next one, and 1% for the one after. You're not owed a Ghazt by the game - the idea that you are is called the Gambler's Fallacy. In fact, this whole thing is fairly academic, and not terribly helpful when you're in the middle of those attempts. This won't tell you when you'll succeed - all it tells you is that you can make a reasonable expectation to get it with seventy attempts (basically, exactly as reasonably as you can toss a coin and expect to get a head). However, if you've already made those seventy attempts, they still don't give you a leg-up for the next one - you've still got a fifty-fifty chance of getting at least one at some point in the next seventy attempts. Failing for the first sixty attempts doesn't somehow concentrate that fifty-fifty into the last ten attempts - it's still 1% per attempt.

Sooo.... how do I get a Ghazt?
Easy: If at first you don't succeed, try, try again. And again. And again. =P