Calculating how many iterations are needed to find a big upset? (Formative Studies)

Coordinator
Dec 21, 2012 at 1:30 AM

How many iterations do you need before you find a big upset winner?

The question of how many iterations you need before you can spot a big upset winner of a Swiss tournament is answered by statistics called formative studies.  Formative studies answer the question:  how big should your sample size be before you find a single instance of the event under consideration? 


The formula where you want to be XYZ% sure you find the correct sample size (or iterations in our case) for finding at least one event (or big upset winner in our case) is given by:
P (x >= 1) = 1 – (1-p)^N

where p is the probability of an event (e.g., the probability of tossing a coin and getting heads) n is the number of opportunities for the event to occur (e.g., the number of coin tosses), and P(x ≥1) is the probability of the event occurring at least once in N tries (or iterations), and ^N is a quantity raised to the Nth power

Solving for “N” gives
N = ln (1 - P (x >= 1)) / ln (1-p) 
where ln = natural logarithm. 

Conventionally, we choose P(x ≥1) to be 80% (we want to be 80% sure of seeing at least one event), so the formulae for N, the number of tries (or iterations) necessary to give 80% chance of seeing an event, say a big upset winner is:


N = ln (0.2) / ln (1-p)   = -1.61 / ln(1-p)      {“EQUATION ‘L’”}

We can use this formula in actual examples to illustrate.

Example 1:  Suppose you have a fair coin that flips heads or tails.  How many times should you flip the coin before you are 80% sure that you’ll get at least three heads out of four tries? 

Answer:   since coin flips are independent events, we can use a formula for summing the probability of getting at least three heads out of four flips as the probability of getting three heads out of four tries plus the probability of getting four heads out of four tries, with the probability of getting heads or tails as 50% = 0.50, so:  where P=probability: P(>= 3 H | 4 tries) = P(3 heads (H) out of 4 tries) + P(4 H out of 4 tries) = where p=Prob of heads = 0.50 = 1-p=Prob of tails; n=number of tries, ^=’raised to the power’ hence 2^3 =8, so probability of getting at least three heads out of four tries:


The probability of obtaining k heads out of n flips is given by the binomial formula
P(X = k) = n!/[k!(n-k)!] p^k (1 - p)^(n-k)   {“EQUATION ‘M’”}


= (4)*[(0.5)^3*(0.5)] + (1)*(0.5)^4 = 0.25 + 0.0625 = 0.3125 = 31.25%


So using “EQUATION ‘L’” we get:
N = -1.61/ln(1-0.3125) = 4.295 = 5 times (rounding to the next integer).  So we must do five iterations of flipping the coin four times in a row before we see (with 80% probability) at least one occurrence of a coin toss having at least three heads out of four flips.

Example 2:  same as example 1 but for at least 9 heads out of 10 coin flips.

Answer: from the previous example, using  “EQUATION M”:
 (10)*[(0.5)^9*(0.5)] + (0.5)^10
= 0.0107421875 = 1.07%, so using “EQUATION ‘L’” we get:
N = -1.61/ln(1-0.01074) = 149.0178709  =  150 times (rounding to the next integer).  So we must do 150 iterations of flipping the coin ten times in a row before we see (with 80% probability) at least one occurrence of a coin toss having at least nine heads out of ten flips.

Example 3:  How many iterations should you run Swiss Chess Tournament Winner Prediction Simulator (SCTWPS) in order to get at least one big upset winner having an Elo that is 200 Elo points lower than the average, which has a standard deviation of 100 Elo points, when 10 games are played in the tournament, and a large number of players (30 to 100) are involved?

Answer:  first you run the SCTWPS simulation with the following parameters to get a feel for the statistics:   The population mean is: 2000.0, and std Dev: 100.00, The Player Elo is: 1800, The number games played: 10, The number iterations: 10000.  Run it with a good number of players, say 30 to 100, to get a nice distribution.  Observe in the SCTWPS.txt output file for the winning players that the average opposition is between 2000 and 2100 Elo for the winners, and to win the tournament you need a total score of between 8 and 10 points, or at least 8 wins or more should win the tournament. 
Observing the output in SCTWPS.txt for the average opposition shows about 2050 Elo is the opposition, while you can deduce that to beat the opposition, if you are rated 1800, will require an Elo somewhere greater than this (the Performance Elo stats are not reliable for figuring what the Elo that should be used however).  Pick an Elo of say 350 points greater, so 1800 Elo + 350 = 2150 Elo.  From the Table 1 below, which is from the source code for SCTWPS (class3.cs), an 350 Elo gives about 11% to 12% chance of success for the weaker player (from the table (351 Elo = 11% to 12% chance of winning), here the target 1800 Elo player. 
If you choose 12% and use “Equation M”, with the following values:  over 10 games, you must win 8, 9 or 10 games, with p= 0.12 (12%)
k=8, n=10  gives 1.4984E-06 for P(X = 8)
k=9, n=10 gives 4.54061E-08 for P(X=9)
k=10, n=10 gives 6.19174E-10 for P(X=10)
Summing up these events, as is proper since they don’t depend on each other, gives the total odds of such a weak player, rated 1800, of winning with an Elo of 350 points higher, or 2150 Elo, as: 1.54443E-06
Substituting this number into “Equation L” gives: 1,042,095 iterations are needed before we have 80% probability of seeing at least one win by the 1800 Elo target player. So for these parameters, to see any big upsets  you should run the SCTWPS program for 1M iterations, which is the maximum iterations allowed.

When you are doing formative studies and looking for big upsets the Equations L and M are very sensitive to inputs.  This can only be found out from trial and error.  For example, picking 11% as the input for Equation M gives a total probability for K=8,9 and 10 of:  7.85317E-07 and in Equation L: 2,049,412 iterations needed to get at least one ‘hit’ with 80% probability.
Hence, this gives the user a clue that to find ‘big upsets’ involving a target player having Elo of 1800 that is 200 Elo points lower than the mean of 2000 Elo, having a standard deviation of 100 Elo points, and with 10 games in the Swiss tournament, where to win you need from between 8 to 10 points, you need to run the iterations at least 1M times, which is the limit of the SCTWPS program.  Hence, instead of choosing 10000 iterations, choose 1M iterations, and, about 16 minutes later on an Intel Core 2 Duo chip, the output indeed should show, a good portion of the time, that the target 1800 Elo player does win.  And indeed, one such listing where 100 players played is shown here:

Subject Player of Elo 1800 won 4 times out of 1000000 iterations, a total of 0.0% percent
Winning Players within 50 Elo of Subject player (1750 to 1850) won 41 times out of 1000000 iterations, a total of 0.0%

From the above example it should be clear that a severe underdog, having an Elo of say three sigma below the mean, would almost never be shown to win any Swiss tournament having a sizable number of games (say 10 games, as in the above example) given that the iterations of the SCTWPS program stops at 1 million.  You would need many more iterations to find such a big upset.  For example using an Excel spreadsheet the following iterations are needed for the same sample as above but with an Elo of 1700 rather than 1800, so 400 Elo = 8.5% is the difference:  total probability (Eq. M): 1.048E-07 and, iterations from Eq.L: 15,357,213 (15 M).  Hence you would have to run the program 15 times at the maximum number of iterations allowed, 1M, just to have an 80% chance of seeing at least one such big upset.  In fact, the program does not allow Elos to be entered that are less than 3 sigma below the mean for this reason. 


However, if the tournament was short (few games played, say four games), and the standard deviation is big enough, you can simulate the chances of an upset, which increase since the number of games that need to be won is much smaller, and thus a weak player can have a run of ‘good luck’ and string together enough upset wins to win the entire tournament.  Thus, for n=6 games in the tournament, an average Elo of 2000, a standard deviation of 100 points, a target player Elo of 1700, 30 players, and 1M iterations we get running the program this output:

Subject Player of Elo 1700 won 52 times out of 1000000 iterations, a total of 0.0% percent
Winning Players within 50 Elo of Subject player (1650 to 1750) won 52 times out of 1000000 iterations, a total of 0.0%
Winning Players having an Elo below: 1800, below 2 sigmas of mean, won 758 out of 1000000 iterations, a total of: 0.1%

Conclusion:  to find big upsets, pick a tournament with a small number of games.

Jay

TABLE 1:
    if (pW < 0.01) { rd = 5555; return rd; }
            if (pW >= 0.01 && pW < 0.02) { rd = 677; return rd; }
            if (pW >= 0.02 && pW < 0.03) { rd = 589; return rd; }
            if (pW >= 0.03 && pW < 0.04) { rd = 538; return rd; }
            if (pW >= 0.04 && pW < 0.05) { rd = 501; return rd; }
            if (pW >= 0.05 && pW < 0.06) { rd = 470; return rd; }
            if (pW >= 0.06 && pW < 0.07) { rd = 444; return rd; }
            if (pW >= 0.07 && pW < 0.08) { rd = 422; return rd; }
            if (pW >= 0.08 && pW < 0.09) { rd = 401; return rd; }
            if (pW >= 0.09 && pW < 0.10) { rd = 383; return rd; }
            if (pW >= 0.10 && pW < 0.11) { rd = 366; return rd; }
            if (pW >= 0.11 && pW < 0.12) { rd = 351; return rd; }
            if (pW >= 0.12 && pW < 0.13) { rd = 336; return rd; }
            if (pW >= 0.13 && pW < 0.14) { rd = 322; return rd; }
            if (pW >= 0.14 && pW < 0.15) { rd = 309; return rd; }
            if (pW >= 0.15 && pW < 0.16) { rd = 296; return rd; }
            if (pW >= 0.16 && pW < 0.17) { rd = 284; return rd; }
            if (pW >= 0.17 && pW < 0.18) { rd = 273; return rd; }
            if (pW >= 0.18 && pW < 0.19) { rd = 262; return rd; }
            if (pW >= 0.19 && pW < 0.20) { rd = 251; return rd; }
            if (pW >= 0.20 && pW < 0.21) { rd = 240; return rd; }
            if (pW >= 0.21 && pW < 0.22) { rd = 230; return rd; }
            if (pW >= 0.22 && pW < 0.23) { rd = 220; return rd; }
            if (pW >= 0.23 && pW < 0.24) { rd = 211; return rd; }
            if (pW >= 0.24 && pW < 0.25) { rd = 202; return rd; }
            if (pW >= 0.25 && pW < 0.26) { rd = 193; return rd; }
            if (pW >= 0.26 && pW < 0.27) { rd = 184; return rd; }
            if (pW >= 0.27 && pW < 0.28) { rd = 175; return rd; }
            if (pW >= 0.28 && pW < 0.29) { rd = 166; return rd; }
            if (pW >= 0.29 && pW < 0.30) { rd = 158; return rd; }
            if (pW >= 0.30 && pW < 0.31) { rd = 149; return rd; }
            if (pW >= 0.31 && pW < 0.32) { rd = 141; return rd; }
            if (pW >= 0.32 && pW < 0.33) { rd = 133; return rd; }
            if (pW >= 0.33 && pW < 0.34) { rd = 125; return rd; }
            if (pW >= 0.34 && pW < 0.35) { rd = 117; return rd; }
            if (pW >= 0.35 && pW < 0.36) { rd = 110; return rd; }
            if (pW >= 0.36 && pW < 0.37) { rd = 102; return rd; }
            if (pW >= 0.37 && pW < 0.38) { rd = 95; return rd; }
            if (pW >= 0.38 && pW < 0.39) { rd = 87; return rd; }
            if (pW >= 0.39 && pW < 0.40) { rd = 80; return rd; }
            if (pW >= 0.40 && pW < 0.41) { rd = 72; return rd; }
            if (pW >= 0.41 && pW < 0.42) { rd = 65; return rd; }
            if (pW >= 0.42 && pW < 0.43) { rd = 57; return rd; }
            if (pW >= 0.43 && pW < 0.44) { rd = 50; return rd; }
            if (pW >= 0.44 && pW < 0.45) { rd = 43; return rd; }
            if (pW >= 0.45 && pW < 0.46) { rd = 36; return rd; }
            if (pW >= 0.46 && pW < 0.47) { rd = 29; return rd; }
            if (pW >= 0.47 && pW < 0.48) { rd = 21; return rd; }
            if (pW >= 0.48 && pW < 0.49) { rd = 14; return rd; }
            if (pW >= 0.49 && pW <= 0.50) { rd = 5; return rd; }