CodePlexProject Hosting for Open Source Software

This program estimates the probability of a given chess player winning a Swiss-style chess tournament.

The user should be familiar with statistics such as the mean and standard deviation.

The program requires from the user the following inputs:

--The mean Elo and the standard deviation of all the chess players in the tournament.

--The Player Elo = the Elo of the chosen chess player under consideration to win the tournament.

--The total number of chess players in the tournament

--The number of games to be played by each player

--The number of iterations that the chess simulator will run.

The program takes from a few seconds to several hours to run, depending on the parameters chosen (bigger nos. = longer time). The user can abort the program anytime, by closing the window. A progress bar shows percent completion.

The program outputs its main results to the screen, and optionally more data to a small text file (limited to typically ~130 kb). The results tell how often the chosen chess player wins first place, and how the other players fare.

The program makes no changes to a user's machine, nevertheless, the user should take care to download this program from a reputable source, such as CodePlex.com (here), and run it through a virus checker and/or in a sandbox environment.

SCTWPS PROGRAM DETAILS

The program uses mean and sigma to form a Gaussian distribution (bell-shaped curve) of players. The players are ranked so the strongest player plays the weakest player in round 1, the second strongest plays the second weakest player, and so on. However, after the first round, players nearest in strength play each other. There is no provision against playing the same opponent twice in a row, nor any provision is made for colors or for draws (both ignored, and in the long run this should not matter).

Tie break is used to sort players from strongest to weakest according to the following tie breaks: first, winning points; second, winning points won by a player's opponent (Solkoff tiebreak); and third, initial Elo rating. This system should give an accurate picture over many iterations.

Program output is on the console screen and optionally the user may elect more information be placed in a small text file. The text file is limited to about 130 to 300 kb in size.

Output is given as the % wins by: (1) the chosen player (CP), (2) all players within 50 pts of the CP, (3) all players having an Elo two sigma or greater from the mean, (4) 1-2 sigma above the mean (5) +/- 1 sigma from the mean, (6) 1-2 sigma below the mean, and, (7) 2 sigma or greater below the mean.

Finally, the text file logs the first 10 upset wins (2+ sigma below the mean) and the first 1000 tournament wins of all winners.

During setup, prior to the program being run, default values are given for population mean, standard deviation, number of players, number of games per player and program iterations. These default values may be chosen by hitting the Enter key. The larger the number of iterations, the more accurate is the prediction by the program.

The user should be familiar with terms such as the mean and the standard deviation, which are found in any book on statistics. Arpad E. Elo, the inventor of the Elo system, states in the book “The Rating of Chessplayers Past & Present” (2nd. Ed., 1978) that the distribution of ratings on the 1983 USCF Rating List is, for all players, a mean of 1505 and a standard deviation (sigma) of 335, while for “established players” the mean is 1649 and the sigma is 288. Picking a large mean will make it harder for a weak player to win the tournament. Picking a large number of games to play in the tournament (up to 30 games are allowed) will also make it nearly impossible for a severe underdog to win the tournament. The best chances for an upset in a Swiss tournament is to have a small number of games and a narrow range of player strengths (sigma).

The program should typically take a few seconds to run on a modern multi-core personal computer, but if the user inputs a large number of players (up to 500 players are allowed) and a large number of games per tournament (up to 30 games are allowed) and a large number of program iterations (up to 1 million are allowed), the program could take several hours. On an Intel Core 2 Duo the maximum number of parameters allowed took the program six hours to complete the simulation. Incidentally, due to the fact that only 1M iterations are allowed, the chances of finding an upset winner given a standard deviation of say 200 Elo points and 30 games per tournament is very small. In order to find such a winner reliably, many more iterations than 1M must be performed. In the event any large upsets occur (greater than 2 sigma), for ten or more players, the program logs the output in the text file.

The data input by the user is checked to see if it is within legal boundaries, and if the data input would result in an unusually long time to process, the user is warned. The user may abort the program by simply closing the window of the program. A progress bar shows percent completion of the program in deciles.

Sample output from the default values is shown below (abbreviated) for ten players with a chosen player having an Elo of 1800 and 8 games per tournament, with 1000 iterations of the tournament run.

START SAMPLE OUTPUT

Log, file created at time:11/14/2012 23:50:46

---

The population mean is: 2000.0, and std Dev: 100.00

The Player Elo is: 1800

The number games played: 8

The number iterations: 1000

Player ID: 0, has rating: 1800.0

Player ID: 1, has rating: 1934.0

Player ID: 2, has rating: 1937.0

Player ID: 3, has rating: 1942.0

Player ID: 4, has rating: 2021.0

Player ID: 5, has rating: 2037.0

Player ID: 6, has rating: 2060.0

Player ID: 7, has rating: 2077.0

Player ID: 8, has rating: 2106.0

Player ID: 9, has rating: 2163.0

Subject Player of Elo 1800 won 1 times out of 1000 iterations, a total of 0.1% percent

Winning Players within 50 Elo of Subject player (1750 to 1850) won 1 times out of 1000 iterations, a total of 0.1%

Winning Players having an Elo:2200 or greater, two sigma or greater from the mean, won 0 times out of 1000 iterations, a total of 0.0%

Winning Players having an Elo between:2100 and 2200, 1-2 sigma above mean, won 563 times out of 1000 iterations, a total of 56.3%

Winning Players having an Elo of 1900 to 2100, within 1 sigma of the mean:2000.0, won 436 out of 1000 iterations, a total of: 43.6%

Winning Players having an Elo of 1800 to 1900, between 1 to 2 sigmas below mean (2000), won 1 out of 1000 iterations, a total of: 0.1%

Winning Players having an Elo below: 1800, below 2 sigmas of mean, won 0 out of 1000 iterations, a total of: 0.0%

.................................

after 8 rounds per iteration, and 1000 iterations

First 1000 winners displayed (max.)

Winning Player Stats (ID:9, Rating:2163, Round:0, Total Score:7, Avg. Elo Opponent:1950.0,Performance Elo:2286.0)

Winning Player Stats (ID:9, Rating:2163, Round:1, Total Score:6, Avg. Elo Opponent:2028.5,Performance Elo:2221.5)

… {deletions}...

Winning Player Stats (ID:8, Rating:2106, Round:385, Total Score:6, Avg. Elo Opponent:2048.6,Performance Elo:2241.6)

Winning Player Stats (ID:0, Rating:1800, Round:386, Total Score:6, Avg. Elo Opponent:2102.8,Performance Elo:2295.8)

The winning player was also the chosen player and has the following stats for the round 386

OpponentID, Rating, W/L result (W=Win; L=Loss): 9,2163.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 9,2163.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 8,2106.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 2,1937.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 4,2021.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 8,2106.0,L

OpponentID, Rating, W/L result (W=Win; L=Loss): 9,2163.0,L

OpponentID, Rating, W/L result (W=Win; L=Loss): 9,2163.0,W

...end chosen player data...

Winning Player Stats (ID:4, Rating:2021, Round:387, Total Score:6, Avg. Elo Opponent:1975.9,Performance Elo:2168.9)

...{deletions}...

Winning Player Stats (ID:7, Rating:2077, Round:998, Total Score:6, Avg. Elo Opponent:2077.9,Performance Elo:2270.9)

Winning Player Stats (ID:1, Rating:1934, Round:999, Total Score:6, Avg. Elo Opponent:2038.3,Performance Elo:2231.3)

Program elapsed time: 00:00:00.6094112

---

END OF SAMPLE OUTPUT

For the chosen player (here having an Elo of 1800) the percent first place win statistics are shown, and for all players plus or minus 50 Elo points from the chosen player's Elo (here 1750 to 1850). Given the low rating of the chosen player, and the small number of iterations (1000), there was only one first place win. To give a more accurate measure of the probability of winning for such a low rated player, the number of iterations should be increased, up to the maximum of one million iterations.

Winning percent statistics for all players who won at the following intervals are shown: more than two standard deviations (sigma) above the mean, within 1-2 sigma above the mean, plus or minus one sigma from the mean, within 1-2 sigma below the mean, and more than two sigma below the mean (these are big upsets).

Since there were no players having an Elo greater than 2 sigma above the mean, no such players won. Increasing the number of players from 10 to say 100 would give a few such players. Having a few such highly rated players will result in them winning first place in a significant percent of the iterations.

For the first 1000 iteration simulation winners, the statistics recorded are: winning player ID, rating, round number the win occurred in, total score (Wins = 1 point, Losses = 0, draws are not allowed since statistically they do not matter in the long run, on average), average opponent Elo faced by the winner, and the performance Elo by the winner.

Notice that no 'big upset winners' with an Elo of less than two sigma from the mean were found above. This is because the no such players existed. In general, if the number of iterations is too small, the program will not reliably capture such outliers. To find big upsets, you must decrease the number of games played in the tournament, and/or decrease the standard deviation of the population, and/or increase the number of iterations for the simulation (up to a maximum of 1M iterations). Further, there must be at least 10 players in the tournament. In the event such upset winners are found, the text file would have the following output for example:

START OF SAMPLE OUTPUT FOR BIG UPSET WINNERS

The list of top ten upsets (Elo less than 2 sigma below mean) for 10 or more players

Upset Winning Player Stats (ID:0, Rating:1554, Round:38159, Total Score:8,Avg. Elo Opponent:1849.7,Performance Elo:2100.7)

The upset player has the following stats for the round

OpponentID, Rating, W/L result (W=Win; L=Loss): 48,1942.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 48,1942.0,L

OpponentID, Rating, W/L result (W=Win; L=Loss): 48,1942.0,L

OpponentID, Rating, W/L result (W=Win; L=Loss): 17,1755.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 17,1755.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 28,1810.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 42,1917.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 15,1746.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 6,1673.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 51,2015.0,W

...end upset player data...

END OF SAMPLE OUTPUT FOR BIG UPSET WINNERS

For these big upset winners, the statistics for the winning player are given as: ID, rating, round #, total score, average Elo of opponent and performance Elo of the winner; further, for each win or loss, the opponent ID, the opponent Elo, and the result (W=Win, L=Loss) are given for each opponent in the round. The order of opponents displayed is from most recent to least recent, thus the Upset Winning Player of ID:0 played the higher rated player of ID:51, rated 2015 Elo, then last played the player of ID: 48, rated 1942 Elo.

In the event the user's chosen player wins, the text file would have the following entry (if the win occurs in the first 1000 wins):

SAMPLE WHERE CHOSEN PLAYER WINS TOURNAMENT:

Winning Player Stats (ID:4, Rating:1999, Round:11, Total Score:6, Avg. Elo Opponent:2062.3,Performance Elo:2255.3)

The winning player was also the chosen player and has the following stats for the round 11

OpponentID, Rating, W/L result (W=Win; L=Loss): 2,1953.0,L

OpponentID, Rating, W/L result (W=Win; L=Loss): 9,2154.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 9,2154.0,L

OpponentID, Rating, W/L result (W=Win; L=Loss): 0,1848.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 9,2154.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 9,2154.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 6,2043.0,W

OpponentID, Rating, W/L result (W=Win; L=Loss): 5,2038.0,W

...end chosen player data...

With the stats as before. At the very bottom of the text file is given the time elapsed for the execution of the program, in hour:minute:second format.

Future improvements to the program might be to allow a user to input, in a text file, the actual players in a tournament rather than approximate them as a Gaussian distribution; to allow draws to count (though that should not change the statistics in the long run), and to show a graphical output rather than text.

The program is stored at http://swisschesswinpredict.codeplex.com/, including the source code, and is freeware released AS IS AND WITHOUT ANY WARRANTY OF ANY KIND. It is written in Visual Studio C#.

(c) 2012, by JayRodrequez

Last edited Nov 18, 2012 at 8:44 PM by JayRodrequez, version 9