Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> General Discussion >> Arimaa Top 25 COMPUTER Power Ranking Results
(Message started by: The_Jeh on Nov 3rd, 2007, 12:10am)

Title: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 3rd, 2007, 12:10am
With the help of Fritzlein, I was able to get data for all rated Human v. Human games (played through noon Oct 28, 2007), and I put the results through a least-squares regression model to obtain the following rankings:

All Players:

1. xabiron 1-0 2.26468
2. dethwing 2-0 1.76469
3. spela 2-0 1.36315
4. Sameer 1-2 1.26475
5. omarFast 1-0 1.13060
6. GordonBlack 1-0 1.11907
7. acroninj 2-0 1.0
7. archigavr 1-0 1.0
7. Virgeist 1-0 1.0
7. Yaron 1-0 1.0
7. pikachamp 1-0 1.0
7. BLooodyANgel 1-0 1.0
7. marcgb 1-0 1.0
7. emeryaj 1-0 1.0
7. glitch 1-0 1.0
7. Gesuma 2-0 1.0
7. brad 1-0 1.0
7. i_am_you 1-0 1.0
7. ZeroOne 1-0 1.0
7. Yzaxtol 1-0 1.0
7. mightybyte 2-0 1.0
7. Asturianuco 3-0 1.0
7. Guest5409 3-0 1.0
7. travis 1-0 1.0
25. Fritzlein 385-51 .86405

Now, you may notice that the list above is pretty meaningless, and I have to agree. For a better ranking, I will ignore players with fewer than 15 HvH rated games played. The first number is the rating, the second is the schedule strength. This is actually a list of all players who have played 15+rated HvH games:

1. Fritzlein 385-51             .86405     .09799
2. 99of9 182-55                .71591     .18005
3. chessandgo 227-98      .56717     .17025
4. robinson 185-112         .55218     .30639
5. Adanac 105-80             .45143      .31630
6. PMertens 246-144        .41130     .14976
7. RonWeasley 49-24       .33907     -.00340
8. Belbo 60-67                  .27918      .33430
9. omar 81-64                  .26476       .14752
10. UltraWeak 10-5         .26066      -.07268
11. thorin 13-9                .22853         .04671
12. Paul 26-26                 .21480       .21480
13. BlackKnight 7-11        .19097      .41319
14. Akhenaten 10-7         .17647       0.00000
15. clauchau 23-21          .17627       .13082
16. Ryan_Cable 76-78     .16039        .17337
17. petitprince 11-6         .14689       -.14723
18. mdk 15-13                 .14515         .07372
19. naveed 117-129        .13060        .17938
20. kamikazeking 37-38   .11908         .13241
21. Brendan 30-46           .10497         .31550
22. OLTI 59-58                .02792         .01937
23. jdb 104-112              .02441            .06145
24. blue22 51-66             -0.00199      .12622
25. arimaa_master 170-83 -.04366     -.38753
26. Swynndla 72-38        -.08838          -.39747
27. nbarriga 14-9             -.16625         -.38364
28. appalachia 7-10          -.17647         0.00000
29. kerdamdam 18-18       -.20366      -.20366
30. KT2006 10-10             -.20886        .20886
31. Soter 23-7                  -.20965       -.74298
32. megamau 21-41        -.23047         .09211
33. camelback 11-8           -.23434      -.39224
34. mistre 22-16              -.26178           -.41967
35. Tanker_JD 15-11         -.29379       -.44763
36. The_Jeh 13-48           -.33582        .23795
37. Arimanator 23-47       -.36339        -.02054
38. IdahoEv 31-42           -.38449         -.23381
39. purplebaron 7-11       -.42379          -.20156
40. woh 21-37                  -.42449        -.14863
41. Mr. Brain 6-18             -.43030        .06970
42. H_Bobbeltoff 8-35       -.45670          .17121
43. rick 8-17                      -.46878        -.10878
44. Chegorimaa 10-34       -.46912        .07633
45. seanick 62-159           -.52921          -.09030
46. frostlad 8-22               -.53164          -.06497
47. friztforpresident 6-17   -.53320        -.05494
48. Tore 10-17                   -.54907         -.28981
49. grey_0x2A 3-20           -.57682         .16231
50. dtj 8-25                       -.62151          -.10636
51. Keith 3-14                   -.65800         -.01094
52. NIC1138 25-97            -.70551        -.11535
53. aaaa 3-19                   -.82908         -.10181
54. Gregorius 0-17             -.83280          .16720
55. Slowstorm 7-20          -.84630          -.36482
56. Erezap 3-18               -.89784           -.18356
57. mentalsurge 4-13       -1.01733         -.48792
58. proselyte 7-19            -1.07071        -.60917
59. Calumet45 3-27           -1.10712       -.30712
60. Kruschak 0-17             -1.14554         -.14554

Here are the top 10 players in schedule strength among the 15+ game group:
1. BlackKnight .41319
2. Belbo .33430
3. Adanac .31630
4. Brendan .31550
5. robinson .30639
6. The_Jeh .23795
7. Paul .21480
8. 99of9 .18005
9. naveed .17938
10. Ryan_Cable .17121

Extra notes: 376 distinct users have participated in a rated HvH game, and there have been 3068 such games as of noon October 28, 2007.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 3rd, 2007, 12:23am
Very interesting.  I think this ranking is reasonable if all games in history count equally.  However, if you weight more recent games more heavily, and allow for the strength of players to have fluctuated over time, chessandgo would rise and robinson would fall, I expect.

Also, is there some elegant way to weight ratings toward the middle, so that 1-0 players don't rate so high?

I'm curious to see how this compares to the p8 ratings, since those ratings do have a ballast to hold inexperienced players near 1500, as well as a decay factor to weight older results less heavily.

Still, it's good to know that, taking all games at once, I'm the #1 player of all time.  (Well, #25, at least... :P)

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Janzert on Nov 3rd, 2007, 12:39am
Would it be possible to calculate a confidence interval then rank by the conservative rating, i.e. the rating minus the confidence interval?

Interesting result as it stands though.

Janzert

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by omar on Nov 4th, 2007, 1:41am
Nice job on producing this list Jeh. The rankings here seem to match very closely with our intuitive feel for players strengths. I could probably use this for ordering the players in the Swiss preliminary, if we can't produce a better list before January.

If you want to take a crack at generating P8 ratings you can find the code for it here:
http://arimaa.com/arimaa/rating/testRatings.tgz

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 4th, 2007, 11:57pm
I thought it might me interesting to do the calculations again using only games played after the conclusion of the last WC, which would mark the start of a new season. Thanks again to Fritzlein. Here are the results of that calculation:

1. clauchau 1-0
2. ntroncos 1-0
3. chessandgo 66-7
4. Rabbit 2-0
5. Yzaxtol 2-0
6. ZeroOne 1-0
7. Virgeist 1-0
8. PatoGuy 1-0
9. Fritzlein 66-6
10. challenger 2-0
11. Raymond 2-0
12. RonWeasley 23-6
13. 99of9 11-4
14. Brendan 12-7
15. knarl 1-0
16. PMertens 15-8
17. smonroy 1-0
18. jdb 12-6
19. UltraWeak 2-1
20. blue22 17-10
21. OLTI 5-5
22. robinson 2-2
23. omar 6-3
24. arimaa_master 90-28
25. petitprince 11-6

The problem inevitably encountered with these calculations is isolated pools of players who play each other but do not play anyone outside of their circle. For those players who are connected by games, the results are reasonable enough relative to each other. Just to make things look better, I'll cut out players who've played fewer than five games:

1. chessandgo 66-7   1.108   .300
2. Fritzlein 66-6   .903   .069
3. RonWeasley 23-6   .732   .146
4. 99of9 11-4  .689   .222
5. Brendan 12-7   .667   .404
6. PMertens 15-8   .582   .278
7. jdb 12-6   .476   .143
8. blue22 17-10   .437   .178
9. OLTI 5-5   .403   .403
10. omar 6-3   .375   .041
11. arimaa_master 90-28   .367   -.156
12. petitprince 11-6   .357   .063
13. mdk 15-13   .324   .252
14. Adanac 6-10   .215   .465
15. nbarriga 5-3   .177   -.073
16. Soter 25-7   .100   -.462
17. mistre 23-16   .090   -.089
18. camelback 11-7 .044   -.178
19. woh 15-17   .043   .106
20. Tanker_JD 15-11    .033   -.121
21. seanmcl 4-4   0   0
21. Asubfive 4-4   0   0
23. JacquesB 5-4   -.008   -.119
24. kerdamdam 5-5   -.061   -.061
25. megamau 2-1   -.113   -.446
26. IdahoEv 16-17   -.127   -.097
27. seanick 9-11   -.149   -.049
28. The_Jeh 13-32   -.154   .268
29. Chegorimaa 7-17   -.170   .247
30. Erezap 3-8   -.433   .021
31. NIC1138 25-85   -.450   .096
32. K_Hayes 3-5   -.493   -.243
33. ChrisB 5-6   -.517   -.426
34. aaaa 3-19   -.547   .180
35. Slowstorm 3-11   -.595   -.023
36. naveed 1-14   -.622   .244
37. Ganesha 0-5   -.623   .377
38. dougk 0-6   -.631  .369
39. nogard 3-6   -.656    -.323
40. BBcardsRI 0-5   -.715   .285
41. gunananda 1-4   -.738   -.138
42. Kruschak 0-17   -.759   .241
43. proselyte 7-19   -.760   -.299
44. froody 6-13   -.815   -.447
45. pcpdams 1-8   -.818   -.041
46. Krasnotron 4-7   -.842   -.569
47. willwould 2-4   -.984   -.650
48. casparix 0-9   -1.076   -.076

What are your opinions of the second option compared to the first? If only people would play a variety of opponents, this would work much better.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 5th, 2007, 8:01am
For those of us who have been playing for longer than a year, I think the second list better reflects our results in the most recent year.  All my "learning losses" to 99of9 and chessandgo's learning losses to me are not included, which probably gives a better indication of current playing strength.

Still, there are players like mdk and mistre who have improved a great deal within the last year.  I'm not sure what one can do about that, because at some point using only the most recent games makes the sample of games too small to be useful.

Fortunately, pre-tournament ratings only have a limited impact, because the preliminary sorts things out better to seed the final, and in the final everyone gets two lives again.  The tournament will be long enough this year that it will be unequivocally settled over the board within the tournament, rather than being too influenced by ratings generated during the rest of the year.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 8th, 2007, 10:19pm

on 11/04/07 at 01:41:13, omar wrote:
I could probably use this for ordering the players in the Swiss preliminary, if we can't produce a better list before January.

I like that you are opening up the process, Omar, and that you will possibly use a ranking list from the community.  However, my current preference would be for using the p8 HvH ratings rather than the list produced by The_Jeh's program, because for seeding the Swiss preliminary, we need to be able to seed everyone.  The_Jeh's list is quite reasonable when we cut out everyone who played too few games, but for seeding the tournament we don't have the luxury of omitting players.

John, do you think you could tweak your algorithm so that players with few games also have a reasonable rating?  One idea would be to add in an anchor player, and fake results that everyone has one win and one loss against the anchor player.  That will bias everyone towards the mean and (perhaps) produce reasonable seeds for inexperienced players.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 8th, 2007, 11:32pm
Your idea of adding an anchor player might work. It would bring everyone toward the mean, but it would affect players with fewer games more than those with many games. It might punish good players with few games more than we'd like. I'll have to see the results to know for sure.

One thing that I know it would help is connecting players into one pool. For example, in a pool of two players who've played one game, there are an infinite number of solutions. If A defeats B, as long as -A=B, any ratings would solve the system. So in my previous posts, players who are 1-0 and have a rating of 1 would have had a rating of 0 had I done an odd number of iterations. In the case of everyone else, the ratings do converge to a single solution that minimizes the squared error. I think with an anchor, everyone will be connected to the big pool that has one solution, so everyone's rating will converge.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 9th, 2007, 10:57am
With the anchor player added, the rankings are as follows. (And I've consolidated Arimanator's accounts.) The W-L are given without the anchor games. This still uses the data from last time:

1. Chessandgo 66-7
2. Fritzlein 66-6
3. RonWeasley 23-6
4. Brendan 12-7
5. 99of9 11-4
6. PMertens 15-8
7. clauchau 1-0
8. Arimanator 8-2
9. jdb 12-6
10. arimaa_master 90-28
11. blue22 17-10
12. petitprince 11-6
13. mdk 15-13
14. omar 6-3
15. OLTI 5-5
16. Soter 25-7
17. Rabbit 2-0
18. UltraWeak 2-1
19. mistre 23-16
20. ntroncos 1-0
21. Tau 3-0
22. Robinson 2-2
23. Adanac 6-10
24. woh 15-17
25. camelback 11-7

I'm not sure I like this yet, either.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 9th, 2007, 12:00pm
I guess the problem is always that assumptions have to be made. You assume players who are 1-0 or 2-0 on the list are weaker than what this rating says because you have access to knowledge the computer doesn't. You know they might have gotten lucky or might have lost other games not considered here. I, however, cannot maintain absolute objectivity by adding presumptions into the formula. And adding these presumptions always helps some things while hurting others.

I've tried several different schemes of adding fictitious games, such as the Anchor player, and also Genius/Idiot players who always win or always lose, but the results are always better in some respects and worse in others. The only way for me to achieve greater accuracy is to add more true games.

So that's what I'm going to do. Fritzlein, if you would be so kind, please e-mail me the spreadsheet of all rated games, HH HB and BB, played within the last 12 months, and a second list with only the last 6 months. I really won't know if it's feasible to calculate all that until I try. Actually, I'm thinking it's possible. If it can be done, the results should be perfectly acceptable. If there still are players you think should be lower or higher, you will have no evidence to point to that the computer won't have considered.

I am not necessarily saying that this should replace p8, though.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Janzert on Nov 9th, 2007, 2:17pm
Let's say you have player A beating player C in 100 games and losing 50 games. At the same time player B beats player C in 2 games and loses 1 game.

While on the one hand you can say that from the data available it appears that players A and B are both twice as good as C. You should also be able to say that you are much more confident player A is twice as good as C than you are that player B is.

Janzert

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 9th, 2007, 3:18pm
Yes, but I cannot translate lack of confidence into a lower rating.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Janzert on Nov 9th, 2007, 4:36pm
The way I've seen is to subtract the confidence interval from the apparent rating. Basically this means the resulting rating is saying we believe this players true rating to be at least this good with whatever confidence the interval used is.

Janzert

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 9th, 2007, 5:03pm
I see. You want each player to be given a performance rating on each game played, a standard deviation calculated from these games, and then a t-model used to determine the confidence interval of their true rating?

I admit, it's getting a bit complicated for me. Right now, I am anxious to see the results from all the rated games of the past year, including bots. Anyone considering entering the WC, though he finds it hard to find humans to play, likely plays the bots several times. I know a reason why HB games aren't used for p8's for the WC - because playing a thousand games against weak bots will inflate one's rating. I know p8 attempts to correct this, but it does so imperfectly. That is a nonissue with this system. But we'll get sufficient quantity with bot games also considered, and we can benefit from being able to include the type of game most often played on the server. Sorry if I keep asking for more, Fritzlein, but I think I'm nearing the max of what I could ask.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by mistre on Nov 9th, 2007, 5:51pm
I am continuing to watch this topic with interest.  Thanks for all of your research, John!


Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 9th, 2007, 9:32pm
I haven't done much research, honestly.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 9th, 2007, 11:46pm

on 11/09/07 at 12:00:51, The_Jeh wrote:
Fritzlein, if you would be so kind, please e-mail me the spreadsheet of all rated games, HH HB and BB, played within the last 12 months, and a second list with only the last 6 months.

Mailed.  19402 games from 11/1/06 through 10/31/07.  10444 games from 5/1/06 through 10/31/07.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 9th, 2007, 11:59pm

on 11/09/07 at 12:00:51, The_Jeh wrote:
I, however, cannot maintain absolute objectivity by adding presumptions into the formula.

It is entirely objective to assume that an unknown player is near the mean until proven otherwise.  Based on concrete data you can say (for example) only 5% of players are better than 2000 in strength.  If someone plays a single game and wins, that provides some objective evidence of their skill level, but why should you say that single game is weighter than objective evidence that that player is probably not in the top ten?  Having prior assumptions about a newcomers skill is not unscientific if those assumptions are based on observation.


Quote:
The only way for me to achieve greater accuracy is to add more true games.

The observer changes whatever is observed.  The system we choose to seed the tournament will alter people's behavior as they try to get higher seeds.  The current rating system is a perfect example, as people (including me) engage in silly behavior simply because their silly behavior is rewarded with a higher rating.

In my opinion HvB games are far from being "true games", and I expect the rating list you generate to reflect that.  Still I'm curious to see what the results are.  Trying out various things to see what works is research in my book.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 10th, 2007, 12:24am
You are correct as usual, Fritzlein. As far as newcomers go, the way we tabulate the results using recent games, I cannot assume someone with few games is a newcomer. For example, Robinson only had 4 HvH games since the last WC.

You are also right that the results will be manipulated. However, I don't know how you can manipulate my system (which is really just a fundamental system no one can claim as his own) to your advantage except by actually improving yourself. Playing a ton of bots won't help you if you're already known to be better than them. Playing a ton of top humans won't help you if you can't win. And if you can win, you deserve the higher rating. Besides, my system is not something people can keep track of, like the p8 ratings are.

Thank you for the data. I'm working on getting a ranking, but it's a ton to plow through.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 10th, 2007, 10:42am

on 11/09/07 at 10:57:04, The_Jeh wrote:
With the anchor player added, the rankings are as follows. (And I've consolidated Arimanator's accounts.) The W-L are given without the anchor games. This still uses the data from last time:

1. Chessandgo 66-7
2. Fritzlein 66-6
3. RonWeasley 23-6
4. Brendan 12-7
5. 99of9 11-4
6. PMertens 15-8
7. clauchau 1-0
8. Arimanator 8-2
9. jdb 12-6
10. arimaa_master 90-28
11. blue22 17-10
12. petitprince 11-6
13. mdk 15-13
14. omar 6-3
15. OLTI 5-5
16. Soter 25-7
17. Rabbit 2-0
18. UltraWeak 2-1
19. mistre 23-16
20. ntroncos 1-0
21. Tau 3-0
22. Robinson 2-2
23. Adanac 6-10
24. woh 15-17
25. camelback 11-7

I generated rankings using the same 629 games, and the same assumption that all games count equally (not more recent ones more heavily), and the same methodology of having everyone win one game and lose one game to an anchor player.  The difference is that I used our ratings model and maximum likelihood estimation.

The reason to prefer maximum likelihood estimation is that in The_Jeh's method you can be penalized for beating a weak player or rewarded for losing to a strong player.  Strength of schedule can be more important that winning and losing.  With maximum likelihood estimation, beating a weak player always helps your rating, and losing to a strong player always hurts your rating, albeit perhaps very slightly.

This results are rather similar:

2374      chessandgo      66-7
2357      Fritzlein      66-6
2054      PMertens      15-8
2044      Brendan      12-7
2040      RonWeasley      23-6
2023      99of9      11-4
2002      clauchau      1-0
1910      arimaa_master      90-28
1910      jdb      12-6
1905      blue22      17-10
1859      mdk      15-13
1803      OLTI      5-5
1790      omar      6-3
1775      petitprince      11-6
1752      Adanac      6-10
1751      Soter      25-7
1725      Arimabuff      3-1
1724      mistre      23-16
1722      challenger      2-0
1722      Raymond      2-0
1722      Rabbit      2-0
1695      UltraWeak      2-1
1692      robinson      2-2
1683      woh      15-17
1661      ntroncos      1-0
1655      Tau      3-0
1650      Tanker_JD      15-11
1639      camelback      11-7
1631      Yzaxtol      2-0
1614      nbarriga      5-3

Then I did a version of Janzert's lower confidence idea.  I asked how much it would hurt a player's ratings to lose an additional game to the anchor player. We might say that if one loss would cause a player's rating to tumble, they don't deserve a high seed in the World Championship.  Here are what you might call the lower confidence ratings:

2344      chessandgo      66-7
2317      Fritzlein      66-6
2000      RonWeasley      23-6
1980      Brendan      12-7
1977      PMertens      15-8
1939      99of9      11-4
1898      arimaa_master      90-28
1872      blue22      17-10
1858      jdb      12-6
1816      mdk      15-13
1735      OLTI      5-5
1728      petitprince      11-6
1724      Soter      25-7
1707      omar      6-3
1705      mistre      23-16
1703      Adanac      6-10
1658      woh      15-17
1627      Tanker_JD      15-11
1617      clauchau      1-0
1609      Arimabuff      3-1
1603      camelback      11-7
1595      challenger      2-0
1595      Raymond      2-0
1593      robinson      2-2
1590      Rabbit      2-0
1585      UltraWeak      2-1
1569      Chegorimaa      7-17
1568      IdahoEv      16-17
1567      JacquesB      5-4
1561      nbarriga      5-3

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 10th, 2007, 1:49pm
Your ratings look pretty good. I question your statement that you counted all games equally. Yours is a sequential system. Even if you leave the RU the same, doesn't the final result differ slightly depending on the order in which you tally the games? The system I am using right now, on the other hand, does consider all games simultaneously. I'm not saying that's good or bad, but isn't that the way it is?

I'm having trouble pushing the games through my system. Using the 6-month list, I was only able to get results from one iteration, which is meaningless. That is very disappointing. I can think of only one possible solution, which would be to use player ID #'s instead of player names, which would drastically reduce the length of the string and perhaps make it easy enough to handle. Could that be arranged? I am tired of asking for more, please believe me.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by jdb on Nov 10th, 2007, 2:31pm
This is a link to a decent ratings calculator:

http://remi.coulom.free.fr/Bayesian-Elo/

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 10th, 2007, 5:47pm
I'm working on something like what's found on http://www.pro-football-reference.com/blog/wordpress/?p=171#comment-12705.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 10th, 2007, 9:03pm
Now I have two systems. I'm going to put the newer one aside for the time being.

I still think having a string with only player ID's would be easier to use, but, I found an inefficiency in my programming technique, corrected it, and now what once took 6 hours now takes less than half an hour. Wow. So never mind with the request.

So, here are the results of the older system using all rated games of the past 6 months, no anchors added:

1. obiwan 1-0
2. chessandgo 30-2
3. Ryan_Cable 1-0
4. Fritzlein 60-7
5. RonWeasley 13-2
6. robinson 1-0
7. blue22 92-26
8. Arimanator 532-65
9. smonroy 2-0
10. Aamir 1-0
11. PMertens 13-6
12. syed 1106-34
13. 6sense 173-131
14. OLTI 3-2
15. arimaa_master 102-27
16. petitprince 1-1
17. willwould 2-0
18. jdb 8-4
19. naveed 113-111
20. 99of9 4-2
21. UltraWeak 4-3
22. Brendan 4-3
23. mdk 182-118
24. omar 19-19
25. Adanac 9-10

I think you're right, Fritzlein. This looks kind of ugly.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 10th, 2007, 9:32pm
I'm going to give an example of the newer system now, with Janzert and Fritzlein's lower confidence method, 1 win and 2 losses to the Anchor (same 629 games):

1. chessandgo 66-7  20.38245
2. Fritzlein 66-6   18.42715
3. RonWeasley 23-6   4.01976
4. Brendan 12-7   3.39503
5. PMertens 15-8   3.30043
6. 99of9 11-4   3.03360
7. arimaa_master 90-28   2.35544
8. jdb 12-6   2.08053
9. blue22 17-10   2.03331
10. Arimanator 8-2   1.84120
11. mdk 15-14   1.49077
12. clauchau 1-0   1.34560
13. Soter 25-7   1.17840
13. OLTI 5-5   1.17840
15. omar 6-3   1.13829
16. petitprince 11-6   1.13657
17. mistre 23-16   0.94523
18. Rabbit 2-0   0.89194
19. Adanac 6-10   0.87342
20. UltraWeak 2-1   0.79323
21. woh 15-17   0.74131
22. robinson 2-2   0.73276
23. ntroncos 1-0   0.69139
24. Tau 3-0   0.67645
25. Yzaxtol 2-0   0.66307
26. Tanker_JD 15-11   0.64187
27. camelback 11-7   0.61546
28. knarl 1-0   .59045
29. Virgeist 1-0   0.57366
30. nbarriga 5-3   0.56981

Under this system, the probability that player A defeats player B is given by A/(A+B). For example, my rating is 0.44583. So, the probability Fritzlein defeats me is 98% (even though we know it's 100% ;) )

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 11th, 2007, 12:21am

on 11/10/07 at 13:49:11, The_Jeh wrote:
Your ratings look pretty good. I question your statement that you counted all games equally. Yours is a sequential system.

The ratings I posted a while ago in another thread used a sequential system, and the p8 ratings do as well, but for the ratings I posted in this thread, the order of games was irrelevant.  I generally prefer to weight later games more heavily, but didn't do so this time.  It made maximum likelihood much easier to calculate.  It only took me a second or two per iteration, so I could get convergent ratings in a hurry.


Quote:
I can think of only one possible solution, which would be to use player ID #'s instead of player names, which would drastically reduce the length of the string and perhaps make it easy enough to handle. Could that be arranged? I am tired of asking for more, please believe me.

Sure, I can get you the data with ID's instead of player names, albeit not until tomorrow.  What exactly did you want for the date range?  Tonight the data that is downloadable from arimaa.com will automatically update with games through November 10.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 11th, 2007, 9:39am

on 11/10/07 at 21:32:00, The_Jeh wrote:
Under this system, the probability that player A defeats player B is given by A/(A+B).

I like that formula a lot.  For one thing it never sets impossible expectations such that a player can lose rating points for defeating a weak opponent.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 11th, 2007, 2:19pm
So, what do you think of this list compared to your recent lists, Fritzlein? Are either of them adequate? Have we reached an acceptable system? I think we have punished unproven players sufficiently, and I don't think there will be too many of them signing up, anyway. As it is, both our systems seed the current entrants the same way.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Arimanator on Nov 11th, 2007, 3:14pm

on 11/11/07 at 14:19:53, The_Jeh wrote:
So, what do you think of this list compared to your recent lists, Fritzlein? Are either of them adequate? Have we reached an acceptable system? I think we have punished unproven players sufficiently, and I don't think there will be too many of them signing up, anyway. As it is, both our systems seed the current entrants the same way.

For some reason that expression evoked in me an old character of "Saturday Night Live", "Unfrozen Caveman Lawyer". I know that it's not relevant to the discussion at hand; still I find it funny enough to be mentioned in passing. ;D

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 11th, 2007, 11:45pm

on 11/11/07 at 14:19:53, The_Jeh wrote:
So, what do you think of this list compared to your recent lists, Fritzlein? Are either of them adequate? Have we reached an acceptable system?

I think that either your last list or my list would be clearly better than the gameroom ratings, and good enough to be acceptable for seeding the World Championship.  One important point is that people who have played more games against humans are penalized less by the losses to the anchor player, so if people "play to the system", they will play against more humans, which is a good thing.

Your list is better than mine by virtue of consolidating Arimanator's games; otherwise I can't see a reason to choose between them.

As long as the seeding is roughly accurate it will be fine.  The advantage of the #1 seed over the #2 seed is much less in Swiss pairing than in Floating Double Elimination, so we don't have to stress about it as much.

That said, there are two features of p8 ratings that beat both our systems, so I would still like to use p8 ratings for the seeds if Omar has the time to run them.  First, p8 ratings take the variety of opposition into consideration.  It's considered more impressive to beat ten different opponents than the same opponent ten times.  Second, p8 ratings weight more recent games more heavily, so we don't have to have an arbitrary date cutoff.  We can run the ratings over all time and still get reasonable seeding.  This would be more important for players like robinson and clauchau who have significant history that could be used to seed them but few recent games.

In any event, it is good to get this conversation started.  Thanks for your efforts, John.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by The_Jeh on Nov 12th, 2007, 11:43am
That's fine. Computer rankings are something that I've been wanting to understand for a long time, and I'm glad I've had the occasion to figure them out.

p8 is Omar's invention, I take it? I am impressed with how sophisticated it is.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 12th, 2007, 12:40pm
Omar and I developed the p8 ratings together.  At the time I thought they would produce reasonable ratings even with HvB games included, but I have since changed my mind.  Even a sophisticated system can't fix what are essentially social issues.

There are two things that kill the rating system: the inability of bots to learn, and the freedom of players to select their own opponents.  If we had neither of these problems, i.e. only HvH games, and opponents assigned rather than chosen, then even the current game room rating system would be very accurate.  Conversely, since we actually do have both of these problems, it doesn't matter how clever the rating system is, because it won't be accurate anyway.

It is no coincidence that on ICC (the Internet Chess Club) folks judge that the only "real" ratings are the ones where you sign up to play a 5-minute game, without knowing your opponent in advance, without being able to escape playing after you are paired, and with no bots involved. ICC took an administrative route to eliminating the two major sources of error, and voila, the ratings distortions disappeared.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 12th, 2007, 5:08pm

on 11/10/07 at 14:31:22, jdb wrote:
This is a link to a decent ratings calculator:

http://remi.coulom.free.fr/Bayesian-Elo/

Thanks for the link, JDB.  That does look like a good calculator.

Two of the nice features aren't very relevant for us, namely handling first-player advantage and handling draws.  For Arimaa we can't even measure the first-player advantage, and there has yet to be a draw between humans.

A third feature is of critical importance.  It is a terrible approximation to take the combined winning percentage against one's "average opponent".  That approximation would only work if winning percentage were linear, for example if there were an extra 10% chance of winning for every 100 rating points, which isn't true.  If you make the assumption of linearity, you get absurd situations where, for example, A is rated 600 points higher than B and thus is expected to win 110% of time, so winning only 100% of the time against B hurts A's rating.

I'm afraid I don't understand the fourth feature, namely the prior distribution.  What does he mean that the prior distribution will be chosen to be uniform?  Uniform over what interval?  I don't think the prior can be uniform over the whole real line.  I'm embarrassed that I don't understand the math.  I'll think about it some more, and if I figure it out I'll post again.

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by Fritzlein on Nov 12th, 2007, 5:36pm
Ah, I thought I had heard the name Rémi Coulom somewhere before.  He wrote the Go program Crazy Stone.  JDB, did your knowledge of his Go efforts lead you to his ratings calculator?

Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results
Post by jdb on Nov 13th, 2007, 1:21pm
I can't really remember how I came across his website, it was a longish time ago.

An internet search for "bradley terry model" will yield a wealth of info on generalizations for the a/(a+b) ratings model.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.