Author |
Topic: Arimaa Top 25 COMPUTER Power Ranking Results (Read 4039 times) |
|
The_Jeh
Forum Guru
Arimaa player #634
Gender:
Posts: 460
|
|
Arimaa Top 25 COMPUTER Power Ranking Results
« on: Nov 3rd, 2007, 12:10am » |
Quote Modify
|
With the help of Fritzlein, I was able to get data for all rated Human v. Human games (played through noon Oct 28, 2007), and I put the results through a least-squares regression model to obtain the following rankings: All Players: 1. xabiron 1-0 2.26468 2. dethwing 2-0 1.76469 3. spela 2-0 1.36315 4. Sameer 1-2 1.26475 5. omarFast 1-0 1.13060 6. GordonBlack 1-0 1.11907 7. acroninj 2-0 1.0 7. archigavr 1-0 1.0 7. Virgeist 1-0 1.0 7. Yaron 1-0 1.0 7. pikachamp 1-0 1.0 7. BLooodyANgel 1-0 1.0 7. marcgb 1-0 1.0 7. emeryaj 1-0 1.0 7. glitch 1-0 1.0 7. Gesuma 2-0 1.0 7. brad 1-0 1.0 7. i_am_you 1-0 1.0 7. ZeroOne 1-0 1.0 7. Yzaxtol 1-0 1.0 7. mightybyte 2-0 1.0 7. Asturianuco 3-0 1.0 7. Guest5409 3-0 1.0 7. travis 1-0 1.0 25. Fritzlein 385-51 .86405 Now, you may notice that the list above is pretty meaningless, and I have to agree. For a better ranking, I will ignore players with fewer than 15 HvH rated games played. The first number is the rating, the second is the schedule strength. This is actually a list of all players who have played 15+rated HvH games: 1. Fritzlein 385-51 .86405 .09799 2. 99of9 182-55 .71591 .18005 3. chessandgo 227-98 .56717 .17025 4. robinson 185-112 .55218 .30639 5. Adanac 105-80 .45143 .31630 6. PMertens 246-144 .41130 .14976 7. RonWeasley 49-24 .33907 -.00340 8. Belbo 60-67 .27918 .33430 9. omar 81-64 .26476 .14752 10. UltraWeak 10-5 .26066 -.07268 11. thorin 13-9 .22853 .04671 12. Paul 26-26 .21480 .21480 13. BlackKnight 7-11 .19097 .41319 14. Akhenaten 10-7 .17647 0.00000 15. clauchau 23-21 .17627 .13082 16. Ryan_Cable 76-78 .16039 .17337 17. petitprince 11-6 .14689 -.14723 18. mdk 15-13 .14515 .07372 19. naveed 117-129 .13060 .17938 20. kamikazeking 37-38 .11908 .13241 21. Brendan 30-46 .10497 .31550 22. OLTI 59-58 .02792 .01937 23. jdb 104-112 .02441 .06145 24. blue22 51-66 -0.00199 .12622 25. arimaa_master 170-83 -.04366 -.38753 26. Swynndla 72-38 -.08838 -.39747 27. nbarriga 14-9 -.16625 -.38364 28. appalachia 7-10 -.17647 0.00000 29. kerdamdam 18-18 -.20366 -.20366 30. KT2006 10-10 -.20886 .20886 31. Soter 23-7 -.20965 -.74298 32. megamau 21-41 -.23047 .09211 33. camelback 11-8 -.23434 -.39224 34. mistre 22-16 -.26178 -.41967 35. Tanker_JD 15-11 -.29379 -.44763 36. The_Jeh 13-48 -.33582 .23795 37. Arimanator 23-47 -.36339 -.02054 38. IdahoEv 31-42 -.38449 -.23381 39. purplebaron 7-11 -.42379 -.20156 40. woh 21-37 -.42449 -.14863 41. Mr. Brain 6-18 -.43030 .06970 42. H_Bobbeltoff 8-35 -.45670 .17121 43. rick 8-17 -.46878 -.10878 44. Chegorimaa 10-34 -.46912 .07633 45. seanick 62-159 -.52921 -.09030 46. frostlad 8-22 -.53164 -.06497 47. friztforpresident 6-17 -.53320 -.05494 48. Tore 10-17 -.54907 -.28981 49. grey_0x2A 3-20 -.57682 .16231 50. dtj 8-25 -.62151 -.10636 51. Keith 3-14 -.65800 -.01094 52. NIC1138 25-97 -.70551 -.11535 53. aaaa 3-19 -.82908 -.10181 54. Gregorius 0-17 -.83280 .16720 55. Slowstorm 7-20 -.84630 -.36482 56. Erezap 3-18 -.89784 -.18356 57. mentalsurge 4-13 -1.01733 -.48792 58. proselyte 7-19 -1.07071 -.60917 59. Calumet45 3-27 -1.10712 -.30712 60. Kruschak 0-17 -1.14554 -.14554 Here are the top 10 players in schedule strength among the 15+ game group: 1. BlackKnight .41319 2. Belbo .33430 3. Adanac .31630 4. Brendan .31550 5. robinson .30639 6. The_Jeh .23795 7. Paul .21480 8. 99of9 .18005 9. naveed .17938 10. Ryan_Cable .17121 Extra notes: 376 distinct users have participated in a rated HvH game, and there have been 3068 such games as of noon October 28, 2007.
|
« Last Edit: Nov 3rd, 2007, 2:25am by The_Jeh » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #1 on: Nov 3rd, 2007, 12:23am » |
Quote Modify
|
Very interesting. I think this ranking is reasonable if all games in history count equally. However, if you weight more recent games more heavily, and allow for the strength of players to have fluctuated over time, chessandgo would rise and robinson would fall, I expect. Also, is there some elegant way to weight ratings toward the middle, so that 1-0 players don't rate so high? I'm curious to see how this compares to the p8 ratings, since those ratings do have a ballast to hold inexperienced players near 1500, as well as a decay factor to weight older results less heavily. Still, it's good to know that, taking all games at once, I'm the #1 player of all time. (Well, #25, at least... )
|
|
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #2 on: Nov 3rd, 2007, 12:39am » |
Quote Modify
|
Would it be possible to calculate a confidence interval then rank by the conservative rating, i.e. the rating minus the confidence interval? Interesting result as it stands though. Janzert
|
|
IP Logged |
|
|
|
omar
Forum Guru
Arimaa player #2
Gender:
Posts: 1003
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #3 on: Nov 4th, 2007, 1:41am » |
Quote Modify
|
Nice job on producing this list Jeh. The rankings here seem to match very closely with our intuitive feel for players strengths. I could probably use this for ordering the players in the Swiss preliminary, if we can't produce a better list before January. If you want to take a crack at generating P8 ratings you can find the code for it here: http://arimaa.com/arimaa/rating/testRatings.tgz
|
|
IP Logged |
|
|
|
The_Jeh
Forum Guru
Arimaa player #634
Gender:
Posts: 460
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #4 on: Nov 4th, 2007, 11:57pm » |
Quote Modify
|
I thought it might me interesting to do the calculations again using only games played after the conclusion of the last WC, which would mark the start of a new season. Thanks again to Fritzlein. Here are the results of that calculation: 1. clauchau 1-0 2. ntroncos 1-0 3. chessandgo 66-7 4. Rabbit 2-0 5. Yzaxtol 2-0 6. ZeroOne 1-0 7. Virgeist 1-0 8. PatoGuy 1-0 9. Fritzlein 66-6 10. challenger 2-0 11. Raymond 2-0 12. RonWeasley 23-6 13. 99of9 11-4 14. Brendan 12-7 15. knarl 1-0 16. PMertens 15-8 17. smonroy 1-0 18. jdb 12-6 19. UltraWeak 2-1 20. blue22 17-10 21. OLTI 5-5 22. robinson 2-2 23. omar 6-3 24. arimaa_master 90-28 25. petitprince 11-6 The problem inevitably encountered with these calculations is isolated pools of players who play each other but do not play anyone outside of their circle. For those players who are connected by games, the results are reasonable enough relative to each other. Just to make things look better, I'll cut out players who've played fewer than five games: 1. chessandgo 66-7 1.108 .300 2. Fritzlein 66-6 .903 .069 3. RonWeasley 23-6 .732 .146 4. 99of9 11-4 .689 .222 5. Brendan 12-7 .667 .404 6. PMertens 15-8 .582 .278 7. jdb 12-6 .476 .143 8. blue22 17-10 .437 .178 9. OLTI 5-5 .403 .403 10. omar 6-3 .375 .041 11. arimaa_master 90-28 .367 -.156 12. petitprince 11-6 .357 .063 13. mdk 15-13 .324 .252 14. Adanac 6-10 .215 .465 15. nbarriga 5-3 .177 -.073 16. Soter 25-7 .100 -.462 17. mistre 23-16 .090 -.089 18. camelback 11-7 .044 -.178 19. woh 15-17 .043 .106 20. Tanker_JD 15-11 .033 -.121 21. seanmcl 4-4 0 0 21. Asubfive 4-4 0 0 23. JacquesB 5-4 -.008 -.119 24. kerdamdam 5-5 -.061 -.061 25. megamau 2-1 -.113 -.446 26. IdahoEv 16-17 -.127 -.097 27. seanick 9-11 -.149 -.049 28. The_Jeh 13-32 -.154 .268 29. Chegorimaa 7-17 -.170 .247 30. Erezap 3-8 -.433 .021 31. NIC1138 25-85 -.450 .096 32. K_Hayes 3-5 -.493 -.243 33. ChrisB 5-6 -.517 -.426 34. aaaa 3-19 -.547 .180 35. Slowstorm 3-11 -.595 -.023 36. naveed 1-14 -.622 .244 37. Ganesha 0-5 -.623 .377 38. dougk 0-6 -.631 .369 39. nogard 3-6 -.656 -.323 40. BBcardsRI 0-5 -.715 .285 41. gunananda 1-4 -.738 -.138 42. Kruschak 0-17 -.759 .241 43. proselyte 7-19 -.760 -.299 44. froody 6-13 -.815 -.447 45. pcpdams 1-8 -.818 -.041 46. Krasnotron 4-7 -.842 -.569 47. willwould 2-4 -.984 -.650 48. casparix 0-9 -1.076 -.076 What are your opinions of the second option compared to the first? If only people would play a variety of opponents, this would work much better.
|
« Last Edit: Nov 5th, 2007, 8:16am by The_Jeh » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #5 on: Nov 5th, 2007, 8:01am » |
Quote Modify
|
For those of us who have been playing for longer than a year, I think the second list better reflects our results in the most recent year. All my "learning losses" to 99of9 and chessandgo's learning losses to me are not included, which probably gives a better indication of current playing strength. Still, there are players like mdk and mistre who have improved a great deal within the last year. I'm not sure what one can do about that, because at some point using only the most recent games makes the sample of games too small to be useful. Fortunately, pre-tournament ratings only have a limited impact, because the preliminary sorts things out better to seed the final, and in the final everyone gets two lives again. The tournament will be long enough this year that it will be unequivocally settled over the board within the tournament, rather than being too influenced by ratings generated during the rest of the year.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #6 on: Nov 8th, 2007, 10:19pm » |
Quote Modify
|
on Nov 4th, 2007, 1:41am, omar wrote:I could probably use this for ordering the players in the Swiss preliminary, if we can't produce a better list before January. |
| I like that you are opening up the process, Omar, and that you will possibly use a ranking list from the community. However, my current preference would be for using the p8 HvH ratings rather than the list produced by The_Jeh's program, because for seeding the Swiss preliminary, we need to be able to seed everyone. The_Jeh's list is quite reasonable when we cut out everyone who played too few games, but for seeding the tournament we don't have the luxury of omitting players. John, do you think you could tweak your algorithm so that players with few games also have a reasonable rating? One idea would be to add in an anchor player, and fake results that everyone has one win and one loss against the anchor player. That will bias everyone towards the mean and (perhaps) produce reasonable seeds for inexperienced players.
|
|
IP Logged |
|
|
|
The_Jeh
Forum Guru
Arimaa player #634
Gender:
Posts: 460
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #7 on: Nov 8th, 2007, 11:32pm » |
Quote Modify
|
Your idea of adding an anchor player might work. It would bring everyone toward the mean, but it would affect players with fewer games more than those with many games. It might punish good players with few games more than we'd like. I'll have to see the results to know for sure. One thing that I know it would help is connecting players into one pool. For example, in a pool of two players who've played one game, there are an infinite number of solutions. If A defeats B, as long as -A=B, any ratings would solve the system. So in my previous posts, players who are 1-0 and have a rating of 1 would have had a rating of 0 had I done an odd number of iterations. In the case of everyone else, the ratings do converge to a single solution that minimizes the squared error. I think with an anchor, everyone will be connected to the big pool that has one solution, so everyone's rating will converge.
|
« Last Edit: Nov 9th, 2007, 12:40am by The_Jeh » |
IP Logged |
|
|
|
The_Jeh
Forum Guru
Arimaa player #634
Gender:
Posts: 460
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #8 on: Nov 9th, 2007, 10:57am » |
Quote Modify
|
With the anchor player added, the rankings are as follows. (And I've consolidated Arimanator's accounts.) The W-L are given without the anchor games. This still uses the data from last time: 1. Chessandgo 66-7 2. Fritzlein 66-6 3. RonWeasley 23-6 4. Brendan 12-7 5. 99of9 11-4 6. PMertens 15-8 7. clauchau 1-0 8. Arimanator 8-2 9. jdb 12-6 10. arimaa_master 90-28 11. blue22 17-10 12. petitprince 11-6 13. mdk 15-13 14. omar 6-3 15. OLTI 5-5 16. Soter 25-7 17. Rabbit 2-0 18. UltraWeak 2-1 19. mistre 23-16 20. ntroncos 1-0 21. Tau 3-0 22. Robinson 2-2 23. Adanac 6-10 24. woh 15-17 25. camelback 11-7 I'm not sure I like this yet, either.
|
« Last Edit: Nov 9th, 2007, 11:09am by The_Jeh » |
IP Logged |
|
|
|
The_Jeh
Forum Guru
Arimaa player #634
Gender:
Posts: 460
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #9 on: Nov 9th, 2007, 12:00pm » |
Quote Modify
|
I guess the problem is always that assumptions have to be made. You assume players who are 1-0 or 2-0 on the list are weaker than what this rating says because you have access to knowledge the computer doesn't. You know they might have gotten lucky or might have lost other games not considered here. I, however, cannot maintain absolute objectivity by adding presumptions into the formula. And adding these presumptions always helps some things while hurting others. I've tried several different schemes of adding fictitious games, such as the Anchor player, and also Genius/Idiot players who always win or always lose, but the results are always better in some respects and worse in others. The only way for me to achieve greater accuracy is to add more true games. So that's what I'm going to do. Fritzlein, if you would be so kind, please e-mail me the spreadsheet of all rated games, HH HB and BB, played within the last 12 months, and a second list with only the last 6 months. I really won't know if it's feasible to calculate all that until I try. Actually, I'm thinking it's possible. If it can be done, the results should be perfectly acceptable. If there still are players you think should be lower or higher, you will have no evidence to point to that the computer won't have considered. I am not necessarily saying that this should replace p8, though.
|
« Last Edit: Nov 9th, 2007, 12:55pm by The_Jeh » |
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #10 on: Nov 9th, 2007, 2:17pm » |
Quote Modify
|
Let's say you have player A beating player C in 100 games and losing 50 games. At the same time player B beats player C in 2 games and loses 1 game. While on the one hand you can say that from the data available it appears that players A and B are both twice as good as C. You should also be able to say that you are much more confident player A is twice as good as C than you are that player B is. Janzert
|
« Last Edit: Nov 9th, 2007, 2:18pm by Janzert » |
IP Logged |
|
|
|
The_Jeh
Forum Guru
Arimaa player #634
Gender:
Posts: 460
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #11 on: Nov 9th, 2007, 3:18pm » |
Quote Modify
|
Yes, but I cannot translate lack of confidence into a lower rating.
|
« Last Edit: Nov 9th, 2007, 3:20pm by The_Jeh » |
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #12 on: Nov 9th, 2007, 4:36pm » |
Quote Modify
|
The way I've seen is to subtract the confidence interval from the apparent rating. Basically this means the resulting rating is saying we believe this players true rating to be at least this good with whatever confidence the interval used is. Janzert
|
« Last Edit: Nov 9th, 2007, 4:39pm by Janzert » |
IP Logged |
|
|
|
The_Jeh
Forum Guru
Arimaa player #634
Gender:
Posts: 460
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #13 on: Nov 9th, 2007, 5:03pm » |
Quote Modify
|
I see. You want each player to be given a performance rating on each game played, a standard deviation calculated from these games, and then a t-model used to determine the confidence interval of their true rating? I admit, it's getting a bit complicated for me. Right now, I am anxious to see the results from all the rated games of the past year, including bots. Anyone considering entering the WC, though he finds it hard to find humans to play, likely plays the bots several times. I know a reason why HB games aren't used for p8's for the WC - because playing a thousand games against weak bots will inflate one's rating. I know p8 attempts to correct this, but it does so imperfectly. That is a nonissue with this system. But we'll get sufficient quantity with bot games also considered, and we can benefit from being able to include the type of game most often played on the server. Sorry if I keep asking for more, Fritzlein, but I think I'm nearing the max of what I could ask.
|
« Last Edit: Nov 9th, 2007, 5:28pm by The_Jeh » |
IP Logged |
|
|
|
mistre
Forum Guru
Gender:
Posts: 553
|
|
Re: Arimaa Top 25 COMPUTER Power Ranking Results
« Reply #14 on: Nov 9th, 2007, 5:51pm » |
Quote Modify
|
I am continuing to watch this topic with interest. Thanks for all of your research, John!
|
|
IP Logged |
|
|
|
|