|
||||
Title: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 3rd, 2007, 12:10am With the help of Fritzlein, I was able to get data for all rated Human v. Human games (played through noon Oct 28, 2007), and I put the results through a least-squares regression model to obtain the following rankings: All Players: 1. xabiron 1-0 2.26468 2. dethwing 2-0 1.76469 3. spela 2-0 1.36315 4. Sameer 1-2 1.26475 5. omarFast 1-0 1.13060 6. GordonBlack 1-0 1.11907 7. acroninj 2-0 1.0 7. archigavr 1-0 1.0 7. Virgeist 1-0 1.0 7. Yaron 1-0 1.0 7. pikachamp 1-0 1.0 7. BLooodyANgel 1-0 1.0 7. marcgb 1-0 1.0 7. emeryaj 1-0 1.0 7. glitch 1-0 1.0 7. Gesuma 2-0 1.0 7. brad 1-0 1.0 7. i_am_you 1-0 1.0 7. ZeroOne 1-0 1.0 7. Yzaxtol 1-0 1.0 7. mightybyte 2-0 1.0 7. Asturianuco 3-0 1.0 7. Guest5409 3-0 1.0 7. travis 1-0 1.0 25. Fritzlein 385-51 .86405 Now, you may notice that the list above is pretty meaningless, and I have to agree. For a better ranking, I will ignore players with fewer than 15 HvH rated games played. The first number is the rating, the second is the schedule strength. This is actually a list of all players who have played 15+rated HvH games: 1. Fritzlein 385-51 .86405 .09799 2. 99of9 182-55 .71591 .18005 3. chessandgo 227-98 .56717 .17025 4. robinson 185-112 .55218 .30639 5. Adanac 105-80 .45143 .31630 6. PMertens 246-144 .41130 .14976 7. RonWeasley 49-24 .33907 -.00340 8. Belbo 60-67 .27918 .33430 9. omar 81-64 .26476 .14752 10. UltraWeak 10-5 .26066 -.07268 11. thorin 13-9 .22853 .04671 12. Paul 26-26 .21480 .21480 13. BlackKnight 7-11 .19097 .41319 14. Akhenaten 10-7 .17647 0.00000 15. clauchau 23-21 .17627 .13082 16. Ryan_Cable 76-78 .16039 .17337 17. petitprince 11-6 .14689 -.14723 18. mdk 15-13 .14515 .07372 19. naveed 117-129 .13060 .17938 20. kamikazeking 37-38 .11908 .13241 21. Brendan 30-46 .10497 .31550 22. OLTI 59-58 .02792 .01937 23. jdb 104-112 .02441 .06145 24. blue22 51-66 -0.00199 .12622 25. arimaa_master 170-83 -.04366 -.38753 26. Swynndla 72-38 -.08838 -.39747 27. nbarriga 14-9 -.16625 -.38364 28. appalachia 7-10 -.17647 0.00000 29. kerdamdam 18-18 -.20366 -.20366 30. KT2006 10-10 -.20886 .20886 31. Soter 23-7 -.20965 -.74298 32. megamau 21-41 -.23047 .09211 33. camelback 11-8 -.23434 -.39224 34. mistre 22-16 -.26178 -.41967 35. Tanker_JD 15-11 -.29379 -.44763 36. The_Jeh 13-48 -.33582 .23795 37. Arimanator 23-47 -.36339 -.02054 38. IdahoEv 31-42 -.38449 -.23381 39. purplebaron 7-11 -.42379 -.20156 40. woh 21-37 -.42449 -.14863 41. Mr. Brain 6-18 -.43030 .06970 42. H_Bobbeltoff 8-35 -.45670 .17121 43. rick 8-17 -.46878 -.10878 44. Chegorimaa 10-34 -.46912 .07633 45. seanick 62-159 -.52921 -.09030 46. frostlad 8-22 -.53164 -.06497 47. friztforpresident 6-17 -.53320 -.05494 48. Tore 10-17 -.54907 -.28981 49. grey_0x2A 3-20 -.57682 .16231 50. dtj 8-25 -.62151 -.10636 51. Keith 3-14 -.65800 -.01094 52. NIC1138 25-97 -.70551 -.11535 53. aaaa 3-19 -.82908 -.10181 54. Gregorius 0-17 -.83280 .16720 55. Slowstorm 7-20 -.84630 -.36482 56. Erezap 3-18 -.89784 -.18356 57. mentalsurge 4-13 -1.01733 -.48792 58. proselyte 7-19 -1.07071 -.60917 59. Calumet45 3-27 -1.10712 -.30712 60. Kruschak 0-17 -1.14554 -.14554 Here are the top 10 players in schedule strength among the 15+ game group: 1. BlackKnight .41319 2. Belbo .33430 3. Adanac .31630 4. Brendan .31550 5. robinson .30639 6. The_Jeh .23795 7. Paul .21480 8. 99of9 .18005 9. naveed .17938 10. Ryan_Cable .17121 Extra notes: 376 distinct users have participated in a rated HvH game, and there have been 3068 such games as of noon October 28, 2007. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 3rd, 2007, 12:23am Very interesting. I think this ranking is reasonable if all games in history count equally. However, if you weight more recent games more heavily, and allow for the strength of players to have fluctuated over time, chessandgo would rise and robinson would fall, I expect. Also, is there some elegant way to weight ratings toward the middle, so that 1-0 players don't rate so high? I'm curious to see how this compares to the p8 ratings, since those ratings do have a ballast to hold inexperienced players near 1500, as well as a decay factor to weight older results less heavily. Still, it's good to know that, taking all games at once, I'm the #1 player of all time. (Well, #25, at least... :P) |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Janzert on Nov 3rd, 2007, 12:39am Would it be possible to calculate a confidence interval then rank by the conservative rating, i.e. the rating minus the confidence interval? Interesting result as it stands though. Janzert |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by omar on Nov 4th, 2007, 1:41am Nice job on producing this list Jeh. The rankings here seem to match very closely with our intuitive feel for players strengths. I could probably use this for ordering the players in the Swiss preliminary, if we can't produce a better list before January. If you want to take a crack at generating P8 ratings you can find the code for it here: http://arimaa.com/arimaa/rating/testRatings.tgz |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 4th, 2007, 11:57pm I thought it might me interesting to do the calculations again using only games played after the conclusion of the last WC, which would mark the start of a new season. Thanks again to Fritzlein. Here are the results of that calculation: 1. clauchau 1-0 2. ntroncos 1-0 3. chessandgo 66-7 4. Rabbit 2-0 5. Yzaxtol 2-0 6. ZeroOne 1-0 7. Virgeist 1-0 8. PatoGuy 1-0 9. Fritzlein 66-6 10. challenger 2-0 11. Raymond 2-0 12. RonWeasley 23-6 13. 99of9 11-4 14. Brendan 12-7 15. knarl 1-0 16. PMertens 15-8 17. smonroy 1-0 18. jdb 12-6 19. UltraWeak 2-1 20. blue22 17-10 21. OLTI 5-5 22. robinson 2-2 23. omar 6-3 24. arimaa_master 90-28 25. petitprince 11-6 The problem inevitably encountered with these calculations is isolated pools of players who play each other but do not play anyone outside of their circle. For those players who are connected by games, the results are reasonable enough relative to each other. Just to make things look better, I'll cut out players who've played fewer than five games: 1. chessandgo 66-7 1.108 .300 2. Fritzlein 66-6 .903 .069 3. RonWeasley 23-6 .732 .146 4. 99of9 11-4 .689 .222 5. Brendan 12-7 .667 .404 6. PMertens 15-8 .582 .278 7. jdb 12-6 .476 .143 8. blue22 17-10 .437 .178 9. OLTI 5-5 .403 .403 10. omar 6-3 .375 .041 11. arimaa_master 90-28 .367 -.156 12. petitprince 11-6 .357 .063 13. mdk 15-13 .324 .252 14. Adanac 6-10 .215 .465 15. nbarriga 5-3 .177 -.073 16. Soter 25-7 .100 -.462 17. mistre 23-16 .090 -.089 18. camelback 11-7 .044 -.178 19. woh 15-17 .043 .106 20. Tanker_JD 15-11 .033 -.121 21. seanmcl 4-4 0 0 21. Asubfive 4-4 0 0 23. JacquesB 5-4 -.008 -.119 24. kerdamdam 5-5 -.061 -.061 25. megamau 2-1 -.113 -.446 26. IdahoEv 16-17 -.127 -.097 27. seanick 9-11 -.149 -.049 28. The_Jeh 13-32 -.154 .268 29. Chegorimaa 7-17 -.170 .247 30. Erezap 3-8 -.433 .021 31. NIC1138 25-85 -.450 .096 32. K_Hayes 3-5 -.493 -.243 33. ChrisB 5-6 -.517 -.426 34. aaaa 3-19 -.547 .180 35. Slowstorm 3-11 -.595 -.023 36. naveed 1-14 -.622 .244 37. Ganesha 0-5 -.623 .377 38. dougk 0-6 -.631 .369 39. nogard 3-6 -.656 -.323 40. BBcardsRI 0-5 -.715 .285 41. gunananda 1-4 -.738 -.138 42. Kruschak 0-17 -.759 .241 43. proselyte 7-19 -.760 -.299 44. froody 6-13 -.815 -.447 45. pcpdams 1-8 -.818 -.041 46. Krasnotron 4-7 -.842 -.569 47. willwould 2-4 -.984 -.650 48. casparix 0-9 -1.076 -.076 What are your opinions of the second option compared to the first? If only people would play a variety of opponents, this would work much better. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 5th, 2007, 8:01am For those of us who have been playing for longer than a year, I think the second list better reflects our results in the most recent year. All my "learning losses" to 99of9 and chessandgo's learning losses to me are not included, which probably gives a better indication of current playing strength. Still, there are players like mdk and mistre who have improved a great deal within the last year. I'm not sure what one can do about that, because at some point using only the most recent games makes the sample of games too small to be useful. Fortunately, pre-tournament ratings only have a limited impact, because the preliminary sorts things out better to seed the final, and in the final everyone gets two lives again. The tournament will be long enough this year that it will be unequivocally settled over the board within the tournament, rather than being too influenced by ratings generated during the rest of the year. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 8th, 2007, 10:19pm on 11/04/07 at 01:41:13, omar wrote:
I like that you are opening up the process, Omar, and that you will possibly use a ranking list from the community. However, my current preference would be for using the p8 HvH ratings rather than the list produced by The_Jeh's program, because for seeding the Swiss preliminary, we need to be able to seed everyone. The_Jeh's list is quite reasonable when we cut out everyone who played too few games, but for seeding the tournament we don't have the luxury of omitting players. John, do you think you could tweak your algorithm so that players with few games also have a reasonable rating? One idea would be to add in an anchor player, and fake results that everyone has one win and one loss against the anchor player. That will bias everyone towards the mean and (perhaps) produce reasonable seeds for inexperienced players. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 8th, 2007, 11:32pm Your idea of adding an anchor player might work. It would bring everyone toward the mean, but it would affect players with fewer games more than those with many games. It might punish good players with few games more than we'd like. I'll have to see the results to know for sure. One thing that I know it would help is connecting players into one pool. For example, in a pool of two players who've played one game, there are an infinite number of solutions. If A defeats B, as long as -A=B, any ratings would solve the system. So in my previous posts, players who are 1-0 and have a rating of 1 would have had a rating of 0 had I done an odd number of iterations. In the case of everyone else, the ratings do converge to a single solution that minimizes the squared error. I think with an anchor, everyone will be connected to the big pool that has one solution, so everyone's rating will converge. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 9th, 2007, 10:57am With the anchor player added, the rankings are as follows. (And I've consolidated Arimanator's accounts.) The W-L are given without the anchor games. This still uses the data from last time: 1. Chessandgo 66-7 2. Fritzlein 66-6 3. RonWeasley 23-6 4. Brendan 12-7 5. 99of9 11-4 6. PMertens 15-8 7. clauchau 1-0 8. Arimanator 8-2 9. jdb 12-6 10. arimaa_master 90-28 11. blue22 17-10 12. petitprince 11-6 13. mdk 15-13 14. omar 6-3 15. OLTI 5-5 16. Soter 25-7 17. Rabbit 2-0 18. UltraWeak 2-1 19. mistre 23-16 20. ntroncos 1-0 21. Tau 3-0 22. Robinson 2-2 23. Adanac 6-10 24. woh 15-17 25. camelback 11-7 I'm not sure I like this yet, either. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 9th, 2007, 12:00pm I guess the problem is always that assumptions have to be made. You assume players who are 1-0 or 2-0 on the list are weaker than what this rating says because you have access to knowledge the computer doesn't. You know they might have gotten lucky or might have lost other games not considered here. I, however, cannot maintain absolute objectivity by adding presumptions into the formula. And adding these presumptions always helps some things while hurting others. I've tried several different schemes of adding fictitious games, such as the Anchor player, and also Genius/Idiot players who always win or always lose, but the results are always better in some respects and worse in others. The only way for me to achieve greater accuracy is to add more true games. So that's what I'm going to do. Fritzlein, if you would be so kind, please e-mail me the spreadsheet of all rated games, HH HB and BB, played within the last 12 months, and a second list with only the last 6 months. I really won't know if it's feasible to calculate all that until I try. Actually, I'm thinking it's possible. If it can be done, the results should be perfectly acceptable. If there still are players you think should be lower or higher, you will have no evidence to point to that the computer won't have considered. I am not necessarily saying that this should replace p8, though. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Janzert on Nov 9th, 2007, 2:17pm Let's say you have player A beating player C in 100 games and losing 50 games. At the same time player B beats player C in 2 games and loses 1 game. While on the one hand you can say that from the data available it appears that players A and B are both twice as good as C. You should also be able to say that you are much more confident player A is twice as good as C than you are that player B is. Janzert |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 9th, 2007, 3:18pm Yes, but I cannot translate lack of confidence into a lower rating. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Janzert on Nov 9th, 2007, 4:36pm The way I've seen is to subtract the confidence interval from the apparent rating. Basically this means the resulting rating is saying we believe this players true rating to be at least this good with whatever confidence the interval used is. Janzert |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 9th, 2007, 5:03pm I see. You want each player to be given a performance rating on each game played, a standard deviation calculated from these games, and then a t-model used to determine the confidence interval of their true rating? I admit, it's getting a bit complicated for me. Right now, I am anxious to see the results from all the rated games of the past year, including bots. Anyone considering entering the WC, though he finds it hard to find humans to play, likely plays the bots several times. I know a reason why HB games aren't used for p8's for the WC - because playing a thousand games against weak bots will inflate one's rating. I know p8 attempts to correct this, but it does so imperfectly. That is a nonissue with this system. But we'll get sufficient quantity with bot games also considered, and we can benefit from being able to include the type of game most often played on the server. Sorry if I keep asking for more, Fritzlein, but I think I'm nearing the max of what I could ask. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by mistre on Nov 9th, 2007, 5:51pm I am continuing to watch this topic with interest. Thanks for all of your research, John! |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 9th, 2007, 9:32pm I haven't done much research, honestly. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 9th, 2007, 11:46pm on 11/09/07 at 12:00:51, The_Jeh wrote:
Mailed. 19402 games from 11/1/06 through 10/31/07. 10444 games from 5/1/06 through 10/31/07. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 9th, 2007, 11:59pm on 11/09/07 at 12:00:51, The_Jeh wrote:
It is entirely objective to assume that an unknown player is near the mean until proven otherwise. Based on concrete data you can say (for example) only 5% of players are better than 2000 in strength. If someone plays a single game and wins, that provides some objective evidence of their skill level, but why should you say that single game is weighter than objective evidence that that player is probably not in the top ten? Having prior assumptions about a newcomers skill is not unscientific if those assumptions are based on observation. Quote:
The observer changes whatever is observed. The system we choose to seed the tournament will alter people's behavior as they try to get higher seeds. The current rating system is a perfect example, as people (including me) engage in silly behavior simply because their silly behavior is rewarded with a higher rating. In my opinion HvB games are far from being "true games", and I expect the rating list you generate to reflect that. Still I'm curious to see what the results are. Trying out various things to see what works is research in my book. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 10th, 2007, 12:24am You are correct as usual, Fritzlein. As far as newcomers go, the way we tabulate the results using recent games, I cannot assume someone with few games is a newcomer. For example, Robinson only had 4 HvH games since the last WC. You are also right that the results will be manipulated. However, I don't know how you can manipulate my system (which is really just a fundamental system no one can claim as his own) to your advantage except by actually improving yourself. Playing a ton of bots won't help you if you're already known to be better than them. Playing a ton of top humans won't help you if you can't win. And if you can win, you deserve the higher rating. Besides, my system is not something people can keep track of, like the p8 ratings are. Thank you for the data. I'm working on getting a ranking, but it's a ton to plow through. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 10th, 2007, 10:42am on 11/09/07 at 10:57:04, The_Jeh wrote:
I generated rankings using the same 629 games, and the same assumption that all games count equally (not more recent ones more heavily), and the same methodology of having everyone win one game and lose one game to an anchor player. The difference is that I used our ratings model and maximum likelihood estimation. The reason to prefer maximum likelihood estimation is that in The_Jeh's method you can be penalized for beating a weak player or rewarded for losing to a strong player. Strength of schedule can be more important that winning and losing. With maximum likelihood estimation, beating a weak player always helps your rating, and losing to a strong player always hurts your rating, albeit perhaps very slightly. This results are rather similar: 2374 chessandgo 66-7 2357 Fritzlein 66-6 2054 PMertens 15-8 2044 Brendan 12-7 2040 RonWeasley 23-6 2023 99of9 11-4 2002 clauchau 1-0 1910 arimaa_master 90-28 1910 jdb 12-6 1905 blue22 17-10 1859 mdk 15-13 1803 OLTI 5-5 1790 omar 6-3 1775 petitprince 11-6 1752 Adanac 6-10 1751 Soter 25-7 1725 Arimabuff 3-1 1724 mistre 23-16 1722 challenger 2-0 1722 Raymond 2-0 1722 Rabbit 2-0 1695 UltraWeak 2-1 1692 robinson 2-2 1683 woh 15-17 1661 ntroncos 1-0 1655 Tau 3-0 1650 Tanker_JD 15-11 1639 camelback 11-7 1631 Yzaxtol 2-0 1614 nbarriga 5-3 Then I did a version of Janzert's lower confidence idea. I asked how much it would hurt a player's ratings to lose an additional game to the anchor player. We might say that if one loss would cause a player's rating to tumble, they don't deserve a high seed in the World Championship. Here are what you might call the lower confidence ratings: 2344 chessandgo 66-7 2317 Fritzlein 66-6 2000 RonWeasley 23-6 1980 Brendan 12-7 1977 PMertens 15-8 1939 99of9 11-4 1898 arimaa_master 90-28 1872 blue22 17-10 1858 jdb 12-6 1816 mdk 15-13 1735 OLTI 5-5 1728 petitprince 11-6 1724 Soter 25-7 1707 omar 6-3 1705 mistre 23-16 1703 Adanac 6-10 1658 woh 15-17 1627 Tanker_JD 15-11 1617 clauchau 1-0 1609 Arimabuff 3-1 1603 camelback 11-7 1595 challenger 2-0 1595 Raymond 2-0 1593 robinson 2-2 1590 Rabbit 2-0 1585 UltraWeak 2-1 1569 Chegorimaa 7-17 1568 IdahoEv 16-17 1567 JacquesB 5-4 1561 nbarriga 5-3 |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 10th, 2007, 1:49pm Your ratings look pretty good. I question your statement that you counted all games equally. Yours is a sequential system. Even if you leave the RU the same, doesn't the final result differ slightly depending on the order in which you tally the games? The system I am using right now, on the other hand, does consider all games simultaneously. I'm not saying that's good or bad, but isn't that the way it is? I'm having trouble pushing the games through my system. Using the 6-month list, I was only able to get results from one iteration, which is meaningless. That is very disappointing. I can think of only one possible solution, which would be to use player ID #'s instead of player names, which would drastically reduce the length of the string and perhaps make it easy enough to handle. Could that be arranged? I am tired of asking for more, please believe me. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by jdb on Nov 10th, 2007, 2:31pm This is a link to a decent ratings calculator: http://remi.coulom.free.fr/Bayesian-Elo/ |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 10th, 2007, 5:47pm I'm working on something like what's found on http://www.pro-football-reference.com/blog/wordpress/?p=171#comment-12705. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 10th, 2007, 9:03pm Now I have two systems. I'm going to put the newer one aside for the time being. I still think having a string with only player ID's would be easier to use, but, I found an inefficiency in my programming technique, corrected it, and now what once took 6 hours now takes less than half an hour. Wow. So never mind with the request. So, here are the results of the older system using all rated games of the past 6 months, no anchors added: 1. obiwan 1-0 2. chessandgo 30-2 3. Ryan_Cable 1-0 4. Fritzlein 60-7 5. RonWeasley 13-2 6. robinson 1-0 7. blue22 92-26 8. Arimanator 532-65 9. smonroy 2-0 10. Aamir 1-0 11. PMertens 13-6 12. syed 1106-34 13. 6sense 173-131 14. OLTI 3-2 15. arimaa_master 102-27 16. petitprince 1-1 17. willwould 2-0 18. jdb 8-4 19. naveed 113-111 20. 99of9 4-2 21. UltraWeak 4-3 22. Brendan 4-3 23. mdk 182-118 24. omar 19-19 25. Adanac 9-10 I think you're right, Fritzlein. This looks kind of ugly. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 10th, 2007, 9:32pm I'm going to give an example of the newer system now, with Janzert and Fritzlein's lower confidence method, 1 win and 2 losses to the Anchor (same 629 games): 1. chessandgo 66-7 20.38245 2. Fritzlein 66-6 18.42715 3. RonWeasley 23-6 4.01976 4. Brendan 12-7 3.39503 5. PMertens 15-8 3.30043 6. 99of9 11-4 3.03360 7. arimaa_master 90-28 2.35544 8. jdb 12-6 2.08053 9. blue22 17-10 2.03331 10. Arimanator 8-2 1.84120 11. mdk 15-14 1.49077 12. clauchau 1-0 1.34560 13. Soter 25-7 1.17840 13. OLTI 5-5 1.17840 15. omar 6-3 1.13829 16. petitprince 11-6 1.13657 17. mistre 23-16 0.94523 18. Rabbit 2-0 0.89194 19. Adanac 6-10 0.87342 20. UltraWeak 2-1 0.79323 21. woh 15-17 0.74131 22. robinson 2-2 0.73276 23. ntroncos 1-0 0.69139 24. Tau 3-0 0.67645 25. Yzaxtol 2-0 0.66307 26. Tanker_JD 15-11 0.64187 27. camelback 11-7 0.61546 28. knarl 1-0 .59045 29. Virgeist 1-0 0.57366 30. nbarriga 5-3 0.56981 Under this system, the probability that player A defeats player B is given by A/(A+B). For example, my rating is 0.44583. So, the probability Fritzlein defeats me is 98% (even though we know it's 100% ;) ) |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 11th, 2007, 12:21am on 11/10/07 at 13:49:11, The_Jeh wrote:
The ratings I posted a while ago in another thread used a sequential system, and the p8 ratings do as well, but for the ratings I posted in this thread, the order of games was irrelevant. I generally prefer to weight later games more heavily, but didn't do so this time. It made maximum likelihood much easier to calculate. It only took me a second or two per iteration, so I could get convergent ratings in a hurry. Quote:
Sure, I can get you the data with ID's instead of player names, albeit not until tomorrow. What exactly did you want for the date range? Tonight the data that is downloadable from arimaa.com will automatically update with games through November 10. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 11th, 2007, 9:39am on 11/10/07 at 21:32:00, The_Jeh wrote:
I like that formula a lot. For one thing it never sets impossible expectations such that a player can lose rating points for defeating a weak opponent. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 11th, 2007, 2:19pm So, what do you think of this list compared to your recent lists, Fritzlein? Are either of them adequate? Have we reached an acceptable system? I think we have punished unproven players sufficiently, and I don't think there will be too many of them signing up, anyway. As it is, both our systems seed the current entrants the same way. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Arimanator on Nov 11th, 2007, 3:14pm on 11/11/07 at 14:19:53, The_Jeh wrote:
For some reason that expression evoked in me an old character of "Saturday Night Live", "Unfrozen Caveman Lawyer". I know that it's not relevant to the discussion at hand; still I find it funny enough to be mentioned in passing. ;D |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 11th, 2007, 11:45pm on 11/11/07 at 14:19:53, The_Jeh wrote:
I think that either your last list or my list would be clearly better than the gameroom ratings, and good enough to be acceptable for seeding the World Championship. One important point is that people who have played more games against humans are penalized less by the losses to the anchor player, so if people "play to the system", they will play against more humans, which is a good thing. Your list is better than mine by virtue of consolidating Arimanator's games; otherwise I can't see a reason to choose between them. As long as the seeding is roughly accurate it will be fine. The advantage of the #1 seed over the #2 seed is much less in Swiss pairing than in Floating Double Elimination, so we don't have to stress about it as much. That said, there are two features of p8 ratings that beat both our systems, so I would still like to use p8 ratings for the seeds if Omar has the time to run them. First, p8 ratings take the variety of opposition into consideration. It's considered more impressive to beat ten different opponents than the same opponent ten times. Second, p8 ratings weight more recent games more heavily, so we don't have to have an arbitrary date cutoff. We can run the ratings over all time and still get reasonable seeding. This would be more important for players like robinson and clauchau who have significant history that could be used to seed them but few recent games. In any event, it is good to get this conversation started. Thanks for your efforts, John. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by The_Jeh on Nov 12th, 2007, 11:43am That's fine. Computer rankings are something that I've been wanting to understand for a long time, and I'm glad I've had the occasion to figure them out. p8 is Omar's invention, I take it? I am impressed with how sophisticated it is. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 12th, 2007, 12:40pm Omar and I developed the p8 ratings together. At the time I thought they would produce reasonable ratings even with HvB games included, but I have since changed my mind. Even a sophisticated system can't fix what are essentially social issues. There are two things that kill the rating system: the inability of bots to learn, and the freedom of players to select their own opponents. If we had neither of these problems, i.e. only HvH games, and opponents assigned rather than chosen, then even the current game room rating system would be very accurate. Conversely, since we actually do have both of these problems, it doesn't matter how clever the rating system is, because it won't be accurate anyway. It is no coincidence that on ICC (the Internet Chess Club) folks judge that the only "real" ratings are the ones where you sign up to play a 5-minute game, without knowing your opponent in advance, without being able to escape playing after you are paired, and with no bots involved. ICC took an administrative route to eliminating the two major sources of error, and voila, the ratings distortions disappeared. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 12th, 2007, 5:08pm on 11/10/07 at 14:31:22, jdb wrote:
Thanks for the link, JDB. That does look like a good calculator. Two of the nice features aren't very relevant for us, namely handling first-player advantage and handling draws. For Arimaa we can't even measure the first-player advantage, and there has yet to be a draw between humans. A third feature is of critical importance. It is a terrible approximation to take the combined winning percentage against one's "average opponent". That approximation would only work if winning percentage were linear, for example if there were an extra 10% chance of winning for every 100 rating points, which isn't true. If you make the assumption of linearity, you get absurd situations where, for example, A is rated 600 points higher than B and thus is expected to win 110% of time, so winning only 100% of the time against B hurts A's rating. I'm afraid I don't understand the fourth feature, namely the prior distribution. What does he mean that the prior distribution will be chosen to be uniform? Uniform over what interval? I don't think the prior can be uniform over the whole real line. I'm embarrassed that I don't understand the math. I'll think about it some more, and if I figure it out I'll post again. |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by Fritzlein on Nov 12th, 2007, 5:36pm Ah, I thought I had heard the name Rémi Coulom somewhere before. He wrote the Go program Crazy Stone. JDB, did your knowledge of his Go efforts lead you to his ratings calculator? |
||||
Title: Re: Arimaa Top 25 COMPUTER Power Ranking Results Post by jdb on Nov 13th, 2007, 1:21pm I can't really remember how I came across his website, it was a longish time ago. An internet search for "bradley terry model" will yield a wealth of info on generalizations for the a/(a+b) ratings model. |
||||
Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1! YaBB © 2000-2003. All Rights Reserved. |