|
||||||||
Title: Improving Arimaa Rates Post by Tachyon on Nov 5th, 2008, 5:39pm There has been numerous threads where the topic of arimaa game ratings has come up and it is seems that most agree that there is room for improvement. A while back Fritzlein has sugessted I start a thread to raise suggestions as to how this could be done. I thought that a ggod way to kick this off is to get some consensus about what the problems are and what causes them before looking at possible solutions. So far It seems to me that the main issues are : 1) Standardise the rating method Not a huge problem but I think that there should be consensus as to which system is used for ratings and why ( e.g. P8 or not ) and to stick with it. Having different ratings values for the same indivudual is not very helpfull and dilutes the sense of value attached to the rating. 2) Rating Manupilation Here I think we need to identify what issues there are through which ratings can be distorted and which ones have the most significant impact. So far I am thinking these are : 1) Bot bashing 2) Game platform connectivity issues. 3) Player collusion in HvH games. 4) Strong players taking advantage of weak or new players in HvH games. Any constructive input will be appreciated. |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 14th, 2008, 10:45am Thanks for starting this Tachyon. My current thoughts on the Arimaa rating system and rating systems in general is that there are really two parts to a rating system. The first is the mathematical model and the second is the game filter. Most rating systems including the current Arimaa rating system are based on the Elo model. This model works pretty good and I don't think there is much to be gained by tweaking it. The game filter part of the rating system is what determines which games are used as input to the first part. I think this is the part where there is room for improvement. Take for example the USCF or FIDE rating system; they are more accurate not because the mathematical model is so different, but because the games used were highly filtered. You can't just take any game you played and apply it towards your USCF rating, it has to be supervised and played under strict conditions and then it has to be submitted through the appropriate channels and finally be approved by the USCF to be included in the ratings. It can take a while before the game you just played eventually has an effect on your rating. But this is compensated by having more accurate ratings. In the Arimaa gameroom, players get to pick if the game will be rated or not, who the opponent will be, what time control will be used, when the game is played and even after that they can go back and unrate the game under some conditions. They may even be just playing the game for bot basing or experimenting with a particular setup. The games going into the rating model are pretty much unfiltered. But the good part is the players get instant gratification of seeing their ratings move right after the game. Also we develop a larger pool of games that can be used to compute ratings from. So to really get more accurate ratings I think the rated games we have need to be filtered. Automatically deciding which games should be used and which games should not is a bit of an AI problem. Not that different than say the problem of approving loan applications. And we know how good banks are doing that :-) More and more I am leaning towards leaving the gameroom rating system as it is (so that players have the freedom to chose which games are rated and get instant gratification of seeing their ratings change) and filtering the rated games through another system to get more accurate ratings for different needs. Perhaps we should have a contest to see who can develop the best filtering system. All your system has to do is go through the games archive and pickout which games should be used as rated games. The system is free to look player histories, the actual moves of the game and everything else that is available about the game in selecting the rated games. Those games would then be put through the same mathematical model as currently used in the gameroom to see which produces the most accurate ratings. But before we could judge the filtering systems we would have to answer the question of what we consider to be accurate ratings :-) |
||||||||
Title: Re: Improving Arimaa Rates Post by Fritzlein on Nov 14th, 2008, 2:38pm A game filter could also operate on a continuum. Instead of games either counting or failing to count, they could be weighted anywhere between one and zero. Omar and I talked about the desirability of having repeated games between the same two opponents count less than the same number of games against a variety of opponents. I'm not sure how to implement that, though. Let's say I play a series of 100 games against Bomb. If we institute any kind of "diminishing returns" policy, whereby each game counts less than the previous, then my 100th game will count the least. But shouldn't it count the most, since it is my most recent? On the other hand, the reverse philosophy of counting the most recent game fully and counting older games less and less requires some kind of retroactive recalculation. Folks have resisted historical revisionism on the grounds that it is too complicated, too CPU-intensive, and/or too counter-intuitive. |
||||||||
Title: Re: Improving Arimaa Rates Post by mistre on Nov 14th, 2008, 7:22pm Let's look at Tachyon's four categories of potential ratings abuse: 1) Bot bashing 2) Game platform connectivity issues. 3) Player collusion in HvH games. 4) Strong players taking advantage of weak or new players in HvH games. 1) Bot bashing I think is the #1 culprit for inaccurate ratings with the current system. Here is a potential fix. Once a player has beaten a particular bot more than x times in a row (in rated games), then his future games vs this bot do not count towards his rating until he loses 1. Then he could win another x times in a row and have the rating count before he would have to lose 1. Ideally, losing the 1 game would lower his rating enough to prevent the player from just winning x and then losing 1 on purpose. I have no idea what value x would be - an outside guess would be about 7. Once a player can win 7 in a row vs a particular bot, then it could be assumed that this player can pretty much win 90% of the time vs it. Any further wins would therefore should not raise their rating. I think there are enough different bots available for play that a player can still raise their rating substantially. But once they run out, they will have to face tougher bots to raise their rating. 2) Game platform issues. There is the unrate feature which helps with this, but there are still the instances where a player feels that they are winning and Bomb declares them losing. Or perhaps a player is unaware of the unrate feature. I don't see an easy fix for this, but I don't see it as a big problem either. This problem pales in comparison to #1. 3) Player Collusion. This one should probably be handled on a case by case basis. I haven't really seen any evidence of this happening and I think it would be pretty easy to spot if it was. 4) Strong vs Weak in Human games. Once again, I don't see this as a major issue. Weak players will generally avoid strong players if they lose once or twice vs them. It is also much harder to play multiple games vs human opponents due to availability vs bashing bots as many times in a row as you want. There has been no evidence that anyone has an inflated rating just by picking on newcomers. Overall, I really only see #1 as a problem and hopefully my attempt at a solution has some merit or will help someone else to come up with a solution. |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 16th, 2008, 9:07am on 11/14/08 at 14:38:51, Fritzlein wrote:
Yes, definitely; in fact that would be the desired way to do it. The weight would in effect scale the K factor used in the Elo formula. So if a game is specified to have a weight of 0.5 then only half of the normal K value would be used. For a long time Karl has been suggesting that I change the gameroom rating system so that games between humans and bots use half the normal K value. I've been resisting since it did seemed like adding a very ad hoc component to the rating formula. But Karl's suggestion fits nicely in the context of a game filter. The game filter simply returns 0.5 for the weight of games between humans and bots and 1 otherwise. Quote:
Yes, more complex filters that look at complete game histories of the players are pretty compute intensive, but they can definitely be run in an off line mode. |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 16th, 2008, 9:33am on 11/14/08 at 19:22:21, mistre wrote:
That sounds like an interesting filter to try out. Quote:
Even though I don't do much bot bashing games, I tend to play a lot of late night blitz or fast games without much concern for the effects on my ratings. I also occasionally experiment with different setups and don't bother to play these as unrated games. So in addition to bot bashing these kinds of games are also degrading the rating system. Also we are mixing games of vastly different speeds all into the same single rating number. I think this also contributes significantly to making the ratings inaccurate. |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 16th, 2008, 9:41am I would really like to see a situation where different rating lists are posted as a web page and also a web service. To facilitate this I can make available a version of the games database that contains only the rated games and is updated hourly. That way anyone who wants to try out an idea of how to filter the games can download this and try it out. I think this will allow various ideas to be tried out and see what the community likes. |
||||||||
Title: Re: Improving Arimaa Rates Post by Fritzlein on Nov 16th, 2008, 10:46am on 11/16/08 at 09:07:30, omar wrote:
No, no, that isn't at all what I have been suggesting. I have been advocating that while games between humans use the win probability formula 1/(1+10^((A-B)/400)), games between a human and bot should use the win probability formula 1/(1+10^((A-B)/200)). The K value affects the volatility of the ratings, i.e. it determines how fast ratings change. I don't have strong feelings about the current volatility. The scaling factor of 200 versus 400 is a completely separate matter, and is justified in the following way: Take it as fixed that if I can beat someone 10 times out of 11, I deserve to be 400 points higher than them, and if I lose 10 times out of 11, I deserve to be 400 points lower than them. This is the normal scale of Elo ratings. However, it doesn't transfer well to bots. If I can beat a bot 10 games out of 11, I might not be that much better than the bot. I might be only a little better, and be winning by rote. Therefore the system should put me only 200 points above that bot. Similarly if I lose to a bot 10 times out of 11, I might actually be almost at its level, but still losing lots of games due to my blunders while the bot is infallible. Therefore I should only be rated 200 points below the bot. To repeat, I wasn't suggesting changing the volatility of the ratings, or suggesting counting HvB games for less than HvH games, although both of those ideas are reasonable. The thrust of my suggestion was about scaling HvB games to be twice as compressed as HvH games. |
||||||||
Title: Re: Improving Arimaa Rates Post by Fritzlein on Nov 16th, 2008, 10:46am on 11/16/08 at 09:41:29, omar wrote:
Sounds like a great idea! |
||||||||
Title: Re: Improving Arimaa Rates Post by Tachyon on Nov 16th, 2008, 7:24pm Omar : Quote:
I totally agree ... This should be taken into consideration. I a recent chatroom discussion Fritzlein referred to fast and slow games as two different games. I believe we should separate the ratings for fast and slow games. Omar : Quote:
Players quality of play tend to vary over time and as I would think that we want the rating to reflect their current level of play I do not see the value of including games older than a certain time limit .. say 1 year Fritzlein: Quote:
Surely ... not making blunders and playing consitently are key aspects of what constitutes a good player. I do not see why that serves as a reason to differentiate between bot and human play ? Fritzlein: Quote:
I agree ... However .. another player may have won 10 games out of eleven without using any bot bashing tactics ... how does a system distinguish between the two ? |
||||||||
Title: Re: Improving Arimaa Rates Post by Adanac on Nov 17th, 2008, 12:48pm on 11/16/08 at 19:24:22, Tachyon wrote:
Rather than just slow/fast, I would prefer to have 3 categories (fast/slow/postal), since they all require different skill sets. I tend to think of 15 or 30 seconds/move as “fast”, though some may argue that 45 seconds/move is also “fast”. There are many redundant bots, such as botXblitz, botXfast, botXP1, botX, etc. It’d be nice to have a single botX with multiple ratings that can play at any speed. I suppose that would get pretty messy with tens of thousands of archived bot games to re-assign to a consolidated bot. Fritzlein’s 1/(1+10^((A-B)/200)) formula is a surprisingly simple and effective way to reduce in half the number of points that can be “stolen” by rote from any bot – I like it. The (good) unintended side affect is encouraging new players to play more games against other humans - rather than just sticking to the bot ladder - if they desire a faster climb up the rating chart. I’m not partial to the idea of reducing the weight for each additional game played between the same 2 players. If World Championship seeding is the greatest concern, then how about using previous World Championship results to generate the seeds? Or, create a 4th rating category “tournament” which would be in effect only for the WC and possibly the Continuous and Postal tournaments, or any other controlled events. |
||||||||
Title: Re: Improving Arimaa Rates Post by Tachyon on Nov 17th, 2008, 4:46pm The biggest problem with fast speed games is that it promotes unforced blunders by relatively good players causing them to lose games that they would not have lost given some more time. I also believe that time controls that forces games to be played at a steady pace is advantageous to bots since humans tend to require a lot more thinking time in some positions than others. There also seem to be arguments that faster time controls and/or per move time limits are implemented more for spectator benefit than player benefit. I think we should define slow / fast games according to the purpose they serve. Fast games : 1) Strongly favours those who have many hundreds/thousands of games experience. 2) Favours those who have a better ability to think fast than deep 3) Somewhat favours bots vs humans. 4) Puts spectator interest above quality of play. Slow games : 1) Equalizes the playing field for well practiced players vs strong new players to some extent. 2) Put those who think slower but deeper on a more equal footing ( maybe somewhat favor them ) to the fast thinkers. 3) Puts bots and humans on a more equal footing. 4) Puts quality of play above spectator interest. Given the above objectives I would define fast games as any games faster than 2 minutes per move. Slow games would be 2 or more minutes per move up to a game time limit of say +-8 Hours I agree with Adanac that postal games need to be treated differently since the greater time length also allow players to research their moves or elicit some other form of help . |
||||||||
Title: Re: Improving Arimaa Rates Post by Janzert on Nov 18th, 2008, 6:04pm on 11/17/08 at 16:46:23, Tachyon wrote:
Hmm, yet at least in go and chess the opposite end of the spectrum seems to be regarded as very favorable to bots. Absolute time control seems to be often given as being unreasonably bad for humans because they aren't very good at leaving themselves the time to play the end game out correctly. I think the arimaa timecontrols in general are fairly good at getting humans to actaully use the time pretty well. They allow a player to make "obvious" moves quickly and not lose that time for use later but generally disourage them from spending too long in one move and running out of time before the end of the game. Janzert |
||||||||
Title: Re: Improving Arimaa Rates Post by Tachyon on Nov 18th, 2008, 7:37pm I do not think it is the purpose of time controls to assist players in regulating their time usage. Using your time efficiently is part of the skill of the game. It only makes it worse if the computer regulates time in a way which is too inflexible too allow players to play a style that suits them. I would rather lose a game because I run out of time due to my own bad time management, than making blunders or bad moves due to a forced pace in a difficult position. |
||||||||
Title: Re: Improving Arimaa Rates Post by jdb on Nov 18th, 2008, 7:54pm The standard arimaa time controls are nice for spectators. The players play at a regular pace. Watching a game using the chess time control can be horrible for spectators, if one player decides to use an hour for one move. |
||||||||
Title: Re: Improving Arimaa Rates Post by Janzert on Nov 18th, 2008, 8:52pm on 11/18/08 at 19:37:45, Tachyon wrote:
Ahh, then surely the fairest time control is absolute time. ;) But I think you'll find computers will benefit more than humans from this. My main point was really meant to be that I don't believe the current time controls are an advantage towards the bots but are actually in general the opposite. Quote:
I think you really want "time control" not "computer" there. ;) Yes, the extreme in this direction of course is a fixed time per move (no reserve, any unused time is simply lost). This may also be an advantage towards computers although I would guess less than an absolute time would be. Quote:
I've seen more people argue that they had a won endgame and simply didn't have the time to finish the game but deserved the win anyway because they were obviously in a won position. Ignoring the possibility that the reason for both the won endgame and running out of time is that they spent too much time earlier allowing them to make the better moves but not finish the game over the board. In other words I think most people consider the time limits more of a necessary evil to assure the game progresses (or ends in a certain time) than something intrinsic to the game that game skills should need to be applied to. So in general the more a time control can disappear from the players consideration the better it is. Janzert |
||||||||
Title: Re: Improving Arimaa Rates Post by Tuks on Nov 18th, 2008, 9:38pm that is my main problem right now, im not tactically strong enough to instinctively know good moves in a winning position and i find in many...many blitz games that i can get myself into a good position but i slowly lose focus under time pressure as the game progresses and i lose overall i think its important that the rating reflects all aspects of a persons game, not just the good aspects of the game |
||||||||
Title: Re: Improving Arimaa Rates Post by Tachyon on Nov 19th, 2008, 1:08am Quote:
This may be true ... The question is ... do you rather deserve to lose because you managed you entire game time badly or because you made one hasty blunder under time pressure in one move ? |
||||||||
Title: Re: Improving Arimaa Rates Post by Janzert on Nov 19th, 2008, 1:40am With all the arimaa time controls in use that I'm aware of I would consider the latter to be a subset of the former. Janzert |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 19th, 2008, 7:47am on 11/16/08 at 10:46:18, Fritzlein wrote:
Thanks for clarifying this Karl. It's been a while since we discussed this. It would be interesting to use the weight of the game within this equation: 1/(1+10^((A-B)/(W*400+1))) where the weight is between 0 and 1 instead of using it to scale the K parameter. Or maybe two weights could be specified; one to scale the rating difference and another to scale the K parameter. |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 20th, 2008, 8:46am I wrote a program to generate the offline ratings based on using a front end game filter. You can download it and try it out. It includes filters for human-human games, human-bot games (as proposed by Fritzlein), postal games and seven wins (as proposed by mistre). You can also create your own filters pretty easily. You can get it from here: http://arimaa.com/arimaa/rating/gameFilter.zip |
||||||||
Title: Re: Improving Arimaa Rates Post by mistre on Nov 20th, 2008, 1:43pm I downloaded the file, but I don't have the slightest idea how to use it. ??? |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 20th, 2008, 6:57pm Just unzip it and check the README file. Maybe my instructions on how to use are not clear. Let me know if you have any specific questions. You will need need Perl on your system to run it. |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 24th, 2008, 6:35pm Just wondering if anyone had a chance to try this. |
||||||||
Title: Re: Improving Arimaa Rates Post by camelback on Nov 25th, 2008, 8:54am Yes omar, I tried it and it worked fine. I also created a filter for non-casual games and excited to see it worked ;D Code:
|
||||||||
Title: Re: Improving Arimaa Rates Post by camelback on Nov 25th, 2008, 9:07am Code should be very modular to make it very easy to create a filter. We can even combine all the default filters in the zip file into 1 single filter, giving different weights for each and get a complex rating.. Great job omar ! :o |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 26th, 2008, 6:15am Thanks camelback. Glad to hear that it worked for you. Here are the top 20 players without any filter applied; basically the current gameroom ratings; except that all players started with a rating of 1300. username rating ratingk played Arimanator 2444 30 1162 syed 2321 30 3220 chessandgo 2221 30 578 Fritzlein 2189 30 1230 DorianGaray 2080 30 343 RonWeasley 2060 30 217 ArifSyed 2057 30 1945 Adanac 2009 30 930 6sense 1989 30 843 99of9 1976 30 417 Raymond 1957 30 110 blue22 1935 30 1359 mistre 1931 30 1059 The_Jeh 1930 30 691 robinson 1919 30 435 Belbo 1899 30 914 bot_Bomb2005Lightning 1886 30 219 mdk 1872 30 396 PMertens 1867 30 959 PierreHenry 1864 52 48 arimaa_master 1856 30 712 bot_OpFor 1839 30 530 Here are the top players using the filter proposed by Karl; where games between humans and bots are counted less than HH or BB games: username rating ratingk played chessandgo 2137 30 578 Fritzlein 2059 30 1230 Arimanator 2056 30 1162 RonWeasley 1990 30 217 DorianGaray 1915 45 343 99of9 1901 30 417 Adanac 1891 30 930 syed 1869 55 3220 robinson 1856 30 435 UltraWeak 1850 38 95 Raymond 1848 55 110 6sense 1825 45 843 Belbo 1810 30 914 The_Jeh 1803 30 691 ArifSyed 1802 30 1945 mistre 1794 30 1059 PierreHenry 1794 72 48 bot_Bomb2005Lightning 1790 55 219 naveed 1787 30 1661 blue22 1784 30 1359 PMertens 1783 30 959 challenger 1783 66 52 Here are the top players using the filter proposed by Mark; where if you have already won 7 games against this opponent, further wins don't count. username rating ratingk played chessandgo 2345 30 433 blue22 2053 30 688 ArifSyed 2035 30 1550 naveed 1994 30 990 RonWeasley 1970 30 205 mistre 1937 30 802 99of9 1932 30 299 The_Jeh 1916 30 426 Fritzlein 1914 30 714 robinson 1910 30 340 Adanac 1893 30 713 6sense 1887 30 549 mdk 1887 30 301 PierreHenry 1887 52 48 Arimabuff 1878 30 348 Ryan_Cable 1867 30 550 Arimanator 1851 30 381 challenger 1840 48 52 omar 1836 30 540 bot_lightning 1833 40 60 Kraizy_Dave 1832 30 331 roger 1826 56 44 Here are the top players using only postal games: username rating ratingk played Fritzlein 2024 37 63 chessandgo 1912 30 72 RonWeasley 1827 30 108 Adanac 1796 47 53 99of9 1794 70 30 UltraWeak 1714 86 17 mistre 1648 30 83 blue22 1632 30 100 clauchau 1627 78 22 jdb 1622 49 51 Belbo 1598 77 23 OLTI 1578 73 27 arimaa_master 1576 30 274 bot_OpFor 1546 86 17 ChrisB 1535 72 28 The_Jeh 1502 69 31 omar 1499 46 54 Soter 1484 60 40 Brendan 1474 98 11 PMertens 1465 82 19 thorin 1455 98 11 Tanker_JD 1443 82 19 Omar |
||||||||
Title: Re: Improving Arimaa Rates Post by mistre on Nov 26th, 2008, 11:59am Wow! Under the 7 wins system, I am ranked higher than Fritzlein, definitely something wrong with it! Omar, I think you didn't get the entire gist of my idea. Let me explain again. The idea was that once a player wins 7 in a row vs any bot, then any further wins vs that bot do not count UNTIL he loses 1. Then the count would start over again where his next 7 wins would count and then they wouldn't count beyond 7. What I think this system would do would lower the ratings of the few players that have abused the bot system - playing the same bot over and over again if they know that their is virtually no chance they will lose against that particular bot. Also, the Postal rankings look pretty accurate, although I have yet to beat Blue22 or Omar in a postal game so in my eyes they still rank ahead of me. :) |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Nov 29th, 2008, 12:36pm on 11/26/08 at 11:59:28, mistre wrote:
I think I said it wrong in the description, but I think the code is doing what you intended: Code:
|
||||||||
Title: Re: Improving Arimaa Rates Post by Tachyon on Nov 30th, 2008, 1:02am OK ... I have tried my hand at this ... the code may be clunky as I have never coded in perl before. The following is a filter that only includes games played in the 2 years preceding the current date. My rationale is that we want the rating to reflect the current strength of a player as far as is possible. However we need at least 70 games for the rating to "stabilise" so I set the period at 2 years. The following is the filter and its result for the top 50 players. Please check it ... it is quite possible i did something stupid :) The filter: Quote:
The Result: Quote:
|
||||||||
Title: Re: Improving Arimaa Rates Post by Tachyon on Nov 30th, 2008, 1:26am Quote:
Omar, what does this mean ... with what start rating is the current gameroom ratings calculated ? Maybe for the sake of comparison we should keep the filter tests start rating the same as the gameroom ?? |
||||||||
Title: Re: Improving Arimaa Rates Post by Janzert on Nov 30th, 2008, 1:55am Currently gameroom ratings do start at 1300, but they used to start at 1500. So players will have a slightly different rating in the gameroom than what Omar calculated here, but the effect should be fairly small and get smaller over time. Janzert |
||||||||
Title: Re: Improving Arimaa Rates Post by Tuks on Nov 30th, 2008, 2:45am when you manage to make a filter that puts me in the top 50 then you will have succeeded ;) wouldn't combining Tachyons "current strength" added on to the "human games are worth more" be more accurate? I still see arif and arimaanator up at the top... |
||||||||
Title: Re: Improving Arimaa Rates Post by Tachyon on Nov 30th, 2008, 4:31am Ill try my best tuks ;D . I know this list is still far from correct. It does not deal with bot bashing or slow vs fast games. I just wanted to test one change at a time before starting to combine them. |
||||||||
Title: Re: Improving Arimaa Rates Post by omar on Dec 3rd, 2008, 7:22am on 11/30/08 at 01:02:11, Tachyon wrote:
Code looks right. Glad to see that you were able to start experimenting with it. |
||||||||
Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1! YaBB © 2000-2003. All Rights Reserved. |