|
||
Title: Ratings: quick fix for intransitivity Post by Fritzlein on Feb 22nd, 2006, 4:36pm In another thread we discussed how to reduce the ratings distortion caused by someone picking a single opponent (particularly a bot) and playing that opponent over and over. We tossed around some complicated formulas that may some day be implemented, but here's an easy band-aid: Whenever two players sit down to a game, scan the last 50 rated games of each. If either player has the other player 7 or more times in their last 50 rated games, then the game automatically converts to unrated. In other words, you can play a given opponent all you like, but no more than 14% of your rating can be based on playing that opponent. We might want to make an exception for both bots and humans 1500 or under, but once your rating starts to climb, you should have to start mixing it up between a variety of opponents for the games to count in your ratings. What do you think? |
||
Title: Re: Ratings: quick fix for intransitivity Post by jdb on Feb 22nd, 2006, 5:28pm Nice idea. Maybe tweak it a bit, only make the game unrated, if one side is winning a "large" percentage of the games. If the last 7 games are split 4 to 3, then it would be ok to make the game rated. If the last 7 are 7-0, then play unrated. |
||
Title: Re: Ratings: quick fix for intransitivity Post by PMertens on Feb 22nd, 2006, 6:09pm Imagine the possibility, that a total noob is playing only against a top-player and gets personal training ... his rating will not go up ? In fact his rating would first go down and then freeze even when he is starting to win once in a while ... I do not think that "rating" should be an incentive to play in a certain way - we got PotM for that ... If I decide in the future to play only against Top 2000 players, than that might be a loss for the community but certainly not for my quality of play. Since we do not really have that many Top2000 ... plenty of games would be lost ... |
||
Title: Re: Ratings: quick fix for intransitivity Post by omar on Feb 22nd, 2006, 7:16pm You would really have to try it out to know how this effects the whole system and how much it fixes the problem. Plus you would want to try it with different constants. We can't just make changes to the system based on our mental simulations. But we can use mental simulations to guide us in deciding which actual simualtions to try. |
||
Title: Re: Ratings: quick fix for intransitivity Post by frostlad on Feb 22nd, 2006, 8:20pm How many cases have there been of a player playing one opponent over and over to boost their ratings? I am just wondering as it seems like a lot of hassle to fix a problem if it doesn't really happen. If you are looking at really fixing it I like the idea of looking at the last few times the two opponents have played together and let the game stand if the winning percentage isn't grossly tilted in one players favor. |
||
Title: Re: Ratings: quick fix for intransitivity Post by Ryan_Cable on Feb 23rd, 2006, 12:19am Personally, I don't want to do something that is such a hack unless we have run out of other options. Moreover, I don't think that this will really fix the problem. Currently there are seven bots rated above 1700. 5 are Bombs and 2 are Cluelesses. I could beat any of these bots 95% of the time, if I focused on it. Except, perhaps, for the Blitz bots, I think the entire bot ladder is mostly consistent with me being rated 2300+. There is no significant intransitivity between me and the bots. However, intransitivity becomes very important when we add top humans to the picture. I am 1-22 against 99of9, Fritzlein, robinson, and Adanac, and 2-29 when you add in PMertens and blue22. I have probably gained in skill since a few of those losses, but still it seems like some, if not all, of these people should be 400+ above me. However, it doesn't seem reasonable to have the top human 900+ above the top bot. To some degree, I think this is less intransitivity than just the fact that the sigmoid curve (in the ELO formula) is an imperfect model of skill distribution. The upshot of this is that the key to rating inflation is not selection of opponents but rejection of them. The secret to getting a high rating is to lock your invite flag in the off position and avoid skilled humans like the plague. The only real way I see to solve this is to remove bots from the rating system (which I think would be a terrible idea for other reasons). Anything that only degrades the importance of bot bashing is likely to just make skilled humans more threatening to a highly ratings-conscious player. I think we should create a secondary rating system for use in selecting rankings for tournaments and other such things. Something simple like the 20-game human performance rating or the 90-day human performance rating would probably work reasonably well. Flaws like having a win cause a drop in rating or having ratings changing without any playing would be easier to overlook in a secondary system. on 02/22/06 at 20:20:32, frostlad wrote:
Arimanator and I both took advantage of the bots to inflate our ratings. When I first played Bomb2005Blitz, I was rated 109 points below it. I used the Bait and Tackle strategy just to give myself a fighting chance at 15s per move. However, I soon discovered that Bomb could be induced to maneuver its E back against the edge of the board into a crushing blockade. With this discovery, Bomb became a much weaker bot and therefore became highly overrated. I then transferred this overratedness to myself by playing many games against BombBlitz and BombFast. I managed to push myself all the way to 2120, very briefly being the second highest rated human in the world. In part, I did it just for the experience of playing and winning a bunch of Blitz games against a fairly skilled opponent. In fact, I think I gained a surprising amount of tactical skill, even though I was playing approximately the same game every time. I think I am currently able to beat BombBlitz without using the Bait and Tackle 80% of the time, whereas before all of the bot bashing, I doubt I could have beaten it 20% of the time. In part, I did it out of an obsessive desire to see how far I could inflate my rating. And in part, I did it to get a high ranking (3) in the World Championship, to help me achieve my goal of winning 2+ games (I won exactly 2). It is this last item that makes it somewhat important to limit the possibilities for rating inflation. |
||
Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1! YaBB © 2000-2003. All Rights Reserved. |