Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> Off Topic Discussion >> The Elo++ rating system
        
(Message started by: omar on Feb 21^st, 2011, 8:04pm)

Title: The Elo++ rating system
Post by omar on Feb 21^st, 2011, 8:04pm

Yannis Sismanis, the winner of the Chess Rating competition on Kaggle.com has written a paper describing his rating system which he calls Elo++.

http://arxiv.org/abs/1012.4571

It looks quite interesting. I wonder if it would work better than WHR on the Arimaa data.

Title: Re: The Elo++ rating system
Post by Fritzlein on Feb 21^st, 2011, 9:23pm

Thanks for the link, Omar. It was an interesting read. I respect the author for having the guts to settle on a system with only two parameters, the first of which is well know. The "white advantage" parameter worked out to almost exactly 55%-45% between equal players, which is the round number that I have been quoting for years, so he could hardly expect this parameter to win him the competition.

The other parameter is the interesting one, namely how much do you rely on an assumption that folks who play each other are nearly equal in strength. This is a very clever idea, and apparently a competition-winning one. However, I would be strongly opposed to implementing any rating system that actually used such a parameter.

The problem is that chess ratings serve more societal functions than to simply predict future game outcomes. They also measure prestige. Chess players get invited to or excluded from top-tier chess tournaments almost entirely based on their ratings. Losing or gaining a few Elo points can substantially impact a chess professional's income. It's not just science at stake here; it's money.

If Elo++ (or any similar system) were adopted by FIDE, the players would all quickly understand that if you play strong opponents, the system assumes you are a strong player, and similarly assumes you are weak if you play weak opponents. Thus every player has an instant incentive to play above his grade whenever possible, and avoid playing below his grade at all costs. By carefully choosing his opponents, a player can improve his rating above what would be implied by his results alone. Elo++ would be a classic case of an attempted prediction changing the very behavior that it is attempting to predict. But even if this logical issue didn't doom the system, the social impact of everyone trying to avoid playing below his grade would be very undesirable.

We here on arimaa.com have plenty of evidence that people will try to game any system, and that there are negative consequences whenever there is an exploitable loophole. Anyone else who has tried to implement on-line ratings systems has surely encountered the same issues. It is a waste of energy to try to squeeze the last juice of significance out of a dataset that is assumed to have been generated under controlled conditions. Far more important is to constantly consider what behavior will be induced if players understand the system and play solely to maximize their own ratings.