Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Off Topic Discussion >> Player ratings in team games
(Message started by: chessandgo on Aug 3rd, 2013, 6:22am)

Title: Player ratings in team games
Post by chessandgo on Aug 3rd, 2013, 6:22am
Hi,

A few questions on regular WHR (whole-history ratings) first:

1) Do WHR ratings depend on the order of the games played? Ie with the same set of games, but played in a different order, would one obtain a different WHR?

2) Is there a program /online service available which inputs game results and computes WHR? If not, how difficult would it be to home-program one for a newbie programmer?


On to the subject of computing player ratings from team results. With friends we play a computer team versus team game called Left 4 Dead (it's a FPS). We'd like to play without fixed teams, ie any given night some players turn up to play, form random teams and play. We would get a set of team versus team results, with players playing for different teams throughout. The result is a numerical one, for example the first team beats the second team 800 points to 650.

We want to compute players ratings out of that (yes, we're competitive geeks). So far we're thinking of maintaining ratings for each player, and saying that the rating of a team is the average of its player's ratings. We could use a regular rating system to compute how many rating points a team should win or lose with the game's result, and apply this win/loss to each player.

One of my friends hopes to use a WHR-type rating system (presumably using only win or lose results). Instead, I would think of having ratings which express the average margin of victory, ie a team rated 200 points higher would on average win by 200 points. Call these 200 points the handicap. The change of rating after a game would be proportional to the amount of points scored above/under the handicap. For example, a team rated 1800 beating a team rated 1600 by 300 points would have performed 100 points above average, and win 100 / C rating points, with C a constant (say 10). If the 1800 team had won by only 100 points, they would have performed 100 points under average, and lose 100/10 = 10 points.

3) Do you think WHR can be modified to fit this team setting? Could it be modified to fit a numbered setting (ie not just win/lose, but win by 150 and such)?

4) Any thought on the above sketch for a rating system based on numbered results? Does something similar exist already? Do you think the one described above should converge towards the true strength of a player (assuming that the constant C decreases towards 0 when the number of games played tends to infinity for example)?

5) Such a player rating for team sports could be applied to sports like the NBA. Given how much you americans love your statisctics and rankings, I suppose something like that has to have been done already? You could for example say that a team rating for a game is the average of it's players ratings, weighted by playing time for this game. Know anything like that?

Any idea welcome! Sorry for the too long post :)
Jean

Title: Re: Player ratings in team games
Post by rbarreira on Aug 3rd, 2013, 10:01am
I play a game called Quake Live, one of the most popular modes is also team-based, called CA (Clan Arena). The game itself ranks players in tiers (there are 4 or 5 tiers I believe).

But there's also a website that tracks all CA games and builds up an elo ranking. I'm not sure what system they use though:

http://www.qlranks.com/ca/

Title: Re: Player ratings in team games
Post by JimmSlimm on Aug 3rd, 2013, 11:44am

on 08/03/13 at 10:01:00, rbarreira wrote:
I play a game called Quake Live, one of the most popular modes is also team-based, called CA (Clan Arena). The game itself ranks players in tiers (there are 4 or 5 tiers I believe).

But there's also a website that tracks all CA games and builds up an elo ranking. I'm not sure what system they use though:

http://www.qlranks.com/ca/



nice to see another quake live player :) I play FFA in quake live, watching quakecon finals today? :)

Title: Re: Player ratings in team games
Post by rbarreira on Aug 3rd, 2013, 1:56pm

on 08/03/13 at 11:44:30, JimmSlimm wrote:
nice to see another quake live player :) I play FFA in quake live, watching quakecon finals today? :)


Indeed :)

Title: Re: Player ratings in team games
Post by clyring on Aug 3rd, 2013, 2:25pm
1) WHR ratings depend on the relative times at which the games are played. For two near-simultaneous results it doesn't make much difference in what order they are played, but for two results far apart in time it makes a big difference.

2) There is none that I am aware of. Difficulty of implementation depends on how well you want to implement it. If need be it is not too difficult to implement a (very, very inefficient) WHR calculator on any spreadsheet program.

3) It is extremely simple to generalize the Bradley-Terry model to accommodate team games with just winrate predictions. Generalizing it to predict win margins as well as win rates would require a complete reworking of the underlying model. This can be as easy or as hard as you want it to be depending on how much you want your model to encapsulate.

4) The convergence of a rating system for individuals competing in teams is harder to characterize than a rating system for individuals competing as individuals. For example, if one team's members exclusively play with that set, then the rating system cannot distinguish them even though it is possible that there are significant skill differences within the group. However, under ideal circumstances (players are well-mixed within teams, each player contributes some normally distributed amount to the relative score in the match, player strengths are unchanging, C approaches an infinite limit, but does so slowly enough that the sum of the reciprocals of each Ci approaches infinity as well) then I think it should converge to the correct relative strengths.

Title: Re: Player ratings in team games
Post by Fritzlein on Aug 3rd, 2013, 11:19pm

on 08/03/13 at 06:22:13, chessandgo wrote:
One of my friends hopes to use a WHR-type rating system (presumably using only win or lose results). Instead, I would think of having ratings which express the average margin of victory, ie a team rated 200 points higher would on average win by 200 points. Call these 200 points the handicap. The change of rating after a game would be proportional to the amount of points scored above/under the handicap. For example, a team rated 1800 beating a team rated 1600 by 300 points would have performed 100 points above average, and win 100 / C rating points, with C a constant (say 10). If the 1800 team had won by only 100 points, they would have performed 100 points under average, and lose 100/10 = 10 points.

Before the USCF (United States Chess Federation) switched to Elo ratings, they used the Harkness system.  It was a linear system in essence like the one you have described, except with winning percentages translated into points.  Every ten rating points was one percentage point of winning probability.  If A is rated 200 points above B, then A's expected score when playing B is 70%.  If B is rated 200 points above C, then B's expected score when playing C is 70% and A's expected score when playing C is 90%.  This is just like points, except the only possible "points" you could score in a game were 100%, 50%, or 0%.

Unfortunately, a linear system doesn't mesh well with probabilities.  If C is rated 200 points above D, then A would be expected to win 110% of the time when playing D.  This translated into guaranteed underperformance by A.  Even if A scored 100% against D, it was less than expected for A, so his rating would be adjusted downward.

The Harkness system introduced a hack to take care of this: your expected score is capped at 95%.  Of course the cap had the reverse side effect of making it very favorable to play someone rated waaay below you.  You would almost surely win and still get rewarded as if you had risked a 5% of losing.

The Elo system smoothed out the kink in the "linear plus cutoff" Harkness model, with generally good results.  In my judgement, the Bradley-Terry model underlying Elo ratings is almost certainly false, but to lesser degree than the Harkness model.  Elo chess ratings have worked remarkably well in practice despite the false model, in part because the smooth curve makes discrepancies difficult to exploit, but mostly because players don't get to choose their opponents, and must instead play whoever they are paired against in a tournament.

Since I don't know how your game of interest is scored, I can't advise you on whether a linear model would be appropriate.  Ask yourself whether you expect transitivity to hold.  If Team A beats Team B by 200 points on average, and Team B beats Team C by 200 points on average, will Team A beat Team C by 400 points on average?  How many links can there be in this chain before the transitivity breaks down?  If there aren't too many levels of skill in your game, then a linear model might work out.  If there are lots of levels of skill, then very likely you don't want a linear model, and you want to apply some sort of squashing function to the rating difference to get the expected score.


Quote:
3) Do you think WHR can be modified to fit this team setting? Could it be modified to fit a numbered setting (ie not just win/lose, but win by 150 and such)?

Yes, WHR can be modified to meet your needs.  Simply use the logistic function as the squashing function for your expected score differences.  I'm not sure what would happen, though, if you fed in a score difference greater than the maximum predicted score difference, so you probably have to include a score cutoff, and report (for example) any victory of more than 500 points as a victory of exactly 500 points, similar to a win in chess being reported as a 1.

In vanilla WHR, when estimating the derivative to apply Newton's method, you have only opponents.  If anyone else you have played moves down, you go down with them, and if anyone else you have played moves up, you move up with them.  The addition of teammates adds a wrinkle: you move the opposite direction of anyone who was on your teams.  If one of your former teammates goes out and loses a bunch with different partners than you, that makes you look good, because you won even when teamed with a schmoe.  So you have some reversed signs.  Still, I believe the Newton update is in principle the same.

I highly recommend using a "best fit" method like WHR instead of a "update proportional to error" method such as your friend proposes, because a "best fit" method requires only a small number of games to get good discrimination among the ratings, whereas "update proportional to error" takes a huge amount of game data to get reasonable ratings established.


Quote:
4) Any thought on the above sketch for a rating system based on numbered results?

Having individual ratings for a team game is essentially a doomed project.  Any team's performance will be greater than the sum of the individuals if they work well together, and less than the sum of the individuals if they don't mesh.  The "team compatibility" effect will introduce exactly the same sort of distortions one gets from being able to choose opponents (i.e. choose overrated opponents to boost yourself) but will be even worse.  In fact it will overwhelm all other errors, inaccuracies, and glitches in the system you devise.  That is to say, the way to inflate your rating will be to play only with people you mesh with, and to avoid ever playing with new and different team members.

That said, ratings are fun, and hopefully people won't care so much about their ratings that they refuse to play on teams with strangers, so the positive contribution of the ratings will be greater than the chilling effect on formation of new teams.

I hope this helps!

Title: Re: Player ratings in team games
Post by chessandgo on Aug 5th, 2013, 5:40am
Thanks clyring for your helpful answer!

Ricardo, that's funny, a good half of my playing group consists of former quake players. I couldn't find details on these ratings though.



on 08/03/13 at 23:19:17, Fritzlein wrote:
Since I don't know how your game of interest is scored, I can't advise you on whether a linear model would be appropriate.  Ask yourself whether you expect transitivity to hold.  If Team A beats Team B by 200 points on average, and Team B beats Team C by 200 points on average, will Team A beat Team C by 400 points on average?  How many links can there be in this chain before the transitivity breaks down?  If there aren't too many levels of skill in your game, then a linear model might work out.  If there are lots of levels of skill, then very likely you don't want a linear model, and you want to apply some sort of squashing function to the rating difference to get the expected score.


There's a given amount of points to be scored (say 1000), and scores are never below 0. So the most extreme result would be a victory by a margin of 1000.

Hmmm yeah, we have a transitivity problem in extreme cases (say, two huge jumps in strength). If that becomes a problem I guess we could force, for each game, that the two teams are formed so that the difference in rating between the 2 is minimal.


on 08/03/13 at 23:19:17, Fritzlein wrote:
I highly recommend using a "best fit" method like WHR instead of a "update proportional to error" method such as your friend proposes, because a "best fit" method requires only a small number of games to get good discrimination among the ratings, whereas "update proportional to error" takes a huge amount of game data to get reasonable ratings established.


I don't understand, what is a "update proportional to error" method?


on 08/03/13 at 23:19:17, Fritzlein wrote:
 That is to say, the way to inflate your rating will be to play only with people you mesh with, and to avoid ever playing with new and different team members.

That said, ratings are fun, and hopefully people won't care so much about their ratings that they refuse to play on teams with strangers, so the positive contribution of the ratings will be greater than the chilling effect on formation of new teams.


We have the option for everyone to select "random team" at the start of the game. This way people wouldn't be able to abuse the system, although maybe they could try to play only when they think the whole player line-up is favorable to ... nah, that's too far fetched :)


on 08/03/13 at 23:19:17, Fritzlein wrote:
I hope this helps!


It does, a lot, thanks!

Title: Re: Player ratings in team games
Post by Boo on Aug 5th, 2013, 10:43am

Quote:
Does something similar exist already?


Yes. Starting points:
http://en.wikipedia.org/wiki/TrueSkill
http://research.microsoft.com/en-us/projects/trueskill/details.aspx
http://www.moserware.com/2010/03/computing-your-skill.html
http://ziggyny.blogspot.ca/2012/01/trueskill.html

I saw it successfully implemented in http://www.yucata.de
You can search yucata forums for "Trueskill" for lots of discussions about it.

Title: Re: Player ratings in team games
Post by chessandgo on Aug 5th, 2013, 11:08am
Thanks for the links Boo. However, are you sure it's a numbered result based system? It seems to me it takes into account only win/loss (or order for more than 2 players games)? From your second link:

"If one is playing a point based game and the winner beats all the other players by a factor of ten, that player’s victory will be scored no differently than if they had only won by a single point."

Title: Re: Player ratings in team games
Post by Fritzlein on Aug 5th, 2013, 1:50pm

on 08/05/13 at 05:40:47, chessandgo wrote:
I don't understand, what is a "update proportional to error" method?

By "update proportional to error" I mean that after each game each player's rating is adjusted by K*(actual score - predicted score).  If your results are better than your rating predicts, you move up proportional to how much better.  If your results are worse than you rating predicts, you move down proportional to how much worse.  The constant of proportionality, K, determines how volatile the ratings are.

This is what FIDE does, with a K that is lower for top-level players and higher for lower-level players.  It is simple, comprehensible, and reasonably fair.  It also takes a lot longer to sort out the players than WHR takes, so there is a tradeoff between complexity and responsiveness.  

Title: Re: Player ratings in team games
Post by Boo on Aug 5th, 2013, 2:36pm

Quote:
However, are you sure it's a numbered result based system?


Trueskill does not depend on the rules of the game, e.g. the system of awarding points.

From FAQ:
"Q: Does the TrueSkill ranking system reward individual players in a team game?

A: The only information the TrueSkill ranking system will process is:

       Which team won?
       Who were the members of the participating teams?

The TrueSkill ranking system takes neither the underlying exact scores (flag captures, kills, time etc.) for each team into account nor which particular team member performed how well. As a consequence, the only way players can influence their skill updates is by promoting the probability that their team wins. Hence, "ball pregnant dog es", "hill sleeper s", "flag fruits", "territory twits", and "bomb bastards" will hurt their individual TrueSkill ranks unless what they are doing helps their team. "

The question is, why do you need "numbered result based system", why is "ordered player list" not sufficient?
E.g. in arimaa does it really matter how much material you are up when you goal the rabbit? If the system scored the skill based on "difference of numbered points", the players could abuse it by grinding the game down to a single point available, but is it really a skill?

EDIT - in my 3rd link skill is defined as "Skill = Probability of Winning" and Trueskill measures exactly that.

Title: Re: Player ratings in team games
Post by chessandgo on Aug 6th, 2013, 3:45am

on 08/05/13 at 13:50:28, Fritzlein wrote:
By "update proportional to error" I mean that after each game each player's rating is adjusted by K*(actual score - predicted score).  If your results are better than your rating predicts, you move up proportional to how much better.  If your results are worse than you rating predicts, you move down proportional to how much worse.  The constant of proportionality, K, determines how volatile the ratings are.

This is what FIDE does, with a K that is lower for top-level players and higher for lower-level players.  It is simple, comprehensible, and reasonably fair.  It also takes a lot longer to sort out the players than WHR takes, so there is a tradeoff between complexity and responsiveness.  


Oh ok. Actually my friend is making a best fit system while I was going for the easier update system. He's having the best of me, fortunately :)

Title: Re: Player ratings in team games
Post by chessandgo on Aug 6th, 2013, 8:35am
Well, with L4D scores being in place and accurately describing perfomance in a quantified way, and the strategy for objective "maximize average score" and "maximize chance to win" being essentially identical except for very rare occurences ... it feels helpful to consider score.

Title: Re: Player ratings in team games
Post by hyperpape on Aug 7th, 2013, 9:08am
WHR in Python: http://www.lifein19x19.com/forum/viewtopic.php?f=10&t=4556

Title: Re: Player ratings in team games
Post by chessandgo on Aug 8th, 2013, 7:28am
interesting, thanks hyperpape!



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.