Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> General Discussion >> Gold advantage impossible to measure?
(Message started by: Fritzlein on Nov 30th, 2005, 9:21pm)

Title: Gold advantage impossible to measure?
Post by Fritzlein on Nov 30th, 2005, 9:21pm
I just did another little query on the games database.  Over the last 400 rated games between humans both rated over 1600, Gold was expected from the ratings to win 180 games, but actually won 173 games.  This indicates that playing Silver is an advantage!  In order to make the expected number of wins also be 173, we need to add 19 rating points to each Silver player.

However, in those 400 games, the standard deviation is plus or minus 9.9 wins, which overwhelms the difference between actual and expected score.  That is to say, the figure of a 19 point rating advantage is statistically worthless.

Thus the results are invalidated even before we consider a possible source of bias: Stronger players may be more likely to invite weaker players to a game than vice versa, and the inviting player may be more likely to give himself Silver.  This would explain why Gold was expected to win less than half the games.  If in addition it happens that stronger players are likely to be underrated relative to weaker players, then the inaccuracies of the ratings rather than the color advantage would explain any discrepancies.

Does anyone have an idea how we are ever going to know whether Gold has an advantage, and if so, how much?  I don't think it helps at all to include human versus bot games, given all the bot-bashing that goes on.  For example, RyanCable bashed Bomb2005Blitz almost entirely from the Gold side when building up his pre-tournament rating.  If we included that data, it would look like playing Gold is an enormous advantage.

Should we therefore rely entirely on bot vs. bot games?  We could do another self-play experiment like we did with Clueless over 144 games, but as JDB pointed out, self-play would tend to exaggerate any advantage, because the bots are using the same evaluation function.

I can't think of any good methodology.  Perhaps we just won't be able to tell for some time into the future, and must keep randomizing color assignments on the theory that it might make a difference, even if we can't tell what that difference is.

Title: Re: Gold advantage impossible to measure?
Post by 99of9 on Dec 2nd, 2005, 12:40am

Quote:
a possible source of bias: Stronger players may be more likely to invite weaker players to a game than vice versa, and the inviting player may be more likely to give himself Silver

To eliminate this, between any pair of players, you should only include the same number of G-S games as you do S-G games.  (Or weight them so that the effective number of games with each colour is equal.)

I don't think human-bot games are at all useful at answering this question.

Bot-bot games are ok, but since they are weak players, they are not the best group to sample to determine intrinsic bias in a game.  For that you need to look at players as close to perfect as you have got.  At the moment, that is human-human games.

Title: Re: Gold advantage impossible to measure?
Post by 99of9 on Dec 2nd, 2005, 12:42am
Oh, and of course 400 is quite a small sample when trying to estimate a bias that could easily be less than 5%.

Title: Re: Gold advantage impossible to measure?
Post by Adanac on Dec 2nd, 2005, 10:15am
Suppose that 300 years ago there were a couple of dozen chess players that wanted to determine whether white had any advantage in chess.  Suppose also that those players considered 1. g4 to be white's strongest opening move.  They could play one another several hundred times to try to determine whether white has an advantage and...that might be a good parallel to the current state of arimaa opening thoery.  I'm very interested to learn the size of gold's advantage in the opening (if any) but we're going to need a lot more players, a far larger database of games, and more importantly, a much better knowledge of the game.

I don't know the answer to the following question, but do chess grandmasters have a larger advantage with the white pieces against other grandmasters than, say, a 2000 player would have against another 2000 player?  If so, that might suggest that gold's advantage should increase as our best arimaa players increase in strength.

Title: Re: Gold advantage impossible to measure?
Post by Fritzlein on Dec 2nd, 2005, 11:11am

on 12/02/05 at 10:15:34, Adanac wrote:
[D]o chess grandmasters have a larger advantage with the white pieces against other grandmasters than, say, a 2000 player would have against another 2000 player?  If so, that might suggest that gold's advantage should increase as our best arimaa players increase in strength.

I have read that (astonishingly) the advantage for white seems to be constant all the way from beginner to grandmaster, with the difference being only in the number of draws.  I'm not sure of the source, but it may have been Elo's old book, and if so, it's a little shaky.  On the other hand, if it is true for chess, it might also be true for Arimaa, and therefore present data might be valuable even though we aren't very good yet.

Title: Re: Gold advantage impossible to measure?
Post by Fritzlein on Dec 2nd, 2005, 1:14pm

on 12/02/05 at 00:40:50, 99of9 wrote:
To eliminate this, between any pair of players, you should only include the same number of G-S games as you do S-G games.

Good methodology.  We can eliminate lots of biases by looking only at pairs of games with reversed colors between the same two players.  When I get around to it, I'll find aas many such game pairs as possible among rated human vs. human games in the database going back to the very start.  And I guess as long as I'm doing it, it isn't much extra work to compile the numbers for bot vs. bot games and bot vs. human games as well, for whatever they're worth.

Title: Re: Gold advantage impossible to measure?
Post by Fritzlein on Dec 4th, 2005, 1:41pm
OK, I stretched my meagre programming abilities to pair up games based on reversed colors between the same opponents.  I tried to pair up the games that were closest in time, e.g. if two players had colors
G-S
G-S
G-S
G-S
S-G

then I counted the last two games as a pair while disregarding the first three.  Indeed, I was so worried about changing skill over time that if the games were
G-S
G-S
G-S
G-S
S-G
S-G
S-G
S-G

then I only counted the middle two games while disregarding the other six.  Oh, and I only counted rated games, and games ending in "b" or "w" (no draws or aborts).

The end result is that Gold won 4725 games out of 4692 pairs, i.e. 50.35% of the games.  This suggests a Gold advantage of 2.44 points for Gold.  That is to say, Someone who is rated 2.44 points higher than his opponent should win 50.35% of the games.

However, this doesn't capture the effect of mismatches.  The more the players in a pair are mismatched, the more it masks the advantage of playing gold, because the stronger player is probably just going to win both games anyway.  The average rating difference between the players in those games was 189 points.  If two players are mismatched by 189 points, and they play games of alternating color, then Gold must have an advantage of 3.23 rating points to account for winning 50.35% of the games.

A guesstimate of the error would be to suppose all 4692 pairs were played at a mismatch of 189 points, i.e. were about 3:1 for the favorite, so the standard deviation would be 41.9 games.  For Gold to win 33 games more than expected represents 0.79 standard deviations, i.e. the Gold advantage is clearly statistically insignificant.

This was for all types of games combined.  If we repeat the calculation based on the types of opponents we have

Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
ALL    .   4692   4725    .  189    .  3.23   .   0.79
H v B   .  3839   3851   .   192   .   1.45   .   0.32
B v B    .  608    630    .  152    .  15.1    .  1.38
H V H   .   245    244   .   237   .   -2.19   .  0.12


Yes, that's right, over the human games, Silver actually has the advantage.  This doesn't matter, of course, because all the results are statistically insignificant.  We have essentially zero evidence that either side has an advantage.

Title: Re: Gold advantage impossible to measure?
Post by Ryan_Cable on Dec 6th, 2005, 5:27pm
Well, at least we can now say with 95% certainty that the color advantage is less than 10 points for games with one or more humans.  I think this is small enough to justify our current method of assigning gold, which would be a joke in a chess tournament.

Title: Re: Gold advantage impossible to measure?
Post by Fritzlein on Dec 7th, 2005, 9:50am

on 12/06/05 at 17:27:41, Ryan_Cable wrote:
Well, at least we can now say with 95% certainty that the color advantage is less than 10 points for games with one or more humans.

That narrow range holds if you are lumping hvh games in with hvb games.  For the purists who only think hvh games are relevant, the 95% confidence interval is a bit wider.   I'm wavering in my own mind as to how convinced I am that the colors are equal.  The statistics are getting fairly strong, but maybe the data we have so far isn't as relevant as data that is yet to come.

Some day, if the evidence for color equality gets strong enough, one could justify assigning colors completely at random, rather than merely with-some-potential-for-imbalances as in the 2006 WC.  But I suppose that there will always be players who prefer a particular color regardless of what the statistics say, so there will always be an argument for attempting to equalize color assignments.

Title: Re: Gold advantage impossible to measure?
Post by acheron on Dec 9th, 2005, 8:14pm
Another reason gold is less intrinsically advantaged than white, is that unlike chess, the opponent has the ability to respond to your setup.

So while the gold player must arrange his initial layout blind, the silver player can examine this arrangement and respond accordingly.  Against the bots for example, this can be a sizable advantage, positioning your camel away from the opposing elephant, and ensuring each board subsection is arranged to your advantage.

Title: Re: Gold advantage impossible to measure?
Post by robinson on Dec 12th, 2005, 4:56pm
wow... i just looked at my stats vs paulMertens... maybe thats the only way we can find out were the advantage is..
i have 8 to 12 with gold
and    12 to 2  with silver
knowing that not all of them can count cause of some expiriments.... ;D

Title: Re: Gold advantage impossible to measure?
Post by omar on Dec 13th, 2005, 1:21am

on 12/09/05 at 20:14:34, acheron wrote:
Another reason gold is less intrinsically advantaged than white, is that unlike chess, the opponent has the ability to respond to your setup.

So while the gold player must arrange his initial layout blind, the silver player can examine this arrangement and respond accordingly.  Against the bots for example, this can be a sizable advantage, positioning your camel away from the opposing elephant, and ensuring each board subsection is arranged to your advantage.


Indeed, sometimes I wonder if this may actually give silver more of an advantage once we learn more about Arimaa openings.

Title: Re: Gold advantage impossible to measure?
Post by Fritzlein on Dec 13th, 2005, 3:01pm
Here's one way opening theory could eventually favor Silver: What if it turns out that opening with the camel on one flank is an attacking advantage, but it becomes a disadvantage if the other player lines up his elephant opposite to it?  Then Gold wouldn't be able to start with a flank camel, because Silver would put an elephant on the same flank, whereas Silver would still have the ability to start with a flank camel on whichever side is away from the gold elephant.

Title: Re: Gold advantage impossible to measure?
Post by Adanac on Dec 13th, 2005, 6:06pm

on 12/13/05 at 15:01:23, Fritzlein wrote:
Here's one way opening theory could eventually favor Silver: What if it turns out that opening with the camel on one flank is an attacking advantage, but it becomes a disadvantage if the other player lines up his elephant opposite to it?  Then Gold wouldn't be able to start with a flank camel, because Silver would put an elephant on the same flank, whereas Silver would still have the ability to start with a flank camel on whichever side is away from the gold elephant.


I tried that idea with silver once against Robinson including a rabbit on f7 to minimize the impact of a direct elephant charge up the middle (and it worked!) but I'm still not convinced it's a good idea due to the decentralization of the elephant.

http://arimaa.com/arimaa/gameroom/replayFlash.cgi?gid=21919&s=w&client=1

Had I known that I would meet Robinson in round 4 with the silver pieces, I would have waited a few weeks to use this idea   ;)  I've thought of a new idea, but it all depends upon how Robinson sets up  :-X

Title: Re: Gold advantage impossible to measure?
Post by 99of9 on Mar 3rd, 2007, 9:04pm
Nic's question in the bot forum prompted me to think about this again.

Fritz, do you have an easy way to tell if the two opening setups are symmetrical to each other?

If so, you could run these queries again and split them into games where silver responded passively (either a mirror image, or a rotation), and games where silver responded actively (asymmetry w.r.t. the opponent is sometimes indicative of an attempt to gain an advantage by a method similar to that outlined by Fritz and Adanac).

If symmetrical setups still give silver an advantage relative to gold, then I can only see 3 options (in order of likelihood as I see it):
1) This was an errant statistical fluctuation.
2) Our play is so suboptimal that we're actually using our gold initiative to our disadvantage!!
3) Gold has somehow been forced to setup in a zugzwang position!!!

nb When I say "symmetrical setup", it's not quite the same as Fritz's previous definitions of symmetry, which were related to the symmetry of each players own pieces with respect to each other.  What I'm talking about is when you can apply either a reflection or a rotation of the gold pieces, and get the silver pieces.

Title: Re: Gold advantage impossible to measure?
Post by Fritzlein on Mar 3rd, 2007, 9:25pm
Nice idea, 99of9.  I don't think I can do it with an Access query, but it would be interesting to segregate the data into games where Silver sets up similarly to Gold (which could hardly be a setup advantage), and games where Silver sets up differently (which could be a setup advantage).

I waver between thinking we are using the Gold advantage sub-optimally, and thinking that it is a statistical fluke.  The notion that Silver actually has an inherent advantage from moving second is a distant third on my list of hypotheses.

Title: Re: Gold advantage impossible to measure?
Post by aaaa on Apr 17th, 2007, 12:10pm
I have a hypothesis that the reason for the apparent slight advantage for Silver may be psychological in nature. Namely, that by having the first move a player may subconsciously assume a too high of an advantage (perhaps by analogy with chess) and consequently feel the need to capitalize on it and play more brazenly than the nature of the game would justify, resulting in a shift of advantage to the defensive side. One possible way of testing this theory could be to classify the games based on the first move of Gold, with more elephant moves indicating a more aggressive opening.

Title: Re: Gold advantage impossible to measure?
Post by seanick on Apr 18th, 2007, 11:17am
I think the symmetry angle might be the reason that silvers advantage is not greater between HvH, and also the reason that the bots are the opposite.

of course, the numbers are still too small to make any real conclusions. but Karl's choice of games seems appropriate and likely to be optimal for the given amount of data.
<ripped off from karls post>
Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
ALL    .   4692   4725    .  189    .  3.23   .   0.79
H v B   .  3839   3851   .   192   .   1.45   .   0.32
B v B    .  608    630    .  152    .  15.1    .  1.38
H V H   .   245    244   .   237   .   -2.19   .  0.12
</jack move>
to attempt to explain my thoughts I'll separate them into some questions and my thoughts about their cause:
Q1. Why do bots (vs bots) have a large advantage as gold?
A1. The majority of bots don't respond with a silver position that neutralizes gold's advantage of going first

Q2. Why do humans have less of an advantage vs. bots (than bots vs bots) when playing as gold, and why do bots have less of an advantage over humans (than vs bots)? (these two are not separable by the above numbers, so I'll consider them both here as inseparable. mainly because I'm too lazy to mine that data myself.)
A2. lower level players on the bot ladder are less knowledgable on how to best respond with a good silver setup. But this skill is developing, hence there being less of an advantage to be gold in h vs bot than in bot vs. bot. Note that the knowledge of preparing an appropriate setup response is orthogonal to the ability to actually beat the opponent so the gold advantage is not entirely regained when gold is played by the human.

Q3. why do hvh games appear to show an advantage to being silver? Even more than golds advantage in hvb?
A3. because silver can respond appropriately to golds setup and it is more likely that gold be psychologically affected by silvers response than vs. a bot (which, in some ways, couldn't care less about the opening position.)

this is all very shaky in terms of a root cause analysis but as mentioned many times already, more understanding will come with time (provided games continue to be played).



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.