Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> General Discussion >> First Move Advantage vs. Second Setup Advantage
(Message started by: Fritzlein on Nov 15th, 2006, 10:07pm)

Title: First Move Advantage vs. Second Setup Advantage
Post by Fritzlein on Nov 15th, 2006, 10:07pm
A year ago, in another thread, I tried to calculate the advantage of moving first.  I was (and am!) quite convinced that Gold has an advantage from moving first.  However, my naive method of measuring Gold's advantage showed that it was negligible.  I therefore did a much more careful study, using only pairs of games where the same two opponents matched up twice with reversed colors.  The results are as below:


on 12/04/05 at 13:41:49, Fritzlein wrote:

Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
ALL    .   4692   4725    .  189    .  3.23   .   0.79
H v B   .  3839   3851   .   192   .   1.45   .   0.32
B v B    .  608    630    .  152    .  15.1    .  1.38
H v H   .   245    244   .   237   .   -2.19   .  0.12

At that time, the only explanation I could think of was that there are too few games to make a judgment, especially in HvH games, so the slight bias to Silver must be a statistical fluke, which will work out in the long run as a bias to Gold.

When I was chatting with Omar today, however, a different explanation suddenly occurred to me.  Gold has the advantage of the first move, but Silver has the advantage of setting up second.  I have always thought of the setup advantage as being negligible compared to the first move advantage, but what if I am wrong?  What if setting up second is in fact more important?

I think my bias comes from the fact that I use the 99of9 setup no matter what.  All the asymmetrical setups recently, however, suggest that it would be very nice to be able to respond to the other guy's setup, either to defend where he is attacking, or to attack where he isn't defending.

If the second setup is actually a greater advantage than the first move, then you would expect that to show up only in human games, because none of the bots adjust their setup based on what the other player has done (except to keep elephants in different files).  By the same logic, the bot vs. bot games would measure purely the first move advantage.

I therefore quickly resurrected my naive query to determine Gold advantage.  The way it works is that I calculate over all the rated games how many games Gold was expected to win, and how many games Gold did win.  If Gold won more games than expected, I would add a few points to the ratings of all Gold players and calculate again, until their expected number of wins came up to their actual number of wins.  It turns out the magic number is 3.5 rating points.      If I add 3.5 rating points to every Gold player's rating in the database, then Gold's expected wins are equal to the actual wins.

Look, however, what happens when I break that down according to game type:


Game Type  Games  Gold Advantage
---------  -----  --------------
ALL     .  32680    3.5
B v B   .   4566   10.0
B v H   .  12830    8.4
H v B   .  13070    0.8
H v H   .   2214  -25.7
1600+ HvH   1391  -26.5


For HvH games, the Silver advantage is huge!  The effect is so pronounced, I am going to have to revisit my more careful methodology, and evaluate games only in pairs.

Before I jump to the conclusion that the setup advantage outweighs the first-move advantage, I should note that the BvH and HvB data somewhat contradict my theory.  If the bots have a setup that is not reactive, then humans playing Gold should have move advantage plus setup advantage, while bots playing Gold should have move advantage minus setup advantage.  Yet for some reason, there appears to be a greater advantage for bots playing Gold against humans than for humans playing Gold against bots.

So maybe I will eventually land back at my idea that humans doing better with Silver than with Gold is just a fluke.  More data is needed, especially given the trend towards asymmetrical setups and away from the 99of9 setup, before we can determine whether move advantage or setup advantage is greater.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Nov 16th, 2006, 1:02am
Luckily, I had the old code lying around with which I did the paired-games experiment before.  I repeated it now on a larger set of data.  The results are as follows:


Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
ALL    .   11222  11231   .  212   .     0.4   .  0.14
B v B   .   1676   1684   .  190   .     2.2   .  0.32
H v B   .   8972   9001.5 .  213   .     1.6   .  0.53
H v H   .    574    545.5 .  258   .   -30.3   .  2.2


Here I did not calculate the Gold advantage by tweaking the ratings of players to match expected results to actual results.  Instead I calculated the Gold advantage from the number of extra rating points needed, at the size of the average mismatch, to tip the winning percentage as much as it was tipped.  Thus if the players had inaccurate ratings it won't throw off the result by much; the ratings were only used to calculate roughly how mismatched the players are.

I also used the number of game pairs to calculate how statistically significant the results are.  Generally one says that a result is statically significant if there is less than a 5% probability that it could happen by chance.  Since the HvH result is 2.2 standard deviations from the mean, it has less than a 5% chance of being a fluke.  In all probability, Silver has an advantage over Gold in HvH games, and it seems to be worth about 30 rating points.  In HvB games, or BvB games, or in all games combined, the results are not statistically significant, and we might as well say the sides are even.

I don't believe that the second setup advantage is greater than the first move advantage, certainly not by 30 rating points, but that's what the data seems to be saying.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by RonWeasley on Nov 16th, 2006, 7:27am
When muggle players invite each other, does the inviter usually offer to play silver as a courtesy?  This is what I see, but I haven't played enough games to really know.  If this is true, do inviters tend to be winners?  Avid players gain more experience and tend to improve.  If an improving player tends to be silver, he would play better than his rating, possibly accounting for some of the advantage you are measuring.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Nov 16th, 2006, 8:01am

on 11/16/06 at 07:27:07, RonWeasley wrote:
When muggle players invite each other, does the inviter usually offer to play silver as a courtesy?

It was my suspicion as well that the stronger player is more often Silver.  I personally have played as Silver in 54.4% of my games.  That is why I used the second methodology, in which a single game can't count by itself.  In this more careful methodology, I only count pairs of games in which the same two players compete with colors reversed in the second game.  If I play someone twice with Silver and once with Gold, one of my games with Silver is disregarded, and only the other two are counted.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Microbe on Nov 16th, 2006, 8:42am
I remember once reading that you believed Gold had an advantage Fritz, and I wondered if Silver had a way of counteracting and balancing it by setting up second. But I knew I could never prove it and that you probably had data or other ideas to back you up.

However, after seing this data it's nice to see that I may have at least been a little right. Very interesting results there. Looking forward to further studies and possible results.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by PMertens on Nov 16th, 2006, 10:07am
Thats a nice excuse to remind Fritzl that I got a 50% winning ration against him ... when I play silver ...


Quote:
It was my suspicion as well that the stronger player is more often Silver.  I personally have played as Silver in 54.4% of my games

hmm .. I played silver 58.2% (HvH) ... (and won 70% of silver games ... while only 53% if gold games)

Counting all my opponents with a 2-digit number of games only against Adanac, 99of9 and Omar I do have a better ration with gold than with silver.

But more questions to the actual question:

How can a purely defensive player gain advantage from first move ?
How can an agressive gold setup (EHH, phant behind trap) reduce silvers options to pure reaction ?
(While only a 08/15 gold setup actually gives silver all options to choose and reduce gold's advantage)

I personally think that the advantage is very dependant on the players/styles and not to safely calculated as a sum over everything.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Nov 16th, 2006, 10:24pm

on 11/16/06 at 10:07:42, PMertens wrote:
I personally think that the advantage is very dependant on the players/styles and not to safely calculated as a sum over everything.

Certainly the advantage of going first depends on the styles of the players.  If both players use the 99of9 setup every game, like I have a tendency to do, it seems to me the first player must have the advantage.

But for every player with a Gold advantage of 10 rating points, there must be a player with a Gold advantage of -70 rating points to make it average out.  The fact that the effect varies from player to player means that for some it is even more pronounced than it is for the average, which is even more startling to me.


Quote:
How can a purely defensive player gain advantage from first move ?

There are no purely defensive players.  I'm about the most defensive player there is, and I can gain an advantage from the first move by dragging a rabbit first.


Quote:
(While only a 08/15 gold setup actually gives silver all options to choose and reduce gold's advantage)

What is a 08/15 gold setup?

One possible explanation of the Silver advantage is that current opening theory recommends doing something bad.  For example, suppose it is bad to launch an EH attack, but everyone tries to do it because they think it is good.  Since Gold has the first move, Gold is more often successful getting an EH attack going, and therefore loses more often than Silver.  That could account for my results above where BvH games have an 8.4 point Gold rating advantage whereas HvB games have an 0.4 point Gold rating advantage: the humans are using their first move to do something counter-productive.

I like this explanation because it allows me to claim that Gold theoretically has the true first-move advantage and claim that rabbit-dragging is the only sound opening strategy, and claim that I am respecting the evidence rather than stubbornly clinging to old-fashioned ideas that have already been proven wrong.  ;-)

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by 99of9 on Nov 16th, 2006, 10:38pm

on 11/16/06 at 08:01:32, Fritzlein wrote:
It was my suspicion as well that the stronger player is more often Silver.  I personally have played as Silver in 54.4% of my games.  That is why I used the second methodology, in which a single game can't count by itself.  In this more careful methodology, I only count pairs of games in which the same two players compete with colors reversed in the second game.  If I play someone twice with Silver and once with Gold, one of my games with Silver is disregarded, and only the other two are counted.


What if I play silver 10 times against someone while they're learning.  Then we play 10:10 when they have become as strong a player as me.

(The first?) 10 silver games count, I have a very good record as silver.  10 gold games count, I have an even record as gold.

Therefore it is better to have the silver pieces than the gold?

I don't mean to cut you down on this, but removing statistical bias is a hard job!

[btw... this isn't just hypothetical, from memory I behaved very like this against some learners (Belbo springs to mind)]

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Nov 16th, 2006, 11:36pm

on 11/16/06 at 22:38:10, 99of9 wrote:
What if I play silver 10 times against someone while they're learning.  Then we play 10:10 when they have become as strong a player as me.

(The first?) 10 silver games count, I have a very good record as silver.

Good question, but I did anticipate this.  I only count a game if the very next game between the two players has the colors reversed.  So if you play Silver ten times against someone, and then Gold ten times against the same person, only the middle two games count from all twenty games.  Here is a sample of which games I would count from a string of your games against a particular opponent, with only your color given:

S
S
S *
G *
S
S *
G *
G
G *
S *
G *
S *
G

where the asterisks indicate the paired games.


Quote:
I don't mean to cut you down on this, but removing statistical bias is a hard job!

I would be grateful if you would find a flaw in my methodology, because I don't want to believe the result!

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by 99of9 on Nov 17th, 2006, 12:04am

on 11/16/06 at 23:36:29, Fritzlein wrote:
Good question, but I did anticipate this.  I only count a game if the very next game between the two players has the colors reversed.

Excellent, I'll have to think harder then.


Quote:
because I don't want to believe the result!

:-) neither do I.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by woh on Nov 17th, 2006, 9:11am
You might calculate the advantage of either side by giving each player two ratings, one for when he's playing gold and one for silver. Then you calculate their current ratings applying their gold or silver rating as appropriate. If more players end up with a higher gold rating then you might have an indication that the first move advantage is greater than the second setup advantage.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by 99of9 on Nov 17th, 2006, 3:18pm
that's a nice alternative woh

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by IdahoEv on Nov 17th, 2006, 4:31pm

on 11/17/06 at 09:11:07, woh wrote:
You might calculate the advantage of either side by giving each player two ratings, one for when he's playing gold and one for silver. Then you calculate their current ratings applying their gold or silver rating as appropriate. If more players end up with a higher gold rating then you might have an indication that the first move advantage is greater than the second setup advantage.


It would be interesting to chart how separate gold vs. silver ratings change over time.   It may be that the advantage has varied as Arimaa opening theory has evolved.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Nov 17th, 2006, 11:32pm
I realized that I could calculate the standard deviation more accurately than by estimating it based on the average rating difference.  I simply summed the variance of each game and took the square root.  Unfortunately, for HvH games, the Silver advantage still comes out to 2.12 standard deviations, which means there is only a 3.4% chance of such an extreme result if Arimaa is really fair.

I may calculate the Gold and Silver ratings of each player if the fancy strikes me, but that not only seems like a less accurate way to measure, it also seems almost certain to confirm some level of Silver advantage.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Nov 17th, 2006, 11:37pm

on 11/17/06 at 16:31:29, IdahoEv wrote:
It would be interesting to chart how separate gold vs. silver ratings change over time.   It may be that the advantage has varied as Arimaa opening theory has evolved.

Yes, it seems that the Silver advantage has only recently materialized, whereas before it was more even.  That could roughly correspond to the emergence of unbalanced setups whereas before symmetrical (or at least balanced) setups predominated.  However, the smaller the subsets we break the data into, the less statistically significant the results on each piece.  I'm just happy we're getting enough HvH games to to statistics on the whole set.  (even though the results seem to say everything I know about Arimaa is wrong)

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by chessandgo on Nov 18th, 2006, 5:22am
Interesting statistics. I have no idea wether gold or silver should have an intrinseque advantage, but whichever way it can't but be a VERY small one, at least with our current way to play. As during the games, many tactical blunders and large strategical errors occur, I don't see how the starting advantage could be relevant in the final result.

We might have to wait to get a lot stonger before getting some precise statistical or metaphysical answer, don't you think ? Or does a large enough data show it in spite of the errors occuring in any game ?

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Nov 18th, 2006, 9:25am

on 11/18/06 at 05:22:26, chessandgo wrote:
We might have to wait to get a lot stonger before getting some precise statistical or metaphysical answer, don't you think ?

To get a precise metaphyscial answer, we need to be sure we are playing well, which we aren't.


Quote:
Or does a large enough data show it in spite of the errors occuring in any game ?

It already is a large enough dataset to overcome the randomness from game to game, or at least that's what 2.12 standard deviations says.  A 30-point rating advantage is just about the smallest that can be detected at this point.  We would need four times as many games to detect half that advantage, one hundred times as many games to detect one-tenth that advantage, etc.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by chessandgo on Nov 21st, 2006, 12:44pm
Here is another attempt to explain this silver advantage. The player inviting usually takes silver, and if he/she invites, it means he/she's ready to play, even hungry to do so, not too tired, not immersed in doing something else, while the other player might not be under as good conditions of play, even though he/she accepts.
When someone challenges me I almost never decline, and I think it might explain the pourcentage difference between my gold and silver results (I don't know who has the advantage, but I'm really convinced that it's too small to be a real bias to a game result, even to a ten thousands games results).

[Casting a quick glance at some top-rated player's statistics, it appear that they indeed have better results with silver than gold, with the exceptions of yourself, Karl, and Adanac. Are you too never tired ?:) ]

Do you think it makes sense ?

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by chessandgo on Nov 21st, 2006, 12:44pm
*Are you two ...  :-X

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by 99of9 on Nov 21st, 2006, 3:06pm

on 11/21/06 at 12:44:21, chessandgo wrote:
Casting a quick glance at some top-rated player's statistics, it appear that they indeed have better results with silver than gold, with the exceptions of yourself, Karl, and Adanac.

If you limit my games to HvH, or if you only consider rated games, I have a big advantage with gold.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by chessandgo on Nov 21st, 2006, 3:35pm
Indeed 99, you have a good 3 extra percents with gold compared to silver ... I just saw that Omar has one extra % as well ... and I guess many other players should be in the same case ...

My theory crumbles ... :)

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Mar 26th, 2008, 10:02am
All the talk about evening up the colors in tournament pairing and possibly changing the setup rules to equalize for Gold's advantage made me revisit this thread.  I remember now how most queries on the database to determine rating advantage are invalidated by the inaccuracy of the gameroom ratings and/or the fact that the stronger players tend to play Silver more often.  But there is one methodology I still haven't found any holes in, and that is using only pairs of games where the two players have reversed colors.  Rated games only, of course, to exclude timeouts that got unrated and exclude experiments.  When I ran that 16 months ago, the results were


on 11/16/06 at 01:02:00, Fritzlein wrote:

Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
ALL    .   11222  11231   .  212   .     0.4   .  0.14
B v B   .   1676   1684   .  190   .     2.2   .  0.32
H v B   .   8972   9001.5 .  213   .     1.6   .  0.53
H v H   .    574    545.5 .  258   .   -30.3   .  2.2

Note that I express the advantage of being Gold in rating points.  I decided to try this method again, this time only on game results after 11/15/06, in an attempt to detect any trends in game play.  Perhaps the way we used to play gave an advantage to Silver, but the times have changed and now the playing style gives an advantage to Gold.  The results according to the new data only:

Rated Pairs from 11/15/2006 to 3/23/2008

Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
ALL    .   10347  10392   .  251   .     2.4   .  0.79
B v B   .    143    151   .  128   .    22.2   .  0.95
H v B   .   9872   9896   .  251   .     1.4   .  0.43
H v H   .    332    345   .  309   .    27.5   .  1.33

Sadly, limiting the dataset to only games of the last sixteen months means that all of the results are statistically insignificant.  On just that data, the Gold advantage (if any) is too small to measure.  Most of the data is HvB, which shows a negligible advantage to Gold, whereas the BvB and HvH data, which shows a larger advantage to Gold, is on so few games that it could just be a fluke.  We are looking for 2 standard deviations but have at most 1.33.

The one thing this data does show to me, however, is that in November 2006 when I measured a "statistically significant" advantage for Silver in HvH games, it was probably a fluke after all, just accidentally among those 5% of times when the result is an outlier due to chance.

In an effort to achieve greater statistical significance, I gave up on trying to detect a recent trend.  (Craps players and stock pickers are always sure of finding short-term "trends".  :P)  Instead I ran the the test on the entire database, i.e. all pairs of games since the beginning of time.


Rated Pairs from 11/22/2002 to 3/23/2008

Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
ALL    .   21926  21985   .  231   .     1.4   .  0.69
B v B   .   1872   1896   .  185   .     5.8   .  0.88
H v B   .  19112  19163.5 .  233   .     1.4   .  0.65
H v H   .    942    925.5 .  278   .   -10.9   .  1.05


Even taking all the data together, we have no statistical significance.  For all we know the advantage of setting up second exactly balances the advantage of moving first.

I persist in my belief that Gold probably has the advantage, but it appears to be so small that we won't be able to measure it until we have a game database five or ten times the size we have now.  I predict that in the long run, between evenly matched players, Gold will win less than 51%, which is to say playing Gold is worth less than 7 rating points.

There is a possibility that once we know what we are doing, the advantage of the first move will be magnified.  Maybe now our incompetence adds so much noise to the signal, we simply can't see the true worth of tempo.  Note, however, that this is not true for chess.  In amateur chess games white scores about 55% and in expert chess games white also scores about 55%.  The difference is just that the chess experts have more draws.  If Arimaa is similar, and the balance between the colors doesn't change with increasing skill, then we will perpetually have the luxury of not worrying too much about color balance in tournaments, or changing the setup rule to even things out.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by The_Jeh on Mar 26th, 2008, 12:08pm
Let me try to understand your math. Would it be correct to say that, if gold and silver were in fact completely even, the chance of seeing results as good or better for gold in the ALL category is 28.5%, which is far from statistically significant?

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Mar 26th, 2008, 12:54pm

on 03/26/08 at 12:08:17, The_Jeh wrote:
Let me try to understand your math. Would it be correct to say that, if gold and silver were in fact completely even, the chance of seeing results as good or better for gold in the ALL category is 28.5%, which is far from statistically significant?

Well, my math is kind of fuzzier than that.  Let me walk through the all case step by step.

We have 21926 pairs of games, i.e. 43852 games total, of which Gold won 21985.  If Gold and Silver were completely equal, then it would be just like flipping a coin to test if the coin is fair.  If you do a large number of coin flips, the results will fall into a normal distribution (whether or not the coin is fair) with a standard deviation of sqrt(npq), where n is the number of trials, p the probability of heads, and q is the probability of tails.

Let's say we want to test the null hypothesis that the coin is fair.  If we flip a fair coin 43852 times, we expect heads 21926 times, with a standard deviation of sqrt(43852*(1/2)*(1/2)) = 104.7.  I'm sure you know how to convert standard deviations into probabilities, e.g. there is 68.3% chance of the actual number of head falling within 104.7 of the mean of 21926.  Our actual number of Gold wins was only 59 away from the mean, i.e. only 0.56 standard deviations.  There is a 28.66% chance that Gold would do this well or better just due to random chance, which I think is the number you are referring to in your post.

However, Gold and Silver are usually not even, which affects the significance of the result.  Suppose the data comes from repeated games of players who are 400 rating points apart.  Usually the stronger player will win both as Gold and as Silver, since he has 10 to 1 odds if there is no color advantage.  This helps force the results towards being 50-50 between Gold and Silver, i.e. it conceals the color advantage.  If the observed results deviate from even, we should attribute more significance to them.  The standard deviation for this mismatched case is sqrt(43852*(10/11)*(1/11)) = 60.2, so an observed difference of 59 extra wins for Gold is almost a full standard deviation.

Unfortunately, the true skill gap between pairs of players is unknown, and only inaccurately measured by game room ratings.  Furthermore, the skill gap varies from game to game, so I really should sum up the variance introduced by each pair of games, but I was lazy and instead just calculated an average mismatch.  Over all games the stronger player is rated an average of 231 point higher than the weaker player.  So I did my calculation of the size of one standard deviation as if each game had a 231-point mismatch.

The last column in the row for all games is the number of standard deviations.  For all games, 0.69 standard deviations means that Gold would have scored 59 extra wins by chance with 24.5% probability.  But actually, since we don't know whether Gold or Silver has an advantage, we really should use a two-tailed statistic, and say that there was a 49% chance of a result at least this extreme.

Not only is that number not statistically significant, it is right in middle of "average extremeness", so to speak.  If the actual game results had been exactly half Gold wins and half Silver wins, or only off by one or two, that would be a really spooky result in a different way from showing a color advantage.  It would be like strong evidence of cosmic Karma trying to even out wins.  I would have to double check my code for errors or look for reasons why the split was so exact and there wasn't more random variation.

All in all, the overall result of 0.69 standard deviations is about as non-informative as you can get.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Mar 26th, 2008, 1:09pm

on 12/14/03 at 07:43:07, clauchau wrote:
The most elementary strategy consists in randomly picking up a valid step among the immediately available steps, making the selected step on the board, and repeating this four times for every move.

random stepper / random stepper

Gold won 50.3%,  Silver won 49.7%

The winner reached the goal 62.5%
The loser moved an opposing rabbit to the goal 26.0%
The loser was unable to move 11.5%
Loss by 3-times repetition: 0 (no such loss).

Shortest game: 11 half-moves
Mean length: 76.9 half-moves (standard deviation = 21.7)
Longest game: 194 half-moves

(figures based on 100,000 games run in 4 minutes)

The standard deviation for 100,000 games between equal players is 158 wins, so actual results of 50,300 wins for Gold is almost two standard deviations, i.e. on the borderline of statistical significance.  Clauchau, would you be willing to run a larger test so we can say with greater confidence that moving first is an advantage to a random stepper?

Won't it be a hoot if we someday measure that among equally matched experts Gold has a 50.3% chance of winning?  That would be an advantage of 2 rating points to Gold :P.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by The_Jeh on Mar 26th, 2008, 1:44pm
The number I was referring to was the binomial probability if gold and silver are each equally likely to win. It's been a while since I took statistics and I had forgotten how to calculate what the standard devation is. Thanks for the review.

Perhaps you could get some perfectly even games by pitting a bot against itself several times. The results might be confounded by the style of the bot itself, but the results could be interesting.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by mistre on Mar 26th, 2008, 3:12pm
Or we could just have Karl play a match against himself.  Whichever color wins has the advantage.  If it is a draw, then there is no advantage for either color.   ;)

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by The_Jeh on Mar 26th, 2008, 3:41pm
Let's say that eventually we find that gold wins 60% of the time and silver 40% in master-level games. That is significant and I'm sure we would want a rule change. Or would we? Suppose, hypothetically, that we solve the game and discover that despite gold's 60% rate of victory, it is actually silver who could force a win with perfect play. With this new knowledge, would we want to change the rules or not?

My guess is that we would. But why does gold win more often? It must be because it is easier to make the best moves as gold than it is as silver. But why? And how do we measure this "difficulty in choosing?" Is it that gold's move choices lead to a higher average outcome than silver, even though silver's highest forced outcome is better than gold's highest forced outcome? Or is it for some reason easier to spot a good move for gold than for silver?

When a game has the intuitive feel of being mechanically perfect, such as chess, I am superstitious of changing its rules just for the sake of statistics obtained from mortal players.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Mar 26th, 2008, 3:43pm

on 03/26/08 at 13:44:18, The_Jeh wrote:
The number I was referring to was the binomial probability if gold and silver are each equally likely to win. It's been a while since I took statistics and I had forgotten how to calculate what the standard devation is. Thanks for the review.

Ah, that's another way in which I'm careless: I assume the normal curve as an approximation of the binomial.  Also I vaguely recall that when I use the continuous approximation instead of the exact discrete, I should have my boundary at 59.5 instead of 59.  Oh, well.  It's a good thing we got nearly the same answer!

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by aaaa on Apr 13th, 2008, 3:01pm
Based on my calculations, Gold's winning percentage varies wildly depending on how the games are weighted by rating. In order for me to come to a more reliable number, Fritzlein and chessandgo would need to play a bunch of games against each other with alternating colors (cf. Karpov vs. Kasparov).

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by chessandgo on Apr 13th, 2008, 5:27pm

on 04/13/08 at 15:01:31, aaaa wrote:
Based on my calculations, Gold's winning percentage varies wildly depending on how the games are weighted by rating. In order for me to come to a more reliable number, Fritzlein and chessandgo would need to play a bunch of games against each other with alternating colors (cf. Karpov vs. Kasparov).


I fear it would be more like John Doe vs Jean Dupont, but I have nothing against playing a lot of games with Karl :)

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Jun 10th, 2011, 11:15pm
Nowadays there is a persistent rumor that Silver has an advantage in Arimaa due to the second setup.  I suddenly realized that it had been over three years since my last statistical investigation of the subject.  What if modern playing styles have changed so much that the results have started to change as well?  So I have repeated my former experiment over the most recent three years of data using the same methodology described earlier in this thread.


Rated Pairs from 3/24/2008 to 5/31/2011

Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
ALL    .   35329  35451.5 .  231   .     1.8   .  1.13
B v B   .   1615   1672   .  203   .    16.9   .  1.94
H v B   .  32966  33019.5 .  233   .     0.9   .  0.51
H v H   .    748    760   .  216   .     8.0   .  0.74


The draw that appears is this game (http://arimaa.com/arimaa/games/jsShowGame.cgi?gid=74070&s=w) from a time when elimination wasn't always a victory.  None of the results are statistically significant, not even the BvB games where we are pretty sure Gold has an advantage because bots don't use the second setup to react.

To capture the largest dataset possible, I queried for all color-reversed pairs of games since the beginning of time.  Note that this includes 69% of all rated games ever played, which reflects a natural tendency of all participants to swap colors between rematches.  If color assignment were completely random, I would only be picking up 2/3 of the games.


Rated Pairs from 11/22/2002 to 5/31/2011

Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
ALL    .   57725  57914.5 .  231   .     1.7   .  1.37
B v B   .   3491   3571   .  193   .    10.7   .  2.22
H v B   .  52494  52602   .  233   .     1.1   .  0.82
H v H   .   1740   1741.5 .  251   .     0.5   .  0.06


At last, a statistically significant result!  In BvB games, Gold has a measurable advantage.  My best guess at the size of the advantage is about 11 Elo points.  But of course that is attributable to bots failing to use the second setup to gain tempi in any way.  Bot developers take note of the points you are giving away.

One objection to my methodology might be that I haven't looked at the best dataset, namely games between strong human players.  I doubt that it is the best dataset, because it is smaller and thus less likely to give a significant result.  Nevertheless, leaving no stone unturned, I looked.  I limited the games to both players being human and rated over 1900.  This might have a weird edge effect in that one game will count but the rematch won't count because one of the players fell below 1900 rating in the mean time.  That's the way my code works, so hopefully the effect is small.


Rated Good Game Pairs from 11/22/2002 to 5/31/2011

Game Type  Pairs  Gold Wins  Mismatch  Gold Adv.  # Std. Dev.
---------  -----  ---------  --------  ---------  -----------
good    .    524    529   .  181   .     4.3   .  0.35


We can't reject the null hypothesis, and not just because the data set is small.  Even in relative terms the colors are coming out even.  

As I reread what I wrote in years past, I was quite persistent in my belief that Gold had an inherent advantage.  Now I am less sure.  The BvB data can be neglected, because bots are dumb.  Otherwise the data are totally inconclusive.  The only thing I feel confident in saying at the moment is that the statistics give us no reason whatsoever to prefer one color over the other.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Saposhiente on Apr 3rd, 2013, 6:41pm
Included in your data is probably a lot of symmetric 99of9 vs 99of9 setups, where Silver does nothing to use his second setup advantage. Would it be possible to filter to only include games where one player uses a different setup than the other, varied by more than just swapping cats and dogs?

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Boo on Apr 4th, 2013, 1:48am
High rated postal games have the less blunders, so long-term tendencies should be best visible there.

E.g. I took 15 best players acording to current postal ratings:
Player                    Gold%    Silver%          difference
Fritzlein (2823)        92     -      90               +2
Alfons    (2527)        76       -     96              -20
chessandgo (2493)  88     -      83              +5
Adanac   (2420)       73     -      62              +11
99of9     (2377)        80    -       66              +14
Boo       (2331)         70       -     89             -19
bot_briareus (2314) 80     -       63             +17
Hippo (2281)            84     -       70             +14
jdb      (2280)           67      -      83             -16
RonWeasley  (2225) 67    -       73              -6
browni3141 (2215)   52       -     91             -39
rabbits (2211)          too little games(4)      0
ChrisB (2209)          62        -     75              -13
robinson (2208 )       50       -      54             -4
Asger (2200)          100      -      100            0
---
Total  difference                                          -56

Of course it would be better to calculate by the number of games... Before doing this table my guess was that silver has a slight advantage (because I do better with silver ;) ). But most probably I am wrong, because the players preferring silver have played less games (I  think).



Title: Re: First Move Advantage vs. Second Setup Advantag
Post by browni3141 on Apr 4th, 2013, 12:57pm

on 04/04/13 at 01:48:53, Boo wrote:
(because I do better with silver ;) )

Well just look at that horrible gold setup :P. Yeah, yeah, I know I lost to it.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Ail on Feb 20th, 2014, 7:11pm
Is the statement that Bots would not make use of the second-setup-advantage still thought to be true?

I'm pretty sure that at least Sharp does it all the time since I even immitated his way of doing it, which kinda boils down to:
Put elephant on the line of opponents camel and place own camel as far away as possible from the opponents elephant.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by johncf1018 on Jan 21st, 2015, 3:58pm
A bit of thread necro but there is any easy theoretical solution to player advantage in arimaa, consisting of slight modifications to the rules.  My idea (I'm sure others have had it before) is simply to start with an empty board.  Gold places any 2 gold pieces, then silver places any 4 silver pieces.  Players alternate placing pieces 4 at a time.  When a player has no more pieces to place s/he may start moving his pieces.  Play progresses as normal.  When I've suggested this in the past I've always met with the response that this modification isn't needed as there is no significant advantage for either player.  However, it seems like now the data suggests there could be an advantage.  Even if there isn't, the overhead of these rules changes are so minor that I can't see any reason NOT to adopt them aside from tradition - which is a bit of a moot point in a game this young.  In any case, this is how I play with my friends and we like it so I thought I'd perform a bit of thread necro and toss this idea out there again.  

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by aaaa on Jan 21st, 2015, 4:23pm

on 01/21/15 at 15:58:40, johncf1018 wrote:
A bit of thread necro but there is any easy theoretical solution to player advantage in arimaa, consisting of slight modifications to the rules.  My idea (I'm sure others have had it before) is simply to start with an empty board.  Gold places any 2 gold pieces, then silver places any 4 silver pieces.  Players alternate placing pieces 4 at a time.  When a player has no more pieces to place s/he may start moving his pieces.  Play progresses as normal.  When I've suggested this in the past I've always met with the response that this modification isn't needed as there is no significant advantage for either player.  However, it seems like now the data suggests there could be an advantage.  Even if there isn't, the overhead of these rules changes are so minor that I can't see any reason NOT to adopt them aside from tradition - which is a bit of a moot point in a game this young.  In any case, this is how I play with my friends and we like it so I thought I'd perform a bit of thread necro and toss this idea out there again.  

This is actually pretty close to something Omar himself
already proposed as a possible future rule change (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?action=display;board=talk;num=1206399208;start=0#4).

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by browni3141 on Jan 21st, 2015, 4:52pm

on 01/21/15 at 15:58:40, johncf1018 wrote:
A bit of thread necro but there is any easy theoretical solution to player advantage in arimaa, consisting of slight modifications to the rules.  My idea (I'm sure others have had it before) is simply to start with an empty board.  Gold places any 2 gold pieces, then silver places any 4 silver pieces.  Players alternate placing pieces 4 at a time.  When a player has no more pieces to place s/he may start moving his pieces.  Play progresses as normal.  When I've suggested this in the past I've always met with the response that this modification isn't needed as there is no significant advantage for either player.  However, it seems like now the data suggests there could be an advantage.  Even if there isn't, the overhead of these rules changes are so minor that I can't see any reason NOT to adopt them aside from tradition - which is a bit of a moot point in a game this young.  In any case, this is how I play with my friends and we like it so I thought I'd perform a bit of thread necro and toss this idea out there again.  


I thought the data indicated no advantage whatsoever with any amount of reliability. There is also no good logical argument favoring one side or another. I personally prefer silver. Which side are you implying has the advantage? For me there is more than just tradition as a reason not to adopt a rule change. I don't see any advantage to your rule set over the current one. For all we know your change could increase advantage for one side :P
I don't think your idea would necessarily make Arimaa worse, but I can't see how it would make Arimaa better.

Finally, this rule and rules like the pie rule I don't personally very much like. I think achieving equality through imbalance (like setup response advantage vs. first move advantage)  is much more interesting than equality through clear balance, although the former is much harder to achieve.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Ail on Jan 22nd, 2015, 3:52am
I indeed see a disadvantage in that rule-change.
It would take more time to set the board up.

I'd rather just start quickly into the game with a standard-setup as gold and a slightly adapted setup as silver instead of spending several turns of setting up the board first.

Title: Re: First Move Advantage vs. Second Setup Advantag
Post by Fritzlein on Jan 22nd, 2015, 9:22am
Taking longer to start the game is a small nuisance, a small price to pay to fix a problem.  The problem itself, however, is absolutely minuscule, to the point that we don't even know which side has the advantage.  Thus the vanishingly small size of the problem makes the price of the solution way too high to be worth paying.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.