Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Bot Development >> Specific UCT advice sought.
(Message started by: ddyer on Aug 30th, 2012, 4:07am)

Title: Specific UCT advice sought.
Post by ddyer on Aug 30th, 2012, 4:07am
Ok, here's another one.  I noticed that my UCT bot prefers pushing rabbits a lot more than makes sense to me.  I discovered a position where an "elephant suicide" move is greatly preferred, much to my amazement.

After a few hours of tweaking the behavior to try to home in on the problem, I came up with this explanation:  Given an advanced rabbit with anything close to a clear path to the goal, making random moves is very likely to free the rabbit and let it win.  I first saw this in the playouts as an actual suicide move, where gold pulls or pushes the silver rabbit to the goal, but filtering out these moves isn't enough.

My question is what kind of logic should I consider to bias against unnecessarily freeing the rabbit.

Title: Re: Specific UCT advice sought.
Post by Fritzlein on Aug 30th, 2012, 8:22am
If anyone has figured this out, I haven't heard about it.  My takeaway about UCT for Arimaa is that random playouts are extremely biased toward the player with advanced rabbits for the reason you give: random moves are likely to let a rabbit through.

But it is not just advanced rabbits that are favored by random playouts; it is also more rabbits.  Someone once let two random bots play each other, where one started without an elephant and the other started with one rabbit fewer, but all fifteen remaining pieces starting on the two home ranks.  The bot without an elephant won more often.  (Perhaps you would be interested in replicating the result that a rabbit is worth more than an elephant between two random players?)

The only known solution appears to be abandoning random playouts as a source of evaluation, and using a hand-coded evaluation at a cutoff depth instead.  Of course, re-introducing hand-coded evaluation removes one of the main reasons for doing UCT in the first place.  Then MCTS competes with alpha-beta only on grounds of effective search tree building, which is a hard sell due to the vastly greater overhead involved in MCTS.  Read here:
http://arimaa.com/arimaa/papers/TomasKozelekThesis/mt.pdf


Title: Re: Specific UCT advice sought.
Post by Manuel on Aug 30th, 2012, 9:15am
Myself I was planning to try out the following idea some day:

Not only try to remove the biases in the random playouts, but also try to simply correct for them with an evaluation afterwards.
So, if random playouts favor a forward rabbit step by e.g 4 points too much, I would add a negative bonus of 4 points to each forward rabbit step. Then, if the random playouts tell me that a move with a forward rabbit step is worth 3.8 points more than another move, I would select the other move. If the rabbit move wins by 4.2 points, I would select the rabbit move.
This way you could also correct for other biases that may occur in your random playouts and eventually a complete evaluation could be added to the outcome of the playouts. The advantage would be that the UCT part should recognize strategic good moves and the added evaluation would correct for the mistakes the UCT makes.

Of course, I have no idea whether this is enough to fix UCT for arimaa, but I think it can be very interesting.
A slight problem may be that any change in the random bots will change the biases and the whole evaluation would have to be retuned...

As I said, I was planning on trying this out myself, but as in practice I am programming on average about 1 line per month, my bot is probably never going to be finished, so I can just as well give my idea away here... I would be very interested to see whether my idea works!


P.S. This idea came to my mind when I read once that in Go UCT tends to slightly favor (or disfavor, i don't remember) moves at the edge of the board. In Go this is apparently not a big problem, but still I didn't understand why nobody got the idea to simply correct for it. (or maybe people do correct for this and this was just not mentioned in the text I was reading).
Anyways, in the case of Go, I would be very surprised if this way of correcting would not work.

Title: Re: Specific UCT advice sought.
Post by rbarreira on Aug 30th, 2012, 9:21am
I haven't tried these methods myself, but I am more surprised at the fact they work for Go than at the fact they don't work for Arimaa :)

Title: Re: Specific UCT advice sought.
Post by ddyer on Aug 30th, 2012, 11:26am
Having sept on the problem;   The problem is that in the random playout phase, really bad moves that no one would actually make are on a par with the good moves.  In the UCT/Tree building phase, the bad moves are punished and weeded out, but in the random phase they are not.

In my situation, if the "rabbit run" were imminent, it would be seen in the UCT tree and avoided.  However, it's not.  3 or 4 enabling moves have to happen first, so the UCT tree never gets traction.  The elephant suicide actually seems to be good, because it allows the opposing pieces to run away, clearing the path for the rabbit.

As my next experiment, I'm going to decrease the probability of a class of moves which release frozen rabbits or unblock blocked rabbits.

Title: Re: Specific UCT advice sought.
Post by Fritzlein on Aug 30th, 2012, 12:59pm

on 08/30/12 at 11:26:03, ddyer wrote:
The problem is that in the random playout phase, really bad moves that no one would actually make are on a par with the good moves.

That's an intuitive answer, and the one I would have come up with myself, but according to what I have read it is not true.  If selecting bad moves with equal preference to good moves were the problem, then UCT would not work for Go either.  But in fact UCT does work for Go, even when the random playouts consist mostly of atrocious moves on both sides.

There was a famous paper in which it was shown that making UCT Go playouts less random by increasing the quality of the moves (i.e. having good moves be more likely to be selected than bad moves) made UCT perform worse.  How is this result possible?  The conclusion of the authors was that it doesn't matter whether moves selected for playout are good or bad.  What matters is that they don't favor one side or the other on average.

It is possible to construct Go positions for which the side with a clear win will lose more often than not when the playouts are random, but such positions appear to be unusual.  It appears to be normal that letting each side make one random move tends to help/hurt both sides by about the same amount.  That's why random playouts give approximately reasonable evaluations for Go.

Trying to make the same thing true for Arimaa is a daunting task.  The starting point of random moves hugely favors the player with more rabbits, or more advanced rabbits.  From this terrible condition, you can't expect the problem to be corrected simply by choosing better moves more often, since we know from Go that this can actually make the problem worse.  Yet somehow you have to ensure that for a general, typical position, letting each side make a move by your chosen weighting tends to help/hurt both players equally on average.  Good luck!

Title: Re: Specific UCT advice sought.
Post by ddyer on Aug 30th, 2012, 2:04pm
I've implemented a very cheap test for "moving away from a rabbit" and reject these moves with 50% probability.  It seems to work so far.

I suppose there's a good argument that Arimaa is different from Go in this respect.   In Arimaa there are two very different kinds of fight going on simultaneously - the short term rabbit run, and the long term slog to material superiority.  Pulling material away from keeping the rabbits bottled up is a delicate judgement.

Title: Re: Specific UCT advice sought.
Post by jdb on Aug 30th, 2012, 7:45pm
Here is my two cents.

Using random playouts the side who has the most advanced rabbits is winning. To fix this the playouts need to be smart enough to deal with an advanced rabbit.

1) Assign a high probability to capture moves. This only needs to handle two step captures. Some simple static detection around the traps should work.

2) Bias the playouts to maintain piece distribution on the home files. Make sure each file has at least a piece on one of the first three ranks. So for example, if gold has d1,d2 and d3 empty, bias the playouts to put a piece on one of those squares.

These two rules should be enough to stop an unassisted rabbit from goaling.

Good luck.

Title: Re: Specific UCT advice sought.
Post by ddyer on Sep 1st, 2012, 12:31pm
I tried a few more UCT experiments, which I thought I'd report.

  • semi-heavy playouts:  always choose 2 random moves, run the static evaluator on both, and use the better.  This slowed the move rate a lot, and didn't result in a net benefit.
  • punish rabbit moves.   Lower the probability of rabbit moves being chosen by x%.  For small x this seems to help, but there's a tendency for the player who won't let the rabbits run to be blindsided by a successful run.
  • piece hysterysis.  Make moves 2-4 of a turn move the same piece as the previous, with enhanced probability x%.  This didn't ever seem to help.


In any case, the UCT arimaa bot remains very weak, much weaker than the concentional alpha-beta bot.   I got the impression from Fritzlein's post that no one has a successful bot based on UCT.  Is that correct?

Title: Re: Specific UCT advice sought.
Post by Hippo on Sep 1st, 2012, 4:15pm
... bot_akimot won all 13 games against me ... ;)

Title: Re: Specific UCT advice sought.
Post by Fritzlein on Sep 1st, 2012, 6:08pm

on 09/01/12 at 16:15:14, Hippo wrote:
... bot_akimot won all 13 games against me ... ;)

Bot_akimot didn't use playouts for evaluation.  It cut off the playouts and used hard-coded evaluation.  That makes the comparison of bot_akimot to alpha-beta searchers very problematic.  Bot_akimot was certainly never competitive with the top alpha-beta searchers.  You could say that it isn't fair to compare a new approach to an established approach, because Kozelek didn't have as much time to enhance and fine tune the basic MCTS compared to top alpha-beta searchers.  That's a reasonable argument, but then when he actually compares bot_akimot to a relatively vanilla alpha-beta searcher, it is unfair in the other direction that bot_akimot got enhanced and tuned.

To summarize: What has been a total failure is using random playouts (weighted or not) for evaluation.  What is very much up for debate is whether semi-random, lopsided construction of a search tree can justify the necessary overhead compared to the faster (usually also lopsided) growth of a deterministic alpha-beta search tree, where both MCTS and alpha-beta use a hard-coded evaluation function.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.