Arimaa Forum - Specific UCT advice sought.

Welcome, Guest. Please Login or Register.
Jul 1^st, 2025, 11:12pm

Home

Help

Members

Arimaa Forum « Specific UCT advice sought. »

   Arimaa Forum
   Arimaa
   Bot Development (Moderator: supersamu)
   Specific UCT advice sought.

« Previous topic | Next topic »

Pages: 1

Notify of replies

Send Topic

Author

Topic: Specific UCT advice sought. (Read 2073 times)

ddyer
Forum Guru

Gender:

Posts: 66

Specific UCT advice sought.
« on: Aug 30^th, 2012, 4:07am »

Quote

Modify

Ok, here's another one. I noticed that my UCT bot prefers pushing rabbits a lot more than makes sense to me. I discovered a position where an "elephant suicide" move is greatly preferred, much to my amazement.

After a few hours of tweaking the behavior to try to home in on the problem, I came up with this explanation: Given an advanced rabbit with anything close to a clear path to the goal, making random moves is very likely to free the rabbit and let it win. I first saw this in the playouts as an actual suicide move, where gold pulls or pushes the silver rabbit to the goal, but filtering out these moves isn't enough.

My question is what kind of logic should I consider to bias against unnecessarily freeing the rabbit.

IP Logged

visit my game site: http://www.boardspace.net/
free online abstract strategy games

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Specific UCT advice sought.
« Reply #1 on: Aug 30^th, 2012, 8:22am »

Quote

Modify

If anyone has figured this out, I haven't heard about it. My takeaway about UCT for Arimaa is that random playouts are extremely biased toward the player with advanced rabbits for the reason you give: random moves are likely to let a rabbit through.

But it is not just advanced rabbits that are favored by random playouts; it is also more rabbits. Someone once let two random bots play each other, where one started without an elephant and the other started with one rabbit fewer, but all fifteen remaining pieces starting on the two home ranks. The bot without an elephant won more often. (Perhaps you would be interested in replicating the result that a rabbit is worth more than an elephant between two random players?)

The only known solution appears to be abandoning random playouts as a source of evaluation, and using a hand-coded evaluation at a cutoff depth instead. Of course, re-introducing hand-coded evaluation removes one of the main reasons for doing UCT in the first place. Then MCTS competes with alpha-beta only on grounds of effective search tree building, which is a hard sell due to the vastly greater overhead involved in MCTS. Read here:
http://arimaa.com/arimaa/papers/TomasKozelekThesis/mt.pdf

« Last Edit: Aug 30^th, 2012, 3:50pm by Fritzlein »

IP Logged

Manuel
Forum Guru

Arimaa player #4020

Gender: male

Posts: 58

Re: Specific UCT advice sought.
« Reply #2 on: Aug 30^th, 2012, 9:15am »

Quote

Modify

Myself I was planning to try out the following idea some day:

Not only try to remove the biases in the random playouts, but also try to simply correct for them with an evaluation afterwards.
So, if random playouts favor a forward rabbit step by e.g 4 points too much, I would add a negative bonus of 4 points to each forward rabbit step. Then, if the random playouts tell me that a move with a forward rabbit step is worth 3.8 points more than another move, I would select the other move. If the rabbit move wins by 4.2 points, I would select the rabbit move.
This way you could also correct for other biases that may occur in your random playouts and eventually a complete evaluation could be added to the outcome of the playouts. The advantage would be that the UCT part should recognize strategic good moves and the added evaluation would correct for the mistakes the UCT makes.

Of course, I have no idea whether this is enough to fix UCT for arimaa, but I think it can be very interesting.
A slight problem may be that any change in the random bots will change the biases and the whole evaluation would have to be retuned...

As I said, I was planning on trying this out myself, but as in practice I am programming on average about 1 line per month, my bot is probably never going to be finished, so I can just as well give my idea away here... I would be very interested to see whether my idea works!

P.S. This idea came to my mind when I read once that in Go UCT tends to slightly favor (or disfavor, i don't remember) moves at the edge of the board. In Go this is apparently not a big problem, but still I didn't understand why nobody got the idea to simply correct for it. (or maybe people do correct for this and this was just not mentioned in the text I was reading).
Anyways, in the case of Go, I would be very surprised if this way of correcting would not work.

IP Logged

rbarreira
Forum Guru

Arimaa player #1621

Gender: male

Posts: 605

Re: Specific UCT advice sought.
« Reply #3 on: Aug 30^th, 2012, 9:21am »

Quote

Modify

I haven't tried these methods myself, but I am more surprised at the fact they work for Go than at the fact they don't work for Arimaa

IP Logged

ddyer
Forum Guru

Gender:

Posts: 66

Re: Specific UCT advice sought.
« Reply #4 on: Aug 30^th, 2012, 11:26am »

Quote

Modify

Having sept on the problem; The problem is that in the random playout phase, really bad moves that no one would actually make are on a par with the good moves. In the UCT/Tree building phase, the bad moves are punished and weeded out, but in the random phase they are not.

In my situation, if the "rabbit run" were imminent, it would be seen in the UCT tree and avoided. However, it's not. 3 or 4 enabling moves have to happen first, so the UCT tree never gets traction. The elephant suicide actually seems to be good, because it allows the opposing pieces to run away, clearing the path for the rabbit.

As my next experiment, I'm going to decrease the probability of a class of moves which release frozen rabbits or unblock blocked rabbits.

IP Logged

visit my game site: http://www.boardspace.net/
free online abstract strategy games

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Specific UCT advice sought.
« Reply #5 on: Aug 30^th, 2012, 12:59pm »

Quote

Modify

on Aug 30^th, 2012, 11:26am, ddyer wrote:

The problem is that in the random playout phase, really bad moves that no one would actually make are on a par with the good moves.

That's an intuitive answer, and the one I would have come up with myself, but according to what I have read it is not true. If selecting bad moves with equal preference to good moves were the problem, then UCT would not work for Go either. But in fact UCT does work for Go, even when the random playouts consist mostly of atrocious moves on both sides.

There was a famous paper in which it was shown that making UCT Go playouts less random by increasing the quality of the moves (i.e. having good moves be more likely to be selected than bad moves) made UCT perform worse. How is this result possible? The conclusion of the authors was that it doesn't matter whether moves selected for playout are good or bad. What matters is that they don't favor one side or the other on average.

It is possible to construct Go positions for which the side with a clear win will lose more often than not when the playouts are random, but such positions appear to be unusual. It appears to be normal that letting each side make one random move tends to help/hurt both sides by about the same amount. That's why random playouts give approximately reasonable evaluations for Go.

Trying to make the same thing true for Arimaa is a daunting task. The starting point of random moves hugely favors the player with more rabbits, or more advanced rabbits. From this terrible condition, you can't expect the problem to be corrected simply by choosing better moves more often, since we know from Go that this can actually make the problem worse. Yet somehow you have to ensure that for a general, typical position, letting each side make a move by your chosen weighting tends to help/hurt both players equally on average. Good luck!

« Last Edit: Aug 30^th, 2012, 1:00pm by Fritzlein »

IP Logged

ddyer
Forum Guru

Gender:

Posts: 66

Re: Specific UCT advice sought.
« Reply #6 on: Aug 30^th, 2012, 2:04pm »

Quote

Modify

I've implemented a very cheap test for "moving away from a rabbit" and reject these moves with 50% probability. It seems to work so far.

I suppose there's a good argument that Arimaa is different from Go in this respect. In Arimaa there are two very different kinds of fight going on simultaneously - the short term rabbit run, and the long term slog to material superiority. Pulling material away from keeping the rabbits bottled up is a delicate judgement.

IP Logged

visit my game site: http://www.boardspace.net/
free online abstract strategy games

jdb
Forum Guru

Arimaa player #214

Gender: male

Posts: 682

Re: Specific UCT advice sought.
« Reply #7 on: Aug 30^th, 2012, 7:45pm »

Quote

Modify

Here is my two cents.

Using random playouts the side who has the most advanced rabbits is winning. To fix this the playouts need to be smart enough to deal with an advanced rabbit.

1) Assign a high probability to capture moves. This only needs to handle two step captures. Some simple static detection around the traps should work.

2) Bias the playouts to maintain piece distribution on the home files. Make sure each file has at least a piece on one of the first three ranks. So for example, if gold has d1,d2 and d3 empty, bias the playouts to put a piece on one of those squares.

These two rules should be enough to stop an unassisted rabbit from goaling.

Good luck.

« Last Edit: Aug 30^th, 2012, 7:45pm by jdb »

IP Logged

ddyer
Forum Guru

Gender:

Posts: 66

Re: Specific UCT advice sought.
« Reply #8 on: Sep 1^st, 2012, 12:31pm »

Quote

Modify

I tried a few more UCT experiments, which I thought I'd report.

semi-heavy playouts: always choose 2 random moves, run the static evaluator on both, and use the better. This slowed the move rate a lot, and didn't result in a net benefit.
punish rabbit moves. Lower the probability of rabbit moves being chosen by x%. For small x this seems to help, but there's a tendency for the player who won't let the rabbits run to be blindsided by a successful run.
piece hysterysis. Make moves 2-4 of a turn move the same piece as the previous, with enhanced probability x%. This didn't ever seem to help.

In any case, the UCT arimaa bot remains very weak, much weaker than the concentional alpha-beta bot. I got the impression from Fritzlein's post that no one has a successful bot based on UCT. Is that correct?

« Last Edit: Sep 1^st, 2012, 12:33pm by ddyer »

IP Logged

visit my game site: http://www.boardspace.net/
free online abstract strategy games

Hippo
Forum Guru

Arimaa player #4450

Gender: male

Posts: 883

Re: Specific UCT advice sought.
« Reply #9 on: Sep 1^st, 2012, 4:15pm »

Quote

Modify

... bot_akimot won all 13 games against me ... Wink

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: Specific UCT advice sought.
« Reply #10 on: Sep 1^st, 2012, 6:08pm »

Quote

Modify

on Sep 1^st, 2012, 4:15pm, Hippo wrote:

... bot_akimot won all 13 games against me ... Wink

Bot_akimot didn't use playouts for evaluation. It cut off the playouts and used hard-coded evaluation. That makes the comparison of bot_akimot to alpha-beta searchers very problematic. Bot_akimot was certainly never competitive with the top alpha-beta searchers. You could say that it isn't fair to compare a new approach to an established approach, because Kozelek didn't have as much time to enhance and fine tune the basic MCTS compared to top alpha-beta searchers. That's a reasonable argument, but then when he actually compares bot_akimot to a relatively vanilla alpha-beta searcher, it is unfair in the other direction that bot_akimot got enhanced and tuned.

To summarize: What has been a total failure is using random playouts (weighted or not) for evaluation. What is very much up for debate is whether semi-random, lopsided construction of a search tree can justify the necessary overhead compared to the faster (usually also lopsided) growth of a deterministic alpha-beta search tree, where both MCTS and alpha-beta use a hard-coded evaluation function.

IP Logged

Pages: 1

Notify of replies

Send Topic


« Previous topic \| Next topic »