Arimaa Forum

Welcome, Guest. Please Login or Register.
May 16^th, 2024, 12:18am

Home

Help

Members

Arimaa Forum « UCT »

   Arimaa Forum
   Arimaa
   Bot Development (Moderator: supersamu)
   UCT

« Previous topic | Next topic »

Pages: 1

Notify of replies

Send Topic

Author

Topic: UCT (Read 1538 times)

nbarriga
Forum Guru

Almost retired Bot Developer

Gender: male

Posts: 119

UCT
« on: Aug 13^th, 2008, 9:51pm »

Quote

Modify

Sorry for bringing back an old topic. I read the old posts about UCT, but I would like some kind of update/abstract. I'm interested to know what the status is/was on the bots that used UCT.

Is there currently a bot using UCT or some similar Montecarlo approach?

What where the strengths and weaknesses of them?

What made you drop that approach(if you did...)?

Is there a consensus that alpha/beta plus optimizations is the way to go for winning the Computer Championship in the near future(2-3 years)?

What other options are you considering, I'm talking search here, not eval(I already ruled out for myself the genetic algorithm and tabu search approaches a few years ago)?

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: UCT
« Reply #1 on: Aug 13^th, 2008, 10:24pm »

Quote

Modify

I have not tried any fancy search since 2004 (when I hacked together my own forward pruning alpha-beta). I am now concentrating on an eval rewrite, but I don't really hold hopes of winning the challenge.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: UCT
« Reply #2 on: Aug 14^th, 2008, 6:16am »

Quote

Modify

on Aug 13^th, 2008, 9:51pm, nbarriga wrote:

What where the strengths and weaknesses of them?

JDB's UCT bot would push rabbits insanely, because a rabbit might get through against a random defense. It was far too concerned about advanced rabbits, and not nearly concerned enough about trap control.

We know from Janzert's experiment that a random bot would prefer an extra rabbit to an extra elephant. UCT isn't quite that bad, but some of the random mindset infects the UCT thinking. Arimaa seems different from Go in this respect: random playouts in Go give a noisy evaluation, whereas random playouts in Arimaa give a horrendously bad evaluation.

IP Logged

nbarriga
Forum Guru

Almost retired Bot Developer

Gender: male

Posts: 119

Re: UCT
« Reply #3 on: Aug 14^th, 2008, 8:22am »

Quote

Modify

on Aug 14^th, 2008, 6:16am, Fritzlein wrote:

We know from Janzert's experiment that a random bot would prefer an extra rabbit to an extra elephant. UCT isn't quite that bad, but some of the random mindset infects the UCT thinking. Arimaa seems different from Go in this respect: random playouts in Go give a noisy evaluation, whereas random playouts in Arimaa give a horrendously bad evaluation.

I would expect this to improve when increasing the number of playouts? With multicore computers increasingly common, I would expect easy parallelization of the random playouts in this algorithm. Or even through MPI in a small cluster.

And what about not having the playouts until the endgame, but have them of fixed length, let's say 12-16 moves, and use an eval function there?

IP Logged

Janzert
Forum Guru

Arimaa player #247

Gender: male

Posts: 1016

Re: UCT
« Reply #4 on: Aug 14^th, 2008, 11:26am »

Quote

Modify

on Aug 14^th, 2008, 8:22am, nbarriga wrote:

Not sure what exactly you're referring to that you expect to improve. If you mean a straight monte-carlo bot, one that does no search but merely playouts from the root moves and chooses the one that looks best, then no it will pretty much always prefer the rabbit to an elephant no matter how many playouts it performs. Monte-carlo go programs play a little more reasonably but do level off their performance at around 5000 playouts. If on the other hand you mean a bot that uses UCT (or more generally MCTS¹) then yes since these would eventually converge to perfect play they will (eventually) get better.

¹ Monte-Carlo Tree Search (MCTS) is a term recently coined on the computer-go list to describe tree search techniques that use randomized playouts at the leaves for evaluation. The reason for the term is because several (most?) of the top go programs no longer use actual UCT.

Quote:

And what about not having the playouts until the endgame, but have them of fixed length, let's say 12-16 moves, and use an eval function there?

Not sure I understand this correctly, but it sounds like something I have thought about trying but have not found the time yet.

The ultimate question of course is whether any of these techniques choose better moves than more traditional alpha-beta based searches. I believe the reason the MCTS techniques work so well in go is because the computational work to score accuracy ratio is favorable for random playouts compared to traditional eval.

Also regarding the actual tree search portion I've been thinking lately that at least at a conceptual level there isn't much difference between UCT (or at least the variants now in use) and a modern alpha-beta based searcher with all the extensions and reductions in use. Recently on the computer-go list the author of the Deep Sjeng (chess) and Leela (go) programs Gian-Carlo Pascutto stated this as well:
Quote:

It is not so obvious to me any more what the differences are between a state of the art UCT searcher (i.e. with nearly no exploration) and a state of the art alpha-beta searcher (i.e. with heavy late move reductions).

They both just hammer out the mainline deeper until the score drops a bit, or an alternative rises strongly.

Janzert

« Last Edit: Aug 14^th, 2008, 11:28am by Janzert »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: UCT
« Reply #5 on: Aug 15^th, 2008, 11:25am »

Quote

Modify

on Aug 14^th, 2008, 11:26am, Janzert wrote:

Also regarding the actual tree search portion I've been thinking lately that at least at a conceptual level there isn't much difference between UCT (or at least the variants now in use) and a modern alpha-beta based searcher with all the extensions and reductions in use.

I agree. If you are going to do evaluation with a static function rather than with a complete playout, then why use UCT to build a tree at all? Why not use something deterministic to create depth extensions and depth reductions? Then the depth/breadth tradeoff philosophy that underlies UCT can be applied in a very game-specific way. For example, in Arimaa we know it is a good idea to explore threatened captures, so we deepen the search automatically. If I recall correctly, Fotland once said something like, "The better I understand Arimaa strategy, the more lopsided my search tree is going to become."

The main reason to use UCT as far as I can see is to avoid static evaluation. But random playouts give terrible Arimaa evaluations. Yes, the tree-building slowly trains the players not to play randomly, but the training is far too slow to overcome the bias for forward-flung rabbits.

A reason to not use UCT to build the tree is that the whole explored tree so far has to be saved, whereas alpha-beta searchers can throw away information as they go. However, there may be some convergence here too as alpha-beta searches save more and more nodes with information, while UCT searchers expand nodes selectively and throw away most of the tree because they can't save it all.

It will be interesting if the shape of search trees in alpha-beta searchers gradually converges to the shape of trees created by UCT searchers.

« Last Edit: Aug 15^th, 2008, 2:40pm by Fritzlein »

IP Logged

nbarriga
Forum Guru

Almost retired Bot Developer

Gender: male

Posts: 119

Re: UCT
« Reply #6 on: Aug 15^th, 2008, 11:43am »

Quote

Modify

And to reformulate my last question? Is anybody considering, or has ideas on new search algorithms other than UCT and alpha/beta?

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: UCT
« Reply #7 on: Aug 15^th, 2008, 2:44pm »

Quote

Modify

As I understand it, Rat has a totally different philosophy. It doesn't even generate all the moves, much less explore them all. Rat is only allowed to make moves that fall into certain patterns, so it builds a much narrower tree than either UCT or alpha/beta.

Haizhi's bot wasn't as selective as Rat, but it also threw away about 85% of all moves without even evaluating them. It wouldn't look at moves that had four independent steps (i.e. steps that could occur in any order) so a move with no push, pull, two steps with the same piece, or some other dependency in steps could not even be considered. The reasoning was that a massive speedup would be possible without eliminating any really good moves. (Incidentally that would make a good puzzle challenge: create a reasonable-looking position where the only winning move consists of four independent steps, and thus would be ignored by bot_haizhi.)

« Last Edit: Aug 15^th, 2008, 3:05pm by Fritzlein »

IP Logged

tize
Forum Guru

Arimaa player #3121

Gender: male

Posts: 118

Re: UCT
« Reply #8 on: Aug 24^th, 2008, 6:02am »

Quote

Modify

If anyone want to see just how bad a simple monte carlo bot is like I will have bot_carlos up for play sometime now.

It seemed like a to funny and different approach not to try... Wink

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: UCT
« Reply #9 on: Aug 24^th, 2008, 1:03pm »

Quote

Modify

Is bot_carlos straight Monte Carlo (e.g. evaluate all moves at the root by random playout win percentage) or does it build some sort of tee like UCT? Or something else I'm not even thinking of?

IP Logged

tize
Forum Guru

Arimaa player #3121

Gender: male

Posts: 118

Re: UCT
« Reply #10 on: Aug 30^th, 2008, 1:59pm »

Quote

Modify

bot_carlos uses straight monte carlo, no tree building, just a lot of random games and takes the root moves with highest win percentage

IP Logged

Pages: 1

Notify of replies

Send Topic


« Previous topic \| Next topic »