Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> Bot Development >> UCT
        
(Message started by: nbarriga on Aug 13^th, 2008, 9:51pm)

Title: UCT
Post by nbarriga on Aug 13^th, 2008, 9:51pm

Sorry for bringing back an old topic. I read the old posts about UCT, but I would like some kind of update/abstract. I'm interested to know what the status is/was on the bots that used UCT.

Is there currently a bot using UCT or some similar Montecarlo approach?

What where the strengths and weaknesses of them?

What made you drop that approach(if you did...)?

Is there a consensus that alpha/beta plus optimizations is the way to go for winning the Computer Championship in the near future(2-3 years)?

What other options are you considering, I'm talking search here, not eval(I already ruled out for myself the genetic algorithm and tabu search approaches a few years ago)?

Title: Re: UCT
Post by 99of9 on Aug 13^th, 2008, 10:24pm

I have not tried any fancy search since 2004 (when I hacked together my own forward pruning alpha-beta). I am now concentrating on an eval rewrite, but I don't really hold hopes of winning the challenge.

Title: Re: UCT
Post by Fritzlein on Aug 14^th, 2008, 6:16am

on 08/13/08 at 21:51:11, nbarriga wrote:

What where the strengths and weaknesses of them?

JDB's UCT bot would push rabbits insanely, because a rabbit might get through against a random defense. It was far too concerned about advanced rabbits, and not nearly concerned enough about trap control.

We know from Janzert's experiment that a random bot would prefer an extra rabbit to an extra elephant. UCT isn't quite that bad, but some of the random mindset infects the UCT thinking. Arimaa seems different from Go in this respect: random playouts in Go give a noisy evaluation, whereas random playouts in Arimaa give a horrendously bad evaluation.

Title: Re: UCT
Post by nbarriga on Aug 14^th, 2008, 8:22am

on 08/14/08 at 06:16:23, Fritzlein wrote:

We know from Janzert's experiment that a random bot would prefer an extra rabbit to an extra elephant. UCT isn't quite that bad, but some of the random mindset infects the UCT thinking. Arimaa seems different from Go in this respect: random playouts in Go give a noisy evaluation, whereas random playouts in Arimaa give a horrendously bad evaluation.

I would expect this to improve when increasing the number of playouts? With multicore computers increasingly common, I would expect easy parallelization of the random playouts in this algorithm. Or even through MPI in a small cluster.

And what about not having the playouts until the endgame, but have them of fixed length, let's say 12-16 moves, and use an eval function there?

Title: Re: UCT
Post by Janzert on Aug 14^th, 2008, 11:26am

on 08/14/08 at 08:22:43, nbarriga wrote:

Not sure what exactly you're referring to that you expect to improve. If you mean a straight monte-carlo bot, one that does no search but merely playouts from the root moves and chooses the one that looks best, then no it will pretty much always prefer the rabbit to an elephant no matter how many playouts it performs. Monte-carlo go programs play a little more reasonably but do level off their performance at around 5000 playouts. If on the other hand you mean a bot that uses UCT (or more generally MCTS¹) then yes since these would eventually converge to perfect play they will (eventually) get better.

¹ Monte-Carlo Tree Search (MCTS) is a term recently coined on the computer-go list to describe tree search techniques that use randomized playouts at the leaves for evaluation. The reason for the term is because several (most?) of the top go programs no longer use actual UCT.

Quote:

And what about not having the playouts until the endgame, but have them of fixed length, let's say 12-16 moves, and use an eval function there?

Not sure I understand this correctly, but it sounds like something I have thought about trying but have not found the time yet.

The ultimate question of course is whether any of these techniques choose better moves than more traditional alpha-beta based searches. I believe the reason the MCTS techniques work so well in go is because the computational work to score accuracy ratio is favorable for random playouts compared to traditional eval.

Also regarding the actual tree search portion I've been thinking lately that at least at a conceptual level there isn't much difference between UCT (or at least the variants now in use) and a modern alpha-beta based searcher with all the extensions and reductions in use. Recently on the computer-go list the author of the Deep Sjeng (chess) and Leela (go) programs Gian-Carlo Pascutto stated this as well:

Quote:

It is not so obvious to me any more what the differences are between a state of the art UCT searcher (i.e. with nearly no exploration) and a state of the art alpha-beta searcher (i.e. with heavy late move reductions).

They both just hammer out the mainline deeper until the score drops a bit, or an alternative rises strongly.

Janzert

Title: Re: UCT
Post by Fritzlein on Aug 15^th, 2008, 11:25am

on 08/14/08 at 11:26:32, Janzert wrote:

Also regarding the actual tree search portion I've been thinking lately that at least at a conceptual level there isn't much difference between UCT (or at least the variants now in use) and a modern alpha-beta based searcher with all the extensions and reductions in use.

I agree. If you are going to do evaluation with a static function rather than with a complete playout, then why use UCT to build a tree at all? Why not use something deterministic to create depth extensions and depth reductions? Then the depth/breadth tradeoff philosophy that underlies UCT can be applied in a very game-specific way. For example, in Arimaa we know it is a good idea to explore threatened captures, so we deepen the search automatically. If I recall correctly, Fotland once said something like, "The better I understand Arimaa strategy, the more lopsided my search tree is going to become."

The main reason to use UCT as far as I can see is to avoid static evaluation. But random playouts give terrible Arimaa evaluations. Yes, the tree-building slowly trains the players not to play randomly, but the training is far too slow to overcome the bias for forward-flung rabbits.

A reason to not use UCT to build the tree is that the whole explored tree so far has to be saved, whereas alpha-beta searchers can throw away information as they go. However, there may be some convergence here too as alpha-beta searches save more and more nodes with information, while UCT searchers expand nodes selectively and throw away most of the tree because they can't save it all.

It will be interesting if the shape of search trees in alpha-beta searchers gradually converges to the shape of trees created by UCT searchers.

Title: Re: UCT
Post by nbarriga on Aug 15^th, 2008, 11:43am

And to reformulate my last question? Is anybody considering, or has ideas on new search algorithms other than UCT and alpha/beta?

Title: Re: UCT
Post by Fritzlein on Aug 15^th, 2008, 2:44pm

As I understand it, Rat has a totally different philosophy. It doesn't even generate all the moves, much less explore them all. Rat is only allowed to make moves that fall into certain patterns, so it builds a much narrower tree than either UCT or alpha/beta.

Haizhi's bot wasn't as selective as Rat, but it also threw away about 85% of all moves without even evaluating them. It wouldn't look at moves that had four independent steps (i.e. steps that could occur in any order) so a move with no push, pull, two steps with the same piece, or some other dependency in steps could not even be considered. The reasoning was that a massive speedup would be possible without eliminating any really good moves. (Incidentally that would make a good puzzle challenge: create a reasonable-looking position where the only winning move consists of four independent steps, and thus would be ignored by bot_haizhi.)

Title: Re: UCT
Post by tize on Aug 24^th, 2008, 6:02am

If anyone want to see just how bad a simple monte carlo bot is like I will have bot_carlos up for play sometime now.

It seemed like a to funny and different approach not to try... ;)

Title: Re: UCT
Post by Fritzlein on Aug 24^th, 2008, 1:03pm

Is bot_carlos straight Monte Carlo (e.g. evaluate all moves at the root by random playout win percentage) or does it build some sort of tee like UCT? Or something else I'm not even thinking of?

Title: Re: UCT
Post by tize on Aug 30^th, 2008, 1:59pm

bot_carlos uses straight monte carlo, no tree building, just a lot of random games and takes the root moves with highest win percentage