Author |
Topic: UCT (Read 1586 times) |
|
nbarriga
Forum Guru
Almost retired Bot Developer
Gender:
Posts: 119
|
Sorry for bringing back an old topic. I read the old posts about UCT, but I would like some kind of update/abstract. I'm interested to know what the status is/was on the bots that used UCT. Is there currently a bot using UCT or some similar Montecarlo approach? What where the strengths and weaknesses of them? What made you drop that approach(if you did...)? Is there a consensus that alpha/beta plus optimizations is the way to go for winning the Computer Championship in the near future(2-3 years)? What other options are you considering, I'm talking search here, not eval(I already ruled out for myself the genetic algorithm and tabu search approaches a few years ago)?
|
|
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: UCT
« Reply #1 on: Aug 13th, 2008, 10:24pm » |
Quote Modify
|
I have not tried any fancy search since 2004 (when I hacked together my own forward pruning alpha-beta). I am now concentrating on an eval rewrite, but I don't really hold hopes of winning the challenge.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: UCT
« Reply #2 on: Aug 14th, 2008, 6:16am » |
Quote Modify
|
on Aug 13th, 2008, 9:51pm, nbarriga wrote:What where the strengths and weaknesses of them? |
| JDB's UCT bot would push rabbits insanely, because a rabbit might get through against a random defense. It was far too concerned about advanced rabbits, and not nearly concerned enough about trap control. We know from Janzert's experiment that a random bot would prefer an extra rabbit to an extra elephant. UCT isn't quite that bad, but some of the random mindset infects the UCT thinking. Arimaa seems different from Go in this respect: random playouts in Go give a noisy evaluation, whereas random playouts in Arimaa give a horrendously bad evaluation.
|
|
IP Logged |
|
|
|
nbarriga
Forum Guru
Almost retired Bot Developer
Gender:
Posts: 119
|
|
Re: UCT
« Reply #3 on: Aug 14th, 2008, 8:22am » |
Quote Modify
|
on Aug 14th, 2008, 6:16am, Fritzlein wrote: We know from Janzert's experiment that a random bot would prefer an extra rabbit to an extra elephant. UCT isn't quite that bad, but some of the random mindset infects the UCT thinking. Arimaa seems different from Go in this respect: random playouts in Go give a noisy evaluation, whereas random playouts in Arimaa give a horrendously bad evaluation. |
| I would expect this to improve when increasing the number of playouts? With multicore computers increasingly common, I would expect easy parallelization of the random playouts in this algorithm. Or even through MPI in a small cluster. And what about not having the playouts until the endgame, but have them of fixed length, let's say 12-16 moves, and use an eval function there?
|
|
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: UCT
« Reply #4 on: Aug 14th, 2008, 11:26am » |
Quote Modify
|
on Aug 14th, 2008, 8:22am, nbarriga wrote: I would expect this to improve when increasing the number of playouts? With multicore computers increasingly common, I would expect easy parallelization of the random playouts in this algorithm. Or even through MPI in a small cluster. |
| Not sure what exactly you're referring to that you expect to improve. If you mean a straight monte-carlo bot, one that does no search but merely playouts from the root moves and chooses the one that looks best, then no it will pretty much always prefer the rabbit to an elephant no matter how many playouts it performs. Monte-carlo go programs play a little more reasonably but do level off their performance at around 5000 playouts. If on the other hand you mean a bot that uses UCT (or more generally MCTS1) then yes since these would eventually converge to perfect play they will (eventually) get better. 1 Monte-Carlo Tree Search (MCTS) is a term recently coined on the computer-go list to describe tree search techniques that use randomized playouts at the leaves for evaluation. The reason for the term is because several (most?) of the top go programs no longer use actual UCT. Quote:And what about not having the playouts until the endgame, but have them of fixed length, let's say 12-16 moves, and use an eval function there? |
| Not sure I understand this correctly, but it sounds like something I have thought about trying but have not found the time yet. The ultimate question of course is whether any of these techniques choose better moves than more traditional alpha-beta based searches. I believe the reason the MCTS techniques work so well in go is because the computational work to score accuracy ratio is favorable for random playouts compared to traditional eval. Also regarding the actual tree search portion I've been thinking lately that at least at a conceptual level there isn't much difference between UCT (or at least the variants now in use) and a modern alpha-beta based searcher with all the extensions and reductions in use. Recently on the computer-go list the author of the Deep Sjeng (chess) and Leela (go) programs Gian-Carlo Pascutto stated this as well: Quote:It is not so obvious to me any more what the differences are between a state of the art UCT searcher (i.e. with nearly no exploration) and a state of the art alpha-beta searcher (i.e. with heavy late move reductions). They both just hammer out the mainline deeper until the score drops a bit, or an alternative rises strongly. |
| Janzert
|
« Last Edit: Aug 14th, 2008, 11:28am by Janzert » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: UCT
« Reply #5 on: Aug 15th, 2008, 11:25am » |
Quote Modify
|
on Aug 14th, 2008, 11:26am, Janzert wrote:Also regarding the actual tree search portion I've been thinking lately that at least at a conceptual level there isn't much difference between UCT (or at least the variants now in use) and a modern alpha-beta based searcher with all the extensions and reductions in use. |
| I agree. If you are going to do evaluation with a static function rather than with a complete playout, then why use UCT to build a tree at all? Why not use something deterministic to create depth extensions and depth reductions? Then the depth/breadth tradeoff philosophy that underlies UCT can be applied in a very game-specific way. For example, in Arimaa we know it is a good idea to explore threatened captures, so we deepen the search automatically. If I recall correctly, Fotland once said something like, "The better I understand Arimaa strategy, the more lopsided my search tree is going to become." The main reason to use UCT as far as I can see is to avoid static evaluation. But random playouts give terrible Arimaa evaluations. Yes, the tree-building slowly trains the players not to play randomly, but the training is far too slow to overcome the bias for forward-flung rabbits. A reason to not use UCT to build the tree is that the whole explored tree so far has to be saved, whereas alpha-beta searchers can throw away information as they go. However, there may be some convergence here too as alpha-beta searches save more and more nodes with information, while UCT searchers expand nodes selectively and throw away most of the tree because they can't save it all. It will be interesting if the shape of search trees in alpha-beta searchers gradually converges to the shape of trees created by UCT searchers.
|
« Last Edit: Aug 15th, 2008, 2:40pm by Fritzlein » |
IP Logged |
|
|
|
nbarriga
Forum Guru
Almost retired Bot Developer
Gender:
Posts: 119
|
|
Re: UCT
« Reply #6 on: Aug 15th, 2008, 11:43am » |
Quote Modify
|
And to reformulate my last question? Is anybody considering, or has ideas on new search algorithms other than UCT and alpha/beta?
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: UCT
« Reply #7 on: Aug 15th, 2008, 2:44pm » |
Quote Modify
|
As I understand it, Rat has a totally different philosophy. It doesn't even generate all the moves, much less explore them all. Rat is only allowed to make moves that fall into certain patterns, so it builds a much narrower tree than either UCT or alpha/beta. Haizhi's bot wasn't as selective as Rat, but it also threw away about 85% of all moves without even evaluating them. It wouldn't look at moves that had four independent steps (i.e. steps that could occur in any order) so a move with no push, pull, two steps with the same piece, or some other dependency in steps could not even be considered. The reasoning was that a massive speedup would be possible without eliminating any really good moves. (Incidentally that would make a good puzzle challenge: create a reasonable-looking position where the only winning move consists of four independent steps, and thus would be ignored by bot_haizhi.)
|
« Last Edit: Aug 15th, 2008, 3:05pm by Fritzlein » |
IP Logged |
|
|
|
tize
Forum Guru
Arimaa player #3121
Gender:
Posts: 118
|
|
Re: UCT
« Reply #8 on: Aug 24th, 2008, 6:02am » |
Quote Modify
|
If anyone want to see just how bad a simple monte carlo bot is like I will have bot_carlos up for play sometime now. It seemed like a to funny and different approach not to try...
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: UCT
« Reply #9 on: Aug 24th, 2008, 1:03pm » |
Quote Modify
|
Is bot_carlos straight Monte Carlo (e.g. evaluate all moves at the root by random playout win percentage) or does it build some sort of tee like UCT? Or something else I'm not even thinking of?
|
|
IP Logged |
|
|
|
tize
Forum Guru
Arimaa player #3121
Gender:
Posts: 118
|
|
Re: UCT
« Reply #10 on: Aug 30th, 2008, 1:59pm » |
Quote Modify
|
bot_carlos uses straight monte carlo, no tree building, just a lot of random games and takes the root moves with highest win percentage
|
|
IP Logged |
|
|
|
|