Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Bot Development >> direction of development
(Message started by: ImranG on Sep 6th, 2003, 7:08pm)

Title: direction of development
Post by ImranG on Sep 6th, 2003, 7:08pm
At the moment development seems to be going in the direction of better evaluation functions and deeper game tree searches.

However in the long term this strategy is going to run into problems due to the high branching factor. One path would be using more efficient minimax variants, but again that seems to be only a short term solution.

The obvious alternative would be to have an evaluation function which given a board and the number of moves left can identify which piece should be moved next. However I don't know how you'ld go about designing such a function. The only thing that comes to mind is using a function approximater like a neural network trained via pre-processed deep searches or temporal difference learning. Anyone have any alternative ideas ?

Imran

Title: Re: direction of development
Post by 99of9 on Sep 7th, 2003, 10:02am
If you had a perfect evaluation function, you would only need one ply lookahead, because then you could then "identify which piece should be moved next" by testing all pieces and all moves, and choosing the best one.

So I'd suggest that improving the evaluation functions (and deepening the game tree - because we all know we'll never get a perfect evaluation function) IS a good way to go about things.

Regarding neural net evaluation functions:  worth a try if you're good with them, but they'll need very good representation in order to have a decent chance at being able to learn this game.  I've programmed one or two neural nets in various projects, and they would not be my first choice for Arimaa - but I'd love to see you prove me wrong!

99of9

Title: Re: direction of development
Post by ImranG on Sep 7th, 2003, 3:07pm
A problem with multi-layered neural networks is that they are too slow to do deep searches in most games. However with Arimaa having such a high-branching factor it might mean that even doing a one-ply search could take too long.

Title: Re: direction of development
Post by omar on Sep 11th, 2003, 11:26pm
Right now it seems like most of the people working on bots
are using the conventional AI approach. But it's good that they are because we have to first see how well Arimaa will stand up against this method. Of course it was designed to be a brick wall for such a method with our current level of hardware, but any engineer knows that things don't always work the way they are designed :-) It's gotta be tested. So I think initially we will see the direction of bot development focused towards the conventional AI approach. If Arimaa holds up against this then we may see the direction of development shift more towards other approaches.

Omar

Title: Re: direction of development
Post by Janzert on Sep 11th, 2003, 11:55pm
Just as a data point. I've been playing around with a TD (temporal difference) trained neural net. So far it makes moves just barely better than random.  ;)

Right now it basically uses the TD-Gammon (http://www.research.ibm.com/massive/tdl.html) "formula" of TD0, no exploration, straightforward representation. If it ever actually gets to the point of being challenging for shallowblue I'll put it online for you guys to play with.

Janzert

Title: Re: direction of development
Post by ImranG on Sep 13th, 2003, 6:44pm

on 09/11/03 at 23:55:02, Janzert wrote:
Just as a data point. I've been playing around with a TD (temporal difference) trained neural net. So far it makes moves just barely better than random.  ;)


I think I've read most of the paper on the application of TD(lambda) to board game playing (about 40 of them), and there seems to be a problem in this area that these researchers seeming to be all starting from scratch rather than building upon previous works.

I've seen a dozen or so improvements to the technique used by Tesauro, but with every researcher using a different game/scoring comparision it's nearly impossible to see what real progress has been made. Most of the researchers start from Tesauro, rather than trying any of the improved versions. (To my knowledge there have been at least 7 independent reimplementations of TD-Gammon)

There are also some holes in the development of the technique, some areas such as how to raw encode the board and how to decide on the size of rewards have been virtually ignored.

Out of interest, how are you implementing the system, are you using raw encoding or are you using hand made features ?

Imran

Title: Re: direction of development
Post by Janzert on Sep 15th, 2003, 9:45am

Completely raw encoded. 12 input nodes for each board position, 1 for each possible piece. 2 nodes representing which side is to move and one for a repeated postion. It's only shown end state positions so there is no need to represent push/pull states.

There are some obvious trivial improvements that could be done. Handle repeated positions manually, reverse the board so it always seems to be white's move, add an "isfrozen" node to each position.  But I wanted to see what would happen with as direct an implementation as possible first.

Another obvious change, that is probably necessary to get good results, is to add exploration into the training.

Janzert

Title: Re: direction of development
Post by fotland on Sep 19th, 2003, 1:48am
Speedy's evaluation function contains about 250 constants, and many values are pretty arbitrary.  Is there any good theory on how to machine-tune these constants?

Title: Re: direction of development
Post by 99of9 on Sep 19th, 2003, 3:31am
I'd go straight for a genetic algorithm, but you'd need to run many many games between differently genomed speedys to get good improvement.

Title: Re: direction of development
Post by ImranG on Sep 21st, 2003, 6:41pm

on 09/19/03 at 01:48:59, fotland wrote:
Speedy's evaluation function contains about 250 constants, and many values are pretty arbitrary.  Is there any good theory on how to machine-tune these constants?


It depends what the constants are, most of the work in this area has been done where the evaluation function has been a weighted sum of features (so e(board)= a*f1+b*f2+c*f3, where the f's are features and a,b,c.. are weights).

In these cases several techniques have been suggested, including genetic techniques, linear regression, and TD() learning. Im not really familar with genetic techniques, but linear regression has been used sucessfully in Othello by Michael Buro and TD() in Chess, Backgammon and Lines of Action.

If you could explain what kind of thing your constant represent it might be easier to say what kind of approach would best suit speedy.

But whichever method you choose you'll need a large number of games (you could use self-play to generate games but they may results in strange habits).  This could be supplied by a recorded games (is there an archive of games played on the site), or alternatively by playing online, however with this methods you might not get enough games (Baxter, et al. tuned their chess program KnightCap online and for their best result needed about 300 games to raise from a rating of 1650 to 2100)


Imran

Title: Re: direction of development
Post by fotland on Sep 22nd, 2003, 1:20am
Thanks.  Speedy's constants are mostly weights for features, although there are a few nonlinear terms.  There are a lot of symetries, so the number of interesting features is much smaller, about 100 I think.

I'm most interested in TD, but I think the structure of this game might make it difficult to apply.  For example, material advantage usually leads to a win, but taking pieces while the opponent advances a rabbit is not good.  I've read papers on TD, but never implemented it.  Can you point me at a good description?

I'm afraid that genetic algorithms would take way too many games to converge.

Thanks,

David

Title: Re: direction of development
Post by ImranG on Sep 22nd, 2003, 5:52am

on 09/22/03 at 01:20:40, fotland wrote:
I'm most interested in TD, but I think the structure of this game might make it difficult to apply.  For example, material advantage usually leads to a win, but taking pieces while the opponent advances a rabbit is not good.


I don't think that will be a problem, as long as during the training it sees games where this strategy is employed so it can recognize the importance of an advancing rabbit.


Quote:
I've read papers on TD, but never implemented it.  Can you point me at a good description?


The standard paper on the application of TD() to games is,

Tesauro, "Practical Issues in Temporal Difference Learning" in Machine Learning 8, pp.257-277 (1992)

You'll probably also want to look at,

Baxter, Jonathan. Andrew Tridgell & Lex Weaver. Combining Temporal Difference Learning with Game-Tree Search.

As it concentrates on chess so is probably more representitive of what you want to do.

You'll need to decide on whether you just want a linear combination of the features (in which case it shouldn't be to hard to implement) or a non-linear combination. A non-linear combinations such as a multilayer neural network is better (that is it can represent any continuous function of the features so it can more easily deal with issues such as advancing rabbit versus material value), but has the disadvantage that it's hard to implement and takes more games to learn.

If you want to use a multilayer neural network have  a look at the paper,

Sutton, R.S. (1989) "Implementation details of the TD(lambda) procedure for the case of vector predictions and backpropagation," GTE Laboratories Technical Note TN87-509.1, May 1987, corrected August 1989.

Richard Sutton also has a pseudo-code implentation of the technique available from his website.

All of the papers are on the web so you should be able to find them via google.

Imran

Title: Re: direction of development
Post by 99of9 on Sep 22nd, 2003, 1:07pm

on 09/22/03 at 01:20:40, fotland wrote:
I'm afraid that genetic algorithms would take way too many games to converge.


probably true if you started from scratch, but there are ways to seed it with the approximate values of the constants.

an alternative approach is simulated annealing, where you start with the constants you currently have in it, and explore that region of phase space.  SA is good at getting out of local minima too.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.