Arimaa Forum - direction of development

Welcome, Guest. Please Login or Register.
Jul 12^th, 2025, 5:00am

Home

Help

Members

Arimaa Forum « direction of development »

   Arimaa Forum
   Arimaa
   Bot Development (Moderator: supersamu)
   direction of development

« Previous topic | Next topic »

Pages: 1

Notify of replies

Send Topic

Author

Topic: direction of development (Read 2561 times)

ImranG
Forum Newbie

Arimaa player #345

Gender: male

Posts: 5

direction of development
« on: Sep 6^th, 2003, 7:08pm »

Quote

Modify

At the moment development seems to be going in the direction of better evaluation functions and deeper game tree searches.

However in the long term this strategy is going to run into problems due to the high branching factor. One path would be using more efficient minimax variants, but again that seems to be only a short term solution.

The obvious alternative would be to have an evaluation function which given a board and the number of moves left can identify which piece should be moved next. However I don't know how you'ld go about designing such a function. The only thing that comes to mind is using a function approximater like a neural network trained via pre-processed deep searches or temporal difference learning. Anyone have any alternative ideas ?

Imran

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: direction of development
« Reply #1 on: Sep 7^th, 2003, 10:02am »

Quote

Modify

If you had a perfect evaluation function, you would only need one ply lookahead, because then you could then "identify which piece should be moved next" by testing all pieces and all moves, and choosing the best one.

So I'd suggest that improving the evaluation functions (and deepening the game tree - because we all know we'll never get a perfect evaluation function) IS a good way to go about things.

Regarding neural net evaluation functions: worth a try if you're good with them, but they'll need very good representation in order to have a decent chance at being able to learn this game. I've programmed one or two neural nets in various projects, and they would not be my first choice for Arimaa - but I'd love to see you prove me wrong!

99of9

IP Logged

ImranG
Forum Newbie

Arimaa player #345

Gender: male

Posts: 5

Re: direction of development
« Reply #2 on: Sep 7^th, 2003, 3:07pm »

Quote

Modify

A problem with multi-layered neural networks is that they are too slow to do deep searches in most games. However with Arimaa having such a high-branching factor it might mean that even doing a one-ply search could take too long.

IP Logged

omar
Forum Guru

Arimaa player #2

Gender: male

Posts: 1003

Re: direction of development
« Reply #3 on: Sep 11^th, 2003, 11:26pm »

Quote

Modify

Right now it seems like most of the people working on bots
are using the conventional AI approach. But it's good that they are because we have to first see how well Arimaa will stand up against this method. Of course it was designed to be a brick wall for such a method with our current level of hardware, but any engineer knows that things don't always work the way they are designed

It's gotta be tested. So I think initially we will see the direction of bot development focused towards the conventional AI approach. If Arimaa holds up against this then we may see the direction of development shift more towards other approaches.

Omar

IP Logged

Janzert
Forum Guru

Arimaa player #247

Gender: male

Posts: 1016

Re: direction of development
« Reply #4 on: Sep 11^th, 2003, 11:55pm »

Quote

Modify

Just as a data point. I've been playing around with a TD (temporal difference) trained neural net. So far it makes moves just barely better than random. Wink

Right now it basically uses the TD-Gammon "formula" of TD0, no exploration, straightforward representation. If it ever actually gets to the point of being challenging for shallowblue I'll put it online for you guys to play with.

Janzert

IP Logged

ImranG
Forum Newbie

Arimaa player #345

Gender: male

Posts: 5

Re: direction of development
« Reply #5 on: Sep 13^th, 2003, 6:44pm »

Quote

Modify

on Sep 11^th, 2003, 11:55pm, Janzert wrote:

Just as a data point. I've been playing around with a TD (temporal difference) trained neural net. So far it makes moves just barely better than random. Wink

I think I've read most of the paper on the application of TD(lambda) to board game playing (about 40 of them), and there seems to be a problem in this area that these researchers seeming to be all starting from scratch rather than building upon previous works.

I've seen a dozen or so improvements to the technique used by Tesauro, but with every researcher using a different game/scoring comparision it's nearly impossible to see what real progress has been made. Most of the researchers start from Tesauro, rather than trying any of the improved versions. (To my knowledge there have been at least 7 independent reimplementations of TD-Gammon)

There are also some holes in the development of the technique, some areas such as how to raw encode the board and how to decide on the size of rewards have been virtually ignored.

Out of interest, how are you implementing the system, are you using raw encoding or are you using hand made features ?

Imran

« Last Edit: Sep 13^th, 2003, 6:45pm by ImranG »

IP Logged

Janzert
Forum Guru

Arimaa player #247

Gender: male

Posts: 1016

Re: direction of development
« Reply #6 on: Sep 15^th, 2003, 9:45am »

Quote

Modify

Completely raw encoded. 12 input nodes for each board position, 1 for each possible piece. 2 nodes representing which side is to move and one for a repeated postion. It's only shown end state positions so there is no need to represent push/pull states.

There are some obvious trivial improvements that could be done. Handle repeated positions manually, reverse the board so it always seems to be white's move, add an "isfrozen" node to each position. But I wanted to see what would happen with as direct an implementation as possible first.

Another obvious change, that is probably necessary to get good results, is to add exploration into the training.

Janzert

IP Logged

fotland
Forum Guru

Arimaa player #211

Gender: male

Posts: 216

Re: direction of development
« Reply #7 on: Sep 19^th, 2003, 1:48am »

Quote

Modify

Speedy's evaluation function contains about 250 constants, and many values are pretty arbitrary. Is there any good theory on how to machine-tune these constants?

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: direction of development
« Reply #8 on: Sep 19^th, 2003, 3:31am »

Quote

Modify

I'd go straight for a genetic algorithm, but you'd need to run many many games between differently genomed speedys to get good improvement.

IP Logged

ImranG
Forum Newbie

Arimaa player #345

Gender: male

Posts: 5

Re: direction of development
« Reply #9 on: Sep 21^st, 2003, 6:41pm »

Quote

Modify

on Sep 19^th, 2003, 1:48am, fotland wrote:

Speedy's evaluation function contains about 250 constants, and many values are pretty arbitrary. Is there any good theory on how to machine-tune these constants?

It depends what the constants are, most of the work in this area has been done where the evaluation function has been a weighted sum of features (so e(board)= a*f1+b*f2+c*f3, where the f's are features and a,b,c.. are weights).

In these cases several techniques have been suggested, including genetic techniques, linear regression, and TD() learning. Im not really familar with genetic techniques, but linear regression has been used sucessfully in Othello by Michael Buro and TD() in Chess, Backgammon and Lines of Action.

If you could explain what kind of thing your constant represent it might be easier to say what kind of approach would best suit speedy.

But whichever method you choose you'll need a large number of games (you could use self-play to generate games but they may results in strange habits). This could be supplied by a recorded games (is there an archive of games played on the site), or alternatively by playing online, however with this methods you might not get enough games (Baxter, et al. tuned their chess program KnightCap online and for their best result needed about 300 games to raise from a rating of 1650 to 2100)

Imran

« Last Edit: Sep 21^st, 2003, 6:46pm by ImranG »

IP Logged

fotland
Forum Guru

Arimaa player #211

Gender: male

Posts: 216

Re: direction of development
« Reply #10 on: Sep 22^nd, 2003, 1:20am »

Quote

Modify

Thanks. Speedy's constants are mostly weights for features, although there are a few nonlinear terms. There are a lot of symetries, so the number of interesting features is much smaller, about 100 I think.

I'm most interested in TD, but I think the structure of this game might make it difficult to apply. For example, material advantage usually leads to a win, but taking pieces while the opponent advances a rabbit is not good. I've read papers on TD, but never implemented it. Can you point me at a good description?

I'm afraid that genetic algorithms would take way too many games to converge.

Thanks,

David

IP Logged

ImranG
Forum Newbie

Arimaa player #345

Gender: male

Posts: 5

Re: direction of development
« Reply #11 on: Sep 22^nd, 2003, 5:52am »

Quote

Modify

on Sep 22^nd, 2003, 1:20am, fotland wrote:

I'm most interested in TD, but I think the structure of this game might make it difficult to apply. For example, material advantage usually leads to a win, but taking pieces while the opponent advances a rabbit is not good.

I don't think that will be a problem, as long as during the training it sees games where this strategy is employed so it can recognize the importance of an advancing rabbit.

Quote:

I've read papers on TD, but never implemented it. Can you point me at a good description?

The standard paper on the application of TD() to games is,

Tesauro, "Practical Issues in Temporal Difference Learning" in Machine Learning 8, pp.257-277 (1992)

You'll probably also want to look at,

Baxter, Jonathan. Andrew Tridgell & Lex Weaver. Combining Temporal Difference Learning with Game-Tree Search.

As it concentrates on chess so is probably more representitive of what you want to do.

You'll need to decide on whether you just want a linear combination of the features (in which case it shouldn't be to hard to implement) or a non-linear combination. A non-linear combinations such as a multilayer neural network is better (that is it can represent any continuous function of the features so it can more easily deal with issues such as advancing rabbit versus material value), but has the disadvantage that it's hard to implement and takes more games to learn.

If you want to use a multilayer neural network have a look at the paper,

Sutton, R.S. (1989) "Implementation details of the TD(lambda) procedure for the case of vector predictions and backpropagation," GTE Laboratories Technical Note TN87-509.1, May 1987, corrected August 1989.

Richard Sutton also has a pseudo-code implentation of the technique available from his website.

All of the papers are on the web so you should be able to find them via google.

Imran

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: direction of development
« Reply #12 on: Sep 22^nd, 2003, 1:07pm »

Quote

Modify

on Sep 22^nd, 2003, 1:20am, fotland wrote:

I'm afraid that genetic algorithms would take way too many games to converge.

probably true if you started from scratch, but there are ways to seed it with the approximate values of the constants.

an alternative approach is simulated annealing, where you start with the constants you currently have in it, and explore that region of phase space. SA is good at getting out of local minima too.

IP Logged

Pages: 1

Notify of replies

Send Topic


« Previous topic \| Next topic »