Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Bot Development >> OpFor
(Message started by: Fritzlein on Dec 18th, 2007, 3:00pm)

Title: OpFor
Post by Fritzlein on Dec 18th, 2007, 3:00pm
Janzert, it is great to see bot_OpFor up and playing!  A bot in the gameroom is worth two on the drawing board.  Perhaps you were reluctant to talk much about your bot while it was still hypothetical, but I'd love to hear more about it now that it is real.

I note that OpFor is stronger than Occam out of the gate, at least it appears so from beating ShallowBlue where both were averaging 0.10 minutes per move.  In the 2007 CC, Omar had Occam play just to make the field an even eight, so without going any further OpFor is already better filler material...

I'm quite curious what you are doing the same as or different than other developers before you, if you have time for sharing between bouts of coding.

Title: Re: OpFor
Post by RonWeasley on Dec 18th, 2007, 4:03pm
I hope OpFor can play the Owl Tournament.  That is, only if I can beat it.

Title: Re: OpFor
Post by Janzert on Dec 18th, 2007, 4:37pm
OpFor is using pretty much just basic standard techniques. At the moment this means straight up iterative deepening alpha-beta search, a few simple positional evaluations mostly implementing the easy to do parts of fotland's description of bomb's eval, a very primitive goal search and FAME for material eval. Already written but not yet integrated is also a trap search. But next thing on the todo list is some real time management. Most of the current games have simply been P1 search (the very first one was 6 step search with tournament rule game endings).

This is actually the around the 5th implementation of the core engine for a bot I've finished. The first one I wrote on my own was pretty horrible, the later ones have benefited greatly from my browsing of (mostly in historical order I think) Don Dailey's sample bot, Jeff Bacher's Clueless, Ola Hansson's Fairy and David Fotland's descriptions here of Bomb. Thanks to all for releasing these and I hope to do the same in the future.

My goal is to get OpFor somewhere in the vicinity of Clueless' or Bomb's strength and then, as the name suggests, use it as the opponent locally for bots based on more experimental techniques.

Janzert

Title: Re: OpFor
Post by Fritzlein on Dec 19th, 2007, 12:23pm
This is a reply to a game comment, but it seemed like more general discussion that should go in the bot developer thread, not in game comments that quickly scroll off.  How much discussion of the development of Bomb is effectively lost since it occurred in game comments that nobody can find any more?


Quote:
Right, after making the initial comment last night I realized this. So instead of changing goal threat evaluation any more I started adding a term for pieces on unsafe traps. But then came to the realization that the quiescent search should take care of these problems once added.

So I left things alone for now and started back in on time management where I should have been in the first place. :P

Yes, quiescence search until no one-move captures remain is surely more important than having the evaluation penalize pieces on trap squares.  In fact, 99of9 convinced me that sometimes pieces on traps squares make the trap safer!  I think Gnobot is the only bot that uses this strategy, though.

I would argue that quiescence search is more important than time management as well.  I have a hunch that the bot that eventually wins the Challenge will not do so by excellent evaluation, but rather by excellent selection of which lines need to be examined more deeply than others.  Capturing moves are an obvious place to start selective extensions, and one that can be done efficiently (with decision trees, as Fotland showed us).

Imagine, though, if bots could quickly choose 30 "moves that need to be looked at" from _every_ position, whether captures are available or not.  The branching factor problem disappears, and poof, computers dominate Arimaa.


Title: Re: OpFor
Post by Janzert on Dec 19th, 2007, 2:15pm
I'm probably mis-communicating my meaning of time management. ;) I don't mean anything fancy, I mean anything at all. Currently the bot will only search to a fixed depth (or infinite depth) and even that is a hardcoded number that requires a recompile to change. So to even enter the championship I'm pretty sure I have to get something done. No it probably won't be very fancy, elegant or good either.

Btw, when I said nascent I really do mean it is just barely developed and functioning. The first alpha-beta search (without even a transposition table) was only finished at the beginning of November.

Janzert

Title: Re: OpFor
Post by Janzert on Dec 20th, 2007, 7:11pm
Emergency stop right now and send a move before timing out implemented. :P

And yay it let's it just barely beat bombP1 when searching 10 steps or run out of time deep. ;)

Janzert

Title: Re: OpFor
Post by Janzert on Dec 21st, 2007, 1:37pm
Here is a slightly condensed commit log for my current iteration of bots.

Janzert

------------------------------------------------------------
revno: 1
timestamp: Fri 2007-08-31 16:23:29 -0400
message:
 Initial import
------------------------------------------------------------
revno: 10
timestamp: Wed 2007-09-19 12:51:09 -0400
message:
 add copy method to steps
 rename random move bot
 create random step bot
------------------------------------------------------------
revno: 15
timestamp: Thu 2007-09-27 13:48:58 -0400
message:
 Arimaa score p1 bot
------------------------------------------------------------
revno: 19
timestamp: Thu 2007-10-04 20:40:06 -0400
message:
 Added FAME scoring
 Various fixes. UCT still plays much better as gold.
------------------------------------------------------------
revno: 24
timestamp: Sat 2007-11-03 02:48:36 -0400
message:
 Initial alpha beta bot work
------------------------------------------------------------
revno: 25
timestamp: Sun 2007-11-04 23:31:24 -0500
message:
 Minimally working alpha beta bot.
 Fix some memory leaks.
------------------------------------------------------------
revno: 26
timestamp: Tue 2007-11-06 11:59:07 -0500
message:
 Add transposition table to alpha beta
 Create caching FAME eval fastFAME
------------------------------------------------------------
revno: 27
timestamp: Wed 2007-11-07 11:46:44 -0500
message:
 Add History heuristic.
 Do some top level move ordering.
 Make the search run incrementally.
------------------------------------------------------------
revno: 28
timestamp: Fri 2007-11-09 15:27:33 -0500
message:
 Some positional evaluation.
 More various stats recorded.
------------------------------------------------------------
revno: 29
timestamp: Mon 2007-11-12 17:20:10 -0500
message:
 More positional evaluation work.
 Infinite analysis support in alpha-beta bot.
------------------------------------------------------------
revno: 30
timestamp: Fri 2007-11-16 21:25:19 -0500
message:
 Beginnings of a goal search
------------------------------------------------------------
revno: 31
timestamp: Mon 2007-11-19 00:26:48 -0500
message:
 Add goal search to alpha beta bot
------------------------------------------------------------
revno: 32
timestamp: Fri 2007-11-23 12:02:27 -0500
message:
 Correct bug allowing pulls after a push step
------------------------------------------------------------
revno: 34
timestamp: Wed 2007-11-28 00:17:03 -0500
message:
 Initial capture move generator
------------------------------------------------------------
revno: 35
timestamp: Fri 2007-11-30 13:23:15 -0500
message:
 Split alphabeta search class out from bot_ab
------------------------------------------------------------
revno: 37
timestamp: Thu 2007-12-13 10:09:48 -0500
message:
 Encapsulate transition table handling.
 Change transition table replacement to greater depth only.
 Small bug fixes and cleanups.
------------------------------------------------------------
revno: 38
timestamp: Tue 2007-12-18 01:17:27 -0500
message:
 Rename bot_ab to bot_opfor
------------------------------------------------------------
revno: 39
timestamp: Tue 2007-12-18 17:50:06 -0500
message:
 Add support for casual play game end rules (no rabbit capture win)
 Make goal search evaluation better handle multi move threats
------------------------------------------------------------
revno: 40
timestamp: Wed 2007-12-19 15:29:22 -0500
message:
 aeibot.d:
 Add UnknownCommand exception to aeibot interface
 bot_opfor.d:
 Clean up the aei score reporting
 Adjust goal search weighting
 Reduce trap safety weight
------------------------------------------------------------
revno: 41
timestamp: Wed 2007-12-19 18:00:05 -0500
message:
 Handle go ponder and goal by ignoring them
------------------------------------------------------------
revno: 42
timestamp: Thu 2007-12-20 00:29:09 -0500
message:
 Add aei setoption command
 Get rid of several aei go command modifiers
 Add ability to set search depth through 'setoption depth'
------------------------------------------------------------
revno: 43
timestamp: Thu 2007-12-20 00:45:18 -0500
message:
 Fix bot_score to handle setoption
------------------------------------------------------------
revno: 44
timestamp: Fri 2007-12-21 00:15:43 -0500
message:
 Add aei stop command and bot_opfor handling of it
------------------------------------------------------------
revno: 45
timestamp: Fri 2007-12-21 04:29:37 -0500
message:
 Add timecontrol information to bot_opfor. Now just need to do something with it.
------------------------------------------------------------
revno: 46
timestamp: Fri 2007-12-21 14:08:31 -0500
message:
 bot_opfor: Switch dog and horse placement in opening setup

Title: Re: OpFor
Post by Fritzlein on Dec 21st, 2007, 2:26pm
Thanks for the interesting detail.

You had mentioned writing a learning bot long ago that bombed, but I didn't know you had also written a UCT bot.  Was your experience the same as JBD's, namely that it flung rabbits forward to their doom because a random defense might let them through?

I see you implemented FAME early on, despite its known shortcomings.  That reminds me to take another pass at improving FAME some day.

I note the dog/horse setup placement comment.  That makes me suspect that there would be utility in completely randomizing the setup while testing.  You are surely still at a stage of finding out in general terms about OpFor's abilities, which is hindered by always giving it the same puzzle to solve.  Different bugs might manifest under different stresses.  Giving OpFor an ideal setup seems like an improvement to maximize winning chances (e.g. for the tournament), rather than a good way to test/learn.

Of course, this comment reflects my own journey in gradually becoming willing to experiment in order to learn even if it hurts my winning chances.  My recent drubbings at chessandgo's hands have taught me more than I learned in twice as many "safe" victories in the past.

Title: Re: OpFor
Post by Janzert on Dec 22nd, 2007, 1:31pm

on 12/21/07 at 14:26:36, Fritzlein wrote:
Thanks for the interesting detail.

You had mentioned writing a learning bot long ago that bombed, but I didn't know you had also written a UCT bot.  Was your experience the same as JBD's, namely that it flung rabbits forward to their doom because a random defense might let them through?

I didn't actually get very far with the UCT bot. What little I saw did seem to like advanced rabbits though yes. After playing with it for a little bit I decided I really needed to write a regular alpha-beta bot for an opponent and hence the development of bot_opfor. UCT is certainly something I want to go back and experiment with some more.


Quote:
I see you implemented FAME early on, despite its known shortcomings.  That reminds me to take another pass at improving FAME some day.

Well it's still better than any static piece evaluation I'm like to put together and since I'd already implemented it twice (javascript and python) a third time was pretty easy.


Quote:
I note the dog/horse setup placement comment.  That makes me suspect that there would be utility in completely randomizing the setup while testing.  You are surely still at a stage of finding out in general terms about OpFor's abilities, which is hindered by always giving it the same puzzle to solve.  Different bugs might manifest under different stresses.  Giving OpFor an ideal setup seems like an improvement to maximize winning chances (e.g. for the tournament), rather than a good way to test/learn.

Random or at least semi-random setups would probably be a good idea for testing. Right now the setup is simply a string with the setup predefined and making sure that the silver elephant doesn't end up straight across from the gold one. The impetus to flip-flop the horse and dog was just having watched a game where the horses ended up not being moved at all till into the endgame while the dogs almost immediately got moved up to be next to the trap then used further from there. So it seemed worthwhile to change 10 characters to encourage the use of the horses in the games. :)


Quote:
Of course, this comment reflects my own journey in gradually becoming willing to experiment in order to learn even if it hurts my winning chances.  My recent drubbings at chessandgo's hands have taught me more than I learned in twice as many "safe" victories in the past.

Yes, hopefully I don't develop a 'fear' of letting the bot lose games. That would certainly hurt development.

One thing I'm trying to be a bit careful of is not getting the bot to 'do the right thing' for the wrong underlying reason.

Thanks for the comments,
Janzert

Title: Re: OpFor
Post by Fritzlein on Dec 22nd, 2007, 2:09pm

on 12/22/07 at 13:31:35, Janzert wrote:
One thing I'm trying to be a bit careful of is not getting the bot to 'do the right thing' for the wrong underlying reason.

For example, you are being careful not to get OpFor to use its horses rather than its dogs (right thing) merely by having the horses conveniently located in the opening setup (wrong reason)?  ;-)


Quote:
Thanks for the comments

You are welcome.  I enjoy speaking at length on topics about which I know little.  Thank you for sharing your project progress so I have the opportunity.

Title: Re: OpFor
Post by Janzert on Dec 22nd, 2007, 2:34pm
Ok, guilty. :) My excuse is that at the moment I wasn't considering the setup to be part of the bot's 'reasoning' since it's just a static setup.

Janzert

Title: Re: OpFor
Post by Janzert on Dec 24th, 2007, 3:07pm
Basic time management that should work fairly decent with any time control is finished. Although I'll need to do something more for postal games where it shouldn't actually run the whole time. It also may not do particularly well with time controls that have a low percent of unused time applied to the reserve, but I don't think those have been used much anyway.

Janzert

Title: Re: OpFor
Post by The_Jeh on Dec 26th, 2007, 12:31am
How did you come up with the name "OpFor?" It is a programming term, I suppose?

That makes me wonder at the names of other bots. Here are my guesses, but please correct me if I'm wrong:

Bomb - perhaps a martial symbol of power or a pop-culture verb meaning "to fail abjectly," but more likely a programming term referring either to a system crash or a piece of code that is dormant until triggered (which would be appropriate given bot_Bomb's features)
Clueless - a humorous self-description
GnoBot - uses root of the word gnosis, meaning "special knowledge"
Aamira, Arimaalon, Arimaazon, Arimaazilla, Arimaanator - plays on Arimaa
ShallowBlue - parody of the chess computer Deep Blue
Loc - abbreviation for the computer term "lines of code"
Occam - name of a programming language

Title: Re: OpFor
Post by Janzert on Dec 26th, 2007, 1:02am

on 12/26/07 at 00:31:00, The_Jeh wrote:
How did you come up with the name "OpFor?" It is a programming term, I suppose?


Nah it's military in origin. See Opposing Force (http://en.wikipedia.org/wiki/OPFOR) on wikipedia.


Quote:
Occam - name of a programming language


Don't really know, but always thought this one came from Occam's razor (http://en.wikipedia.org/wiki/Occam%27s_razor)

Janzert

Title: Re: OpFor
Post by Fritzlein on Dec 26th, 2007, 7:07am

on 12/26/07 at 00:31:00, The_Jeh wrote:
Loc - abbreviation for the computer term "lines of code"

I was persuaded Loc derived from from "loco", meaning crazy, is indicated by its playing style.  BlackKnight, however, explained that it derived from "location", and had to do with the way moves were generated by piece location.

Title: Re: OpFor
Post by 99of9 on Dec 29th, 2007, 7:17am

on 12/26/07 at 00:31:00, The_Jeh wrote:
GnoBot - uses root of the word gnosis, meaning "special knowledge"

yes, and is filled out to rhyme with robot.

Title: Re: OpFor
Post by Janzert on Dec 30th, 2007, 4:24am
Basic quiescense search added.

Janzert

Title: Re: OpFor
Post by Fritzlein on Dec 30th, 2007, 10:25am
Does "basic" include all possible capture moves?  I'm curious what is basic about it...

Title: Re: OpFor
Post by Janzert on Dec 30th, 2007, 1:30pm
The current quiescent search is just the 'standard' chess definition and searching all capture moves. Basically the following:


Code:
int Quiesce(int alpha, int beta)
{
 val = Evaluate();
 if (val >= beta)
   return beta;

 if (val > alpha)
   alpha = val;

 GenerateAllCaptures();
 while (CapturesLeft()) {
   MakeNextCapture();

   val = -Quies(-beta, -alpha);

   if (val >= beta)
   . return beta;

   if (val > alpha)
   . alpha = val;
 }
 return alpha;
}


There are several problems with this for Arimaa. One of the largest and most obvious is that it will only play out till one side doesn't have anymore captures. This leads to it still throwing minor pieces away to save a major piece as you commented on in game 67398.

Janzert

Title: Re: OpFor
Post by Fritzlein on Dec 30th, 2007, 8:06pm

on 12/30/07 at 13:30:47, Janzert wrote:
There are several problems with this for Arimaa. One of the largest and most obvious is that it will only play out till one side doesn't have anymore captures. This leads to it still throwing minor pieces away to save a major piece as you commented on in game 67398.

Wait, I clearly don't understand something here.  The problem of throwing away a minor piece to save a major piece is what quiescence search is supposed to prevent, isn't it?

I remember that when I joined the game room, it seemed that any bot (expect Bomb) could be beaten by getting a strong piece hostage.  The bot would then throw away a series of weaker pieces trying to save it, after which you captured the strong piece anyway.  It seemed to me that this horizon effect would be very difficult to resolve, and I was amazed that Bomb seldom succumbed to it while no other bot could avoid it.  I asked Fotland how he did it, and he said "quiescence search".

But now that I think about it, extending only on capture moves doesn't help.  The line where you throw away a dog will terminate at the same depth as the line where you let the horse be captured.  What have you gained?  The horizon effect is still in force; the dog sacrifice pushes the loss of the horse over the horizon.  What you really need is for the dog sacrifice to be searched to a greater depth than the horse loss.  The different depths would mean that you are comparing a horse loss to a dog plus horse loss, so it is clear which move is better.

How is that possible?  Does the quiescence search have to be extended on defensive moves as well as capture moves?

Title: Re: OpFor
Post by lightvector on Dec 30th, 2007, 9:20pm
I will drop out of my lurking for a moment, since this discussion interests me.  ;D

I'm reasonably sure that for Arimaa, you do have to consider protective moves as well, since extending the false protective move to reveal that it leads to a poor result is what you want to do, not just extending the capture. I believe Fotland stated that his capture tree not only considers how pieces can be captured, but how they can be defended as well (but seems to be done statically in the evaluation?). Possibly what you can do in addition to generating captures is to also run the capture tree for the opponent, and if the opponent can capture any piece, then run a defensive move generator for that piece or that trap.

I see several differences in Chess, that make a captures-only (and perhaps check threats as well) search effective at quiescing the position. In chess, one common manifestation of the horizon effect is the the capture of a well-defended piece that could lead to a sequence of consecutive trades that ultimately give a poor result. Captures-only quiescence search solves this. But this typically does not occur in Arimaa.

By contrast, in Arimaa, throwing pieces away to delay an inevitable capture is the most frequent problem of the horizon effect, which is not solved by a capture-only search. This occurs far less frequently in Chess. Usually, a program can simply retreat the piece and lose nothing more than some tempo, and not have to sacrifice anything. And even if it does occur, Chess programs can easily search deep enough, even to the point where the summed values of all the sacrificed pieces exceeds the value of the piece it is trying to protect. In Arimaa, search depth is not nearly great enough to do this.

Moreover, in Chess, there are fewer possible ways to just throw a piece in the way, given the limited ways that pieces can move (and the fact that there is only one step per move), so this means that after a few ply of sacrificing pieces in Chess, likely there will not be any more pieces to throw in the way, so the search will discover the inevitable loss.

By contrast, in Arimaa, essentially any piece in a 4 step radius is close enough to provide false protection, which is a LOT more pieces on average.

All of this makes the quiescence problem a bit more tricky in Arimaa. :)

As an aside, I happen to be experimenting with my own bot. I have no idea how all you guys have implemented these capture move generators. The sheer number of cases is confounding, especially with all the special conditions that must be checked: off board, frozen, piece must move through another safe/unsafe trap, rabbits can't move backwards to unfreeze others, etc.  Aaargh. ???

Title: Re: OpFor
Post by Janzert on Dec 30th, 2007, 11:12pm
First, if you were to be picky about it, I'm not sure what bomb does is technically quiescense search. It fills the same role but does it in a static manner that doesn't involve any search being done. I do seem to recall Fotland calling it quiescense search before. But in his series of posts talking about the details of bomb the post on search doesn't mention it and the evaluation components post (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=devTalk;action=display;num=1171240256) calls it static capture evaluation. At this point I'm not sure one way or the other whether this static evaluation or the traditional search method is going to work out better for Arimaa. I decided to do some experimentation with searching first, but it may very well turn out to be the wrong way to go.

One other point about Bomb's method. You'll notice that not only does he generate all the captures but also how many steps to save from a capture. I have a feeling that's probably a key thing for it to work so well for Bomb. But I'm also not sure how to go about figuring out the steps to stop a capture.

Since I'm not sure how to get the steps to specifically prevent a capture I think what I'll probably try is doing a full width search whenever the opponent can make a capture or goal in the next move. But this may very well cause the search tree to explode.

Lightvector, my capture move generator is basically a 1200 line big ball of spaghetti code written over a few days that I could basically keep the details paged into my head. So yeah, Argh is pretty much the right description. :P I really should go back through and clean it up, but it seems to be working at the moment and I'd rather not go back and poke any sticks into it at the moment. ;) I do have 18 board files with I think all the basic patterns of 4 step or less captures, although certainly not all the variations. I'd be willing to put these up somewhere if people were interested in them. Also be sure to read Fotland's post on Trap evaluation (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=devTalk;action=display;num=1171242605), it lists most of the patterns that lead to captures. It seems I ran across one or two he missed in that post, but I may be remembering wrong.

Janzert

Title: Re: OpFor
Post by Janzert on Dec 31st, 2007, 9:45am
As suspected when trying full width search if the opponent can capture causes the search to explode.

Janzert

Title: Re: OpFor
Post by Fritzlein on Dec 31st, 2007, 9:56am
I believe that Bomb's static trap evaluation doubles as a move generator for selective deeper search.  This was the solution to the puzzle that Haizhi couldn't solve in his bot: how is it possible to do quiescence search without it being too expensive?  Generating all possible moves and picking some to search deeper was simply too time consuming to be worth it in his opinion.  But if your static evaluation is incidentally generating goal and capture patterns anyway, well...

Also note that Bomb's static trap evaluation is still buggy.  I see from the output of my purchased version of Bomb that it often generates an illegal move in the static evaluation, and when it tries to search deeper on that move it can't, so spits out an error.  Yes, lightvector, all the cases make it a nightmare to code.  But however ugly it looks, a mass of if-then-else statements is blindingly fast to run compared to general move generation, or so I understand.

Title: Re: OpFor
Post by IdahoEv on Jan 11th, 2008, 8:14pm

on 12/26/07 at 00:31:00, The_Jeh wrote:
How did you come up with the name "OpFor?" It is a programming term, I suppose?

That makes me wonder at the names of other bots.


Zombie -  I had/have envisioned writing a bot that uses neural networks heavily in the static evaluator, and planned to call it bot_Brain or bot_Cortex.   Zombie, being a fork of bot_Fairy, isn't that bot.

Thus, bot_Zombie is the bot without a brain...

Title: Re: OpFor
Post by fotland on Mar 6th, 2008, 2:14am
Bomb does have a real quiescence search, but it's very simple.  I guess I got lucky and the simple one works well.

The quiescence move generator generates:

- the step to complete a push
- the step that makes the biggest capture (recorded during evaluation in the static capture evaluation)
- the step that makes the biggest save (recorded during static trap evaluation)
- the step that pulls an enemy piece if the last could be the first half of a pull

Title: Re: OpFor
Post by fotland on Mar 6th, 2008, 2:17am
The static trap evaluator is 700 lines of if/then/else, and it still has bugs.  I didn;t do much testing.  Just fixed bugs as I noticed them in games.

Title: Re: OpFor
Post by fotland on Mar 6th, 2008, 2:22am
bomb's static goal evaluation doesn't try to be exact to 4 steps.  Instead it calculates a lower bound on the number of steps to goal, up to 16 steps (including opponent moves).  I think that the lower bound is exact up to 3 steps, and often exact for larger numbers.  It's about 500 lines of code.

Title: Re: OpFor
Post by fotland on Mar 6th, 2008, 2:24am
Bomb doesn't have goal moves in quiescence search.  The goal search is done with extensions and pruning in the main search.  Extensions if the lower bound on mate is reaonable, and pruning all steps that are too far away to affect the goal.

Title: Re: OpFor
Post by fotland on Mar 6th, 2008, 2:31am
For the capture static evaluation I started with drawings of all the cases (and found more as I was coding), then looked for patterns to reduce the height of the decision tree.  I invented a compact notation to describe them that I could put in comments.  There are 26 cases, so it's not too bad.  I just looked at the code and there are severl TODO comments, so it's incomplete.

Title: Re: OpFor
Post by Janzert on Mar 6th, 2008, 7:24am
Thanks for the clarification and extra information.

Janzert

Title: Re: OpFor
Post by Janzert on Mar 6th, 2008, 9:55am
An updated commit log is now at http://arimaa.janzert.com/opfor/commitlog.txt (http://arimaa.janzert.com/opfor/commitlog.txt)

This goes through to the version used in the 2008 CC.

Janzert

Title: Re: OpFor
Post by Janzert on Jul 24th, 2008, 12:54pm
Updated the commit log again now at revno 184 the last major changes were at the beginning of the postal mixer. The CC version was at revno 162 so not many changes since then.

Janzert

Title: Re: OpFor
Post by Fritzlein on Jul 24th, 2008, 3:32pm
Thanks very much for the update.  I'm particularly curious about this one:

"New small program that does random playouts from a specified handicap"

I think I said in some other thread that for random playouts from the beginning of the game, I expected a camel handicap to be less of a disadvantage than a rabbit handicap.  Am I correct to infer from that note in the change log that you can easily verify or refute my hypothesis?  I am dying to know the relative value of rabbits to pieces for two random players against each other.  If the results are upside-down as I suspect, it's another bit of evidence for the intractability of Arimaa for computers.

Title: Re: OpFor
Post by Janzert on Jul 25th, 2008, 11:55am
You're pretty much right about handicaps for random playout games, but it's actually worse than even just R more important than M.

My take on it after looking at the results is that the random playout result can be basically simulated with a two factor eval. The major factor being number of rabbits and a minor factor for number of pieces. Although the two are closer together at large handicaps. There also seems to be a small bias against higher ranked pieces in the single piece handicaps.

Below are the results for a single piece handicap, single piece plus a rabbit left and a constant 9 pieces with the most major pieces left being replaced by another rabbit after each round. Finally a round with C8R vs emhhddccr. All rounds were done with 1 million playouts.

If people were interested I could put the program up for people to play with, but the results don't seem terribly interesting. I'm also not sure why it has much bearing on the tractability or lack thereof for arimaa being played well by computers.

Janzert

Single piece handicaps:
handicaptest E
Playing MHHDDCC8R vs emhhddcc8r
Win percentage for gold 48.06%

handicaptest M
Playing EHHDDCC8R vs emhhddcc8r
Win percentage for gold 47.93%

handicaptest H
Playing EMHDDCC8R vs emhhddcc8r
Win percentage for gold 47.91%

handicaptest D
Playing EMHHDCC8R vs emhhddcc8r
Win percentage for gold 47.68%

handicaptest C
Playing EMHHDDC8R vs emhhddcc8r
Win percentage for gold 47.57%

handicaptest R
Playing EMHHDDCC7R vs emhhddcc8r
Win percentage for gold 44.63%


Piece and a Rabbit:
handicaptest MHHDDCCRRRRRRR
Playing E1R vs emhhddcc8r
Win percentage for gold 1.21%

handicaptest EHHDDCCRRRRRRR
Playing M1R vs emhhddcc8r
Win percentage for gold 1.13%

handicaptest EMHDDCCRRRRRRR
Playing H1R vs emhhddcc8r
Win percentage for gold 1.09%

handicaptest EMHHDCCRRRRRRR
Playing D1R vs emhhddcc8r
Win percentage for gold 1.01%

handicaptest EMHHDDCRRRRRRR
Playing C1R vs emhhddcc8r
Win percentage for gold 0.93%

handicaptest EMHHDDCCRRRRRR
Playing 2R vs emhhddcc8r
Win percentage for gold 1.01%


Constant 9 pieces:
handicaptest RRRRRRR
Playing EMHHDDCC1R vs emhhddcc8r
Win percentage for gold 6.55%

handicaptest ERRRRRR
Playing MHHDDCC2R vs emhhddcc8r
Win percentage for gold 12.03%

handicaptest EMRRRRR
Playing HHDDCC3R vs emhhddcc8r
Win percentage for gold 16.65%

handicaptest EMHRRRR
Playing HDDCC4R vs emhhddcc8r
Win percentage for gold 20.68%

handicaptest EMHHRRR
Playing DDCC5R vs emhhddcc8r
Win percentage for gold 24.05%

handicaptest EMHHDRR
Playing DCC6R vs emhhddcc8r
Win percentage for gold 26.82%

handicaptest EMHHDDR
Playing CC7R vs emhhddcc8r
Win percentage for gold 29.01%

handicaptest EMHHDDC
Playing C8R vs emhhddcc8r
Win percentage for gold 30.92%


Finally C8R vs emhhddcr:
handicaptest EMHHDDCrrrrrrr
Playing C8R vs emhhddcc1r
Win percentage for gold 79.06%

Title: Re: OpFor
Post by Fritzlein on Jul 25th, 2008, 2:37pm
Whee!  Thanks for the results, Janzert.

There have been many discussions about the relative values of the pieces, and occasional suggestions that computer playouts could help answer the question.  There is a notion that, for example, if Bomb played itself a million times with M vs HC as an opening trade, that would shed some light on which side is materially ahead.  I strongly disagree.

Knowing that a rabbit is worth more to a random bot than an elephant is doesn't prove my point, but it sure helps explain how I feel.  Someone watching a random playout would say the result is meaningless because the side with the extra elephant isn't using his elephant.  By the same token I would say that Bomb playouts where one side lacks a camel are meaningless because Bomb doesn't use its camel.  Yes, Bomb isn't random, but to me its play looks so strategically aimless at times, it might as well be random.

Your experiment also makes for good analogies when, say, IdahoEv's material state analysis shows that HR is worth more than M in an opening trade.  My impulse is totally in the direction of explaining why his data is wrong, which he can chalk up to a natural human tendency to ignore evidence and cling to pre-conceived beliefs.  But here we have "evidence" that it is better to start without a camel than to start without an elephant.  Even IdahoEv would immediately search for an explanation of why the data is bad rather than seriously entertaining the notion that an elephant is worth less than a camel.  That will not, by itself, convince him that I'm right and his data is wrong, but it may give him a window into my feelings about his data if he considers how he feels about your data.

(Incidentally, how do we explain that it's better all down the line for a random bot to give a stronger single piece as a handicap rather than a weaker one?  Is it because having fewer other move options makes a rabbit push slightly more likely?  Is it therefore better to have eight cats and eight rabbits than the actual starting mix?  What is the optimal set of pieces to give to a random bot if it is trying to beat another random bot with the standard set?  Sixteen rabbits for offense, or some mix of cats and rabbits so that some defense is present?)

As far as demonstrating the intractibility of Arimaa, there are some games which have been successfully approached by bots that factor out fallible human domain knowledge.  Backgammon neural networks were able to bootstrap from random play to learn to be quite strong players.  Go programs use random playouts, not to train an evaluation function, but to create a knowledge-less evaluation of a position in real-time.  However, since random play in Arimaa so clearly overvalues rabbits, it seems highly unlikely to be either a feasible starting point for automated learning or a useful aid in real-time evaluation.  Strike a couple of techniques from the programmer arsenal.

Of course, this doesn't mean that a bot training itself by self-play is out of the question.  Maybe if a bot gets good enough, then self-play can make it better.  However, haizhi's failed attempt at self-play training is in the back of my mind.  He said that the values it produced were not reasonable.  I take that to mean his training made the bot's evaluation worse, which is what would happen here: starting from knowing nothing the bot would "learn" that it is good to trade away an elephant to win a rabbit.  Perhaps the nature of Arimaa is such that there are many false starts and misleading backwaters to trip up learing by self-playing bots.

(Incidentally, what does random play say in the case of chess?  I'll bet a dollar it correctly orders the values of the one-piece handicaps.)

To some extent the results just confirm my intuition and JDB's experiments with UCT, but it's always nice to know rather than guess.  Furthermore, my intuition was wrong in some ways.  For example, I would have thought RR had a better chance of beating a random opponent than ER.  I wonder if the E's extra value is primarily offensive or defensive.

I guess I wouldn't be surprised if nobody is interested in the results but me.  In that case, extra thanks for running an experiment for an audience of one! :)

Title: Re: OpFor
Post by omar on Jul 25th, 2008, 9:07pm
Wow, thanks for posting these results Janzert. Very interesting. I would love to experiment with this program. Some questions about your program.

Is the setup random or fixed?

How is the random move generated. Randomly selected after generating all possible moves; after generating all possible unique moves; or random stepwise; or something else?

Does the program include the recently added elimination win condition?


Title: Re: OpFor
Post by Janzert on Jul 25th, 2008, 9:31pm
I mentioned it in chat but for the record, in response to part of Fritz' comment above, I also tried tuning opfor's eval with self play and it led to much worse play in practice.


on 07/25/08 at 21:07:12, omar wrote:
Wow, thanks for posting these results Janzert. Very interesting. I would love to experiment with this program. Some questions about your program.

Is the setup random or fixed?


Setups are randomized, staying within the appropriate setup area though of course.


Quote:
How is the random move generated. Randomly selected after generating all possible moves; after generating all possible unique moves; or random stepwise; or something else?


Random steps.


Quote:
Does the program include the recently added elimination win condition?


Yes. I was going to say that it should be following the old tournament rules for ending the game but I realized that repetitions are not detected. It also doesn't check the strict legality of the "random step" move played.

[Edit: and the program can now be downloaded from http://arimaa.janzert.com/handicaptest.zip. No instructions included. ;) Just run it with the first argument being pieces to be handicapped and the second optionally the number of playouts to run. It will default to a million playouts which can take a couple of minutes on my computer so you might want to try with a smaller number at first so you'll know it hasn't locked up since it provides no feedback while running.]

Janzert

Title: Re: OpFor
Post by omar on Jul 26th, 2008, 1:02pm
Thanks for the link Janzert. It's pretty neat trying different combinations of handicaps.

Is the random setup using a shuffle or a random pick and place?

Also is it always taking 4 steps per move.

Title: Re: OpFor
Post by Janzert on Jul 26th, 2008, 6:47pm
The setup uses pick and place, pick a random square put the next piece there.

It will consider a pass to be a legal step after the 1st step has been made, unless it's in the midst of a push of course.

Janzert

Title: Re: OpFor
Post by Janzert on Aug 26th, 2008, 7:47pm
The current version of opfor up for play is the version that played in the postal mixer. So anyone that wants to try it at live time controls, now is your chance. I'll probably leave it up for the next week.

Janzert

Title: Re: OpFor
Post by Janzert on Aug 28th, 2008, 4:40pm
There was a question on whether opfor might have a bug running under linux (as it is now and the CC version is, but is not for the postal mixer) causing it to play worse. I ran an analysis on a few positions under both XP and linux to see if there was any difference. The node count at a given depth does eventually diverge, but seemingly only by a few percent. In the few positions I checked I only ever saw it change the score at any depth by one centi-rabbit. So while there is something going on that shouldn't be happening, at this point I don't think it is having any significant effect on the level of play. I should still try and figure out what is going on though. :/

Janzert

p.s. the two games against bomb (CC and blitz) last night were run under xp and opfor lost both.

Title: Re: OpFor
Post by Janzert on Sep 21st, 2008, 11:35pm
OpFor in the gameroom has been updated to the latest version. Mostly just infrastructure changes since the postal version so I don't expect it to play significantly different.

Janzert

Title: Re: OpFor
Post by 99of9 on Nov 25th, 2008, 7:49pm
Opfor with full static goal detection is a scary prospect!

Title: Re: OpFor
Post by Janzert on Nov 25th, 2008, 11:07pm
Well the new full four step goal detection is in place now. But it's probably not at all well balanced with the rest of the eval still.

Also I was rather shocked when running a quick test just to make sure it didn't cause big problems, that it does currently change the move chosen on my test positions over 80% of the time. Most of the positions are from the middle game so I didn't expect it to have that large of a change.

Janzert

Title: Re: OpFor
Post by Fritzlein on Nov 26th, 2008, 8:01am
Wouldn't the static goal detection change the speed of search, faster if there are cutoffs and slower otherwise?  And apparently (as we learned from Bob Hyatt) any change in search speed is highly likely to change the outcome.

(which sadly means that tuning to beat Bomb2005CC in its present state may be thwarted by the new server being somewhat faster :()

Title: Re: OpFor
Post by Janzert on Nov 26th, 2008, 10:34am
Ahh, actually I found a bug in the way the eval was using the goal search value (basically the old goal search used 0 for not found and the new goal search uses a large number). Now that I fixed that less than a 25% of positions get a different move.

Janzert

Title: Re: OpFor
Post by 99of9 on Nov 26th, 2008, 1:21pm

on 11/26/08 at 08:01:14, Fritzlein wrote:
Wouldn't the static goal detection change the speed of search, faster if there are cutoffs and slower otherwise?  And apparently (as we learned from Bob Hyatt) any change in search speed is highly likely to change the outcome.

Yes, but I presume when Janzert runs his test corpus each position is searched to a fixed depth.  At least that's how I do it.  You can measure speed improvements by how long it takes to run the whole lot.



Quote:
(which sadly means that tuning to beat Bomb2005CC in its present state may be thwarted by the new server being somewhat faster :()

Certainly.  However, Omar is of the opinion that bomb is not multithreaded.  If he's right, since clock speeds have plateaued, bomb won't improve much until/unless David returns.

Title: Re: OpFor
Post by Janzert on Nov 26th, 2008, 4:16pm
Yes, when I run against test positions it's almost always to a fixed depth.

Janzert

Title: Re: OpFor
Post by fotland on Dec 31st, 2008, 5:40pm

on 12/26/07 at 00:31:00, The_Jeh wrote:
That makes me wonder at the names of other bots. Here are my guesses, but please correct me if I'm wrong:

Bomb - perhaps a martial symbol of power or a pop-culture verb meaning "to fail abjectly," but more likely a programming term referring either to a system crash or a piece of code that is dormant until triggered (which would be appropriate given bot_Bomb's features)


I picked it because it alliterates with Bot, and because it I expected it to be strong tactically and weak strategically, like a bomb.  I didn't put a lot of thought into it.

David

Title: Re: OpFor
Post by Janzert on Feb 28th, 2009, 8:53pm
For anyone that might be interested I've updated the commit log for OpFor at http://arimaa.janzert.com/opfor/commitlog.txt to be current through to the beginning of the CC.

Janzert

Title: Re: OpFor
Post by Janzert on Mar 1st, 2009, 7:56pm
It's not at all documented, not written very cleanly and has a whole lot of bugs that I've fixed in the last year, but you can now find the source code for the 2008 CC version of OpFor on its page at http://arimaa.janzert.com/opfor/.

Janzert



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.