Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Bot Development >> Suite of test positions
(Message started by: unic on May 22nd, 2006, 6:08am)

Title: Suite of test positions
Post by unic on May 22nd, 2006, 6:08am
In a different thread, Swynndla wrote:

Quote:
Unic, would Fairy be able to see the winning move on move 47b?:
http://www.arimaa.com/arimaa/gameroom/comments.cgi?gid=27019

It would be useful to collect a number of these positions, where there is one right move which either leads to a win or to a definite advantage, and is significantly better than any other move.

In chess, plenty of these test suits exist (Nolot, Win at Chess, various others), and are often used to test programs' tactical search...

Does anybody know of more such positions in Arimaa?  I am sure there must be many among all the games that have been played... but I am not strong enough to find them (or to know that the right move is indeed right).

If we could collect a number of these positions (of varying difficulty), that would be a very useful tool for testing the tactical search of bots.

Title: Re: Suite of test positions
Post by Fritzlein on May 22nd, 2006, 10:05am
The deepest tactical forced win I've ever analyzed is
http://www.arimaa.com/arimaa/gameroom/comments.cgi?gid=8073
before move 52w.  Actually, it's enough moves that I'm not sure of my analysis.  Both 99of9 and I changed our minds about the position.  He thought there was a forced win, then decided there wasn't, whereas I thought there wasn't a forced win, then decided there was.  When bots start analyzing that position accurately, I will be shaking in my boots.

Title: Re: Suite of test positions
Post by unic on May 22nd, 2006, 10:16am
So, we have:

Position 1 (game 27019):

Code:
47b
+-----------------+
8| r         D r       | 8
7| r     r         |
6|   c X     X     |
5|           c     |
4| R r h     e     |
3|   D X d   r E   |
2|     C R R R d R |
1|               R |
+-----------------+
  a b c d e f g h

Key move: Rd2s dd3s rf3w ef4s

Position 2 (Game 8073):

Code:
52w
+-----------------+
8| r D           r |
7| R h c E   r r R |
6|     X   m X     |
5| R   h           |
4| r               |
3| H   X     X     |
2|         e     R |
1|   d R       R   |
+-----------------+
  a b c d e f g h

Key move: cc7s Ed7w hb7s Ec7w
(Is that correct, Fritzlein?)

Any more positions (maybe some easier ones to go along with the above supposedly very difficult one)?

How about some positions with forced significant win of material, instead of forced goal?

(Grr - why does the forum eat my whitespace, even when the board is defined as code?  And why only in some cases?)

Title: Re: Suite of test positions
Post by chessandgo on May 22nd, 2006, 11:19am
Nice position ! I agree with your first impression Fritz,
52w Ed7e rf7n Ee7e (and let's add Ra5n)
Now if silver is to save his camel he must as you said
52b cc7e me6n me7n rf8e
Taking the offered rabbit is logical
53w Ef7n rg7w rf7s rf7x Ef8s
Now you proposed to sacrifice the cat to get an extra tempo with the E
53b me8e cd7s cd6e ee2n
but here gold goals in one after
54w Ef7e Eg7w rg8s
I agree that intead, for 53b, saving the material by
53b hc5e hd5e he5e me8e
would also lose  (well, would it ?), for intsance before shifting side with the E gold can play the same idea
54w Ef7e Eg7w rg8s Ra6w
Now either black brings his horse to h6 and abandons his M (*), either he must play
54b cd7n cd8e mf8e x
where shifting side with
55w Ef7w Ee7w Ed7w looks winning.

In the end, it seems that silver can't save his M ;
so maybe it is better to give it on the first move instead of a rabbit (!!) with
52b cc7 e cd7 cd8e rf8e
or on move 54 (*) to bring the horse to h6 ; now the question is wether taking the M in these position will allow gold to goal quickly on one side ... or maybe silver is still winning anyway ?? ... requires more thinking ...

That's a really interesting problem anyway, I'll go on thinking of it tonight

I fear there might be some obvious mistake in my rough analysis ; if you find so I'll be happy to hear where !

Have fun !

Jean

Title: Re: Suite of test positions
Post by Fritzlein on May 22nd, 2006, 12:36pm

on 05/22/06 at 10:16:11, unic wrote:
(Grr - why does the forum eat my whitespace, even when the board is defined as code?  And why only in some cases?)

For some reason, it seems there is a maximum of five spaces in a row before it collapses them.  I agree, it is annoying.  Perhaps if you use a dot to represent an unoccupied square of the board, the positions will display clearly.

Title: Re: Suite of test positions
Post by Fritzlein on May 22nd, 2006, 12:38pm

on 05/22/06 at 11:19:24, chessandgo wrote:
... or maybe silver is still winning anyway ?? ... requires more thinking ...

My recollection is that Silver can give up a camel and still be winning.  However, if my later analysis is correct, Gold has a move which not only forces the win of the camel, but actually forces goal.  That is of course the more critical line.  Do you agree that goal is forced, or have I made an error?

Title: Re: Suite of test positions
Post by unic on May 22nd, 2006, 12:45pm

on 05/22/06 at 12:36:04, Fritzlein wrote:
For some reason, it seems there is a maximum of five spaces in a row before it collapses them.  I agree, it is annoying.  Perhaps if you use a dot to represent an unoccupied square of the board, the positions will display clearly.

I wanted the notation to match the files that the server sends... that's why I used spaces.  Dots are probably a better idea - but that would mean bots might not be able to run the test suite without (very minor) modification.

Title: Re: Suite of test positions
Post by chessandgo on May 22nd, 2006, 3:20pm
I don't know if I have been clear in my post about pgn, but what I meant is partially that it would be great if we could just take the game, edit it with variants, add commentaries at each move, and it could be read somehow.


I have to admit that I haven't found the force to read your extensive commentary, as plain text explanation is very hard and painfull to follow ...


I guess you are right Fritz.

Title: Re: Suite of test positions
Post by Swynndla on May 22nd, 2006, 4:35pm
Forced goal in 3 moves (5 ply) maybe?? ...
http://www.arimaa.com/arimaa/gameroom/replayFlash.cgi?gid=28660&s=w&client=1
Move 27w

Title: Re: Suite of test positions
Post by 99of9 on May 22nd, 2006, 6:36pm

on 05/22/06 at 10:16:11, unic wrote:
Position 2 (Game 8073):
52w
+-----------------+
8|.r.D...........r.|
7|.R.h.c.E...r.r.R.|
6|.....X...m.X.....|
5|.R...h...........|
4|.r...............|
3|.H...X.....X.....|
2|.........e.....R.|
1|...d.R.......R...|
+-----------------+
  a b c d e f g h

Key move: cc7s Ed7w hb7s Ec7w
(Is that correct, Fritzlein?)

(Grr - why does the forum eat my whitespace, even when the board is defined as code?  And why only in some cases?)

One possible solution is to switch to the [ tt ] flag (I put dots in here also, but now I don't think they're necessary).

Title: Re: Suite of test positions
Post by Fritzlein on May 23rd, 2006, 12:16pm

on 05/22/06 at 16:35:55, Swynndla wrote:
Forced goal in 3 moves (5 ply) maybe?? ...
http://www.arimaa.com/arimaa/gameroom/replayFlash.cgi?gid=28660&s=w&client=1
Move 27w

Nice position, Swyyndla.  It's always fun to see the side that is way behind materially force a goal.  Unfortunately, the key move is not unique, which may make it less suitable for a test suite.  Perhaps if the Gold horse were moved to d6 in the starting position, it would make the the key move unique?

Title: Re: Suite of test positions
Post by chessandgo on May 26th, 2006, 1:08pm

on 05/22/06 at 12:38:54, Fritzlein wrote:
My recollection is that Silver can give up a camel and still be winning. However, if my later analysis is correct, Gold has a move which not only forces the win of the camel, but actually forces goal. That is of course the more critical line. Do you agree that goal is forced, or have I made an error?


Yes, your analysis is much sharper than mine ... very convincing ;)
Sorry for the bothering


Title: Re: Suite of test positions
Post by Fritzlein on Dec 8th, 2006, 9:44pm
I suppose test positions are normally asking, "Can the bot find the right move?" but for testing goal search specifically, it might also be interesting to ask, "How long until the bot announces mate?"  The terminal position in game 43858 would be such a position.  I'm going to let Bomb look at it overnight to see if that suffices.
[EDIT]
Looking overnight didn't even get Bomb to 16 steps depth on my underpowered machine.  I'm also puzzled by the output:

14(8-27)-S> -14.54  2.0B  4:29  Mg6s Rf6e Rg6n Mg5n Rg7e df7e MISS
15(8-28)=S> -16.59 1.4B  9:34  Mg6s Rf6e Rg6n Df2w Rg7s df7e hd5n ca4e De2w MISS

So the first report came after four and a half hours, the second after nine and a half hours.  The primary variations and the evaluations make sense.  What puzzles me is the number of nodes reported.  The 14-step search took 2.0 billion, and the 15-step search only 1.4 billion.  I thought that number could only go up.  Or maybe this is a long integer overflowing?

Anyway, the suggested move for Gold is one I didn't have in my game comment analysis, but apparently it doesn't change anything; it is still a forced win for Silver in three moves.  If I enter the first eight steps of the main line, Bomb has to search to depth 13 to see the remaining 16-step forced goal.  This suggests that from the original position it would need to search to depth 21 before seeing the forced goal, despite Bomb's good goal extensions.  So using the position as a "test" is apparently only going to test whether or not you have a supercomputer.

Title: Re: Suite of test positions
Post by Fritzlein on Feb 20th, 2008, 6:33pm
New test position: game 47333 before move 37w

http://arimaa.com/arimaa/games/jsShowGame.cgi?gid=47333&s=w

If am not mistaken, there is only one move which fends off the goal in two that Omar actually played and also fends off material loss: 37w Rg3e rf3e Hf4s Ef5s.  Bomb flunked the test live with two minutes per move, but I am surprised to discover Bomb still flunking on my home computer when given more than an hour.

[EDIT]

Whoops, it seems Bomb didn't "find" 37w Rg3e rf3e Hf4s Ef5s because that in fact allows goal in two with 37b Ce2e mf2e Rf1n dg1w.  The move Bomb actually settled on after completing depth 12 was 37w He3w rf3w Hf4s Ef5s, but it turns out this also allows goal in three after 37b Ce2e mf2e Rf1n dg1w because now the central rabbit is a goal threat as well.  So this position isn't a good "find the right move" test position, it's only a "how long until you realize you are doomed" test position.

Title: Re: Suite of test positions
Post by chessandgo on Feb 27th, 2008, 6:08am

on 02/20/08 at 18:33:06, Fritzlein wrote:
New test position: game 47333 before move 37w

http://arimaa.com/arimaa/games/jsShowGame.cgi?gid=47333&s=w

If am not mistaken, there is only one move which fends off the goal in two that Omar actually played and also fends off material loss: 37w Rg3e rf3e Hf4s Ef5s. Bomb flunked the test live with two minutes per move, but I am surprised to discover Bomb still flunking on my home computer when given more than an hour.

[EDIT]

Whoops, it seems Bomb didn't "find" 37w Rg3e rf3e Hf4s Ef5s because that in fact allows goal in two with 37b Ce2e mf2e Rf1n dg1w. The move Bomb actually settled on after completing depth 12 was 37w He3w rf3w Hf4s Ef5s, but it turns out this also allows goal in three after 37b Ce2e mf2e Rf1n dg1w because now the central rabbit is a goal threat as well. So this position isn't a good "find the right move" test position, it's only a "how long until you realize you are doomed" test position.


Hmmm I had written down an analysis about this goal attack some time ago, not sure if it's accurate though. Wouldn't simply 37w E to h3 avoid goal in the forseeable future ?
I had written that the solution (for silver on previous move) was 36b rh3s rh6s rh5s (with possibly a void step to complete the move), forcing goal in 3. But I haven't been able to check with Bomb ; do I have something wrong ?

Title: Re: Suite of test positions
Post by Fritzlein on Feb 28th, 2008, 11:30pm
Now that is just scary, Jean.  How many games do you have your own analysis notes on?  It's no wonder you are the reigning champ if you are constantly analyzing games, even games in which you didn't participate.

But what is even scarier is that your un-aided analysis was better than my analysis with Bomb's help.  Upon further review, I think you are right that there is one and only one move to stop goal, even though neither I nor Bomb found it.  I am officially demoralized in advance of our next encounter.  :(

Knowing that there is only one right answer re-instates this position as a good candidate for a test suite.  It is unlikely that a program will find the right move for the wrong reason, because Gold can't defend goal without giving up a horse.

Title: Re: Suite of test positions
Post by chessandgo on Mar 1st, 2008, 4:32am

on 02/28/08 at 23:30:35, Fritzlein wrote:
I am officially demoralized in advance of our next encounter. :(


... says the man with an 8-0 record :)

Title: Re: Suite of test positions
Post by Fritzlein on Mar 1st, 2008, 7:35am
Heheh.  I have a special ability to feel that all my wins are lucky flukes I didn't deserve.  

Title: Re: Suite of test positions
Post by Fritzlein on Oct 11th, 2009, 9:13am
Is anyone still interested in compiling public test positions?  The position before 60w in marwin vs. arimaa_master seems to be a good candidate for reasons discussed here:
http://arimaa.com/arimaa/gameroom/comments.cgi?gid=120131

Apparently the is an obvious move (taking a camel hostage) that loses the game, and otherwise one clear move candidate (pushing the horse into g3) to keep the game in balance.  The test question is how long it takes the bot to prefer the better move of the two.

Title: Re: Suite of test positions
Post by aaaa on Oct 11th, 2009, 10:34am
It takes my bot five minutes to find 60g He2e Hf2n rg3n Hf3e.

Title: Re: Suite of test positions
Post by Fritzlein on Oct 11th, 2009, 12:33pm

on 10/11/09 at 10:34:48, aaaa wrote:
It takes my bot five minutes to find 60g He2e Hf2n rg3n Hf3e.

Bomb2005, on my old, slow computer, needed an hour.

Title: Re: Suite of test positions
Post by Arimabuff on Oct 11th, 2009, 2:01pm

on 10/11/09 at 12:33:41, Fritzlein wrote:
Bomb2005, on my old, slow computer, needed an hour.

Which one is at fault? Bomb or your computer?

Title: Re: Suite of test positions
Post by Fritzlein on Oct 11th, 2009, 2:43pm

on 10/11/09 at 14:01:31, Arimabuff wrote:
Which one is at fault? Bomb or your computer?

That's hard to know without having my computer run a different bot, or running Bomb on a different computer.  Tize said that Marwin found it at depth 17, while Bomb found it at depth 16, so that makes Bomb look good, but maybe Bomb has so many extensions it takes longer to get to depth 16 than marwin take to get to depth 17.  Presumably quad found it at a lower depth, but doesn't search as deeply due to using tons of extensions.

I salute Omar for providing the Arimaa Challenge hardware, so that we are truly testing which software is better.  Taking one variable out of the equation gives us more insight into the other.

Title: Re: Suite of test positions
Post by jdb on Oct 11th, 2009, 4:18pm
It takes clueless 5 seconds to find this move, at depth 9. It stays with this move up to at least depth 18, in ten minutes.

Title: Re: Suite of test positions
Post by Fritzlein on Oct 12th, 2009, 4:35am
Wow.

Title: Re: Suite of test positions
Post by aaaa on Oct 12th, 2009, 6:05am

on 10/11/09 at 14:43:16, Fritzlein wrote:
Presumably quad found it at a lower depth, but doesn't search as deeply due to using tons of extensions.

Just barely, at 15 steps deep. Maybe that's because the null-move pruning is applied conservatively, allowing a full four-step move for the beneficiary of the null move to refute it.

Title: Re: Suite of test positions
Post by tize on Oct 14th, 2009, 1:22pm
Hats off to Clueless!


Quote:
Tize said that Marwin found it at depth 17, while Bomb found it at depth 16...

[pathetic resque attempt]
You said, in a game comment, that Bomb didn't stick with the move when a depth 17 search was done. In that case Marwin finds the right move at depth 6. :)
[\pathetic resque attempt]

It takes about 11 minutes for Marwin to stick with the right move.

It's interesting to see that Clueless needed 5 seconds to reach depth 9 while Marwin needs less than a second on this position, but after 10 minutes Clueless has made it through depth 18 while Marwin still struggles with depth 17.

Title: Re: Suite of test positions
Post by Fritzlein on Oct 14th, 2009, 6:12pm

on 10/14/09 at 13:22:43, tize wrote:
You said, in a game comment, that Bomb didn't stick with the move when a depth 17 search was done. In that case Marwin finds the right move at depth 6. :)

Good point.  Finding the right move doesn't count if one doesn't stick with it.  I wonder why Bomb realizes the camel hostage is no good at 16 steps but likes it again at 17.  Maybe curiosity will overcome my impatience, and I will let Bomb run overnight on this position.

Title: Re: Suite of test positions
Post by Fritzlein on Jan 18th, 2010, 3:24pm
I'm not sure if this link was posted long ago, and by now perhaps developers have each made their own, more extensive goal-finding suite, but for what it's worth Fotland made his suite public here: http://www.smart-games.com/mate12.ZIP

Title: Re: Suite of test positions
Post by doublep on Jan 20th, 2010, 2:28pm
I have an around 100 positions goal test suite (positions + correct/incorrect solutions).  Most tests were collected with automated tools against errors of Badger.  However, all tests are written in a slightly modified GTP, like:

setup_board r-dr------r---r----------M-----rRhH-----H---C-----D--rEeRR-----R
620 find_one_move_win black
#? [true eh2n-eh3w-eg3w-rf2s-ef3x]

Is there any interest in it?

Title: Re: Suite of test positions
Post by BlackKnight on Jan 21st, 2010, 11:00am

on 01/20/10 at 14:28:43, doublep wrote:
Is there any interest in it?

Yes, definitely! Thank you.

Title: Re: Suite of test positions
Post by doublep on Jan 21st, 2010, 12:43pm
Not sure how to share files best.  I used Google docs: https://docs.google.com/leaf?id=0B3GzxNlUxcpMMDEzZGExMjAtMDRiYi00YmViLWIwYTYtYWYzNWMwMDQ1Zjdk&hl=en

I included a short description.  If there is interest, I can try formalizing GTP as used by Badger more and share my regression and error-finding tools.  As tools depend on a GTP engine, they would be pointless without specification anyway, so I'm not publishing them at this point.

Title: Re: Suite of test positions
Post by froody on Jan 23rd, 2010, 4:16am
Cool. I'm working on some Arimaa bot stuff at the moment. So far all I've done is find goal moves by brute force. Hope to share some stuff soon. Maybe the Arimaa community should talk more about game formats, and try to standardise something that we can all use?

Title: Re: Suite of test positions
Post by Janzert on Jan 25th, 2010, 11:38am
There was a bit of discussion back in the fall in the site development forum. That thread can be found here (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=siteIssues;action=display;num=1252216594).

Personally for bot development I've pretty much found just the standard board notation (http://arimaa.com/arimaa/learn/notation.html) to be mostly enough and occasionally have used movelists.

Janzert

Title: Re: Suite of test positions
Post by doublep on Jan 25th, 2010, 12:21pm

on 01/25/10 at 11:38:24, Janzert wrote:
Personally for bot development I've pretty much found just the standard board notation (http://arimaa.com/arimaa/learn/notation.html) to be mostly enough and occasionally have used movelists.

I find standard board notation to be not very useful, because it spans several lines and contains a lot of "extra" information. Sure, it is much more human-readable than what I use with 'setup_board', but on the downside it is difficult to parse and copy around or send, e.g. in IRC or console. In fact, it is so cumbersome to parse that you are basically forced to use one position per file if you want any sanity in command parser.

[edited for clarity]

Title: Re: Suite of test positions
Post by froody on Jan 26th, 2010, 5:44am
Do you think GTP is the ideal solution?

Title: Re: Suite of test positions
Post by Janzert on Jan 26th, 2010, 12:56pm
Yes, when using the full board representation I normally do only have one board per file. Although occasionally I'll have a file with a large number of board and comments in between. For parsing I simply use the next line starting with numerals to signal the start of the next position. A dirty thing to do, but it works fine for what I need. My primary goal here though is generally ease of reading and editing, so a single line format is not something I want to deal with.

For communicating a position with an engine, in AEI I exclusively use the single line representation Omar uses in his scripts. This basically boils down to an opening [ followed by a space or piece letter for each square in order of column a through h and rank 8 through 1, closing with a ]. Which looks to be similar to what you are using except you leave out the brackets, replace spaces with dashes and I'm unsure what ordering you are using.

Janzert

Title: Re: Suite of test positions
Post by doublep on Jan 27th, 2010, 3:03pm

on 01/26/10 at 05:44:03, froody wrote:
Do you think GTP is the ideal solution?


Certainly nothing is ideal.  I chose GTP because it was well-defined, I liked it and knew well (I worked for some time on GNU Go several years back).

Basically, I wanted a simple human-readable language that I could issue commands to the bot in.  E.g. run it from command line and tell to do something or just write down commands to a file and tell it read the file and execute commands from it.  That's the way I perform regression testing and debug problems noticed in real games.  I have several commands ranging from full-blown 'genmove' (standard GTP for 'produce a move in given position') to more specific like 'capture_anything', 'evaluate' or 'read_branch'.

Title: Re: Suite of test positions
Post by doublep on Jan 27th, 2010, 3:13pm

on 01/26/10 at 12:56:33, Janzert wrote:
For communicating a position with an engine, in AEI I exclusively use the single line representation Omar uses in his scripts. This basically boils down to an opening [ followed by a space or piece letter for each square in order of column a through h and rank 8 through 1, closing with a ]. Which looks to be similar to what you are using except you leave out the brackets, replace spaces with dashes and I'm unsure what ordering you are using.

Yes, that sounds pretty much the same.  As I understood the order is actually the same.

I use hyphens a lot (e.g. I write moves as Ra1n-Ra2n, not Ra1n Ra2n) because that makes a move a single string from GTP perspective.  Then I can unambigously write several moves or other things on one line when needed.  The same holds for condensed position representation.  Sure, in the latter case you could count number of characters or use delimiters like those [..], but for GTP consistency I chose representation with which it is enough to chop a line into strings (separated with spaces) and then process each string individually.

From a more practical usability point of view, many consecutive spaces are hard to count and generally can be messed with when sending over some communication channels.  E.g. just embedding in HTML without a <pre>, or any text with word wrapping will corrupt position string.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB 2000-2003. All Rights Reserved.