|
||||
Title: Suite of test positions Post by unic on May 22nd, 2006, 6:08am In a different thread, Swynndla wrote: Quote:
It would be useful to collect a number of these positions, where there is one right move which either leads to a win or to a definite advantage, and is significantly better than any other move. In chess, plenty of these test suits exist (Nolot, Win at Chess, various others), and are often used to test programs' tactical search... Does anybody know of more such positions in Arimaa? I am sure there must be many among all the games that have been played... but I am not strong enough to find them (or to know that the right move is indeed right). If we could collect a number of these positions (of varying difficulty), that would be a very useful tool for testing the tactical search of bots. |
||||
Title: Re: Suite of test positions Post by Fritzlein on May 22nd, 2006, 10:05am The deepest tactical forced win I've ever analyzed is http://www.arimaa.com/arimaa/gameroom/comments.cgi?gid=8073 before move 52w. Actually, it's enough moves that I'm not sure of my analysis. Both 99of9 and I changed our minds about the position. He thought there was a forced win, then decided there wasn't, whereas I thought there wasn't a forced win, then decided there was. When bots start analyzing that position accurately, I will be shaking in my boots. |
||||
Title: Re: Suite of test positions Post by unic on May 22nd, 2006, 10:16am So, we have: Position 1 (game 27019): Code:
Key move: Rd2s dd3s rf3w ef4s Position 2 (Game 8073): Code:
Key move: cc7s Ed7w hb7s Ec7w (Is that correct, Fritzlein?) Any more positions (maybe some easier ones to go along with the above supposedly very difficult one)? How about some positions with forced significant win of material, instead of forced goal? (Grr - why does the forum eat my whitespace, even when the board is defined as code? And why only in some cases?) |
||||
Title: Re: Suite of test positions Post by chessandgo on May 22nd, 2006, 11:19am Nice position ! I agree with your first impression Fritz, 52w Ed7e rf7n Ee7e (and let's add Ra5n) Now if silver is to save his camel he must as you said 52b cc7e me6n me7n rf8e Taking the offered rabbit is logical 53w Ef7n rg7w rf7s rf7x Ef8s Now you proposed to sacrifice the cat to get an extra tempo with the E 53b me8e cd7s cd6e ee2n but here gold goals in one after 54w Ef7e Eg7w rg8s I agree that intead, for 53b, saving the material by 53b hc5e hd5e he5e me8e would also lose (well, would it ?), for intsance before shifting side with the E gold can play the same idea 54w Ef7e Eg7w rg8s Ra6w Now either black brings his horse to h6 and abandons his M (*), either he must play 54b cd7n cd8e mf8e x where shifting side with 55w Ef7w Ee7w Ed7w looks winning. In the end, it seems that silver can't save his M ; so maybe it is better to give it on the first move instead of a rabbit (!!) with 52b cc7 e cd7 cd8e rf8e or on move 54 (*) to bring the horse to h6 ; now the question is wether taking the M in these position will allow gold to goal quickly on one side ... or maybe silver is still winning anyway ?? ... requires more thinking ... That's a really interesting problem anyway, I'll go on thinking of it tonight I fear there might be some obvious mistake in my rough analysis ; if you find so I'll be happy to hear where ! Have fun ! Jean |
||||
Title: Re: Suite of test positions Post by Fritzlein on May 22nd, 2006, 12:36pm on 05/22/06 at 10:16:11, unic wrote:
For some reason, it seems there is a maximum of five spaces in a row before it collapses them. I agree, it is annoying. Perhaps if you use a dot to represent an unoccupied square of the board, the positions will display clearly. |
||||
Title: Re: Suite of test positions Post by Fritzlein on May 22nd, 2006, 12:38pm on 05/22/06 at 11:19:24, chessandgo wrote:
My recollection is that Silver can give up a camel and still be winning. However, if my later analysis is correct, Gold has a move which not only forces the win of the camel, but actually forces goal. That is of course the more critical line. Do you agree that goal is forced, or have I made an error? |
||||
Title: Re: Suite of test positions Post by unic on May 22nd, 2006, 12:45pm on 05/22/06 at 12:36:04, Fritzlein wrote:
I wanted the notation to match the files that the server sends... that's why I used spaces. Dots are probably a better idea - but that would mean bots might not be able to run the test suite without (very minor) modification. |
||||
Title: Re: Suite of test positions Post by chessandgo on May 22nd, 2006, 3:20pm I don't know if I have been clear in my post about pgn, but what I meant is partially that it would be great if we could just take the game, edit it with variants, add commentaries at each move, and it could be read somehow. I have to admit that I haven't found the force to read your extensive commentary, as plain text explanation is very hard and painfull to follow ... I guess you are right Fritz. |
||||
Title: Re: Suite of test positions Post by Swynndla on May 22nd, 2006, 4:35pm Forced goal in 3 moves (5 ply) maybe?? ... http://www.arimaa.com/arimaa/gameroom/replayFlash.cgi?gid=28660&s=w&client=1 Move 27w |
||||
Title: Re: Suite of test positions Post by 99of9 on May 22nd, 2006, 6:36pm on 05/22/06 at 10:16:11, unic wrote:
One possible solution is to switch to the [ tt ] flag (I put dots in here also, but now I don't think they're necessary). |
||||
Title: Re: Suite of test positions Post by Fritzlein on May 23rd, 2006, 12:16pm on 05/22/06 at 16:35:55, Swynndla wrote:
Nice position, Swyyndla. It's always fun to see the side that is way behind materially force a goal. Unfortunately, the key move is not unique, which may make it less suitable for a test suite. Perhaps if the Gold horse were moved to d6 in the starting position, it would make the the key move unique? |
||||
Title: Re: Suite of test positions Post by chessandgo on May 26th, 2006, 1:08pm on 05/22/06 at 12:38:54, Fritzlein wrote:
Yes, your analysis is much sharper than mine ... very convincing ;) Sorry for the bothering |
||||
Title: Re: Suite of test positions Post by Fritzlein on Dec 8th, 2006, 9:44pm I suppose test positions are normally asking, "Can the bot find the right move?" but for testing goal search specifically, it might also be interesting to ask, "How long until the bot announces mate?" The terminal position in game 43858 would be such a position. I'm going to let Bomb look at it overnight to see if that suffices. [EDIT] Looking overnight didn't even get Bomb to 16 steps depth on my underpowered machine. I'm also puzzled by the output: 14(8-27)-S> -14.54 2.0B 4:29 Mg6s Rf6e Rg6n Mg5n Rg7e df7e MISS 15(8-28)=S> -16.59 1.4B 9:34 Mg6s Rf6e Rg6n Df2w Rg7s df7e hd5n ca4e De2w MISS So the first report came after four and a half hours, the second after nine and a half hours. The primary variations and the evaluations make sense. What puzzles me is the number of nodes reported. The 14-step search took 2.0 billion, and the 15-step search only 1.4 billion. I thought that number could only go up. Or maybe this is a long integer overflowing? Anyway, the suggested move for Gold is one I didn't have in my game comment analysis, but apparently it doesn't change anything; it is still a forced win for Silver in three moves. If I enter the first eight steps of the main line, Bomb has to search to depth 13 to see the remaining 16-step forced goal. This suggests that from the original position it would need to search to depth 21 before seeing the forced goal, despite Bomb's good goal extensions. So using the position as a "test" is apparently only going to test whether or not you have a supercomputer. |
||||
Title: Re: Suite of test positions Post by Fritzlein on Feb 20th, 2008, 6:33pm New test position: game 47333 before move 37w http://arimaa.com/arimaa/games/jsShowGame.cgi?gid=47333&s=w If am not mistaken, there is only one move which fends off the goal in two that Omar actually played and also fends off material loss: 37w Rg3e rf3e Hf4s Ef5s. Bomb flunked the test live with two minutes per move, but I am surprised to discover Bomb still flunking on my home computer when given more than an hour. [EDIT] Whoops, it seems Bomb didn't "find" 37w Rg3e rf3e Hf4s Ef5s because that in fact allows goal in two with 37b Ce2e mf2e Rf1n dg1w. The move Bomb actually settled on after completing depth 12 was 37w He3w rf3w Hf4s Ef5s, but it turns out this also allows goal in three after 37b Ce2e mf2e Rf1n dg1w because now the central rabbit is a goal threat as well. So this position isn't a good "find the right move" test position, it's only a "how long until you realize you are doomed" test position. |
||||
Title: Re: Suite of test positions Post by chessandgo on Feb 27th, 2008, 6:08am on 02/20/08 at 18:33:06, Fritzlein wrote:
Hmmm I had written down an analysis about this goal attack some time ago, not sure if it's accurate though. Wouldn't simply 37w E to h3 avoid goal in the forseeable future ? I had written that the solution (for silver on previous move) was 36b rh3s rh6s rh5s (with possibly a void step to complete the move), forcing goal in 3. But I haven't been able to check with Bomb ; do I have something wrong ? |
||||
Title: Re: Suite of test positions Post by Fritzlein on Feb 28th, 2008, 11:30pm Now that is just scary, Jean. How many games do you have your own analysis notes on? It's no wonder you are the reigning champ if you are constantly analyzing games, even games in which you didn't participate. But what is even scarier is that your un-aided analysis was better than my analysis with Bomb's help. Upon further review, I think you are right that there is one and only one move to stop goal, even though neither I nor Bomb found it. I am officially demoralized in advance of our next encounter. :( Knowing that there is only one right answer re-instates this position as a good candidate for a test suite. It is unlikely that a program will find the right move for the wrong reason, because Gold can't defend goal without giving up a horse. |
||||
Title: Re: Suite of test positions Post by chessandgo on Mar 1st, 2008, 4:32am on 02/28/08 at 23:30:35, Fritzlein wrote:
... says the man with an 8-0 record :) |
||||
Title: Re: Suite of test positions Post by Fritzlein on Mar 1st, 2008, 7:35am Heheh. I have a special ability to feel that all my wins are lucky flukes I didn't deserve. |
||||
Title: Re: Suite of test positions Post by Fritzlein on Oct 11th, 2009, 9:13am Is anyone still interested in compiling public test positions? The position before 60w in marwin vs. arimaa_master seems to be a good candidate for reasons discussed here: http://arimaa.com/arimaa/gameroom/comments.cgi?gid=120131 Apparently the is an obvious move (taking a camel hostage) that loses the game, and otherwise one clear move candidate (pushing the horse into g3) to keep the game in balance. The test question is how long it takes the bot to prefer the better move of the two. |
||||
Title: Re: Suite of test positions Post by aaaa on Oct 11th, 2009, 10:34am It takes my bot five minutes to find 60g He2e Hf2n rg3n Hf3e. |
||||
Title: Re: Suite of test positions Post by Fritzlein on Oct 11th, 2009, 12:33pm on 10/11/09 at 10:34:48, aaaa wrote:
Bomb2005, on my old, slow computer, needed an hour. |
||||
Title: Re: Suite of test positions Post by Arimabuff on Oct 11th, 2009, 2:01pm on 10/11/09 at 12:33:41, Fritzlein wrote:
Which one is at fault? Bomb or your computer? |
||||
Title: Re: Suite of test positions Post by Fritzlein on Oct 11th, 2009, 2:43pm on 10/11/09 at 14:01:31, Arimabuff wrote:
That's hard to know without having my computer run a different bot, or running Bomb on a different computer. Tize said that Marwin found it at depth 17, while Bomb found it at depth 16, so that makes Bomb look good, but maybe Bomb has so many extensions it takes longer to get to depth 16 than marwin take to get to depth 17. Presumably quad found it at a lower depth, but doesn't search as deeply due to using tons of extensions. I salute Omar for providing the Arimaa Challenge hardware, so that we are truly testing which software is better. Taking one variable out of the equation gives us more insight into the other. |
||||
Title: Re: Suite of test positions Post by jdb on Oct 11th, 2009, 4:18pm It takes clueless 5 seconds to find this move, at depth 9. It stays with this move up to at least depth 18, in ten minutes. |
||||
Title: Re: Suite of test positions Post by Fritzlein on Oct 12th, 2009, 4:35am Wow. |
||||
Title: Re: Suite of test positions Post by aaaa on Oct 12th, 2009, 6:05am on 10/11/09 at 14:43:16, Fritzlein wrote:
Just barely, at 15 steps deep. Maybe that's because the null-move pruning is applied conservatively, allowing a full four-step move for the beneficiary of the null move to refute it. |
||||
Title: Re: Suite of test positions Post by tize on Oct 14th, 2009, 1:22pm Hats off to Clueless! Quote:
[pathetic resque attempt] You said, in a game comment, that Bomb didn't stick with the move when a depth 17 search was done. In that case Marwin finds the right move at depth 6. :) [\pathetic resque attempt] It takes about 11 minutes for Marwin to stick with the right move. It's interesting to see that Clueless needed 5 seconds to reach depth 9 while Marwin needs less than a second on this position, but after 10 minutes Clueless has made it through depth 18 while Marwin still struggles with depth 17. |
||||
Title: Re: Suite of test positions Post by Fritzlein on Oct 14th, 2009, 6:12pm on 10/14/09 at 13:22:43, tize wrote:
Good point. Finding the right move doesn't count if one doesn't stick with it. I wonder why Bomb realizes the camel hostage is no good at 16 steps but likes it again at 17. Maybe curiosity will overcome my impatience, and I will let Bomb run overnight on this position. |
||||
Title: Re: Suite of test positions Post by Fritzlein on Jan 18th, 2010, 3:24pm I'm not sure if this link was posted long ago, and by now perhaps developers have each made their own, more extensive goal-finding suite, but for what it's worth Fotland made his suite public here: http://www.smart-games.com/mate12.ZIP |
||||
Title: Re: Suite of test positions Post by doublep on Jan 20th, 2010, 2:28pm I have an around 100 positions goal test suite (positions + correct/incorrect solutions). Most tests were collected with automated tools against errors of Badger. However, all tests are written in a slightly modified GTP, like: setup_board r-dr------r---r----------M-----rRhH-----H---C-----D--rEeRR-----R 620 find_one_move_win black #? [true eh2n-eh3w-eg3w-rf2s-ef3x] Is there any interest in it? |
||||
Title: Re: Suite of test positions Post by BlackKnight on Jan 21st, 2010, 11:00am on 01/20/10 at 14:28:43, doublep wrote:
Yes, definitely! Thank you. |
||||
Title: Re: Suite of test positions Post by doublep on Jan 21st, 2010, 12:43pm Not sure how to share files best. I used Google docs: https://docs.google.com/leaf?id=0B3GzxNlUxcpMMDEzZGExMjAtMDRiYi00YmViLWIwYTYtYWYzNWMwMDQ1Zjdk&hl=en I included a short description. If there is interest, I can try formalizing GTP as used by Badger more and share my regression and error-finding tools. As tools depend on a GTP engine, they would be pointless without specification anyway, so I'm not publishing them at this point. |
||||
Title: Re: Suite of test positions Post by froody on Jan 23rd, 2010, 4:16am Cool. I'm working on some Arimaa bot stuff at the moment. So far all I've done is find goal moves by brute force. Hope to share some stuff soon. Maybe the Arimaa community should talk more about game formats, and try to standardise something that we can all use? |
||||
Title: Re: Suite of test positions Post by Janzert on Jan 25th, 2010, 11:38am There was a bit of discussion back in the fall in the site development forum. That thread can be found here (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=siteIssues;action=display;num=1252216594). Personally for bot development I've pretty much found just the standard board notation (http://arimaa.com/arimaa/learn/notation.html) to be mostly enough and occasionally have used movelists. Janzert |
||||
Title: Re: Suite of test positions Post by doublep on Jan 25th, 2010, 12:21pm on 01/25/10 at 11:38:24, Janzert wrote:
I find standard board notation to be not very useful, because it spans several lines and contains a lot of "extra" information. Sure, it is much more human-readable than what I use with 'setup_board', but on the downside it is difficult to parse and copy around or send, e.g. in IRC or console. In fact, it is so cumbersome to parse that you are basically forced to use one position per file if you want any sanity in command parser. [edited for clarity] |
||||
Title: Re: Suite of test positions Post by froody on Jan 26th, 2010, 5:44am Do you think GTP is the ideal solution? |
||||
Title: Re: Suite of test positions Post by Janzert on Jan 26th, 2010, 12:56pm Yes, when using the full board representation I normally do only have one board per file. Although occasionally I'll have a file with a large number of board and comments in between. For parsing I simply use the next line starting with numerals to signal the start of the next position. A dirty thing to do, but it works fine for what I need. My primary goal here though is generally ease of reading and editing, so a single line format is not something I want to deal with. For communicating a position with an engine, in AEI I exclusively use the single line representation Omar uses in his scripts. This basically boils down to an opening [ followed by a space or piece letter for each square in order of column a through h and rank 8 through 1, closing with a ]. Which looks to be similar to what you are using except you leave out the brackets, replace spaces with dashes and I'm unsure what ordering you are using. Janzert |
||||
Title: Re: Suite of test positions Post by doublep on Jan 27th, 2010, 3:03pm on 01/26/10 at 05:44:03, froody wrote:
Certainly nothing is ideal. I chose GTP because it was well-defined, I liked it and knew well (I worked for some time on GNU Go several years back). Basically, I wanted a simple human-readable language that I could issue commands to the bot in. E.g. run it from command line and tell to do something or just write down commands to a file and tell it read the file and execute commands from it. That's the way I perform regression testing and debug problems noticed in real games. I have several commands ranging from full-blown 'genmove' (standard GTP for 'produce a move in given position') to more specific like 'capture_anything', 'evaluate' or 'read_branch'. |
||||
Title: Re: Suite of test positions Post by doublep on Jan 27th, 2010, 3:13pm on 01/26/10 at 12:56:33, Janzert wrote:
Yes, that sounds pretty much the same. As I understood the order is actually the same. I use hyphens a lot (e.g. I write moves as Ra1n-Ra2n, not Ra1n Ra2n) because that makes a move a single string from GTP perspective. Then I can unambigously write several moves or other things on one line when needed. The same holds for condensed position representation. Sure, in the latter case you could count number of characters or use delimiters like those [..], but for GTP consistency I chose representation with which it is enough to chop a line into strings (separated with spaces) and then process each string individually. From a more practical usability point of view, many consecutive spaces are hard to count and generally can be messed with when sending over some communication channels. E.g. just embedding in HTML without a <pre>, or any text with word wrapping will corrupt position string. |
||||
Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1! YaBB © 2000-2003. All Rights Reserved. |