Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Bot Development >> Changing your mind and managing your time
(Message started by: Fritzlein on Jan 12th, 2006, 12:09pm)

Title: Changing your mind and managing your time
Post by Fritzlein on Jan 12th, 2006, 12:09pm
In the comments of game 23257, 99of9 posted the evaluation according to Gnobot of the position before move 20b:

Quote:
2 0.3 -5046 e6-f6 e7-f7
3 0.4 -4992 e6-e5 f5-f4 e5-e6
4 0.4 -4976 e6-e5 f5-f4 e5-e6 g4-g3
5 0.5 -5021 e6-e5 f5-f4 g4-g3 f4-g4 h3-h2
6 0.6 -5083 e6-e5 d6-e6 f5-f6 f6-f7 g4-g3 g5-g4
7 1.0 -5243 e6-e5 f5-f4 e5-e6 g4-g3 c4-d4 b4-c4 c1-d1
8 38.9 -8028 e6-f6 d6-e6 f5-f4 f6-f7 h3-g3 g3-f3 f3-g3 f4-f3
9 41.9 -8034 e6-f6 d6-e6 f5-f4 f6-f7 h3-g3 g3-f3 f3-g3 f4-f3 b6-b7
10 45.9 -8028 e6-f6 d6-e6 f5-f4 f6-f7 h3-g3 g3-f3 f3-g3 f4-f3 d8-e8 e8-d8
11 77.1 -8016 e6-f6 d6-e6 f5-f4 f6-f7 h3-g3 g3-f3 f3-g3 f4-f3 e6-e5 e5-e4 e4-f4
12 263.4 -7931 e6-f6 e7-f7 f5-f4 f6-g6 h3-g3 g3-f3 f3-g3 f4-f3 f7-g7 g6-h6 h6-g6 h5-h6

At the moment, 8 ply takes 38.9 seconds on my laptop. Note that 7 ply only takes 1.0 second.

This is because it has to totally change plan when it sees your goal threat, and all the move ordering help provided by searching 7 ply is totally useless.

I have no experience coding a bot, but I'm interested in the effects of a bot changing its mind.  My understanding is that good move ordering is critical to the speed of alpha-beta searching, so most programs use iterative deepening.  Each level of search costs some time to complete, but saves even more time on the next level of search by providing good move suggestions.

In this case, is Gnobot's 7-step search literally "totally useless", i.e. the 8-step search would have taken exactly as long to complete from scratch?  Might the 7-step search have been somewhat useful?  Is it possible that it was actively detrimental, i.e. it provided move suggestions which were worse than random moves, and thus slowed down the 8-step search?

Since there is a huge time loss associated with a bot changing its mind at a new level of search depth, I would think that bot developers would want to prevent their bots from switching plans, for example by accepting a more time-consuming static evaluation that somehow anticipates what is to come.  Yet the article at
http://supertech.lcs.mit.edu/~heinz/dt/node50.html (already referenced in a couple of other threads) suggests that the increase in playing strength by searching an extra step is directly proportional to the probability of the bot changing its mind at that step.  As a consequence, the more likely a bot is to change its mind at a certain depth, the more important it is to search that depth.

A practical application might be this: If Gnobot changes its mind on step 8 (from what it thought on step 7) 50% of the time, but changes its mind on step 9 (from what it thought on step 8) only 10% of the time, then deepening to 9 steps provides only 1/5 of the stength boost that deepening to 8 steps did.  Therefore (perhaps) for time management purposes Gnobot should always complete the 8-step search unless it is out of time, and always forgo the 9-step search unless it has a full reserve, so that it can have enough reserve time for 8-step searches later.

In general, different bots might have different good breakpoints of search depth.  For example, if I remember correctly, bot_haizhi implemented 3-step static goal and trap evaluation, so completing a 5-step or a 9-step search might be an important breakpoint for bot_haizhi's time management.

A second question I have is whether iterative deepening in Arimaa should be applied only at the breakpoints where it is likely for a bot to change its mind.  If there is a fair probability that Gnobot's 7-step search will indeed be totally useless in searching to the eighth step, might it pay to instead do iterative deepening hitting only depths of 4n, i.e. do a 4-step search, then an 8-step search, then a 12-step search?  If that turns out to be too radical, then one could at least consider that two steps are necessary for any capture, and therefore do iterative deepening only at depths of 2n.

I apologize if these thoughts are way off base.  If they have any merit at all, though, I'm curious what real developers think about them.

Title: Re: Changing your mind and managing your time
Post by Ryan_Cable on Jan 12th, 2006, 10:47pm
Interesting, I always wondered why the fixed ply bots would have order of magnitude differences in calculation time for positions that looked qualitatively similar.

It seems to me that the whole point of searching deeper is to find a better move, so I am not sure how you could improve a bot by reducing its likelihood to switch lines.  The only thing I can think of is if the bot is repeatedly oscillating between lines, but that seems unlikely to cause nearly as large a delay as searching a completely new line.

Has anyone done any empirical research into how useful the non 4n evaluations are?  If the 4n+1 search causes a change, how likely is it to actually be a better move (I guess this would require a human opinion of the moves).  If the 4n+1 search causes a change, how likely is it that the search will keep the new move at 4n+4; how likely that it will go back to the old move?  Could there be any value in creating different but related evaluations for each of the step levels, or would this ruin the move orderings produced by the evaluations?

PS  If Gnobby saw the goal threat why didn’t it block it?  Did it just run out of time and move randomly?  Does Gnobby have any static goal evaluation?  Would it be of any value to run static goal evaluations at the very beginning of the move and see that Fritzlein has a goal threat and Gnobby doesn’t?  Then you could have some sort of mode that prioritizes goal defense moves in the search.

PPS  Is everyone just using standard Alpha-Beta with heuristics and iterative deepening, or does anyone use something more advanced like http://en.wikipedia.org/wiki/MTD-f ?

Title: Re: Changing your mind and managing your time
Post by Fritzlein on Jan 13th, 2006, 12:10pm

on 01/12/06 at 22:47:57, Ryan_Cable wrote:
PS  If Gnobby saw the goal threat why didn’t it block it?  Did it just run out of time and move randomly?  Does Gnobby have any static goal evaluation?


If I remember correctly, Gnobot ran its time down to 15 seconds, i.e. it did run out of time and give the best move it had discovered to that point.  I'm guessing that, given the server it was running on, and the load at the time, the 95 seconds weren't enough to complete an 8-step search or else it would certainly have blocked the goal.  I'm also guessing that Gnobot has no static goal evaluation whatsoever, not even 1-step, or else it would have seen the problem in the 7-step search.  

Title: Re: Changing your mind and managing your time
Post by ntroncos on Jan 24th, 2006, 9:13am

on 01/12/06 at 22:47:57, Ryan_Cable wrote:
PPS  Is everyone just using standard Alpha-Beta with heuristics and iterative deepening, or does anyone use something more advanced like http://en.wikipedia.org/wiki/MTD-f ?


Last year i tried to attend that issue in weiser.  We used an incomplete search function based on the hipotesis that much of the search is useless. Until know we have no out preformed the complete search bots (alpha-beta search).

In each step we putted a fixed search apart from the incomplete search.

weiser thinking:
alpha-beta search 2ply depth (very fast), if he wins or looses he move ocordingly. if niether he does the incomplete search.
Then he makes theincomplete search for aslong as he has time.

The fixed search helped so that he didnt loose stupidly, but the use of an incomplete search funtion exposes the posibility that he never sees that he will loose in the next ply. (or that in some bizzar way its good to through his elephant in a trap ;D)

The conclution of our develpoment were that even though alpha-beta search does aparent unusefull search it does fill in the evaluated move table. I think that a combined alpha-beta search for the first plys and the an incomplete search to go higher in depth would be better. Only incomplete search (at our present state of develpoment) never could out-rank a complete alpha-beta bot.

Title: Re: Changing your mind and managing your time
Post by Ryan_Cable on Jan 24th, 2006, 7:03pm

on 01/24/06 at 09:13:40, ntroncos wrote:
weiser thinking:
alpha-beta search 2ply depth (very fast), if he wins or looses he move ocordingly. if niether he does the incomplete search.
Then he makes theincomplete search for aslong as he has time.

This strikes me as a very good idea if one is going to do substantial pruning.  The one downside it has is that it creates an additional horizon effect.

Deep Blue did the first few ply of the search in software on a general purpose computer and then did the rest in specially designed hardware that had a less sophisticated evaluation function.  At one point in its development, the hardware didn’t include castling, which meant that if castling appeared particularly useful to its opponent, it would try to play a line that pushed the castling move beyond the software horizon, expecting it to be impossible to do later.  Of course, you have an advantage in that you are specifically pruning moves that are likely to be bad, whereas castling was left out because it was hard to do in hardware.

PS I hope we will be seeing more of you and weiser around in the future.

Title: Re: Changing your mind and managing your time
Post by fotland on Jan 24th, 2006, 11:56pm
Bomb does depth first alpha-beta with iterative deepening (1 step at a time) and transposition table (4 million entries, w victims).  It has 6 step static trap evaluation and approximate 16 step static goal evaluation (3 steps exact, otherwise lower bound).  It has a lot of selective deepening, and a quiescence search, so the deepest lines are often 4 to 8 steps deeper than the full width search.  It uses null move pruning, and prunes redundant passes and 2 step reverse moves.  It uses 2 killer moves and the history heuristic for move ordering.  It typically searches 12 steps full width, and 15 to 18 steps in the PV including extensions and quiescence (plus typically 3 steps in the static evaluation).  It usually find forced goals 16 to 20 steps away.  The evaluation and move generation are very fast, using bitboards.

Title: Re: Changing your mind and managing your time
Post by 99of9 on Jan 5th, 2009, 5:21pm

on 01/12/06 at 12:09:34, Fritzlein wrote:
A practical application might be this: If Gnobot changes its mind on step 8 (from what it thought on step 7) 50% of the time, but changes its mind on step 9 (from what it thought on step 8) only 10% of the time, then deepening to 9 steps provides only 1/5 of the stength boost that deepening to 8 steps did.  Therefore (perhaps) for time management purposes Gnobot should always complete the 8-step search unless it is out of time, and always forgo the 9-step search unless it has a full reserve, so that it can have enough reserve time for 8-step searches later.

There is nothing new under the sun!  I just ran across this old thread and am amazed that I did not contribute to it, perhaps I didn't even see it.  Fritz is of course right, and I subsequently implemented this exact time control feature (I'm not sure if I came to the conclusion independently or failed to acknowledge this helpful thread).

In fact, a few weeks ago, I was discussing my time control with Fritz in the chatroom, and had to convince him of the value of this very idea!


Quote:
I apologize if these thoughts are way off base.  If they have any merit at all, though, I'm curious what real developers think about them.


Now you know ;-).

However it is now starting to get a bit redundant, the 4n stuff wears off once you get static trap extensions in.  There is no typical search profile anymore, but here's an example of one of my test positions:


r - r - - r - r
- r r - c h d r
E c r h H R e M
m R - H C R R R
R - - - - - D C
- R - - - - - -
- - R - - - - -
- - - - - - - -
initial key 8809538553158063007

Depth   Elapsed      Knps    Eval  Main Line
-----  --------  --------  ------  ------------------------------------------
   2       0.3       1.1   -7772  d5-c5 c5-c4 c6-c5
   3       0.3      10.8   -6594  e7-d7 e6-e7 d7-d8 e7-d7
   4       0.3      58.4   -6594  e7-d7 e6-e7 d7-d8 e7-d7
   5       0.3     108.3  -10078  d5-c5 c5-c4 c6-c5 g4-g3 g7-g8
   6       0.4     162.8  -12041  d5-c5 c5-c4 c6-c5 g4-g3 g7-g8 a8-a7
   7       0.5     233.5  -12552  d5-c5 c5-d5 c6-c5 g4-g3 g7-g8 b7-a7 c7-b7
   8       0.7     318.6  -13116  d5-c5 c5-c4 c6-c5 g4-g3 g7-g8 a8-a7 c7-c6 c5-d5
   9       1.1     374.7   -9429  d5-c5 c5-c4 c6-c5 g4-g3 g5-g4 g6-g5 g5-g6 a8-a7 f5-g5
  10       1.6     419.0   -8160  d5-c5 c5-c4 c6-c5 g4-g3 a8-a7 d6-c6 c5-d5 c6-d6 b6-c6 a6-b6
  11       5.7     414.7   -7641  d5-c5 c5-c4 c6-c5 h4-h3 a8-a7 d6-c6 c5-d5 c6-d6 b6-c6 a6-b6 d5-d4 e5-d5
  12      21.0     563.3   -5458  b3-b4 b5-c5 a5-b5 a6-a5 b4-b3 b5-b4 b6-b5 b7-b6 a5-a6 b5-a5 b6-b5 a6-b6
  13     129.7     708.3  -10160  b3-a3 a4-b4 b4-c4 c4-d4 b5-b4 b6-b5 a5-a4 b5-b6 d5-c5 c5-c4 c6-c5 h4-h3 g7-g8
Total: time 129.7  nodes 91903914  evals 16519192 (18.0%) beta 912581 ( 1.0%) null 21384127 (23.3%)

Applying this move: Rb3w Ra4e Rb4e Rc4e

Title: Re: Changing your mind and managing your time
Post by 99of9 on Jan 5th, 2009, 6:07pm
On the other hand, the 4n problem still does show its head in some positions.  I reran the exact position you posted, and here are my current results:


r r r r - r - r
- - r - r - - -
- h - c c - - -
- - - - - d M R
- H E - - - h -
R R m e - - - H
- C R C D R R -
R - D - - - R -
initial key 4439579325424901324

Depth   Elapsed      Knps    Eval  Main Line
-----  --------  --------  ------  ------------------------------------------
   2       0.3       2.8  -24898  e6-f6 f6-g6
   3       0.3      36.7  -24998  e6-f6 e7-f7 f6-g6
   4       0.4     318.7  -23952  e6-e5 f5-f4 f8-g8 g4-g3
   5       0.5     484.4  -25178  e6-e5 f5-f4 f8-g8 g4-g3 h5-h6
   6       0.7     582.2  -26354  e6-e5 f8-g8 f5-f4 g4-g3 h5-h6 h6-g6
   7       1.1     589.5  -27392  e6-e5 f8-g8 f5-f4 g4-g3 h5-h6 g5-g4 g4-g5 g3-g4
   8       4.2     528.4  -27847  f8-g8 e6-f6 e7-f7 f5-f4 h3-g3 g3-f3 f3-g3 f4-f3
   9      10.4     507.9  -27323  e6-f6 e7-e6 f5-f4 f6-g6 h3-g3 g3-f3 f3-e3 f4-f3 f8-f7
  10      15.2     535.5  -25469  e6-e5 f5-f4 g4-g3 f8-g8 g5-g4 g2-h2 g3-g2 g4-g3 e7-f7 e5-e6
  11      37.4     541.5  -24924  e6-e5 f5-f4 f8-g8 g4-g3 g5-g4 g4-g5 g3-g4 h3-g3 e7-f7 e5-e4 e4-e3
  12     315.2     918.9  -24835  e6-e5 f8-g8 f5-f4 g4-g3 g5-g4 g2-h2 g3-g2 g4-g3 e7-f7 d6-e6 e5-e4 e4-e3
  13    1165.4     974.1  -25447  e6-e5 f8-g8 f5-f4 g4-g3 g5-g4 g2-h2 g3-g2 g4-g3 e7-f7 e5-e4 e4-e3 e3-f3 g3-g4
Total: time 1165.4  nodes 1135288779  evals 203153473 (17.9%) beta 4469561 ( 0.4%) null 245830473 (21.7%)

Applying this move: ce6s rf8e df5s hg4s


The sad thing is that this was done on a dual core pc using the search improvements and the better eval, and now it takes longer to finish 12 ply!!!

Title: Re: Changing your mind and managing your time
Post by Janzert on Jan 5th, 2009, 9:03pm
But presumably it's finding a much(?) better move in that 12 ply, at least that's the hope of course.

Janzert

Title: Re: Changing your mind and managing your time
Post by 99of9 on Jan 5th, 2009, 9:10pm
Yes, I hope so.  It's playing a riskier line, but if it gets away with it, then it will not lose the dog.

Title: Re: Changing your mind and managing your time
Post by jdb on Jan 5th, 2009, 11:07pm
I am not sure I understand the idea being presented here, so this could be missing the point, but here goes.

Clueless will not move until it has completed an 8 step search. (Except if its going to lose on time!) Typically this only takes a couple seconds, but if it takes a few minutes so be it. Its just way too dangerous to move without looking that far ahead.

Stopping the search early, if there is likely not enough time left to complete the next iteration, is questionable in my opinion. Lets say it just finished an 8 ply search and there is only 5 seconds until it is time to move. Might as well just stop searching and save the 5 seconds. But at ply 9, disaster and the move's score drops big time. If a move's score is going to drop huge on the next ply, the search of that move will complete very quickly, typically in under a second. Now extra time will be needed to see if a new move can be found.

So even if there is no hope of completing the 9 ply search, in effect that 5 seconds gives an extra ply of security on the move that is going to be played.

Title: Re: Changing your mind and managing your time
Post by Oystein on Jan 6th, 2009, 12:29am
Just a little observation:


on 01/05/09 at 18:07:35, 99of9 wrote:
Depth   Elapsed      Knps    Eval  Main Line
-----  --------  --------  ------  ------------------------------------------
   7       1.1     589.5  -27392  e6-e5 f8-g8 f5-f4 g4-g3 ...
...
  10      15.2     535.5  -25469  e6-e5 f5-f4 g4-g3 f8-g8 ...
  11      37.4     541.5  -24924  e6-e5 f5-f4 f8-g8 g4-g3 ...
  12     315.2     918.9  -24835  e6-e5 f8-g8 f5-f4 g4-g3 ...

Same first move, but different order of the steps. Normally this would not happen because of the transposition table. But here the position after 4 steps  is apparently overwritten, and also the position after 3 steps for the 12 step search.

Have you tried to change the Zobrist keys (or do you use something else to get hash values) to see the effect of the search time?


Title: Re: Changing your mind and managing your time
Post by Oystein on Jan 6th, 2009, 1:13am

on 01/05/09 at 23:07:35, jdb wrote:
Stopping the search early, if there is likely not enough time left to complete the next iteration, is questionable in my opinion.
I have hardly any experience in bot writing, but I give my opinion anyway.
I think it is best to strive to never do an incomplete iteration.


Quote:
Lets say it just finished an 8 ply search and there is only 5 seconds until it is time to move. Might as well just stop searching and save the 5 seconds. But at ply 9, disaster and the move's score drops big time. If a move's score is going to drop huge on the next ply, the search of that move will complete very quickly, typically in under a second. Now extra time will be needed to see if a new move can be found.
But because there are little time left, it is quite likely that we dont find the best substitute move. It is hard to make good use of an incomplete iteration.


Quote:
So even if there is no hope of completing the 9 ply search, in effect that 5 seconds gives an extra ply of security on the move that is going to be played.
The 5 seconds is better spent later when it is enough time to complete a iteration. Then it is possible to both discover a disaster and to find a good substitute move.

Title: Re: Changing your mind and managing your time
Post by 99of9 on Jan 6th, 2009, 1:20am

on 01/06/09 at 00:29:42, Oystein wrote:
Just a little observation:

Same first move, but different order of the steps. Normally this would not happen because of the transposition table. But here the position after 4 steps  is apparently overwritten, and also the position after 3 steps for the 12 step search.

Have you tried to change the Zobrist keys (or do you use something else to get hash values) to see the effect of the search time?

Thanks for picking that up Oystein, that is unusual.  I wonder if it has anything to do with the parallelization returning irregularly.  I'll check out your Zobrist suggestion too.

Title: Re: Changing your mind and managing your time
Post by 99of9 on Jan 6th, 2009, 1:25am

on 01/06/09 at 01:13:29, Oystein wrote:
It is hard to make good use of an incomplete iteration.

That's not quite true, Jeff is right that even incomplete iterations can be very useful.  Unfortunately Gnobot can only make use of complete iterations, hence the time control.  If it were able to use incomplete ones, then I agree with Jeff it would probably just be better to use the whole time.

Title: Re: Changing your mind and managing your time
Post by Oystein on Jan 6th, 2009, 4:06am

on 01/06/09 at 01:25:02, 99of9 wrote:
That's not quite true, Jeff is right that even incomplete iterations can be very useful.  Unfortunately Gnobot can only make use of complete iterations, hence the time control.  If it were able to use incomplete ones, then I agree with Jeff it would probably just be better to use the whole time.

If you already has spent time on an incomplete search, it is of course best to use that information. What I mean is that it is better to save the time until you have enough to do a complete iteration. I assume that 5 seconds used on a complete iteration is worth more than 5 seconds on an incomplete. If anyone has made a test to show otherwise, I would be pleased to know. Hopefully I would be able to test things like this myself, but currently my engine is not good enough to make any test worth the work.

In an incomplete iteration, we only know what the best move among the finished moves are. If this is the best move from previous iteration, we havent gained anything. If not, it is quite likely we havent got to the best move yet. This is why I think time spent on an incomplete search is worth less. Perhaps a hybrid depth would work: When it becomes clear that the current iteration cant be finished on time, reduce the search depth for the remaining moves.





Title: Re: Changing your mind and managing your time
Post by jdb on Jan 6th, 2009, 10:16am

Assumptions:

1) the first move searched on an iteration is the best move from the previous iteration.

2) there is a nominal time to search per move and a reserve time that is available for use in special situations.

In an incomplete iteration, in addition to knowing what the best move is for the iteration, we also know the score of that move. If the score has dropped a large amount, when compared to the score of the previous iteration, we already have enough information to know we are in big trouble and should use up some reserve time to try and find a better move.

Title: Re: Changing your mind and managing your time
Post by Fritzlein on Jan 6th, 2009, 4:07pm
I just re-read the link from my first post in this thread.  According to Newborn's hypothesis, the increase in playing strength that accrues from searching an additional ply is proportional the probability that the bot will change its mind at that ply.

To apply Newborn's hypothesis to time management, it would be convenient to know (A) how long the next iteration will take to complete, and (B) how likely is the bot to change its mind on the iteration.  Then B/A is the value of doing that iteration.

One application of this is the one I had in mind starting the thread, i.e. that perhaps B is very high for depths 4, 8, 12, etc., so those should be searched if possible.

A second application is jdb's insight that B is very high if searching the best move from the previous iteration produces a lower value.

A third application might be to have some measure of how sharp/volatile the position is, and suppose that B is higher in such positions, thus they should be searched more deeply.

Looking at it now, though, it strikes me that A, the time to search another iteration, seems to increase by a factor of two or three per step of depth.  Another iteration won't be worth it unless B is two or three times higher than normal, and two more iterations won't be worth it unless B is four or nine times higher than normal.

Looking at the bottom half of the ratio, namely the time invested, makes it seem like the best policy is essentially going to be using the same amount of time on every move.

Title: Re: Changing your mind and managing your time
Post by 99of9 on Jan 6th, 2009, 9:03pm

on 01/06/09 at 00:29:42, Oystein wrote:
Just a little observation:

Same first move, but different order of the steps. Normally this would not happen because of the transposition table.

Thanks to your little observation, I have found numerous bugs that I introduced when I originally wrote the move ordering code.  On this position I now get to 12 ply in 173 sec (45% saving), and on my overall test database I get a 26% time saving.  Pleasingly I now have to renew my test database, because it is now running too quickly to give tournament-relevant results!!

Thanks again for spotting it.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.