Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> Bot Development >> Sharp
        
(Message started by: Swynndla on Apr 18^th, 2015, 7:12pm)

Title: Sharp
Post by Swynndla on Apr 18^th, 2015, 7:12pm

Hey lightvector ... in the chatroom you were describing some of the things that made Sharp so strong this year. What you said was fascinating. You said you'd answer more questions tomorrow.

In the chat you said:

Code:

by the way, if anyone is interested about the specific breakdown of the +400, here's the rough breakdown: (careful changes to the qsearch +40, improved win-in-2-patterns +20 (2014 already had some of these, but 2015's are a lot better) relevant-tactics-movegen +80, speedup and optimization +80, piece alignment eval +80, imbalance eval +30, changes to hostage eval +30 anti-swindle eval +20 measurement noise and other improvements +20)

I'm wondering if you can describe some of these things further, eg:
What is "piece alignment eval", and what is "imbalance eval"?

Also how did you change hostage eval ... to be worth more or less etc?

BTW - 400+ rating points in self-play is amazing!

Some other interesting things you said in the chatroom:

Code:

lightvector Some things: special move generation for "relevant tactics" - handcrafted code to rapidly generate about 95% of "good enough moves" while only generating abou 2% of the legal moves

Code:

lightvector another thing: goal-in-2 patterns - these are used to extend the search

lightvector 3 ply (what people usually mean when they say forced win in 2)

lightvector the goal-in-2 patterns are *not* directly trusted, but are verified with extensions, so it's okay for the patterns to have false positives and such, so it's easy to cover wide swaths of patterns by being a little too inclusive

Code:

lightvector the single most important term to add was piece alignment (mostly absent in 2014)

Code:

lightvector also, I dealt with the "reckless play while ahead" detail by making it so that the opponent's goal threats count for more in the eval when you're ahead

Title: Re: Sharp
Post by lightvector on Apr 19^th, 2015, 6:20pm

Thanks for the questions!

* Piece alignment eval is the part of the eval that adds bonuses for your camels being near the opponent's horses, your horses being near their dogs, etc. Sharp2014 had a fairly poor version of this - it mostly only evaluated the "threat" to a piece independently of the strength of the piece doing the threatening.

* Imbalance eval is the part of the eval that penalizes your pieces, particularly the strong ones, from all being together. It's completely new in 2015. Its absence became blatantly obvious after piece alignment eval was added first, because the alignment eval would make sharp try to switch its strong pieces side to side way more frequently, often leaving the army imbalanced.

* Hostage eval was changed quite significantly - it's less fragile now (Eb3 ma2 is still considered a hostage, for example, whereas it's not in sharp2014), and it now takes into account what pieces are buried, double hostages, etc. This year's games also suggest some easy tweaks worth testing to improve it further (slightly lower M hostage value, increase the coefficient on the terms for penalty/effect of buried pieces and on swarm advancement status).

* For goal in 2 patterns, this is not new to 2015. But it's improved. SharpP1 is now capable of finding some goals in 3 due to these patterns and extensions! And almost all goals in 3 in practice are found in seconds. I think it should be feasible to do the same for common goal in 3 patterns, which would be used for controlling extensions in the search.

For comparison, loss in X patterns I have found to be relatively valueless. Loss in 1 can be done statically sometimes, but an unstoppable goal threat is always extremely rapid to verify as unstoppable by search with goal-relevance-pruning. Loss in 2 would be impossible to make accurate enough to trust. Whereas win in X patterns don't need to be perfect to be useful, you just have them trigger an extension. Even in the cases where they're wrong, they still might be a good move, and are worth extensions anyways because they might lead to other gains afterwards.

* Anti-swindle eval is done by converting the eval through a sigmoid function into a probability-of-win P. Additionally, the part of the eval that evaluates scary advanced rabbits for each player are mapped from normal eval value -> odds of "short-term goal" for each side, G and S. I then compute the probability of win overall (in this case for Gold) as: (G*1+1*P+S*0) / (G+1+S). That is, you have odds G of winning right away, odds S of losing right away, and odds 1 of the game lasting a long time in which case your normal eval (e.g. material advantage) gives you a probability P of winning. I then apply an inverse-sigmoid to convert it back to the normal form of the eval in millirabbits.

The result is that if P is close to 1 (such as you are massively up on material), you care a lot about minimizing S, because the dominant contribution is the odds of losing right away. You care less about maximizing P further, since P is already very close to 1. And on the flip side, if P is very small (such as you are down on material), you care about maximizing G. You care less about increasing P, because a few more material captures still leave P close to 0.

What's cool is that this also behaves reasonably in goal races once search is applied. In the event that making goal threats of your own is the best way to stop the opponent from having time to make progress, a bot with this sort of eval and an accurate search will still pursue goal attacks when ahead.