Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> Bot Development >> Arimaa Zero
        
(Message started by: RightfulChip on Oct 24^th, 2019, 11:50pm)

Title: Arimaa Zero
Post by RightfulChip on Oct 24^th, 2019, 11:50pm

Hey Arimaa fans,

I am excited to announce a bot that I have been working on for Arimaa. It is based on the Alpha Zero approach by deep mind. This means that the bot will not be taught in the traditional sense. No tactics, strategy or game logic is given. Instead, the bot will learn on its own, completely from scratch!

Personally, I can't wait to see how well this concept will hold up against the expert Arimaa players as well as the masterful hand crafted bots like sharp.

I set out a few months ago on a personal project to learn the programming language of Rust, as well as to better understand the concepts behind AZ and the chess engine, Lc0. Working with the comprehensive knowledge of the Lc0 community, I have created a self learning program, that can take any deterministic turn based game and learn how to play it from scratch. So far, it has learned how to master Connect4 as well as shed light on a deep and complex game called Quoridor. The results are quite remarkable.

Once I discovered Arimaa, I knew that it was next. The engine for Arimaa has been created and is learning as we speak. It is concurrently playing thousands of games against itself as it learns to unlock the secrets of Arimaa!

Although it is currently very weak as it is just starting to learn, I will be working on making this engine available as a bot to play against. The bot will progress over time so be sure check back in every couple of days and challenge it as it learns with you. Before long it may be quite the formidable opponent.

I plan on making the repository public once I determine the proper license. In the mean time, I would love to call on the Arimaa community and to collaborate with anyone interested. So please post here or email me for more information.

Title: Re: Arimaa Zero
Post by omar on Oct 30^th, 2019, 8:28am

Cool. Looking forward to seeing your bot online. There are some other Arimaa players working on similar self play learning bots.

Title: Re: Arimaa Zero
Post by RightfulChip on Nov 12^th, 2019, 3:01pm

Hey all,

I am continuing down the road of producing a tabula rasa bot. An attempt to create an AI that will learn on its own, completely from scratch, with no game specific features outside of the rules. This has highlighted some very difficult challenges.

Learning
Getting the bot to learn is a challenge in and of itself. Since we can't directly teach it anything, we must wait for it to discover ideas on its own. For example, moving a rabbit to the other side of the board leads to a win. The good news is that the AI seems to know that getting a rabbit to the other side of the board is good. Unfortunately it still hasn't learned about trapping pieces. The AI has a long way to go in discovering these ideas for itself.

Outputting Moves
One of the other challenges is how to get a neural network's outputs to represent moves. From what I understand, the existing approach for ML Arimaa bots, is to generate all, or a subset of the possible moves, then loop through each one of these moves and have the model provide an evaluation on the post move state. This would be ideal for Arimaa since it limits the number of permutations that need to be calculated. Unfortunately this does not integrate well with the AZ two headed network's value and policy. If someone has an idea about how to do this, I would love the feedback.

Instead, the apporach taken will be to analyze each move as a series of steps. With each step moving a single piece. The downside is that steps are disjointed to moves, as well as the amount of transpositions that steps can create. Let's hope that the networks policy is accurate enough to avoid this. The other issue is that now, instead of 1 action per move, 4 actions are evaluated per move. That's 4x longer game generation and training.

Alternating Turns
Since we are using the step based approach, one subtle novelty that Arimaa introduces is the idea that turns no longer alternate. Instead, a player can take 2-4 actions per turn. This requires a small modification to the search tree. Instead of inverting value at each depth, value is stored for each player, and the value used is the value from the perspective of the player to move.

Setup Phase
One unique thing which isn't done w/ Go, Shogi, or traditional Chess is that the pieces are placed as determined by the player. With the games listed, the placement is static for each game. This presents a unique challenge which isn't addressed w/ AZ.

Natural Progression
Another factor that challenges learning, is the fact that the natural progression of Arimaa takes some level of knowledge. For example, in a game like Connect4, you can drop pieces randomly. But every drop advances the game up until a maximum of 42 moves. With chess, random play leads to an exchanging of pieces.

For Arimaa, the AI so far is really good at blocking the end goal spaces, but really bad at trapping pieces or clearing a path. This leads to the generated games being very long and stale with no sense of progression. This is currently the biggest hurdle that the AI will have to overcome, if it ever can...

Game Length
For the reasons stated above, the self play generated games are upwards of 4k+ actions! This is a very large difference from the games of go, chess and shogi.

Zero
The Zero part in AlphaZero is the most important and fascinating. It states that the AI is

...trained solely by reinforcement learning from
games of self-play... without any additional domain
knowledge except the rules of the game, demonstrating that a general-purpose reinforcement
learning algorithm can achieve, tabula rasa, superhuman performance across many challenging
domains.

This is the challenge in reproducing AZ. We could bootstrap the AI with existing games to solve most, if not all of the problems stated above, however it would be a direct violation of the zero principal.

Bot
I added a bot called bot_rusty_zero_alpha. He is using the latest net (It's terribly bad). Feel free to look out for him and send him a challenge.

Title: Re: Arimaa Zero
Post by odin73 on Nov 15^th, 2019, 11:55am

Hi. Nice bot! I played it a few times.

Since I don´t know very much about this network and AI stuff, my simple question is: Can it learn from the opponent?

I played it always in a similar manner, doing some EH or ED or DH attack and sent a rabbit goaling after the trap wing was depleted. So it could learn both, first that losing a wing/trap is bad but also that placing the own pieces around the trap is winning.

Title: Re: Arimaa Zero
Post by deep_blue on Nov 16^th, 2019, 8:02am

on 11/15/19 at 11:55:01, odin73 wrote:

Since I don´t know very much about this network and AI stuff, my simple question is: Can it learn from the opponent?

It does not. The idea of this zero approach is to not learn from anyone else but just by yourself so to speak. That is a lot slower since it starts out with basically random play but the idea is that in the long run it is not held back by anything bad and/or biased in learned from other players. (as well as of course not needing other players; you need many thousand games to train, no human could play that many against it ;) )