Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> Bot Development >> Sharp is Parallel
        
(Message started by: lightvector on Jun 15^th, 2011, 12:11am)

Title: Sharp is Parallel
Post by lightvector on Jun 15^th, 2011, 12:11am

Sharp is parallel now!

I've spend some time on-and-off last couple of weeks fixing bugs, speeding up the search, and rewriting everything to support parallelization.

I'm reasonably sure the current version is free of any major bugs. Despite the large portions of the main search more or less being rewritten and all the data structures being changed, it precisely reproduces the behavior of the old version if run with 1 thread. Exact matching node counts, hash cuts, beta cuts, evaluations, etc.

Also I fixed several bugs and optimized a little, so that even running with only 1 thread, it's 1.3x faster than the old version.

Using 3 threads currently only gives a 2x speedup versus 1 thread. It's probably some combination of the current synchronization protocol being much too restrictive and the policy for choosing when to parallelize being much too simple and causing wasted work during beta cutoffs. But I decided to keep it simple and get it working first. It should be easy to tweak these things now.

After this is done, it will be time to revisit move pruning, and then on to some new experimental ideas for strategic evaluation. After I fix some of the holes in the current eval, I'll see if I can put sharp up to play.

Yay! :)

Title: Re: Sharp is Parallel
Post by Fritzlein on Jun 15^th, 2011, 1:02am

Congrats on your continued progress! It's a bit scary to think that you are in a position to make changes that are guaranteed to result in extra rating points. I will be in the line (probably a long one) waiting to play sharp when you are ready to put it back on line.

Title: Re: Sharp is Parallel
Post by Hippo on Jun 15^th, 2011, 1:44am

Yep, I am more and more scared ... you improve it that fast.

Title: Re: Sharp is Parallel
Post by Swynndla on Jun 15^th, 2011, 3:23am

"some time on-and-off last couple of weeks" and you did all that? - well done!

Fritz - is it too late to take back your $1000 pledge? ;)

Title: Re: Sharp is Parallel
Post by rbarreira on Jun 15^th, 2011, 4:45am

Nice!

How did you choose to do the splitting of work among threads? At the root or inside the tree?

Title: Re: Sharp is Parallel
Post by Nazgand on Jun 15^th, 2011, 8:02am

Maybe you should get one of these (http://arstechnica.com/business/news/2009/12/intel-demos-48-core-cloud-datacenter-on-a-chip.ars). Intel is giving them out free to researchers. It would no doubt make a formidable opponent.

Also, you said a 2x speedup with 3 threads? Is it because you have only 2 cores?

Title: Re: Sharp is Parallel
Post by lightvector on Jun 15^th, 2011, 11:35am

Fritzlein: Thanks!

Swynndla: Well, as a proportion of the time, it wasn't overwhelming large. But I'm home this summer, so I have a lot of free time to begin with.

Nazgand: I have 4 cores, so there's definitely some overhead that needs to be reduced. Also, the current implementation does depend very heavily on shared memory, so I'm not sure what I'd do with the 48 core cluster =).

rbarreira:

It's sort of a DTS (dynamic tree splitting) algorithm. I changed the search from recursive to iterative, so that in terms of the call stack, you're never more than one level deep. This allows the threads freedom to hop around anywhere they like in the search tree, because they're not restricted by their stack to return up the way they came.

Whenever a thread wants, it can "publicize" a node by adding it to a global list, and then any threads that are looking for work can join in. When a thread runs out of work, it can consult the global list to get a new node to work on, which could be in an entirely different spot in the search tree.

Probably the gap between nominal computing power and actual speedup, 1.6x with 2 threads, 2x with 3 threads, comes from one or both of the following (not sure which yet):
1. Most of the tree operations are synchronized under under a single global lock. I had it this way for simplicity to get it working first. I'll be changing this now to see if I get an improvement.
2. The choice of when to parallelize isn't optimal. Right now, I allow splitting whenever the Young-Brothers-Wait criterion is satisfied. But with no regards to depth (splitting at deeper nodes gets less work done per amount of overhead for splitting itself), or the likelihood of things like beta cutoffs. Even after the first move is searched, all else equal, you still want to prefer splitting at likely alpha nodes, rather than PV nodes (where the alpha bound might improve, causing your earlier work to be less efficient), or beta nodes (where the other thread might find a cutoff and cause your work to be entirely wasted). But I have complete freedom to change the splitting policy to whatever I want now, so this should be easy to improve.