Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Bot Development >> Usage of 2, 4, 8 cores?
(Message started by: arimaa_master on Mar 12th, 2008, 6:19am)

Title: Usage of 2, 4, 8 cores?
Post by arimaa_master on Mar 12th, 2008, 6:19am
I wonder If the bot_bomb (or others) is capable of using more than one core? I am asking coz I know that it is known issue in computer chess where engines usually come out in two different versions.

The question is in place especially due to running computer championship on dual core (intel core 2 duo).

I suspect that without optimalization for 2+ cores usage every bot is using only one core (or two cores at 50 percent etc.)
Thus it seems useless to have dual core (or quad next year) for this tournament coz it will not gain any more from it.

But I understand that the usefullness is high afterwards (coz after the replacement of main server - it will for sure make full use of all cores).


Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Mar 12th, 2008, 11:06am
As far as I know none of the current bots use any form of parallel search.

Janzert

Title: Re: Usage of 2, 4, 8 cores?
Post by arimaa_master on Mar 12th, 2008, 3:06pm
Thx for the Answer, that Is exactly what I thought.

So parallel search would help very much (if it gains 1.7x speedup, so
it will be 2.89x faster at quad).

I guess the same situation is in 32 bit versus 64 bit.

Isn´t time to use some sort of bitbases to significantly improve bot's play?




Title: Re: Usage of 2, 4, 8 cores?
Post by nbarriga on Mar 12th, 2008, 3:43pm

on 03/12/08 at 15:06:37, arimaa_master wrote:
Thx for the Answer, that Is exactly what I thought.

So parallel search would help very much (if it gains 1.7x speedup, so
it will be 2.89x faster at quad).

Due to the large branching factor I don't think the added complexity of a multithreaded bot is worth it. It would be the same as having a longer thinking time, and some postal games by bots have shown(If I remember correctly) that the extra time doesn't help very much.

Maybe an obsenely high number of processor(cluster?) could make a difference.


on 03/12/08 at 15:06:37, arimaa_master wrote:
I guess the same situation is in 32 bit versus 64 bit.

I tested this a while ago, and yes, it is a totally free spees boost. The results and discussions are in: http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=devTalk;action=display;num=1129496970;start=15

Title: Re: Usage of 2, 4, 8 cores?
Post by arimaa_master on Mar 13th, 2008, 5:33am

on 03/12/08 at 15:43:54, nbarriga wrote:
Due to the large branching factor I don't think the added complexity of a multithreaded bot is worth it. It would be the same as having a longer thinking time, and some postal games by bots have shown(If I remember correctly) that the extra time doesn't help very much.

Maybe an obsenely high number of processor(cluster?) could make a difference.

I tested this a while ago, and yes, it is a totally free spees boost. The results and discussions are in: http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=devTalk;action=display;num=1129496970;start=15



Yeah it will be useful then to put paralell bot in some sort of supercomputer and see how much improvement in real play it will show.


And thx for that link to 32bit x 64bit. I didn´t know about it (mainly because this thread started before I joined to the arimaa community :)).

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Sep 21st, 2008, 9:36pm
I don't know why I didn't notice this thread when it was active, but I asked the same question of Fritz a few weeks ago.  He was surprised that Gnobot does not utilize multiple cores, and was pretty certain that bomb does.  Jeff also mentioned that clueless does.

The reason I raised the topic was that like arimaa_master, I think it is going to be increasingly important to the bot championship (and eventually the challenge).

One afternoon I tried to parallelize Gnobot.  Unfortunately I have much to learn, because while it utilized both cores, it actually slowed down.  I was going parallel at the leaf nodes (in the evaluation function), but it seems I need to do it higher up the tree.

Does anyone know an easily implemented parallel search algorithm?  Even better, would anyone care to parallelize Don Dailey's sample bot?

Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Sep 21st, 2008, 9:52pm
This is something I'm wanting to get into opfor as well. Last week I took a quick look around the internet for information. While I didn't come up with anything really great the chess programming wiki (http://chessprogramming.wikispaces.com/Parallel+Search) seems to have a fairly good overview of current common techniques at least.

Janzert

Title: Re: Usage of 2, 4, 8 cores?
Post by Fritzlein on Sep 22nd, 2008, 8:20am
Thanks for the link, Janzert.  I'm wondering whether sharing a hash table, which can be a huge issue is some applications, is a non-issue here.

Now that I think about it, I don't know how folks deal with running out of hash table space in the first place.  Obviously the number of nodes searched is going to exceed the number of hash table slots available at some point.  My naive thought was that, since we can't save everything anyway, it doesn't make sense to probe or chain: when there is a collision one should simply overwrite the older information.  Is that what people do?

Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Sep 22nd, 2008, 10:32am
Yes, the number of table entries is almost always quite a bit smaller than the number of nodes actually searched. In order to verify that the information in an entry actually corresponds to the given position being looked up a 64 bit hash is usually used. When storing a position that collides with another position's entry there are a number of different strategies to decide which entry to keep. The most common strategies are always replace, replace if the new entry has been searched to a greater depth or some combination of the two (two separate tables one using each of the above is generally considered one of the best methods).

As far as the hash table use in parallel search in particular in an SMP shared memory computing model (as opposed to say a cluster where the memory access between nodes is orders of magnitude slower then the memory on the node) typically one shared main hash table is used. How access to the table is shared though varies.

The slowest but safest approach is to have a processor lock an entry before reading or writing to it. This prevents one processor from reading data that is in the midst of being written by another processor. But it can be very slow to get a lock even when not needing to wait for another processor to release it.

Another approach is to use "lockless hashing (http://www.cis.uab.edu/hyatt/hashing.html)" developed by Robert Hyatt. This takes advantage of certain memory writes and reads being atomic operations in order to catch if the data read is corrupt.

The approach that seems to be gaining the most favor currently though, is to simply almost ignore the problem. Basically just write and read the entries as if there was no parallel access occurring. The reason for the "almost" qualifier though is that before actually using certain parts of the entry it has to be verified that it is actually valid for the current position. Normally this just consists of checking that the stored bestmove is in fact a legal move. The primary downside is that occasionally an incorrect score will be returned from a hash hit, this turns out to not be a problem in practice though. Hyatt and Cozzie did a study (http://www.cis.uab.edu/hyatt/collisions.html) showing that getting incorrect data even as much as 1 in every 10000 hash probes is unlikely to cause a problem.

Janzert

Title: Re: Usage of 2, 4, 8 cores?
Post by Fritzlein on Sep 22nd, 2008, 7:41pm
Thanks for the clear explanation, Janzert.  You are generous and patient.

It finally sunk in with me that there are two types of collisions, the rare collisions when two different positions have the same key, and the common collisions when two different keys want to occupy the same space in memory.  The common one is no problem, if you can detect a corrupt read, or do hashing with no locks.  (Actually, I had already heard of non-blocking hash tables (http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html), although I didn't quite get the point.)

It's clever to detect the collision of identical keys via a legal move check, but for the ones that get through, I'm surprised that using the wrong information isn't more catastrophic.

Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Sep 22nd, 2008, 11:23pm
Hmm, actually I wasn't completely clear. There are really 3 types of collisions in what I mentioned.

First up is the situation where two different positions hash to the same 64 bit value. This type of collision is of course very rare, barring bugs or improper hash method. In fact they are rare enough that trying to deal with or detect them is usually simply ignored. The next collision type is when two different positions want to occupy the same slot (or entry) in the hash table. Since the space available in the hash table is usually smaller than the number of positions examined in a search these collisions are a common occurance and one of the methods I mentioned in the first paragraph of my last post are used to deal with them. These first two collision types are inherent in normal hash table operation (whether used in parallel or not).

The third collision type doesn't really have anything to do with hash tables in particular and everything to do with parallel use of the data. Since in general there is no guarantee that another processor won't overwrite the data you are trying to read (or write) while you are in the middle of doing it you have to either prevent concurrent access (locks), detect it (lockless hashing) or be able to deal with any corrupt data you get.

And yes it is pretty amazing how resilient alpha-beta search is when simply given the corrupt data. The results given in the Hyatt, Cozzie paper are certainly something I would never have expected.

Janzert

p.s. I should qualify that all of my comments here are meant to apply to hash tables as used by in game search programs not the general computer science hash table (or hash map) (http://en.wikipedia.org/wiki/Hash_table) data structure. While they have much the same basic operation there are significant differences. Not the least of which is they are generally meant to expand and hold whatever data is put in them, not 'arbitrarily' decide which pieces to throw away that won't fit. ;)

Title: Re: Usage of 2, 4, 8 cores?
Post by Fritzlein on Sep 23rd, 2008, 7:05am

on 09/22/08 at 23:23:43, Janzert wrote:
The third collision type doesn't really have anything to do with hash tables in particular and everything to do with parallel use of the data. Since in general there is no guarantee that another processor won't overwrite the data you are trying to read (or write) while you are in the middle of doing it you have to either prevent concurrent access (locks), detect it (lockless hashing) or be able to deal with any corrupt data you get.

Maybe I am not understanding, but this third type of collision seems like a subset of two different keys wanting the same space in memory.  If instead a read thread and a write thread have the same key, the write thread is presumably writing the same data that the read thread is trying to read, so it doesn't matter whether the reader or writer goes first.

Now if the keys are different, and the reading thread has "verified" that the memory space has the correct key, but reads a wrong value for that key because a different thread has over-written the value belonging to a different key, this behaves like a key collision, right?  The two different values didn't actually have the same key, but to the reader it appears they did.  So dealing with it in the same way as a key collision makes sense, no?

Perhaps I don't get what is happening at a low level, and what corrupt data means in this context.

Thanks again for explaining.

Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Sep 23rd, 2008, 3:49pm
Hmm, I hadn't thought of it in that way before but you are correct the effects of key collisions and access collisions are very similar. Thinking about it for a bit I think the results from access collisions are a superset of the results from key collisions. The difference being that the result from a key collision will always be a valid entry just data for the wrong position. Results from an access collision can be essentially any random combination of words from two or more separate position entries (although the most likely occurrence is for the first part to be one entry and the last another with the split at some random word). So yes looking at the effects of both are going to be pretty much the same in any one instance. The differences come when looking at what situations they occur in and how to prevent them from happening.

Key collisions can't actually be prevented but they can be made to occur as infrequently as desired (at the cost of larger hash key size). Access collisions can of course be prevented (locks) or detected (lockless hashing) if desired.

A key collision will occur between two essentially random positions but will always occur for those two positions. An access collision will occur at basically random times between random positions. It's because of this that I think key collisions would cause a larger problem in a search than access collisions (i.e. consistently getting the wrong answer for one specific position out of a thousand is probably worse than getting the wrong answer randomly one time out of a thousand).

Janzert

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Sep 23rd, 2008, 4:10pm

on 09/23/08 at 07:05:08, Fritzlein wrote:
Maybe I am not understanding, but this third type of collision seems like a subset of two different keys wanting the same space in memory.  If instead a read thread and a write thread have the same key, the write thread is presumably writing the same data that the read thread is trying to read, so it doesn't matter whether the reader or writer goes first.

I don't really understand Janzert's answer, so I'll have a quick shot at this comment.  When the key is the same, it's not necessarily writing the same data that the read thread is trying to read.  For example, you may have searched the same position to a different depth, and want to record the new deeper evaluation.

Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Sep 23rd, 2008, 5:12pm
Oops thanks 99of9, I forgot to explain that part explicitly. Maybe expanding your example out step by step will make the it clearer what exactly can happen.

Say we have a program running with multiple threads and a shared hash table with no locking. Amazingly the computer it's running on uses regular english alpha-numerics as its internal storage and has a word size of a single letter. :) The hash table currently has an entry for position A that looks like this:


Code:
Hash key |depth|score|best step
positionA|00007|+0120|Ca5n


Thread 1 is about to search position A to a depth of 8 and wants to check the hash table to make sure this hasn't already been done or barring that if there was any previous search to a lesser depth what the best step to try first is.

Thread 2 has just finished a search on position A to a depth of 8 and is ready to write the results back to the hash table.

As luck would have it Thread 2 starts writing first, the entry it is writing looks like this:


Code:
Hash key |depth|score|best step
positionA|00008|-0010|Ca5s


Unfortunately after having written 2/3rds of the entry Thread 1 reads the entire entry. The entry Thread 1 sees is this:


Code:
Hash key |depth|score|best step
positionA|00008|-0120|Ca5n


Oops, Thread 1 will now not search position A but simply return that its score is -120, a score that it has never actually had.

Hopefully that helps clear up some of the mechanics of what can happen.

Janzert

Title: Re: Usage of 2, 4, 8 cores?
Post by Fritzlein on Sep 23rd, 2008, 6:13pm
Thanks for spelling it out.  The example of one thread reading what another has partially written is illuminating, and incidentally shows how the legal move test wouldn't help.

Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Sep 23rd, 2008, 9:37pm
Right, the illegal move check doesn't guard against invalid data. It's just the part adds the almost to the "almost ignore the problem" method. ;) It just guards against using data that is likely to have a catastrophic effect on the program's operation.

Janzert

Title: Re: Usage of 2, 4, 8 cores?
Post by nbarriga on Sep 24th, 2008, 11:36am

on 09/21/08 at 21:36:03, 99of9 wrote:
One afternoon I tried to parallelize Gnobot.  Unfortunately I have much to learn, because while it utilized both cores, it actually slowed down.  I was going parallel at the leaf nodes (in the evaluation function), but it seems I need to do it higher up the tree.

Does anyone know an easily implemented parallel search algorithm?  Even better, would anyone care to parallelize Don Dailey's sample bot?


I think the easiest way to do it(and i think it would work pretty well) is just do a normal 1 step search first, and then assign 1/N of the generated new positions to each of the N processors. Basically having an independent alpha-beta search for each processor. I would try first using independent hash tables also, for simplicity.

Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Sep 24th, 2008, 3:10pm
Young Brothers Wait should be about as easy to implement and is suppose to have quite a bit better performance.

Janzert

Title: Re: Usage of 2, 4, 8 cores?
Post by fotland on Dec 14th, 2008, 11:29pm
Bomb only uses one core.  When I wrote it I only ahd a single core machine, and parallel alpha-beta is more complex.

David


on 09/21/08 at 21:36:03, 99of9 wrote:
I don't know why I didn't notice this thread when it was active, but I asked the same question of Fritz a few weeks ago.  He was surprised that Gnobot does not utilize multiple cores, and was pretty certain that bomb does.  Jeff also mentioned that clueless does.

The reason I raised the topic was that like arimaa_master, I think it is going to be increasingly important to the bot championship (and eventually the challenge).

One afternoon I tried to parallelize Gnobot.  Unfortunately I have much to learn, because while it utilized both cores, it actually slowed down.  I was going parallel at the leaf nodes (in the evaluation function), but it seems I need to do it higher up the tree.

Does anyone know an easily implemented parallel search algorithm?  Even better, would anyone care to parallelize Don Dailey's sample bot?


Title: Re: Usage of 2, 4, 8 cores?
Post by arimaa_master on Dec 15th, 2008, 2:48am

on 12/14/08 at 23:29:43, fotland wrote:
Bomb only uses one core.  When I wrote it I only ahd a single core machine, and parallel alpha-beta is more complex.

David


Wow, so clueless is actually the ONLY ONE using multiple cores? If so - It will have the more and more important hardware advantage over his competition (maybe this year if quad will become in 1000 bucks horizon).

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Dec 15th, 2008, 5:33am
As of today, Gnobot is parallel!  So far it's only giving me about a 20% gain (on a dual core), and I don't have any collision protection, but it's a start.

Title: Re: Usage of 2, 4, 8 cores?
Post by Fritzlein on Dec 15th, 2008, 6:47am
Very cool, 99of9.  What speedup do you expect on four cores?  I guess you will have a chance to test once you get an account on the tournament machines.  (assuming quad core comes in the price range this year)

Fotland, thanks for clarifying that Bomb is not parallel.  I must have dreamed that up myself.  I apologize for misleading the other developers about that.

Title: Re: Usage of 2, 4, 8 cores?
Post by arimaa_master on Dec 15th, 2008, 7:48am

on 12/15/08 at 05:33:44, 99of9 wrote:
As of today, Gnobot is parallel!  So far it's only giving me about a 20% gain (on a dual core), and I don't have any collision protection, but it's a start.


Congrats Toby!

Title: Re: Usage of 2, 4, 8 cores?
Post by aaaa on Dec 15th, 2008, 8:13am

on 12/15/08 at 02:48:04, arimaa_master wrote:
Wow, so clueless is actually the ONLY ONE using multiple cores?

Actually, if I am remembering this correctly, it isn't Clueless that is parallel, but Jeff's endgame database generator, so we just have our first now.

Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Dec 15th, 2008, 12:43pm
Right, I think I was the one misleading people about clueless being parallel. :-/

Janzert

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Dec 15th, 2008, 2:07pm

on 12/15/08 at 06:47:55, Fritzlein wrote:
What speedup do you expect on four cores?  I guess you will have a chance to test once you get an account on the tournament machines.  (assuming quad core comes in the price range this year)

No idea sorry Fritz.  There are improvements I need to make before I can do any serious performance testing.

Title: Re: Usage of 2, 4, 8 cores?
Post by Fritzlein on Dec 15th, 2008, 2:33pm

on 12/15/08 at 14:07:46, 99of9 wrote:
No idea sorry Fritz.  There are improvements I need to make before I can do any serious performance testing.

Congrats anyway on being the first to have this feature.  The only downside is that if you win the Computer Championship, everyone will think it is due to parallelism, rather than being due to your excellent evaluation function.

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Dec 15th, 2008, 2:49pm
Maybe they'll be right!  However, I expect both opfor and clueless to be parallel by then - i wanted to start putting the pressure on so that all 3 will be humming louder than bomb.

Title: Re: Usage of 2, 4, 8 cores?
Post by omar on Dec 15th, 2008, 9:09pm
Toby,

Congratulations on paralleling GnoBot. I am anxious to see how it does against Bomb2005. What's the specs of the machine are you running it on locally?

This years system will most likely also be a Intel Core 2 Duo, but faster clock speed than last year.

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Dec 15th, 2008, 9:48pm

on 12/15/08 at 21:09:33, omar wrote:
Congratulations on paralleling GnoBot. I am anxious to see how it does against Bomb2005. What's the specs of the machine are you running it on locally?

Thanks, I'm also keen to see how it stacks up. When I run games online, I only run on a single core, 32bit, laptop (1.7GHz).  So we won't see the effects of parallelization until the tourney.  I've been testing the parallelization on a 64 bit, dual core desktop, (2.4 GHz), but I can't use this one on the network for arimaa.  Obviously they're quite different to one another, even without parallel the desktop runs about twice as fast as the laptop.  With parallel, I can currently get an additional speedup of around 1.38 on average (now using Young Brothers Wait).


Quote:
This years system will most likely also be a Intel Core 2 Duo, but faster clock speed than last year.

That would be unfortunate, I had assumed a quad core was very likely (which is why I focussed on parallelizing this year).  In Australia all the suppliers I use are offering complete general purpose systems with Core 2 Quad for under USD $1000, some up to 2.5GHz.  In fact they used to be under AUD $1000, but our dollar has since taken a hammering!

Title: Re: Usage of 2, 4, 8 cores?
Post by Fritzlein on Dec 16th, 2008, 7:24am
Assuming a gain of 70 Elo points for a twofold speedup, and assuming the gain is log-linear, you should get 33 Elo points for a speedup factor of 1.38.  In a tight race everything helps!

I wish it were a quad core this year, though.  That would maximize the inactive developer penalty.  :P

Title: Re: Usage of 2, 4, 8 cores?
Post by jdb on Dec 16th, 2008, 10:14am
Quad core with monitor for $800 Canadian, that would be about $650 US.

http://www.futureshop.ca/catalog/proddetail.asp?logon=&langid=EN&sku_id=BDL10003194&catid=&test_cookie=1

In all honesty, a dual core machine is at least 2 year old technology. By next Christmas, dual core desktops will not be sold.


Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Dec 16th, 2008, 11:11am
Plenty of single core machines are still sold and I'll be quite surprised if they aren't still around next year. My perspective is that while the top end is continually accelerating its rate of change the low end moves much slower. The extreme example being embedded processors where an 8bit processor isn't that uncommon even now.

Also I'm fairly sure hosting companies generally lag a bit behind in the cost for renting a dedicated server versus the comparable price to buy it.

Title: Re: Usage of 2, 4, 8 cores?
Post by omar on Dec 21st, 2008, 5:14pm
Thanks for that link JDB; I haven't bought a desktop system in a while so I had no idea they were getting that low in price. For the bot tournament and challenge match I rent two dedicated servers for about $100 USD per month each (one for 3 months and another for 1 month). As Janzert mentioned hosting companies charge more for renting a dedicated server versus the comparable price to buy it. I think part of the extra cost is because they also supply the network connection and uninterpretable power as well as warranty in case of hardware failures. But I am sure by next year the quad cores will fall in this range.

However, even if price were not a consideration, I would prefer to see the improvements in the bots coming from better evaluations rather than better use of hardware. GnoBot will certainly have an advantage this year over bots that are not multiprocessor ready. If it performs really well the other developers will not be able to ignore paralleling their bots. I hope after a year or two all the bots will be multiprocessor ready and the advantage one bot has over another will be due to a better evaluation or other algorithmic improvement.

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Jan 6th, 2009, 10:36pm

on 12/15/08 at 21:48:40, 99of9 wrote:
With parallel, I can currently get an additional speedup of around 1.38 on average (now using Young Brothers Wait).

On my new test set (where most positions are tested at around blitz speed), I get a speedup due to parallelization of 1.77.

Title: Re: Usage of 2, 4, 8 cores?
Post by Fritzlein on Jan 7th, 2009, 8:00am
Sweet.

Title: Re: Usage of 2, 4, 8 cores?
Post by tize on Jan 7th, 2009, 8:02am
Congrats on the speedup 99of9! 1.77, you're at 85% efficiency now.

The new parallel gnobot will almost play blitz games as strong as the serial play fast games.

Title: Re: Usage of 2, 4, 8 cores?
Post by Paranoid on Jan 7th, 2009, 7:28pm
How about using Amazon EC2 for running the Computer Championship? $0.80/hour for the 8-core instance

http://aws.amazon.com/ec2/instance-types/

Title: Re: Usage of 2, 4, 8 cores?
Post by Janzert on Jan 7th, 2009, 8:21pm
I've considered using EC2 for engine testing, although I have yet to do so. But I'm not sure it would work out well for the CC. My understanding is that EC2 is using virtualized servers (Xen (http://www.xen.org/) specifically). So while the overall effect should be fairly small you could still be effected by other instances running on the same physical hardware you are, also there's no guarantee that two separate instances are actually on identical hardware.

Janzert

Title: Re: Usage of 2, 4, 8 cores?
Post by Paranoid on Jan 7th, 2009, 8:39pm
Since EC2 allocates whole CPUs to your instance (except for the smallest instance type) and running bots is mostly about CPU load, I am guessing the performance will be quite stable. (and if you use the largest instance, presumably you would get most, if not all the resources of the physical machine)

Amazon does specify that all the instances should be with a similar spec (2.5 compute units/core), and you can always check.

Of course, that will not be a problem if you run all bots on one instance.

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Jan 15th, 2009, 7:14pm
I've just discovered a problem that has been plaguing gnobby since I parallelized (causing timeouts against woh and clueless for example).  I think it might be relevant to anyone else wanting to parallelize:

  • When multithreaded applications run, each thread gets given a limited stack size.
  • Recursive calls (like tree search calls in alpha beta) mean that the same function can be on the stack many times (the deeper the program is searching, the more it fills the stack).
  • If you have a few local arrays of information related to the (up to 128) branching moves from each position, in each search call, you can end up exceeding the default stack size for a thread (on my system it was 8Mb).
  • You can change the stack size limit in the terminal you run your bot from ("ulimit -s 100000" gives 100000 bytes).  However I can't see a way to do this within the software, or make this increase permanent.


I'm sure some of you knew all this, and will have a trivial solution for me.  Especially, how do you recommend I make sure that Gnobot has a sufficient stack size in the CC, since I won't be personally opening the terminal window?

Title: Re: Usage of 2, 4, 8 cores?
Post by Paranoid on Jan 17th, 2009, 2:31am
If you're using POSIX threads, try pthread_attr_setstacksize() (before you do pthread_create()). If you're not exceeding ulimit, it should work, otherwise you can increase it with setrlimit(), up to the hard limit.

http://linux.die.net/man/2/setrlimit
http://opengroup.org/onlinepubs/007908775/xsh/pthread_attr_setstacksize.html
http://opengroup.org/onlinepubs/007908775/xsh/pthread_create.html

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Jan 17th, 2009, 5:10am
Janzert also pointed to posix. Unfortunately I'm using OMP, is there a nice method for that?

Title: Re: Usage of 2, 4, 8 cores?
Post by Paranoid on Jan 18th, 2009, 9:44pm
Seems like the only option is an environment variable:

http://gcc.gnu.org/onlinedocs/gcc-4.3.2/libgomp/GOMP_005fSTACKSIZE.html#GOMP_005fSTACKSIZE

Maybe it's best to use a wrapper script then as an entry point, instead of the main executable:


Code:
#!/bin/sh
ulimit -s 10000000
export GOMP_STACKSIZE=8000
exec ./main_executable "$@"


(the exec replaces the wrapper process with the main executable and the "$@" forwards command-line arguments)

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Jan 18th, 2009, 9:51pm
Perfect, thanks Paranoid.  Janzert also recommended a wrapper, and it's nice to have one written out for me!

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Feb 23rd, 2009, 5:05pm
I notice that badger is running parallel... you're an impressively fast programmer doublep!

Title: Re: Usage of 2, 4, 8 cores?
Post by 99of9 on Feb 24th, 2009, 9:34pm

on 01/06/09 at 22:36:40, 99of9 wrote:
On my new test set (where most positions are tested at around blitz speed), I get a speedup due to parallelization of 1.77.

And now that I have access to quad-core, this ends up giving a speedup of 2.52, not quite such a big jump, but handy nonetheless.

Title: Re: Usage of 2, 4, 8 cores?
Post by omar on Feb 25th, 2009, 12:01pm
The bot developers will definitely have to make their programs multi-threaded to take advantage of future hardware. It's getting difficult for the chip makers to squeeze more cycles per second and they are having to use multiple cores to continue providing more performance. By 2002 the chip makers had already reached the 3GHz mark, but have not yet hit 4Ghz.

Title: Re: Usage of 2, 4, 8 cores?
Post by doublep on Feb 28th, 2009, 2:31pm

on 02/23/09 at 17:05:44, 99of9 wrote:
I notice that badger is running parallel... you're an impressively fast programmer doublep!


Thanks :)  In my "defense", I kept parallellization in mind from the start (since the start was just around the New Year --- not so long ago).  So, when I finally got to it (February 22nd, according to VCS logs), I only had to add proper locking to the reader.  Everything else was already prepared, e.g. board was an object, not a global structure, etc.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.