Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> Off Topic Discussion >> Go starting to show cracks as an AI challenge?
        
(Message started by: aaaa on Apr 30^th, 2011, 5:49pm)

Title: Go starting to show cracks as an AI challenge?
Post by aaaa on Apr 30^th, 2011, 5:49pm

Just recently, Crazy Stone (under the handle "bonobot") was able to maintain a whopping 5 dan rating on the KGS for a while, running on a 24-core machine. Although this achievement should be qualified somewhat by the fact that the games it has been playing were 15 seconds per move, it's still mighty impressive.

http://www.mail-archive.com/computer-go@dvandva.org/msg03018.html
http://gosensations.com/?id=2&server_id=1&new_id=1093

Title: Re: Go starting to show cracks as an AI challenge?
Post by omar on May 10^th, 2011, 7:31pm

Wow, sounds like Remi Coulom working with his PhD student Aja has hit upon some more breakthroughs that significantly improved the performance of Crazy Stone. Everyone seems to want to know what they are.

Title: Re: Go starting to show cracks as an AI challenge?
Post by Fritzlein on May 11^th, 2011, 12:02am

I love the way strong Go players, including professionals, routinely put their reputations on the line in man vs. machine games, both formal and informal. The attitude of the Go world puts the professional shogi association to shame. Shogi may be only a few years from falling to computers, or not, but we can't get an accurate reading because professionals are forbidden from publicly playing against computers. With Go, we can get a much clearer sense of the state of the art, and incidentally it is somewhere around one dan for tournament Go, for all the buzz bonobot's rating engendered.

Title: Re: Go starting to show cracks as an AI challenge?
Post by Janzert on May 11^th, 2011, 11:33am

I think the computers are probably now getting very close if not already at 2 dan. Besides the Crazystone rating on kgs Pachi has had some nice wins against high ranked players.

(links to sgf files)
H7 over 9p (http://files.gokgs.com/games/2011/4/14/nutngo1-pachi2.sgf)*
H7 over 9d (http://files.gokgs.com/games/2011/4/10/bigbadwolf-pachi2.sgf)
H6 over 8d (http://files.gokgs.com/games/2011/4/10/Tien-pachi2.sgf)
H6 over 7d (http://files.gokgs.com/games/2011/4/10/Cornel-pachi2.sgf)
H6 over 7d (http://files.gokgs.com/games/2011/4/10/ThunderGod-pachi2.sgf)

Also along with Bonobots recent 5d, Manyfaces(2d), pachi2(3d) and Zen19(4d) are all now above 1d on KGS.

On the contrary side though; in the same event pachi won against the 9p it also lost a H6 game against a 5p and most of the kgs rankings have been attained primarily through blitz time controls.

I also think John Tromp was either very prescient or got very lucky and picked just about the perfect time limit for his Go bet (http://dcook.org/gobet/).

Janzert

* Short report on event with the win against 9p (http://teytaud.over-blog.com/article-blind-go-random-go-13x13-go-rengo-73149193.html)

Title: Re: Go starting to show cracks as an AI challenge?
Post by Fritzlein on May 11^th, 2011, 4:43pm

on 05/11/11 at 11:33:24, Janzert wrote:

On the contrary side though; in the same event pachi won against the 9p it also lost a H6 game against a 5p and most of the kgs rankings have been attained primarily through blitz time controls.

You make the case against yourself well. The embarrassing computer losses don't make the headlines, but there are plenty of such losses to offset the surprising wins and the high server rankings at blitz speed.

An additional argument against the most notable machine victories is that the tradition of Go rankings is based on handicap stones, which exaggerates differences between beginners and conceals differences between experts. At a high level a six-stone handicap is overwhelming, indicative of more than six ranks if ranks were calculated by winning percentages as Elo does.

Matches without handicap, like the Tromp challenge, seem more telling to me than handicap matches. Handicaps in Go are similar to handicaps in Arimaa; they count for more near the top of the scale. If two Arimaa beginners win 50-50 at camel handicap, then the weaker player might only be 100 Elo behind at even games. In contrast, someone who is 50-50 against chessandgo given a camel handicap is probably 600 Elo behind chessando at even games.

Quote:

I also think John Tromp was either very prescient or got very lucky and picked just about the perfect time limit for his Go bet (http://dcook.org/gobet/).

This particular even match was a blowout at 4-0. Do you put the computer close to 2 dan because the games themselves were close despite the lopsided final score? Or are you saying it was close temporally, because computers are advancing so fast that Tromp would lose in another couple of years despite the fact that presently top computers might only be 1 dan? Or did Tromp "get lucky" because he was no better than the computer but won anyway just as one might flip four heads in a row on a fair coin?

I guess if I say "around shodan" and you say "around two dan" we aren't disagreeing that much. Our difference in perception is probably less than the difference between European shodan, American shodan, Japanese shodan, Korean shodan, etc. :)

Title: Re: Go starting to show cracks as an AI challenge?
Post by Janzert on May 11^th, 2011, 10:07pm

on 05/11/11 at 16:43:03, Fritzlein wrote:

:)
Well without the pessimistic evidence you'd have to say the computers are at least 3 if not 4 dan already. Since there are 2 bots with a 4d or higher rating and 4 with 3d or higher (I forgot to put EricaBot at 3d in the list earlier). Also these ratings are derived cumulatively from hundreds if not thousands of games. So it's not like bots are just using an unexpected trick to get these wins.

Quote:

An additional argument against the most notable machine victories is that the tradition of Go rankings is based on handicap stones, which exaggerates differences between beginners and conceals differences between experts. At a high level a six-stone handicap is overwhelming, indicative of more than six ranks if ranks were calculated by winning percentages as Elo does.

Even worse professional rank is a lifetime achievement ranking, not a measure of current ability. So unlike say a top Chess player a Go professional will never drop in rank. So yep I'm in complete agreement the Go ranking system is much coarser and less accurate than something like the chess ELO system (not that the chess system is without flaws either).

Quote:

Matches without handicap, like the Tromp challenge, seem more telling to me than handicap matches. Handicaps in Go are similar to handicaps in Arimaa; they count for more near the top of the scale. If two Arimaa beginners win 50-50 at camel handicap, then the weaker player might only be 100 Elo behind at even games. In contrast, someone who is 50-50 against chessandgo given a camel handicap is probably 600 Elo behind chessando at even games.

Yep, even though Go does have a much better handicapping system than most games, even matches are certainly the best way to measure strength.

Quote:

I was meaning that I think Tromp ended up with the last year he had a better than even chance of winning the bet. Despite the overwhelming final score, during the games themselves the sentiment was that things were much closer. At least one of the games the spectators were sure the computer had a completely won position but blundered it away. Since December ManyFaces has gained ~0.5 a stone in rank on KGS. It's also a bit unfortunate that Zen was unable to be used for the tournament, as it was (and is) pretty widely considered stronger than ManyFaces currently. Actually since December it has won every monthly KGS bot tournament (although it didn't participate in the extra "slow game" tournament). Also of note regarding the pinnacle strength of current bots is that the bet restricted the hardware that could be used. Most (all?) of the top bots in competition can use clusters now and extract a nice little boost in strength from them.

Quote:

I guess if I say "around shodan" and you say "around two dan" we aren't disagreeing that much. Our difference in perception is probably less than the difference between European shodan, American shodan, Japanese shodan, Korean shodan, etc. :)

Yeah, the differences in the actual strength meant by a given rank depending on where it is measured is certainly another problem in discussing Go ranks. My personal base of reference is the KGS server which seems to be where most of the computer go activity takes place. According to Sensei's Library ranking comparisons (http://senseis.xmp.net/?RankWorldwideComparison) KGS 1-2d is somewhere between 2k Korean and 3d in Japan or AGA ranks. So given that I'm American I'll now declare that currently computer go is clearly at least 3 dan. :P

Janzert

Title: Re: Go starting to show cracks as an AI challenge?
Post by Fritzlein on May 12^th, 2011, 2:18am

on 05/11/11 at 22:07:45, Janzert wrote:

Well without the pessimistic evidence you'd have to say the computers are at least 3 if not 4 dan already.

You are eloquent on both sides. :)

Quote:

Since there are 2 bots with a 4d or higher rating and 4 with 3d or higher (I forgot to put EricaBot at 3d in the list earlier). Also these ratings are derived cumulatively from hundreds if not thousands of games. So it's not like bots are just using an unexpected trick to get these wins.

What is the effect of the time control? I tend to guesstimate that Arimaa bots gain 150 Elo relative to humans when playing at 15s/move instead of 2m/move.

Also, my experience on both ICC and the Arimaa game room is that bots either have their ratings driven far lower than they should be by folks beating them repeatedly according to formula, or else their ratings are driven far higher than they should be by players who don't know how to win losing repeatedly to a bot that never gets tired and never overlooks simple tactics. Does KGS suffer the same phenomenon of bots being significantly underrated or overrated depending on whether their recent opponents have been better or worse?

Finally, are the KGS games involving bots mostly played at handicap? I read up on the rating system at one point, and the attempt they made to quantify the tradeoff between handicap stones and winning percentages. My recollection is that they didn't give the higher-ranked player nearly enough winning chances at even games. This would mean, for example, that if a 4dan and a 1dan were breaking even at three stones handicap, and then they switched to playing even games, the 4dan would win an even higher percentage of games than predicted by the server, and thus gain in rating due to the even games.

Or, to put it another way, if the KGS bots are getting their 4dan ratings by winning 50% of the time against 1kyu human players at a handicap of four stones, then I will have to give those bot ratings a lot more respect than I have been. That would be very impressive indeed.

Sharp2011Blitz attained a rating of 2342, but I'm wary of using that to say the Arimaa Challenge is just about to fall, and that wariness is even given a rating system that is more reliable (less of a hack job) than Go rating systems tend to be. But perhaps I'm underestimating the power of the KGS ratings. I'm willing to be educated. (and thanks for all the information so far!)

Quote:

Interesting. The fact that the games were dramatic indicates in itself that the players were close in skill. On the other hand, blundering away a clearly won position cuts both ways; why should we think well of the computer for getting a lead and not think poorly of it for blowing that lead? The sum of its strength and its weakness was still a loss. ;)

Quote:

Since December ManyFaces has gained ~0.5 a stone in rank on KGS.

OK, so your comment about Tromp getting lucky does have to do with the current rate of progress in Go AI. Are computers gaining a rank every year? It used to be a much slower pace of progress, if I remember correctly.

Quote:

Most (all?) of the top bots in competition can use clusters now and extract a nice little boost in strength from them.

What is the current conventional wisdom on the value of hardware to Go AI's? Is there a formula analogous to chess engines gaining 50-100 Elo per doubling of hardware?

Title: Re: Go starting to show cracks as an AI challenge?
Post by lightvector on May 12^th, 2011, 12:13pm

I don't know about how well KGS maps its underlying numeric model to handicap stone differences or Go rank differences, and I wouldn't be surprised if it wasn't that great. I also don't know about winning chances between players of different ranks (Sensei's library had a page with stats on this, but I can't find it).

But my personal experience is that KGS is quite internally consistent above around 10k or so. If you play someone +1 rank, you are moderately more likely to lose, +2 very likely. If you take two players at given different ranks, then it tends to be the case that a fair game occurs at the handicap equal to the rank difference (after accounting for the fact that Go's handicapping system is 1/2 stone off) plus/minus half a stone if the rank difference is "small", and plus/minus a stone if large.

In terms of the numeric model itself, I know that KGS uses some sort of maximum-likelihood or bayesian method (similar to Bayeselo), with decaying weights for older games. I generally trust such a model a little more than a typical elo system, although it is certainly still open to ratings abuse and such.

Games against these bots are typically a good mix of players (because no single player tends to play for hours on end against the same bot, and the bots are in-demand, often remaining open for less than 30 seconds before the next person starts a game). Informally looking at the game records, the rank distribution of the players tends to be around the rank of the bot, with a very heavy tail on the weaker side and a very light tail on the stronger side.

on 05/12/11 at 02:18:31, Fritzlein wrote:

Are computers gaining a rank every year? It used to be a much slower pace of progress, if I remember correctly.

My memory is a bit vague, but a rank every year sounds approximately right. After the massive initial jump in strength following the discovery of MCTS in 2006-2007, I think I recall bots around 4k-1k in 2008 on high-end hardware. In 2009 or so, I think they got to something as high as 1d-3d on massive clusters, but maybe only 2k-1d at best otherwise. And currently, I am highly confident that the best bots are at least 2d on high-end hardware (ex. 8-32 core machines), probably even at time controls somewhat slower than blitz. These are all in KGS ranks.

Prior to MCTS, strength was static. Programs were too fragile, and any improvements in one area would simply worsen them in other areas. Any amount of additional running time did almost nothing to help because the algorithms didn't scale. Now with MCTS, research and improvements are continuing at a good pace, and the algorithm shows clear gain with more running time.

Title: Re: Go starting to show cracks as an AI challenge?
Post by Fritzlein on May 12^th, 2011, 5:12pm

This conversation has piqued my interest. I found the description of KGS ratings on Sensei's Library. I applaud the fact that ranks are now converted into win percentages differently depending on whether the players are beginners or experts:

. expected
rank win
diff rate
0.0 50%
0.5 60%
1.0 70% (k=.85, for 30k-5k)
1.5 78%
2.0 85%
2.5 89%

0.0 50%
0.5 66%
1.0 79%
1.5 88% (k=1.30, for 2d+)
2.0 93%
2.5 96%

This is an improvement, although I expect the win percentages are still overstated for beginners and still understated for top players. At the upper end, where bonobot was getting its ranking, understated win percentages mean that in games that are played with a handicap unequal to rank difference, the favored player will get an inflated rating relative to all games being played at appropriate handicap.

I figured out how to see the list of bonobot's games used to attain the 5d rank. The majority of games were played at an appropriate handicap, which makes me respect bonobot's rating more. It was doing as I said, giving five stones to 1k's and holding its own. Impressive. However, of the significant number of games not played at appropriate handicap, nearly all of them left bonobot at advantage relative to its rank, which would tend to inflate its rating somewhat as described above. So the effect that I was afraid of is probably still there, albeit at a much reduced level compared to what I was talking about in my previous post.

I see also that the rating spanned about a hundred rated games against a variety of opponents; I didn't see evidence of a single weaker player inflating bonobot's rating by being unwilling to quit despite losing every game. Any such effect would be collective, not individual. I wonder if Go bots display the same general phenomenon that Arimaa bots do, namely that a bot loses more than expected against higher-rated players, and wins more than expected against lower-rated players, as if AI's have a tighter performance distribution than humans do.

Also I saw a player "zen" ranked at 1d. Surely this isn't the computer that is winning all the computer tournaments?

The remaining questions which I am not sure how to research myself are the relative effects of playing faster, and the value in ranks of doubled computing power.

Gradually I am becoming convinced that Go AI has advanced further than I thought. How intriguing!

Title: Re: Go starting to show cracks as an AI challenge?
Post by Janzert on May 13^th, 2011, 12:46am

Sorry, I ran out of the time that this conversation deserves. But I'll try and throw in a few quick things anyway. :)

on 05/12/11 at 17:12:56, Fritzlein wrote:

Also I saw a player "zen" ranked at 1d. Surely this isn't the computer that is winning all the computer tournaments?

Zen19 is the username you want (similarly it uses Zen13 and Zen9 for 13x13 and 9x9).

Quote:

The remaining questions which I am not sure how to research myself are the relative effects of playing faster, and the value in ranks of doubled computing power.

The best study I've seen on doubling time effects for MCTS go bots was done by Don Dailey a few years ago (it would be interesting to see another one with modern bots). He covered 12 doublings using two MCTS bots a relatively weak one by himself, Lazarus, and one of the strongest ones at the time, Mogo. Unfortunately I'm not sure if he ever published any sort of permanent final report. All I've been able to find is an email thread (http://www.mail-archive.com/computer-go@computer-go.org/msg02555.html) discussing some intermediate results. If I'm remembering right somewhere in that thread he also listed later if not final results. Most unfortunate is the the actual results were given in a graph he made and it seems to no longer be hosted where the email messages point to. :( One post mentions 200 elo per doubling though. But I also saw another message from much later on referring back to the study that said around 90 elo per doubling.

Janzert

Title: Re: Go starting to show cracks as an AI challenge?
Post by Fritzlein on May 13^th, 2011, 1:35am

Thanks for the link to Dailey's thread. It is good to know that performance is indeed linear in log of computing power, but unfortunate that we don't know the slope. :'(

The scale used by KGS implies that every rank above 2dan is 226 Elo. If current bots are 7 ranks below the top humans (what are the top KGS ratings?), that would translate to 1580 Elo. At 90 Elo per hardware doubling, we just need 17.5 doublings, i.e. hardware that is 194,000 times faster.

Projecting a continued gain of one rank per year is much more optimistic that my above estimate. Can the Go programming community continue to make smashing algorithmic improvements instead of waiting 35 years for Moore's law to bring home the victory?

Title: Re: Go starting to show cracks as an AI challenge?
Post by aaaa on May 25^th, 2011, 4:50pm

Although the values of the rating system used by the European Go Federation (http://www.europeangodatabase.eu/EGD/EGF_rating_system.php) can't actually be perfectly converted to Elo equivalents (as the former doesn't follow the Bradley-Terry model), it might be interesting to know that, by its reckoning, the expected outcome of an even game between an amateur (European) 1 dan (who would apparently be about 2 dan on the KGS) and a professional 9 dan would correspond to a rating difference of about 1459 Elo points.

Title: Re: Go starting to show cracks as an AI challenge?
Post by aaaa on Jun 17^th, 2011, 12:34pm

For a few days now, Zen, as Zen19S, has been able to maintain a 4-dan rating with games played with an overtime of 30 seconds per move. This incarnation had also won the most recent KGS slow bot tournament held a month ago. Hardware:

Quote:

mini-cluster of 6 pcs (a 6-core Xeon W5680/4 GHz, two 4-core i7 920/3.2 GHz, and three 4-core Core2Quad/3 GHz) connected via a GbE LAN

Title: Re: Go starting to show cracks as an AI challenge?
Post by Fritzlein on Jun 17^th, 2011, 2:08pm

on 05/13/11 at 01:35:09, Fritzlein wrote:

Thanks for the link to Dailey's thread. It is good to know that performance is indeed linear in log of computing power, but unfortunate that we don't know the slope. :'(

So, we have confirmation both that performance is linear in log of number of playouts, and confirmation that the slope is 100 Elo per doubling: http://dvandva.org/pipermail/computer-go/2011-June/003498.html

But, just as with the chess study we discussed in another thread, when you look at the data itself instead of reading the summary, it sure doesn't look linear:
http://dvandva.org/pipermail/computer-go/attachments/20110617/491c8b58/attachment-0001.gif

And if it isn't linear, then the concept of "Elo points per doubling" shouldn't be used to extrapolate way out on the curve, e.g. to guess how many doublings are needed to reach 9 dan, as I did in my previous post.

Title: Re: Go starting to show cracks as an AI challenge?
Post by Valueinvestor on Aug 7^th, 2011, 11:16pm

Zen recently beat a professional (Kozo Hayashi 6p) with a 5 stone handicap: http://files.gokgs.com/games/2011/8/3/ZensGuest-Zen19S-3.sgf

Title: Re: Go starting to show cracks as an AI challenge?
Post by Janzert on Jan 17^th, 2012, 2:32pm

on 05/11/11 at 11:33:24, Janzert wrote:

I also think John Tromp was either very prescient or got very lucky and picked just about the perfect time limit for his Go bet (http://dcook.org/gobet/).

In the initial go bet tournament Tromp took the 4 out of 7 tournament by winning 4-0. One year later there has been a repeat tournament with best 3 out of 5, this year Tromp lost 1-3. So I think my feeling last year has been validated and the bet was at the limit of Tromp being able to win it.

Janzert

Title: Re: Go starting to show cracks as an AI challenge?
Post by Fritzlein on Jan 17^th, 2012, 7:46pm

Nice, thanks for the update. I trust results of even games under tournament conditions much more than the results of handicap pickup games. How far is it from 3 dan to World Champion?

Is this result another sign that I am too optimistic about humanity defending the Arimaa Challenge? I have $1000 on the line; as much as Tromp did. Unfortunately, I won't win $1000 amount if computers lose. :P I wonder whether I will still be among the top three human players when the crunch time comes. Will I ever be in a position of defending my own money?

Title: Re: Go starting to show cracks as an AI challenge?
Post by robinz on Jan 21^st, 2012, 6:24am

on 01/17/12 at 19:46:09, Fritzlein wrote:

How far is it from 3 dan to World Champion?

I'm only a very mediocre go player - I know there are stronger players on here who may have a more precise view. But I do know that the answer to this question is "miles and miles" :)

The strongest amateur rank is, I believe, 7 dan. Such a player would be expected to be able to give a 3 dan a 4 stone handicap and still have an even chance of winning. How big the gap is between the best amateurs and the world's top players I am less sure of, but I'm sure it must be at least a couple of stones. So, the gap between amateur 3d and the world's top players is at least 6 stones, if my guess holds up. (And I think that is being conservative.)

(For comparison, before I virtually stopped playing last summer, I was about 8k, which is a whole 10 stones worse than a 3d. Yet I was still capable of easily winning at a 5 or 6-stone handicap against players who, though obviously weak, were by no means beginners. The range of possible strengths in go is really quite amazing.)

Title: Re: Go starting to show cracks as an AI challenge?
Post by hyperpape on Feb 10^th, 2012, 10:49pm

My (anecdotal) reaction to those numbers presented earlier is that they are not understatements. A 2 dan having a 7% chance to beat a 4 dan seems quite reasonable to me. And I think that persists up to higher levels. You see some of the best American players (who are just about professional strength--in some case they are professionals, in fact) losing to players who are one or one and a half stones weaker.

One complicating factor is that the bot used in the 2010 challenge, Many Faces of Go (made by David Fotland) is a good bot, but still not quite on a par with Zen or Crazy Stone.

But the real question is how sustainable this progress will be. Right now, a naive extrapolation puts bots ahead of humans by a good margin in 2020 (it's more like a stone and a half progress per year these days). But you never know.

Also, regarding the number of levels: you can quantify it. The Europeans use an ELO system (http://europeangodatabase.eu/EGD/EGF_rating_system.php). There are players who have competed in tournaments who are rated 100. Against a typical player who had played four or five games, they might give a five-nine stone handicap. And there are active players in Europe players who are 2800. There is no good recent statistical work comparing European and Asian professionals, but they might be anywhere from 2900 to over 3000.

So every game is different, but whenever I read someone saying Arimaa has a lot of room for human improvement, my gut always says that's right.

(Note: I'm KGS 2kyu, and I read about computer go, but don't do any computer go programming, fwiw).