Arimaa Forum - Print Page


    
      
        Arimaa Forum
        (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
      

        Arimaa >> General Discussion >> Widening gap?
        
(Message started by: Fritzlein on Sep 10^th, 2005, 11:12pm)

Title: Widening gap?
Post by Fritzlein on Sep 10^th, 2005, 11:12pm

I just noticed that there are now 7 humans rated ahead of the strongest bot, the first time this has happened since speedy came on line. This certainly validates the concept of a team of three humans defending the challenge, because even if a few top players decline the honor, there is still plenty of talent and knowledge to see the humans through.

Also I notice that there are five humans rated above 2000, which I think has happened for the first time. This is especially impressive because I believe that a 2000 ranking nowadays represents a higher level of achievement now than it used to. Probably a 2000 player today is would have been rated 2100 a year ago, and 2200 two years ago. The level of human play is steadily rising, and the bots have a lot of work to do this fall/winter just to catch up to where they were last year.

Title: Re: Widening gap?
Post by PMertens on Sep 11^th, 2005, 1:23pm

Actually it is now 8 players ahead if we count only active bots :-)
Oh, and 6 player 2k+ :-P (Go Belbo, go ...)

(Speedy is overrated anyway ;-))

Are you trying to explain why some Top players lost about 100 rating points over the last year ? ::)
I agree, that the 2k+ players are probably better than last year (because of newfound strategies) but I also think that it became far easier to reach such heights.

Title: Re: Widening gap?
Post by Arimanator on Sep 11^th, 2005, 3:39pm

Speedy wouldn't be so overrated if some people didn't feed him back points as soon as it's low. I've always expected blitz to go under the 1700s sooner or later but for some reason each time it gets close to that people lose against it (in droves apparently) and it's right back up over 1900s and sometimes even close to 2000. This bot works like the stock market. ;)

Title: Re: Widening gap?
Post by Fritzlein on Sep 11^th, 2005, 11:29pm

on 09/11/05 at 13:23:15, PMertens wrote:

Are you trying to explain why some Top players lost about 100 rating points over the last year ? ::)

Hehe, in my case it would be a 100-rating-point decline over the past couple of months, and I don't think deflation is working quite that fast. ;-) On the other hand, Arimanator, Robinson, and Adanac learned how to play, and the points they won had to be taken from somewhere. Also look at blue22: he's rated only 1800 but at his strength he certainly would have been rated 2000 only a year ago.

Quote:

I agree, that the 2k+ players are probably better than last year (because of newfound strategies) but I also think that it became far easier to reach such heights.

In what sense is it easier to reach a good playing strength? If you mean it is easier because there is a Wikipedia article, and there are more humans to learn from, and more different bots to learn from, I agree that all of these factors help players get good at Arimaa faster.

These aids notwithstanding, in some sense 2000 is still harder to reach than it used to be because the competition you have to climb over has gotten tougher. I got almost to 2000 myself just by beating bot_Arimaanator 80%-90% of the time, but when blue22 did the same it only got him to 1800 because bot_Arimaanator's rating had deflated 200 points in the mean time. And there are plenty of other examples. Speedy is about the only bot whose rating hasn't deflated compared to last year, and that is just because it hasn't been playing. If speedy were online all the time, I expect it would lose about 100 points too, and bounce around 1875 (a bit higher than BombFast) for its average rating. After that stabilization, despite the fact that speedy is a stronger bot than it used to be, one would have to be able to beat it a higher percentage of the time to get a 2000 rating, etc.

Title: Re: Widening gap?
Post by Fritzlein on Sep 11^th, 2005, 11:40pm

on 09/11/05 at 15:39:20, Arimanator wrote:

I've always expected blitz to go under the 1700s sooner or later but for some reason each time it gets close to that people lose against it (in droves apparently)

I think this is at least partly attributable to people who play experimentally despite known methods of beating Bomb. The prime example is Naveed. I'm sure he could beat BombBlitz every single game in a number of ways, but that's not how he likes to play. And even after he discovers a new method of winning, like that spectacular E+H and M+H on the second trap with a rabbit wiggling through, he abandons it after a couple of games to resume his losing ways.

Some of the time I find it embarrassing that some bots still have ratings as high as they do, and I wonder why humanity hasn't beaten them down yet, but other times I'm grateful for the people who don't give hoot about ratings and play for the joy of it. Fortunately the world is big enough for people like me to earn inflated ratings, and for freer spirits to play however it suits them.

Title: Re: Widening gap?
Post by PMertens on Sep 12^th, 2005, 10:03am

Quote:

It is certainly easier to become a better player than the frozen-in-time-players one year ago.
But with reaching heights I meant points not playing-strength.
To reach 2k points by playing a blitz-bot is not really harder than playing 'nator a year ago - just faster :-P
(of course it is a stronger bot, but we know its weaknesses)

Quote:

Fortunately the world is big enough for people like me to earn inflated ratings

If I remember correctly you manage to beat humans once in a while ... so maybe your rating is not that far inflated :-P

Title: Re: Widening gap?
Post by PMertens on Sep 13^th, 2005, 9:38am

I just noticed that there are now 9 humans rated ahead of the strongest (active) bot ... and 1 human is only 1 point behind ...

8)

Title: Re: Widening gap?
Post by Fritzlein on Sep 13^th, 2005, 10:32am

on 09/13/05 at 09:38:16, PMertens wrote:

I just noticed that there are now 9 humans rated ahead of the strongest (active) bot ... and 1 human is only 1 point behind ...

8)

Well, OK, but if you are only counting active bots, shouldn't you only count active humans? Still, any way you slice it, apparently a whole troop of humans could defend the challenge this year, when last year we were wondering if there were four of us.

Title: Re: Widening gap?
Post by PMertens on Sep 13^th, 2005, 11:29am

both mouse and bleitner have been last seen less than 30 days ago ;-)

(I hope they join the ranks of online players for the championchip)

Title: Re: Widening gap?
Post by Arimanator on Sep 13^th, 2005, 3:08pm

on 09/13/05 at 09:38:16, PMertens wrote:

I just noticed that there are now 9 humans rated ahead of the strongest (active) bot ... and 1 human is only 1 point behind ...

8)

Maybe the fact that blitz got recently quite a beating has something to do with that? ;)

Title: Re: Widening gap?
Post by Fritzlein on Sep 14^th, 2005, 7:03am

OK, now Belbo made it back over 2000 too, for a record 7 players over 2000 all at the same time. Who's next? If mouse doesn't come out of retirement soon, my bet is on blue22, just because he is giving me fits in our postal game.

Title: Re: Widening gap?
Post by PMertens on Sep 14^th, 2005, 7:43am

probably ... after all he is Top5 of players with RU 30 ;-)

(and Top7 with RU 31)

Title: Re: Widening gap?
Post by Fritzlein on Sep 23^rd, 2005, 11:39pm

Wow, suddenly the top active bot (where active means RU <= 50) is only rated 1800 flat, which means there are eleven active humans ahead of it. Hopefully the developers will show us some serious upgrades this fall, and bot_haizhi and bot_weiser will use new methods to reach new heights. We wouldn't want the challenge matches to start getting boring, would we?

Title: Re: Widening gap?
Post by Fritzlein on Oct 6^th, 2005, 6:28pm

For fun I decided to create a graphical view of the evolution of the strength of humans relative to computers. I went back through the database and listed the top eight players at the end of each week, then charted them with blue for humans and pink for bots. I allowed diferent versions of the same bot, i.e. both Occam and Arimaazilla earlier, and then all the versions of Bomb later. I excluded OmarFast, though, and would have excluded other duplicate human accounts if I had known of any.

http://math.umn.edu/~juhn0008/HistoricalTopRatings.png

It looks like bots and humans were running about even through mid-2003, but then humans started to pull away. Speedy has occasionally made a run at the top spot, but has always been beaten back. As of the last week shown, namely late September, only speedy is representing silicon in the top eight, and perhaps only due to inactivity.

Title: Re: Widening gap?
Post by PMertens on Oct 6^th, 2005, 9:14pm

nice graph ... would be a nice addition to the rating inflation topic as well :-)

Title: Re: Widening gap?
Post by omar on Oct 12^th, 2005, 3:46pm

Looks like the bots have stabalized around 1900's. It will be interesting to see this graph next year after the 2006 bots have been added.

Karl, perhaps you could create a "Arimaa Stats" page in the Wiki and have pages off of that page for the various stats you've looked at and post the graphs there. There's been times when I wanted to look at one of the graphs and hunting for it in the forum was not easy. Posting these in the Wiki would make it very organized and easy to find.

Title: Re: Widening gap?
Post by Fritzlein on Oct 12^th, 2005, 7:05pm

on 10/12/05 at 15:46:32, omar wrote:

Looks like the bots have stabalized around 1900's.

I doubt the bots will ever "stabilize" with our current rating system. BombFast and BombBlitz were both under 1800, and now they're both almost to 2000, but I'm sure that isn't the end of them alternately nosediving and rockecting upward.

Quote:

Karl, perhaps you could create a "Arimaa Stats" page in the Wiki and have pages off of that page for the various stats you've looked at and post the graphs there. There's been times when I wanted to look at one of the graphs and hunting for it in the forum was not easy. Posting these in the Wiki would make it very organized and easy to find.

Thanks for that idea. That would be good especially if I can put the images directly onto your Wiki server, because I have no idea how long the University of Minnesota will let me have a Web page now that I'm a dropout. I'll go see what I can do.

[EDIT] Hey, it's really easy to upload images directly. I'll add more later, but I might as well update the ones I want to update before uploading them. Do you have any specific requests for graphs that I should generate?

Title: Re: Widening gap?
Post by omar on Oct 15^th, 2005, 9:47am

The stats page in the wiki looks great. Thanks for creating that. Makes it very nice and easy to find the graphs now. Couple suggestions:

Maybe "discussion" links from the graph page back to the forum topic page(s) which discusses that partiular stats would be nice. Allows the discussion about the stats to be found easily.

Perhaps having seperate pages for each of the stat topics (Ratings, Server Activity, etc) would allow a lot more to be said with each topic and avoid one very long page as the number of topics grows. The main "Statistical Graphs" page would just link to these pages.

Title: Re: Widening gap?
Post by Fritzlein on Dec 10^th, 2005, 3:34pm

Just to update, there are now four humans rated over 2100 at the same time, for the first time ever. This is at the same time that the highest rating of an active bot is 1816. If we define "active" as having RU < 50, then there are still eleven active humans rated higher than the top active bot.

Title: Re: Widening gap?
Post by Ryan_Cable on Dec 11^th, 2005, 5:51am

You inspired me to make it 5 above 2100. :-)

I think all of the top 11 active players could beat any bot >90% of the time if they used all of the information available to humanity. Even as ridiculously overrated as I am, I think I am underrated relative to many of the bots. I don’t think there is any bot that deserves to be rated above 1700 right now. Bashing all of the bots to 1700- risks a repetitive stress injury to my brain though. ;-) Maybe rick or some other low rated player will help finish the job.

Title: Re: Widening gap?
Post by PMertens on Dec 11^th, 2005, 11:31am

Quote:

I think all of the top 11 active players could beat any bot >90% of the time if they used all of the information available to humanity.

Put in 100% if you are talking about all information ... but I guess 100% of the top 100 players would not like to simply replay an already played game.

As long as bot's do not play either "randomly" or change (or start to play much much better than they do now) it will always be a question of time to find just the right moves.
(Some of the rather extravagant botbashings are proof to that)

I am sure that botbashing did help us to understand the bots far better than we did just one year ago.
While it was possible to outsmart speedy we now are at a level where we actually know exactly what (for example) bomb will move even before we did press send.
I think that can be called a really wide gap :-)

Title: Re: Widening gap?
Post by 99of9 on Dec 11^th, 2005, 5:18pm

on 12/11/05 at 11:31:25, PMertens wrote:

As long as bot's do not play either "randomly" or change (or start to play much much better than they do now) it will always be a question of time to find just the right moves.

Oh, but they do!

(Nevertheless, probably not enough to alter the truth of your other statements.)

Title: Re: Widening gap?
Post by Fritzlein on Dec 11^th, 2005, 8:04pm

It's true that we can beat most bots essentially formulaically, which could be exploited to make the ratings gap arbitrarily large. Once bots become more adaptable and/or better randomized, the ratings will be a better indication of the true gap in skill. Right now we can mostly just guess that the gap is large and growing, without being able to quantify it very accurately.

Title: Re: Widening gap?
Post by Ryan_Cable on Dec 31^st, 2005, 9:56pm

Fritzlein from http://arimaa.com/arimaa/gameroom/comments.cgi?gid=22958

Quote:

I believe there is some rating deflation, maybe 100 points from a year ago and 200 points from two years ago. Even so, I'm not sure Speedy's true strength today is only 1750. I expect that a human who joined today and played only against other humans until attaining a rating of 1750 would lose a large majority to BombBlitz2005.

We have a strange situation (as you pointed out elsewhere) that there aren't many currently active players in the 1750 to 1950 rating range. If there were more such players, I believe the ratings scale would spread out more than it has at present, that top players would all gain 100 points from where they are now, and that BombBlitz's rating would bounce back unless it were determinedly bashed by known anti-bot methods.

On the other hand, I could be wrong about how much deflation is going on. Certainly we are learning tons about the game each year. I guess rating inflation/deflation is ultimately a matter of how fast we discover new ideas versus how fast new players join the pool.

I think that the ratings should be spread out more. I don't know how it is in chess, but with the number of new players Arimaa has coming in at 1500, it is almost inevitable that anyone that has been active for 6 months has a rating >1500. We seem to have three main groups of humans: noobs many of whom won't stay; those who can reliably beat noobs and most of the ladder bots but not the top bots/fast bots; and those who can reliably beat the top bots and people in the second group. If the second group sits ~1600, Bomb/Clueless Fast/Blitz then would go ~1800. Probably there is room for more spread in the ~1600 group, but usually people break out of this group at about the same time they get good enough to beat all the bots. Within the third group, I think there is enough skill space for at least three classes of humans at ~2000, ~2200, and ~2400.

I would put myself in the ~2000 group, I have a 20 game performance rating against humans of ~1825, and I think I can beat all of the Fast bots 80%+ using only E+H. Yet I have only won 1 game against a human rated >1750. Personally, I feel like I have advanced at least 3 classes from when I was ~1400, struggling to beat 'nator.

Title: Re: Widening gap?
Post by Fritzlein on Jan 1^st, 2006, 10:47am

I expected that new players would start learning the game by reading and studying expert games, rather than by experimenting individually. If it had happened the way I imagined, there would have been much more serious rating deflation by now, with the lobby bots being driven well below their historical levels.

But instead almost everyone seems to prefer learing by doing, and the ratings of the lobby bots haven't changed at all (as far as I can tell). This provides a higher floor to keep the upper levels of ratings from deflating as much as they otherwise would. For every new player that works up the ladder, stealing points from everyone else on the way, there are half a dozen players that lose a few games and then never come back, injecting points into the system at the bottom.

Inflation comes in at the bottom of the system, but deflation usually comes closer to the top. It comes from players learning more about the game, and taking points away from players who have only stayed the same in skill. My hunch is that because inflation and deflation comes from different ends of the spectrum, the ratings scale has compressed somewhat.

So to tie this post back into the topic thread, the gap between humans in general and bots is general is probably a little bit wider than the ratings indicate, although specific bots (in particular BombBlitz2005) are way underrated IMHO.

Title: Re: Widening gap?
Post by Fritzlein on Jan 20^th, 2006, 11:14am

Welcome Blue22 above the 2000 rating watermark. That makes eight humans (albeit just seven if Arimanator is considered inactive), while bots languish around 1800 or below. Unless the Bomb2006 was pumped up to master strength in secret, humanity has pulled away by roughly 200 points of playing strength between challenge matches.

Title: Re: Widening gap?
Post by Ryan_Cable on Jan 29^th, 2006, 6:34pm

There are now no bots rated >=1800 available to play. :-) There are currently 9 humans with RU<=50 rated above the top available bot. There are only 6 available bots rated >=1700, all of which are versions of Bomb or Clueless. With enough effort it is definitely possible to knock all of these bots down to below 1700.

Title: Re: Widening gap?
Post by Fritzlein on Jan 29^th, 2006, 7:54pm

It is interesting to me that ratings of the lobby bots haven't budged much over the same time period the CC bots have been beaten down. Perhaps new players will always perform about as well as they do now initially, no matter how much the state of the art advances, no matter how much we write about the game, or post puzzles in the the Wiki, or comment newbie games, or otherwise make help available. Maybe Arimaazilla will permanently bounce between 1400 and 1550, without ever being kept down by well-armed newbies.

The gap in strength between the lobby bots and the 2005CC bots is presumably as large as it always was, thus the CC bots now by and large have deflated ratings relative to anyone coming to play them after mastering the lobby. Yet those same CC bots will regularly bleed points to any of the top ten humans, so one could argue that the CC bots are overrated.

Before I gave this phenomenon as evidence that the scale of ratings is too compressed. Upon further reflection, however, I am not sure that a ladder of bots makes a good scale for ratings. The problem is that the HvB learning curve is much steeper than the HvH learning curve. If there is a bot you can't beat, you will hardly ever be able to beat it, and if there is a bot that you do know how to beat, you will beat it almost every game. The transition zone is very narrow. This is very different from, say, Robinson passing me in skill. There may be a longer time we are about 50-50, and even as he pulls away I'll probably be dangerous to him for a long time, and win a significant percentage as an underdog.

So maybe in purely human terms the current rating scale isn't so bad, and it just looks compressed because of all the HvB games.

Title: Re: Widening gap?
Post by Ryan_Cable on Mar 27^th, 2006, 6:53pm

There are currently 6 players 2100+, with another 2 or 3 active players 2000+. There are 11 humans with RU<=50 rated above the top available bot, and there are 3+ other active humans rated above the top active bot.

Title: Re: Widening gap?
Post by Fritzlein on Mar 28^th, 2006, 11:50am

That reminds me that Bomb2006, presumably the strongest bot at present, isn't available for play. Of course, Bomb2006 isn't a whole lot better than Bomb2005, but there were a few tweaks and bugfixes.

Unless some bot makes some huge strides this year, we'll probably be able to have the next Challenge defended by the 20th-ranked active human.

Title: Re: Widening gap?
Post by omar on Mar 30^th, 2006, 12:19pm

Since the challenge match is over now I guess I can break the news that Bomb2006 was actually Bomb2005. Any difference we might have notice in its play was due to the faster CPU and more memory (dual P4, 2.8 GHz, 512 MB RAM vs dual P4 3.0 GHz, 1024 MB RAM).

Title: Re: Widening gap?
Post by Fritzlein on Mar 30^th, 2006, 1:12pm

on 03/30/06 at 12:19:11, omar wrote:

Since the challenge match is over now I guess I can break the news that Bomb2006 was actually Bomb2005.

That's hilarious. It tells you something about my psychology that, because I expected to see differences, I kept thinking I was seeing differences. That should be a cautionary tale to me not to leap to conclusions based on a small number of games.

Did you use Bomb2005 simply because Fotland didn't have time to get Bomb2006 working? I know he made some changes to bot_speedy after last year's tournament, and I'm surprised he wouldn't want at least those changes to be included in whatever version of his bot was contesting the championship.

Anyway, I guess we can expect the humans to extend their advantage over computers if the best bot's software stayed exactly the same. Faster hardware seems to have had a small effect in making the nickel-and-dime harder to implement, but advances in human knowledge of the game are (so far) greatly outpacing hardware speedups, and can be expected to do so for some time. When Arimaa theory is a bit more established and agreed upon, then maybe speed alone will start closing the gap, but until then software improvements should be necessary for bots to keep pace.

Title: Re: Widening gap?
Post by omar on Mar 30^th, 2006, 2:33pm

David didn't get much time to work on Bomb in 2005 and a disk crash caused him to lose the stable version. Also the most current version was buggy. So David decided to stick with the older 2005 version. I just hope he gets some time to work on it this year

Title: Re: Widening gap?
Post by PMertens on Mar 30^th, 2006, 4:40pm

You must be kidding me ? :o

I was kinda certain that suiciding rabbits was not in 2k5's repertoire ... not even with much more time :-/

honestly ... before I could kill a cat without bomb letting that rabbit burn ... and now it moves away just for a position ??

Title: Re: Widening gap?
Post by omar on Apr 1^st, 2006, 6:10pm

Im sure it was not changed. I don't know how to explain these observations though.

Title: Re: Widening gap?
Post by Fritzlein on Jul 21^st, 2006, 12:51pm

Two years ago, when I first discovered Arimaa, I read about how difficult it was for computers to play well. Then I clicked on the gameroom link "Top Rated Players", and discovered that only three people (99of9, Omar, and Belbo) were ranked above the top computer (bot_speedy). It was quite disappointing to me, after all the rhetoric I had read, that only three humans in the whole world stood between the top bot and the Challenge Prize.

I later learned that it wasn't as close as the ratings made it seem, in part because bot_speedy was playing at 30 seconds per move while bot_bomb had to play the Challenge games at 120 seconds per move, but even so that first impression affected me greatly.

Nowadays I know that the "Top Rated Players" link is useless (Who cares about the rating of OmarFast, inactive for 22 months?), whereas the "Established Players" link conveys all the information I want, but whichever link a newcomer clicks, they will get a very different first impression than I did. The top bot is ranked #23 on the former list, and #20 on the latter.

In a previous post in this thread, I said that the 20th-ranked human would probably be able to defend the 2007 Arimaa challenge. With every passing month, that prediction looks more likely to come true.