Arimaa Forum (http://arimaa.com/arimaa/forum/cgi/YaBB.cgi)
Arimaa >> Site Discussion >> Game archive irregularities
(Message started by: Janzert on Jan 19th, 2006, 5:27pm)

Title: Game archive irregularities
Post by Janzert on Jan 19th, 2006, 5:27pm
While not really about bot development I figured this is more relevant here than the general forum. Although everyone probably reads both anyway. ;)

While parsing through the game archives recently I've run into a number of places where either the formatting or data is incorrect. I thought I would post here so it could either be corrected or at least give a heads up for anyone trying to do the same thing in the future. Some of the things I'm actually not sure what the correct change would be. Anyway here's the things I've found so far. Also I've only gone through games to the end of 2005.

The last 4 games of allgames2004.txt do not have both players present until 2005 (one has one of the players, a bot, seated in 2004 the rest don't have any events happen until 2005).

The last game in allgames2004.txt (game id 10807) is repeated as the first game in allgames2005.txt.

allgames2003.txt, line 22, game id 86, has an extra \n after the last move in the move list.

The following lines in allgames2005.txt do not have a \n after the last event, actually they all appear to only be partially written events.

Line    Game id
1337    12142
1492    12297
1621    12428
1685    12492
1824    12631
6197    17005
6549    17357

The following games have some events missing the raw integer timestamp and the brackets around the long form timestamp.

File                Line    Game id
allgames2003.txt    3071    3137
allgames2003.txt    3072    3138
allgames2003.txt    3073    3139
allgames2003.txt    3074    3140
allgames2003.txt    3075    3141
allgames2003.txt    3451    3518
allgames2004.txt    3294    8082

Paul Lefert's rating is set to zero for one game ( 11128 ). The game before, it is 1703 and the game after, it is 1687.

Game 20310 has two moves for 11w recorded. The first one is actually 10b recorded as 11w.

10b ec5e ed5e Md4n rd8s\n
11w ec5e ed5e Md4n rd8s\n
11w Ee6e Ef6s cf7s cf6x Re3n\n

Some of the events might give an idea as to what happened.

1128731702 [Sat Oct  8 00:35:02 2005] move 10w received from w\n
1128731838 [Sat Oct  8 00:37:18 2005] move 10b received from b\n
1128731838 [Sat Oct  8 00:37:18 2005] move 10b received from b\n
1128731840 [Sat Oct  8 00:37:20 2005] move 11w received from b\n
1128731840 [Sat Oct  8 00:37:20 2005] gameserver rejected move; posted by the wrong side\n
1128731840 [Sat Oct  8 00:37:20 2005] move 11w received from b\n
1128731840 [Sat Oct  8 00:37:20 2005] gameserver rejected move; posted by the wrong side\n
1128731849 [Sat Oct  8 00:37:29 2005] move 11w received from w\n
1128731864 [Sat Oct  8 00:37:44 2005] move 11b received from b\n
1128731949 [Sat Oct  8 00:39:09 2005] b player has left\n
1128731959 [Sat Oct  8 00:39:19 2005] b player joining\n
1128731964 [Sat Oct  8 00:39:24 2005] b player present\n
1128731999 [Sat Oct  8 00:39:59 2005] move 11b received from b\n

Janzert

Title: Re: Game archive irregularities
Post by 99of9 on Jan 19th, 2006, 5:35pm
Wow, thanks for your detailed checking Janzert...

I'm surprised none of my database reading programs crashed on the formatting issues.

Title: Re: Game archive irregularities
Post by Janzert on Jan 19th, 2006, 5:44pm
Yours are probably more resilient than mine.

Actually as long as someone isn't parsing out individual moves and events they probably won't run into any problems.

Janzert

Title: Re: Game archive irregularities
Post by 99of9 on Jan 19th, 2006, 5:58pm

on 01/19/06 at 17:44:23, Janzert wrote:
Actually as long as someone isn't parsing out individual moves and events they probably won't run into any problems.


I do parse the moves (but not the events) ... to construct an opening book (and some other tables a while ago).  I think I must've just been lucky on the resilience front - I certainly planned to only accept the correct format.

Title: Re: Game archive irregularities
Post by Janzert on Jan 26th, 2006, 10:54pm
Btw, found a couple more problems. Not really serious but just in case it helps someone else avoid problems by knowing they exist.

Game 46, 21b silver wins by goal but the game continues.

Then apparently some bugs in the immobilization detection code early on.

Some games were record as time (t) wins instead of immobilization (m).

m termination recorded as t, Game 253 (253), 24b
m termination recorded as t, Game 1333 (1330), 42b
m termination recorded as t, Game 1986 (1983), 19w
m termination recorded as t, Game 4892 (4888), 71b
m termination recorded as t, Game 5028 (5024), 62b
m termination recorded as t, Game 5162 (5158), 37b
m termination recorded as t, Game 5758 (5754), 44w

and several games were terminated when a side moved into immobilization before allowing the other side to make a final move.

Last move of immo game found with replies, Game 90 (90), 16b
Last move of immo game found with replies, Game 249 (249), 15b
Last move of immo game found with replies, Game 250 (250), 16b
Last move of immo game found with replies, Game 258 (258), 21w
Last move of immo game found with replies, Game 259 (259), 31b
Last move of immo game found with replies, Game 271 (271), 23b
Last move of immo game found with replies, Game 281 (281), 18w
Last move of immo game found with replies, Game 337 (337), 21b
Last move of immo game found with replies, Game 343 (343), 17b
Last move of immo game found with replies, Game 361 (361), 20w
Last move of immo game found with replies, Game 362 (362), 25b
Last move of immo game found with replies, Game 380 (380), 39b
Last move of immo game found with replies, Game 419 (419), 21w
Last move of immo game found with replies, Game 421 (421), 19b
Last move of immo game found with replies, Game 438 (438), 18b
Last move of immo game found with replies, Game 482 (482), 26b
Last move of immo game found with replies, Game 483 (483), 19b
Last move of immo game found with replies, Game 484 (484), 30b
Last move of immo game found with replies, Game 559 (557), 18b
Last move of immo game found with replies, Game 571 (569), 19w
Last move of immo game found with replies, Game 894 (891), 65b
Last move of immo game found with replies, Game 1153 (1150), 19w

Janzert

Title: Re: Game archive irregularities
Post by Janzert on Mar 1st, 2006, 4:07pm
Two more slight gotchas I ran into.

The plycount field in the database actually counts move number not ply. I'm not sure if it records the move number when the game ends or the maximum move number in the game (this could be different because of takebacks). Takebacks that occur in the middle of the game do not appear to effect it.

Game #3136 is unique in that it has no moves and also because it has a termination type of f.

Janzert

Title: Re: Game archive irregularities
Post by 99of9 on Mar 1st, 2006, 6:06pm
Thanks for keeping up this record.  I will certainly consult it next time I need to auto-process the database.

Title: Re: Game archive irregularities
Post by Janzert on Mar 16th, 2006, 8:50pm
The recent game (#25959) between bot_Clueless2006P1 and bot_Clueless2006Fast lasting 600 moves not only set the record for longest game, but also unfortunately added another instance of not all events being recorded.

The last whole event recorded is move 504w. The next event is cut off in the middle of the timestamp.

Janzert

Title: Re: Game archive irregularities
Post by Janzert on Mar 17th, 2006, 7:49am
Game 25990 is another game with a double move recorded. As with #20310 the first move recorded is actually a repeat of the previous move.

30w Hh6n Hh7s rh8s\n
30b Hh6n Hh7s rh8s\n
30b df6s df5s eg3w rh3w\n

The events show two moves submited for 30w. Maybe a race condition in the side to play checking code?

1142014934 [Fri Mar 10 18:22:14 2006] move 30w received from w\n
1142014934 [Fri Mar 10 18:22:14 2006] move 30w received from w\n
1142014943 [Fri Mar 10 18:22:23 2006] move 30b received from b\n
1142015007 [Fri Mar 10 18:23:27 2006] move 31w received from w\n

Janzert

Title: Re: Game archive irregularities
Post by Janzert on Mar 23rd, 2006, 1:26pm
Games that have a termination type of 'a' used to have a result of 'u' (games 1990, 1991, 7955, etc.). Recently though they are given no result at all (games 23634, 24143).

Janzert

Title: Re: Game archive irregularities
Post by omar on Mar 25th, 2006, 11:11am

on 03/16/06 at 20:50:37, Janzert wrote:
The recent game (#25959) between bot_Clueless2006P1 and bot_Clueless2006Fast lasting 600 moves not only set the record for longest game, but also unfortunately added another instance of not all events being recorded.

The last whole event recorded is move 504w. The next event is cut off in the middle of the timestamp.

Janzert


The 'event' field could only store 64K which is why it got chopped off. I've changed it now to store up to 16M.



Title: Re: Game archive irregularities
Post by omar on Mar 25th, 2006, 11:15am

on 03/17/06 at 07:49:13, Janzert wrote:
Game 25990 is another game with a double move recorded. As with #20310 the first move recorded is actually a repeat of the previous move.

30w Hh6n Hh7s rh8s\n
30b Hh6n Hh7s rh8s\n
30b df6s df5s eg3w rh3w\n

The events show two moves submited for 30w. Maybe a race condition in the side to play checking code?

1142014934 [Fri Mar 10 18:22:14 2006] move 30w received from w\n
1142014934 [Fri Mar 10 18:22:14 2006] move 30w received from w\n
1142014943 [Fri Mar 10 18:22:23 2006] move 30b received from b\n
1142015007 [Fri Mar 10 18:23:27 2006] move 31w received from w\n

Janzert


This happened in my postal game against Derek (dtj) also. I was finally able to trace this problem and fix it. It was being caused by the send button being clicked twice and two processes getting started on the server to add the move.

Title: Re: Game archive irregularities
Post by omar on Mar 25th, 2006, 11:40am

on 03/23/06 at 13:26:37, Janzert wrote:
Games that have a termination type of 'a' used to have a result of 'u' (games 1990, 1991, 7955, etc.). Recently though they are given no result at all (games 23634, 24143).

Janzert


Changed these games to have a result='u'. Modified the code which was most likely causing this.

Title: Re: Game archive irregularities
Post by Janzert on Mar 25th, 2006, 5:37pm
Thanks Omar.

Janzert

Title: Re: Game archive irregularities
Post by omar on Mar 27th, 2006, 10:28am
Thanks for reporting these, Brian.

Title: Re: Game archive irregularities
Post by Fritzlein on Mar 31st, 2006, 3:20pm
This is more of a philosophical irreuglarity than an issue of data integrity: How is it that in the very first game Omar has a rating of 1516 and illz has a rating of 1595?

Perhaps Omar put the rating system into place before starting to archive the games.  Approximately how many games were played over what time span before archiving began?  I'm just curious.

Title: Re: Game archive irregularities
Post by Fritzlein on Apr 30th, 2006, 4:28pm
I just discovered 9 duplicate games in my copy of the games database.  (Most have been there for a long time, but discovering the most recent made me look for the rest.)

3913=3914
12535=12536
13047=13048
17769=17770
18262=18263
18274=18275
18513=18514
26280=26281
28600=28601

Maybe some of the duplicates have already been removed from Omar's official database, but they were still in mine, so I thought I would post FYI.

Title: Re: Game archive irregularities
Post by Fritzlein on Jun 18th, 2006, 10:27pm
Omar, this isn't a big deal, but the 2006 game archive is starting to become a large download for me.  Would you be willing to make a new archive for the second half of 2006 starting July 1?  And maybe in 2007 (if Arimaa continues to grow exponentionally in popularity) break down the archive month by month?

Title: Re: Game archive irregularities
Post by omar on Jun 20th, 2006, 10:08pm
Thanks for that suggestion Karl. The games file for 2006 was already getting to be over 14M which is pretty close to 16M for all of 2005. I split the file up by months now, so it should be easier to download.

I've also deleted the duplicate games from the database.

The reason for my rating being higher is because I had played some rated test games while testing the server before releasing it. I cleared the games database, but forgot to reset the ratings.

Title: Re: Game archive irregularities
Post by Fritzlein on Jun 21st, 2006, 9:21am
Thanks for taking care of this, Omar.  I know you are busy and have a million things on the todo list (partly because I keep making suggestions).  It's great that you keep improving an already great service.

Title: Re: Game archive irregularities
Post by Fritzlein on Jul 11th, 2006, 7:56am

on 06/20/06 at 22:08:02, omar wrote:
I split the file up by months now, so it should be easier to download.

It has been much easier to download the updates since the files split by month.  There seems to be a small bug, though: the games for June keep getting updated each week, and the games for July are not yet appearing.  Take care of it when you can.  It's not a big deal.

Title: Re: Game archive irregularities
Post by omar on Jul 20th, 2006, 10:09pm

on 07/11/06 at 07:56:26, Fritzlein wrote:
It has been much easier to download the updates since the files split by month.  There seems to be a small bug, though: the games for June keep getting updated each week, and the games for July are not yet appearing.  Take care of it when you can.  It's not a big deal.


Yes, that's intentional. The process to generate the logs runs once a week on Sunday. So for the first 7 days of the month it will generate the previous months logs (but I do it for first 10 days just to be safe). After that it will start generating the current month's logs.

Title: Re: Game archive irregularities
Post by Fritzlein on Jul 20th, 2006, 10:45pm
Oh, OK.  I had noticed that the July logs started appearing, and I assumed you had fixed it.  It's even better to know it wasn't broken.



Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.