Hacking Chess: Data Munging

This is a supplement to the Hacking Chess with the MongoDB Pipeline. This post has instructions for rolling your own data sets from chess games.

Download a collection of chess games you like. I’m using 1132 wins in less than 10 moves, but any of them should work.

These files are in a format called portable game notation (.PGN), which is a human-readable notation for chess games. For example, the first game in TEN.PGN (helloooo 80s filenames) looks like:

[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "Gedult D"]
[Black "Kohn V"]
[Result "1-0"]
[ECO "B33/09"]

1.e4 c5 2.Nf3 Nc6 3.d4 cxd4 4.Nxd4 Nf6
5.Nc3 e5 6.Ndb5 d6 7.Nd5 Nxd5 8.exd5 Ne7
9.c4 a6 10.Qa4  1-0

This represents a 10-turn win at an unknown event. The “ECO” field shows which opening was used (a Sicilian in the game above).

Unfortunately for us, MongoDB doesn’t import PGNs in their native format, so we’ll need to convert them to JSON. I found a PGN->JSON converter in PHP that did the job here. Scroll down to the “download” section to get the .zip.

It’s one of those zips that vomits its contents into whatever directory you unzip it in, so create a new directory for it.

So far, we have:

$ mkdir chess
$ cd chess
$ ftp ftp://ftp.pitt.edu/group/student-activities/chess/PGN/Collections/ten-pg.zip ./
$ unzip ten-pg.zip
$ wget http://www.dhtmlgoodies.com/scripts/dhtml-chess/dhtml-chess.zip
$ unzip dhtml-chess.zip

Now, create a simple script, say parse.php, to run through the chess matches and output them in JSON, one per line:

$parser = new PgnParser("/path/to/chess/TEN.PGN");
$total = $parser->getNumberOfGames();
for ($i=0; $i<$total; $i++) {
    echo $parser->getGameDetailsAsJson($i)."\n";

Run parse.php and dump the results into a file:

$ php parse.php > games.json

Now you’re ready to import games.json.

Back to the original “hacking” post

kristina chodorow's blog