Hacking Chess: Data Munging

This is a supplement to the Hacking Chess with the MongoDB Pipeline. This post has instructions for rolling your own data sets from chess games.

Download a collection of chess games you like. I’m using 1132 wins in less than 10 moves, but any of them should work.

These files are in a format called portable game notation (.PGN), which is a human-readable notation for chess games. For example, the first game in TEN.PGN (helloooo 80s filenames) looks like:

[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "Gedult D"]
[Black "Kohn V"]
[Result "1-0"]
[ECO "B33/09"]

1.e4 c5 2.Nf3 Nc6 3.d4 cxd4 4.Nxd4 Nf6
5.Nc3 e5 6.Ndb5 d6 7.Nd5 Nxd5 8.exd5 Ne7
9.c4 a6 10.Qa4  1-0

This represents a 10-turn win at an unknown event. The “ECO” field shows which opening was used (a Sicilian in the game above).

Unfortunately for us, MongoDB doesn’t import PGNs in their native format, so we’ll need to convert them to JSON. I found a PGN->JSON converter in PHP that did the job here. Scroll down to the “download” section to get the .zip.

It’s one of those zips that vomits its contents into whatever directory you unzip it in, so create a new directory for it.

So far, we have:

$ mkdir chess
$ cd chess
$
$ ftp ftp://ftp.pitt.edu/group/student-activities/chess/PGN/Collections/ten-pg.zip ./
$ unzip ten-pg.zip
$
$ wget http://www.dhtmlgoodies.com/scripts/dhtml-chess/dhtml-chess.zip
$ unzip dhtml-chess.zip

Now, create a simple script, say parse.php, to run through the chess matches and output them in JSON, one per line:

<?php
 
require("PgnParser.class.php");
 
$parser = new PgnParser("/path/to/chess/TEN.PGN");
 
$total = $parser->getNumberOfGames();
for ($i=0; $i<$total; $i++) {
    echo $parser->getGameDetailsAsJson($i)."\n";
}
 
?>

Run parse.php and dump the results into a file:

$ php parse.php > games.json

Now you’re ready to import games.json.

Back to the original “hacking” post

  • Christer Nilsson

    700 lines in PHP
    20 lines in Coffeescript:

    game = game.replace(/”/g,””)      lines = game.split(“n”)  hash = {}  for i in [0..7]  s = lines[i].replace(“[“,””).replace(“]”,””)  arr = s.split(” “)    hash[arr[0]]=arr[1]moves = lines[8].split(” “)list = []  for move,i in moves  if i%2==0    white=move.split(“.”)[1]  else    black=move                         list.push [white,black]                         white=””                     if white != “”  list.push [white]                                            hash[“moves”]=list  alert JSON.stringify(hash)

  • Christer Nilsson

    game = game.replace(/”/g,””)
    lines = game.split(“n”)
    assert 10,lines.length
    hash = {}
    for i in [0..7]
      s = lines[i].replace(“[“,””).replace(“]”,””)
      arr = s.split(” “)  
      hash[arr[0]]=arr[1]
    moves = lines[8].split(” “)
    list = []  
    for move,i in moves
      if i%2==0
        white=move.split(“.”)[1]
      else
        black=move                     
        list.push [white,black]                     
        white=””                     
    if white != “”
      list.push [white]                                            
    hash[“moves”]=list  
    alert JSON.stringify(hash)

  • Anonymous

    Cool! I’m not familiar with Coffeescript, can you do file IO with it?  (And no dumping on PHP. That package wasn’t great, but it does have a lot more functionality than I’m using above.)

  • Christer Nilsson

    CoffeeScript transpiles into JavaScript and has exactly the same features.
    Check out http://jashkenas.github.com/coffee-script/
    and the PGN2JSON code here http://tinkerbin.com/iY2VCcDF
    (change language to Coffeescript before running)
    I think CS and MongoDB is a perfect match!

  • Anonymous

    Nice 🙂

  • jsjohnst

    Since I can do the same thing in PHP:

    $game = str_replace('"', '', $game);
    $lines = explode("n", $game);
    $hash = array();
    for($i=0;$i$move) 
    	if($turn % 2 == 0) {
    		list($junk, $white) = explode(".", $move);
    	} else {
    		$list[] = array($white, $move);
    		$white = null;
    	}
    if($white) 
    	$list[] = array($white);
    $hash["moves"] = $list;
    print(json_encode($hash));
    

    in the same number of lines, what do I win? 😉 Maybe the knowledge / experience to understand that every language is just a tool in a developer’s toolbox and that being arrogant about one’s preferred choice of language is rather juvenile… 🙂

  • Rolph

    Hi,
    I need to convert about 80 pgn files to json. If you are still on this project, can you tell me how to change the parser.php so that it converts all the pgn files in a folder to .json files of the same names? Also how to run this parser.php?
    It is probably a simple “for” or “while” loop but I am not a programmer, but understand php only as a part of wordpress coding only.

    Thanks

kristina chodorow's blog