Sharing Programming

I’m going to be volunteer teaching AP computer science this fall at a NYC high school! Aside from actually prepping them for the AP exam, I’ve been thinking about how to share the programming culture I love with the students. Off the top of my head, I’d like to tell them about:

Stuff you can do to program for fun:

  • Hackathons
  • Game jams
  • Project Euler

Where programmers hang out:

  • Github
  • StackOverflow
  • HackerNews
  • IRC

Programming culture and history:

  • Basic security: anyone can come up with a scheme that even they, themselves cannot break.
  • How the internet/websites work.
  • Notables in the field: stories about Stallman, Knuth, Linus.
  • Cartoons: XKCD… there must be others.
  • Obfuscated C (and other languages) contests.
  • Read Joel Spolsky’s blog.

I’m sure there’s loads of stuff I’m missing. Any other ideas?

I will gladly write a test Tuesday for a program today

When I started at Google last year, I was really impressed by their testing. Every C++ class had three files: a <classsname>.h file, a <classsname>.cc, and a <classname> Every time something new is implemented, it has to be tested. The code review tool even warns you if you add a new .h without an accompanying

The upside to this is that I am very sure that my code does what I want. There are, of course, still bugs, but generally they’re of the “I hadn’t thought of that case” rather than the “I didn’t implement it the way I meant to” variety.

assert(IsZebraPrint()) hits an edge case.

assert(IsZebraPrint()) hits an edge case.

A side effect is that writing tests forces a decent separation of concerns. If you’re throwing around singletons and hiding twenty layers of functionality in a class’s privates, you’re going to have a bad time. Conversely, if you’re making things testable, each class essentially becomes a wrapper for the resource below it: “I take a database connection and add some query logic,” “I take a storage wrapper and add some app-specific logic,” “I take app responses and present it to the user.” The whole application falls into beautiful, simple layers like a mille-feuille cake.

The downside is that writing tests is so. slow. It often takes me three times as long to write a test than it did to write the code. I think that, if you’re working at a startup, it’s actually probably not a good idea to have a culture of testing like this because it will slow down your coding so much. For most startups, getting something to market in 33% of the time that 90% works is much more important than getting it to 99%. In fact, Google hasn’t always had this culture. If you look a the dark corners of the code base, there are tons of old, untested classes.

I count myself as lucky to know the guy who actually inspired the testing culture that Google has now: Mike Bland. He’s been writing a series of articles on testing for Martin Fowler’s site. If you’re interested in testing, I recommend reading them.

I would gladly pay you Tuesday for a hamburger today.

Innards of Tar

The La Brea Carpets

I’ve been working with tar files a lot lately and I haven’t been able to find a good example of what a tar file looks like, byte-by-byte. The specification is the best reference I’ve found for how tar files are structured, but it isn’t exactly friendly. Here’s an interactive breakdown of what tar files look like on the inside.

First, we’ll make a directory and some files:

$ mkdir tar_test
$ cd tar_test
~/tar_test$ mkdir subdir0 subdir1 subdir2
~/tar_test$ echo content > file0
~/tar_test$ echo content > subdir1/file0
~/tar_test$ echo content > subdir2/file0

Feel free to put whatever files you want in here, it’s a pretty easy-to-understand format. If you’re feeling frisky, add some symlinks.

Now tar them up:

~/tar_test$ tar cvvf tar_test.tar *
-rw-r----- k/k     6 2014-05-15 16:29 file0
drwxr-x--- k/k     0 2014-05-15 16:29 subdir0/
drwxr-x--- k/k     0 2014-05-15 16:30 subdir1/
-rw-r----- k/k     6 2014-05-15 16:30 subdir1/file0
drwxr-x--- k/k     0 2014-05-15 16:30 subdir2/
-rw-r----- k/k     6 2014-05-15 16:30 subdir2/file0

And check out your tar file to make sure everything looks alright:

~/tar_test$ tar tf tar_test.tar

Tar files are organized into blocks of 512 bytes. Basically, the format of a tar file is:

Block # Description
0 Header
1 Content
2 Header
3 Content

If the content is longer than one block, it’ll be rounded up (so if you have a 1300-byte file, the tar entry will look like Header-Content-Content-Content). If an entry has no content (e.g., a directory or symbolic link) it only takes up one block. So, our tar file looks like:

Block # Description
0 Header for file0
1 Content of file0
2 Header for subdir0
3 Header for subdir1
4 Header for subdir1/file0
5 Content of subdir1/file0
6 Header for subdir2
7 Header for subdir2/file0
8 Content of subdir2/file0

Eight 512-byte blocks adds up to 4KB, but if we ls -lh the .tar, we get something bigger:

~/tar_test$ ls -lh tar_test.tar 
-rw-r----- 1 k k 10K May 16 15:19 tar_test.tar

There’s always an extra 1KB of 0s tacked onto the end of a .tar’s content as a footer, and there’s an implementation-dependent size tars are blocked up into (called the blocksize, which is different than the blocks discussed above). On my Linux machine, tar creates the 10KB archive shown above, on my OS X machine, it’s only 5.5KB.

Now we’re going to really look at the contents of the tar file, using hexdump. 512 bytes is 0×200 in hexidecimal, so each 200 is a new block in the archive.

~/tar_test$ hexdump -C tar_test.tar | more

You can see that the archive starts with the first entry’s filename:

00000000  66 69 6c 65 30 00 00 00  00 00 00 00 00 00 00 00  |file0...........|

Hexdump elides all-zero portions of the file, so the next interesting bit is the rest of the header:

00000060  00 00 00 00 30 30 30 30  36 34 30 00 30 36 30 31  |....0000640.0601|
00000070  34 35 34 00 30 30 31 31  36 31 30 00 30 30 30 30  |454.0011610.0000|
00000080  30 30 30 30 30 30 36 00  31 32 33 33 35 32 32 31  |0000008.12335221|
00000090  36 36 35 00 30 31 31 33  33 32 00 20 30 00 00 00  |665.011332. 0...|

Here are what the numbers are you’re seeing (you can look up these fields in the pax spec):

Mode (note that these are ASCII numbers: the byte values of ’0′ is 30)

Typeflag is the most interesting field here: it indicates the type of file (0 for normal files, 5 for directories). It can also b “x” to indicate an “extended header.” Extended headers are used to define your own fields or override fields in the header. For example, the header said that the mtime was 12335221665, but we could override that in an extended header with mtime=12345678901. If you have an extended header, the entry ends up taking an extra kilobyte of storage: one block for the extended header, and one block for a “normal” header which is identical to the initial header except contains the actual file type instead of “x”. So you’d have:

Block # Description
0 Header for file0 (typeflag=x)
1 Extended header of key=value pairs of attributes for file0
2 Header for file0 (typeflag=0)
3 Content of file0

The next part of the header is for links, so it’s all 0 for these normal files and directories. Then you finish up the header with:

00000900  00 75 73 74 61 72 20 20  00 6b 00 00 00 00 00 00  |.ustar  .k......|
00000910  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000920  00 00 00 00 00 00 00 00  00 6b 00 00 00 00 00 00  |.........k......|
00000930  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

“ustar” is a “magic” string that gives the tar format. The “k”s are my username and group name.

At 0×200 is the actual file content:

00000200  66 69 6c 65 30 0a 00 00  00 00 00 00 00 00 00 00  |content.........|

Then at 0×400, then next block (subdir0′s header) starts:

00000400  73 75 62 64 69 72 30 2f  00 00 00 00 00 00 00 00  |subdir0/........|

This is what tar looks like “under the covers.” It’s a lot more sparse than I thought it’d be, but I guess that’s where gzip comes in.

TEALS – Teaching CS on your way to work, part 2

If you’re in NYC and thinking about volunteering, there is another TEALS information session tonight.

After my last post on TEALS, Dan Goldin generously offered to answer some questions about his experience teaching students in Kentucky (remotely from NYC).

What class are you teaching? What are they learning now?

I’m currently teaching AP Computer Science at Lee County High School in Beatyville, KY. We’ve just finished covering the material that will be on the test and are having some fun with graphics and going over some practice AP questions.


How is it teaching remotely?

It’s challenging. I expected it to be tough on the technology front with the internet and screen sharing not always working but it’s surprisingly difficult to do the administrative work online. Since it’s remote, students will be submitting homework, quizzes, and tests online or via fax and then email. This makes grading and jotting down notes more difficult than it would be with paper. Another challenge is keeping everyone engaged which requires more effort remotely than in person. At the same time, we visited the school and it was great meeting the students and the teachers. Many people have been saying that schools and colleges will start teaching remotely and this is a great opportunity to see how it actually works and what challenges can arise.

Have there been any things that surprised you?

When I first started teaching the class I approached from a college lecturer angle but quickly discovered that that approach didn’t work with high school students. With high school students it’s important to make sure everyone is engaged with requires knowing what topics the students will have trouble with and multiple ways of presenting that information. The other surprise was how different students have different learning approaches. For some, just hearing an overview is enough while others need to visualize it to understand while others need to try it out and play with it in code before they get it.

A big surprise was how much school administration time takes up. There are field trips and club meetings that will take some students out of the classroom which makes it difficult to keep everyone on the same page since different students will be missing different topics.

Anything tougher/less tough than you anticipated?

The toughest problem has been figuring out lesson plans that will appeal to different types of students and making sure each of the students are moving at the same pace. Some students will get concepts quickly while others need a bit of reinforcement. In that situation you have to balance keeping the advanced student interested while other students may need more help. Especially in computer science where concepts build on top of one another, it’s easy to get behind so it’s dangerous to move too quickly.

I expected that we’d run into a ton of technical difficulties but for the most part we’ve been pretty successful. We’ve been using Microsoft’s Lync web conferencing software that makes it easy for us to both share our screens as well as log in to the students’ sessions so we can provide one-on-one feedback. Even with the remoteness it doesn’t feel as if we’re that far apart.

What kind of time commitment is it?

In addition to the full time teacher at the school there are 4 volunteers. Two are the main teachers and two are the teaching assistants so the work gets distributed. In our case, my coteacher, Gabe, and I alternate teaching days so we only have to prep for 2.5 days a week. In the beginning when we were ramping up we spent a lot more time ramping up and doing the administrative work but now that we’re comfortable I would say that each week involves a pretty even split between teaching and the administrative side. I would say when I started I spent around 6 hours a week on the class and now it’s closer to 4.

What to you do for a living?

I work as a data scientist/engineer at a startup in New York called TripleLift.


What would you tell a coworker who was interested in volunteering?

Give it a shot! I think it’s a great way to give back and get people interested in computer science. I know I’ve gotten lucky with my schooling that led me to where I am and it’s awesome being able to provide that experience to others.

Anything else you’d like to share about it?

Teaching remotely is only a small part of the TEALS program and most are done locally at nearby schools before the work day starts. Right now the TEALS program is looking to expand so if you have any interest in volunteering definitely attend an info session or reach out to me if you have any questions. Editor’s note: if you leave a comment, Dan will make sure to get back to you.


Thank you so much, Dan! And, I have to say, this is the first time I’ve heard someone not complain about a video conference system, so props to Microsoft Lync.

Mestre Boneco’s Training Sequences

I’ve been doing capoeira for about a year now. It’s very fun and a great martial art for geeks (singing! dancing! friendly people!). I really recommend it if you’re trying to get into shape or build strength/flexibility.

Different capoeira schools use different sequences of moves, and I’ve never been able to find the ones my class uses online. You can find the videos on YouTube, but as far as I know, they aren’t written down anywhere.

Until now!

First Sequence

Meia lua de frente, acaba atraz e ginga (finish back and ginga)
Desce trocando (go down and change) + meia lua de chão (from the floor)
Esquiva (2) da Ginga + Queixada
Esquiva (3) lateral + passada pras costas (walk from behind)

Second Sequence

Meia lua de frente, acaba atraz e ginga (finish back and ginga)
Desce trocando (go down and change) + meia lua de chão (from the floor)
Esquiva (2) da Ginga + Meia lua de compasso de tráz
Esquiva (walk from behind)

Third Sequence

Meia lua de frente, acaba atraz e ginga (finish back and ginga)
Esquiva (2) da Ginga e troca a base (switch base standing) + Queixada dire
Esquiva (2) da Ginga + queixada pras costas (walk from behind)
Puxeta + Meia lua de compasso contra (against)
Esquiva (2) da Ginga + Queixada
Esquiva (walk from behind)

Fourth Sequence

Armada entrando (with entrance)
Esquiva (3) Lateral + Queixada
Esquiva (2) da Ginga + Queixada
Martelo de tráz
Rasteira em Pé + Vingativa

Fifth Sequence

Armada entrando (with entrance)
Esquiva (2) da Ginga com troca de base (switch base from the floor) + Queixada direto (de tráz)
Esquiva (2) da Ginga com troca de base (switch base from the floor) + Meialua de compasso de tráz
Esquiva (3) Lateral + passada pras costas (walk from behind) + Meia lua de compasso contra (against) com Vontra (against)
Esquiva (3) lateral + passada pras costas (walk from behind)

Sixth Sequence

Armada entrando (with entrance)
Esquiva (2) da ginga + Queixada
Puxeta + Martelo cingativa
Meia lua de compasso de tráz
Passada pras costas

(Someone texted these sequences to me, so let me know if anything is incorrect in the comments.)

TEALS – Teaching CS on your way to work

Last night, I went to an information session about TEALS, a volunteer program where software engineers teach CS to high school students on their way to work. Basically, the schools schedule the CS class for first period so that the engineers can make it into work by 9:30. There are four programmer volunteers per class and one teacher from the school who, over the course of four years, learns how to teach CS.

TEALS is starting to expand into more New York City schools and had some inspiring stories. This semester, one group of volunteers is teaching at a school that is 100% English-as-a-second-language students. This took some experimentation, but they found that if they explain things in English and then break the class into groups, then the students who are more fluent in English can explain the concepts to the other students in Spanish.

This class will represent 12% of all hispanic students taking the AP CS test this year.

The school’s IT guy is super supportive of the program and stays after school almost every day to tutor the students and comes in on Saturday to open up the computer lab (many of the students don’t have computers at home, in fact, two are homeless).

If this made your heart grow three sizes, you can apply here or attend an information session.

Hello, Digital Ocean!

I recently switched this blog to using Digital Ocean for hosting, so please let me know if you notice anything broken.

On a side note, Digital Ocean is amazing and I highly recommend it to anyone who needs web hosting. You know how Github makes working with Git repos so pleasant? Digital Ocean does the same thing for web hosting. Down to step-by-step documentation for everything you need to do.

If you’re looking for hosting, try Digital Ocean. (They aren’t paying me for this or anything, I’ve just never had such an positive experience with hosting provider. Or, really, any positive experience with a hosting provider.)

On the flip side: Hostmonster is terrible. Never use Hostmonster. They verify your identity over the phone by asking for your password (!), autogenerate several .html files in your document root that you cannot delete, and nickle-and-dime you for stupid things.

Makin’ Mazes

After my previous post on the subtleties of CSS subpixel rendering, Andrew pointed out that readers might be more interested in how to dynamically generate mazes. It sounded crazy, but here we are.

First of all, if you’re interested in this stuff, there’s a great slideshow on maze generation here and more resources on the procedural content generation wiki.

Basically, your maze is made up of boxes:


You create the maze by making a grid of these boxes, and then Kool-Aid-maning your way through certain walls.

Ohhh yeah.

Ohhh yeah.

How do you choose which walls? That’s where the algorithm comes in.

First, you choose a square to start the maze at. You add this square to an “active set,” which is the set of squares that you’re currently enmazing:


Then you choose one of the square’s neighbors at random. If that square isn’t part of the maze yet, you connect it to the starting square and add it to the active set. If it is already part of the maze, you choose a different neighbor. If all of the neighbors are already part of the maze, you remove the starting square from the active set.

var new_node = startNode.getNeighbor();
if (new_node) {
} else {
    goog.array.remove(this.active_set_, startNode);

Now choose a member from the active set and repeat until the active set is empty.

Putting this all together, this looks like the following:

Maze.prototype.generate = function() {
    while (this.active_set_.length > 0) {
Maze.prototype.run_ = function() {
    var node = this.pickActiveNode();
    var new_node = node.getNeighbor();
    if (new_node) {
    } else {
        goog.array.remove(this.active_set_, node);

The actual full code for this is a couple hundred lines long and you can get it from my Github repo. However, the two functions above are the “meat,” the rest is just details.

Screen Shot 2014-01-18 at 12.22.27 PM

The most interesting part of the code above is the var node = this.pickActiveNode(); line. Depending on how you pick nodes from the active set, you get different “shaped” mazes. For example, here’s random:


30% choosing the first square in the active set:


60% choosing the first square in the active set:


Each maze has a different “character.” Pretty cool!

Update Your Feeds

Some RSS housekeeping:

If you haven’t subscribed, subscribe over RSS or email!

If you subscribed within the last couple of years, thank you and you don’t need to do anything.

If you subscribed more than three years ago, please make sure you’re subscribed to, not This site has been permanently redirecting to for years, but there are still a few subscribers on I’m going to let expire in about a month, so please update your subscription!

Thanks for reading, everyone!

Fixing CSS antialiasing on Chrome

I’ve been working on creating mazes with LimeJs and I ran into a problem:

Screen Shot 2014-01-18 at 12.42.13 PM

As you can see, this maze looks like it was rendered by a dying printer: there are horizontal white lines all over the place. I was going crazy trying to track down the cause: the squares were perfectly sized/placed and the padding and margins were all zero pixels. It looked fine in Firefox, just Chrome had the issue. Finally, Andrew figured out that it was antialiasing on pixel borders and, if I offset the images by less than a pixel, they wouldn’t be “rounded off.”

I tried offsetting everything by .5 pixels:

Screen Shot 2014-01-18 at 12.41.57 PM

Better, but not perfect. However, with .1 pixels:

Screen Shot 2014-01-18 at 12.41.34 PM

Good enough!

I made a patch for this but I haven’t made it work for masks so I’m not going to do a pull request for it (yet). However, you can cherry-pick it from my fork of LimeJs and use it by adding the following line to your start() function:

lime.Renderer.DOM.ANTIALIAS_OFFSET.y = .1;

If anyone knows anything about why Chrome antialiases stuff like this, I’d be interested in knowing (I found lots of sites complaining that it was weird, but none explaining it).

Happy hacking!

kristina chodorow's blog