API changes with extra cheese, hold the fear

Rihanna's dress

When you make a change, how do you know what tests to run? If you’re lucky, no one else depends on your code so you can just run your own tests and you’re done. However, if you’re writing useful code, other people are probably going to start depending on it. Once that happens, it becomes difficult to make changes without breaking them. Bazel can make this easier, by letting you figure out all of the targets that are depending on your code.

Suppose we are working on the pizza library and we need some cheese, so we create a cheese library and depend on it from pizza. If we look at our build graph, it will look something like this:

graph

//italian:pizza is depending on //ingredients:cheese, as expected.

A few weeks later, the macaroni team discovers that it could also use cheese, so it starts depending on our library. Now our build graph looks like this:

graph

Both our team’s pizza target and the macaroni team’s mac_lib target are depending on //ingredients:cheese. However, Team Macaroni never told us that they’re depending on cheese, so as far as we know, we’re still its only users. Suppose we decide to make a backwards-breaking change (e.g., make Cheese::setMilkfat() private). We make our change, run all of the pizza– and cheese-related tests, submit it… and break //american:mac_and_cheese as well as a dozen other projects who were calling setMilkfat() (that we didn’t know about).

If we had known that other people were depending on our code, we could have let them know that they needed to update their API usage. But how could we find out? With Bazel, we can query for everyone depending on our library:

$ bazel query 'rdeps(//..., //ingredients:cheese)'

This means: “query for every target in our workspace that depends on //ingredients:cheese.”

Now we can check that everything in our code base still builds with our cheese changes by running:

$ bazel build $(bazel query 'rdeps(//..., //ingredients:cheese)')

Just because they built doesn’t mean they work correctly! We can then find all of the tests that depend on cheese and run them:

$ bazel test $(bazel query 'kind(test, rdeps(//..., //ingredients:cheese))')

Unpacking that from the innermost parentheses, that means: “find the targets depending on //ingredients:cheese (rdeps(...)), search those for targets that are tests (kind(test, ...)), and run all of those targets (bazel test ...).”

Running that set of builds and tests is a pretty good check that everything that depends on cheese still works. I mean, if they didn’t write a test for it, it can’t matter too much, right?

macandcheese1

Right.

Have you ever looked at your build? I mean, really looked at your build?

Bazel has a feature that lets you see a graph of your build dependencies. It could help you debug things, but honestly it’s just really cool to see what your build is doing.

To try it out, you’ll need a project that uses Bazel to build. If you don’t have one handy, here’s a tiny workspace you can use:

$ git clone https://github.com/kchodorow/tiny-workspace.git
$ cd tiny-workspace

Make sure you’ve downloaded and installed Bazel and add the following line to your ~/.bazelrc:

query --package_path %workspace%:[path to bazel]/base_workspace

There should already be a line in your ~/.bazelrc that is almost identical to this, but starts with “build”. So, when you’re done, it’ll look something like:

build --package_path %workspace%:/home/k/gitroot/bazel/base_workspace
query --package_path %workspace%:/home/k/gitroot/bazel/base_workspace

(except your username probably isn’t “k”).

Now run bazel query in your tiny-workspace/ directory, asking it to search for all dependencies of //:main and format the output as a graph:

$ bazel query 'deps(//:main)' --output graph > graph.in

This creates a file called graph.in, which is a text representation of the build graph. You can use dot (install with sudo apt-get install graphviz) to create a png from this:

$ dot -Tpng < graph.in > graph.png

If you open up graph.png, you should see something like this:

graph

You can see //:main depends on one file (//:main.cc) and four targets (//:x, //tools/cpp:stl, //tools/default:crosstool, and //tools/cpp:malloc). All of the //tools targets are implicit dependencies of any C++ target: every C++ build you do needs the right compiler, flags, and libraries available, but it crowds your result graph. You can exclude these implicit dependencies by removing them from your query results:

$ bazel query --noimplicit_deps 'deps(//:main)' --output graph > simplified_graph.in

Now the resulting graph is just:

graph

Much neater!

If you’re interested in further refining your query, check out the docs on querying.

Hello, Bazel

Bazel Logo

Yesterday, my team open-sourced Bazel, the build system Google uses for most of its software. We have been working on open-sourcing Bazel for over a year, extricating dependencies, renaming and refactoring, and jumping through legal and political hoops. We were still missing a lot of stuff we wanted to add, but we thought it would be useful to get a less complete project out there and start getting some feedback from “friends and family.” So we hit the “make public” button on Github and IM-ed some friends. “We on Hacker News yet?” someone joked. We checked. We were. Over the next half-hour, we rose to #1 on Hacker News and stayed there all day. Twitter exploded with hundreds of tweets about Bazel and we started getting a constant stream of issues and pull requests. Our “press the button on GitHub” meeting turned into an all-day war room, responding to users and fixing documentation and setup issues.

It was exhilarating and amazing. I knew that a lot of people were excited to try Bazel, but this response has exceeded all of my expectations.

I hope that everyone will bear with us as we work the kinks out. Hugely important missing pieces that I can think of off the top of my head:

  • No binaries available – you have to compile from source.
  • No externally available continuous integration – no guarantee the code is actually compiling.
  • Terrible setup process – you have to manually add a WORKSPACE file and symlink the tools/ directory to make your project buildable with Bazel.

Please do give Bazel a try if you’re interested, give us feedback, and, if you hate it now, give it another try in a couple of months when we actually launch!

Making wedding rings

This weekend, Andrew and I made our own wedding rings. We’ve been married for several years, but we never got around to getting rings. We found out about a guy in NYC who does ring-making workshops: you come to his studio and spend a day making personalized, custom rings. It was fun, and now we have very special rings!

Here’s what we started with at 10am:

startingpoint

We made each other’s bands, so Andrew used the thin piece and I used the thick piece. The long strip was for the side rails.

We were using palladium, which we had to anneal (make hot) to make it bendy. When palladium gets hot it turns purple, which is interesting:

annealed

Once it was bendy, we used pliers to bend it into something roughly ring-shaped:

grawr

It was more D-shaped, but the point was just to connect the ends, which we soldered together:

soldering

After some shaping, we had to hammer the bands so that we could get the “beaten” appearance we wanted:

hammering

We had to re-anneal the rings several times while beating them, so they wouldn’t get brittle:

annealing

Then we had to create two more “rings” for rails and stick everything together with solder:

separate

Finally we polished the things:

polishing

It took us twelve hours, but I think they came out pretty good:

finished

I highly recommend it as a fun and romantic way to get wedding rings.

Laptops are getting smaller all the time

As a “thank you” for hosting an intern this summer, Google gave me a little Android figurine. When I took it out of its box, a little backpack fell out, too. The backpack actually zipped and unzipped, but it didn’t have anything in it. So I decided to make a Macbook Air for it.

First, I made the Apple logo at a reasonable size as a small tube:

IMG_0178

Then I rolled out the tube to miniaturize the logo:

IMG_0180

I didn’t roll it out quite evenly enough, so I lost the leaf. However, the apple’s shape came out pretty well:

IMG_0181

I sliced off a piece of my “Apple tube” and dropped it into a grey rectangle for the Macbook body. Then I had to add back in the leaf. The logo was so tiny at this point that even the tip of a pin was a little big for the amount of clay I was working with:

IMG_0182

Finally, I baked it and put it in the backpack!

14 - 1

Teaching CS

I taught my first AP CS class on Thursday. I was wearing a Google teeshirt (it was a “nice” one, have to dress up for the first day of school) so the first thing the students asked me was, “Do you work for Google?” Then: “Can we visit Google?” And: “Will this help us get an internship there?”

I started out with a little “why learn programming/why programming is cool” spiel. I showed them Abundant Music and let them try out my Cardboard, both of which seemed to impress them. Next week, we’re going to discuss Net Neutrality!

cardboard

Sharing Programming

I’m going to be volunteer teaching AP computer science this fall at a NYC high school! Aside from actually prepping them for the AP exam, I’ve been thinking about how to share the programming culture I love with the students. Off the top of my head, I’d like to tell them about:

Stuff you can do to program for fun:

  • Hackathons
  • Game jams
  • Project Euler

Where programmers hang out:

  • Github
  • StackOverflow
  • HackerNews
  • IRC

Programming culture and history:

  • Basic security: anyone can come up with a scheme that even they, themselves cannot break.
  • How the internet/websites work.
  • Notables in the field: stories about Stallman, Knuth, Linus.
  • Cartoons: XKCD… there must be others.
  • Obfuscated C (and other languages) contests.
  • Read Joel Spolsky’s blog.

I’m sure there’s loads of stuff I’m missing. Any other ideas?

I will gladly write a test Tuesday for a program today

When I started at Google last year, I was really impressed by their testing. Every C++ class had three files: a <classsname>.h file, a <classsname>.cc, and a <classname>_test.cc. Every time something new is implemented, it has to be tested. The code review tool even warns you if you add a new .h without an accompanying _test.cc.

The upside to this is that I am very sure that my code does what I want. There are, of course, still bugs, but generally they’re of the “I hadn’t thought of that case” rather than the “I didn’t implement it the way I meant to” variety.

assert(IsZebraPrint()) hits an edge case.

assert(IsZebraPrint()) hits an edge case.

A side effect is that writing tests forces a decent separation of concerns. If you’re throwing around singletons and hiding twenty layers of functionality in a class’s privates, you’re going to have a bad time. Conversely, if you’re making things testable, each class essentially becomes a wrapper for the resource below it: “I take a database connection and add some query logic,” “I take a storage wrapper and add some app-specific logic,” “I take app responses and present it to the user.” The whole application falls into beautiful, simple layers like a mille-feuille cake.

The downside is that writing tests is so. slow. It often takes me three times as long to write a test than it did to write the code. I think that, if you’re working at a startup, it’s actually probably not a good idea to have a culture of testing like this because it will slow down your coding so much. For most startups, getting something to market in 33% of the time that 90% works is much more important than getting it to 99%. In fact, Google hasn’t always had this culture. If you look a the dark corners of the code base, there are tons of old, untested classes.

I count myself as lucky to know the guy who actually inspired the testing culture that Google has now: Mike Bland. He’s been writing a series of articles on testing for Martin Fowler’s site. If you’re interested in testing, I recommend reading them.

I would gladly pay you Tuesday for a hamburger today.

Innards of Tar

The La Brea Carpets

I’ve been working with tar files a lot lately and I haven’t been able to find a good example of what a tar file looks like, byte-by-byte. The specification is the best reference I’ve found for how tar files are structured, but it isn’t exactly friendly. Here’s an interactive breakdown of what tar files look like on the inside.

First, we’ll make a directory and some files:

$ mkdir tar_test
$ cd tar_test
~/tar_test$ mkdir subdir0 subdir1 subdir2
~/tar_test$ echo content > file0
~/tar_test$ echo content > subdir1/file0
~/tar_test$ echo content > subdir2/file0

Feel free to put whatever files you want in here, it’s a pretty easy-to-understand format. If you’re feeling frisky, add some symlinks.

Now tar them up:

~/tar_test$ tar cvvf tar_test.tar *
-rw-r----- k/k     6 2014-05-15 16:29 file0
drwxr-x--- k/k     0 2014-05-15 16:29 subdir0/
drwxr-x--- k/k     0 2014-05-15 16:30 subdir1/
-rw-r----- k/k     6 2014-05-15 16:30 subdir1/file0
drwxr-x--- k/k     0 2014-05-15 16:30 subdir2/
-rw-r----- k/k     6 2014-05-15 16:30 subdir2/file0

And check out your tar file to make sure everything looks alright:

~/tar_test$ tar tf tar_test.tar
file0
subdir0/
subdir1/
subdir1/file0
subdir2/
subdir2/file0

Tar files are organized into blocks of 512 bytes. Basically, the format of a tar file is:

Block # Description
0 Header
1 Content
2 Header
3 Content

If the content is longer than one block, it’ll be rounded up (so if you have a 1300-byte file, the tar entry will look like Header-Content-Content-Content). If an entry has no content (e.g., a directory or symbolic link) it only takes up one block. So, our tar file looks like:

Block # Description
0 Header for file0
1 Content of file0
2 Header for subdir0
3 Header for subdir1
4 Header for subdir1/file0
5 Content of subdir1/file0
6 Header for subdir2
7 Header for subdir2/file0
8 Content of subdir2/file0

Eight 512-byte blocks adds up to 4KB, but if we ls -lh the .tar, we get something bigger:

~/tar_test$ ls -lh tar_test.tar 
-rw-r----- 1 k k 10K May 16 15:19 tar_test.tar

There’s always an extra 1KB of 0s tacked onto the end of a .tar’s content as a footer, and there’s an implementation-dependent size tars are blocked up into (called the blocksize, which is different than the blocks discussed above). On my Linux machine, tar creates the 10KB archive shown above, on my OS X machine, it’s only 5.5KB.

Now we’re going to really look at the contents of the tar file, using hexdump. 512 bytes is 0x200 in hexidecimal, so each 200 is a new block in the archive.

~/tar_test$ hexdump -C tar_test.tar | more

You can see that the archive starts with the first entry’s filename:

00000000  66 69 6c 65 30 00 00 00  00 00 00 00 00 00 00 00  |file0...........|

Hexdump elides all-zero portions of the file, so the next interesting bit is the rest of the header:

00000060  00 00 00 00 30 30 30 30  36 34 30 00 30 36 30 31  |....0000640.0601|
00000070  34 35 34 00 30 30 31 31  36 31 30 00 30 30 30 30  |454.0011610.0000|
00000080  30 30 30 30 30 30 36 00  31 32 33 33 35 32 32 31  |0000008.12335221|
00000090  36 36 35 00 30 31 31 33  33 32 00 20 30 00 00 00  |665.011332. 0...|

Here are what the numbers are you’re seeing (you can look up these fields in the pax spec):

0000640
Mode (note that these are ASCII numbers: the byte values of ‘0’ is 30)
0601454
UID
0011610
GID
00000000008
Size
12335221665
mtime
011332
chksum
0
typeflag

Typeflag is the most interesting field here: it indicates the type of file (0 for normal files, 5 for directories). It can also b “x” to indicate an “extended header.” Extended headers are used to define your own fields or override fields in the header. For example, the header said that the mtime was 12335221665, but we could override that in an extended header with mtime=12345678901. If you have an extended header, the entry ends up taking an extra kilobyte of storage: one block for the extended header, and one block for a “normal” header which is identical to the initial header except contains the actual file type instead of “x”. So you’d have:

Block # Description
0 Header for file0 (typeflag=x)
1 Extended header of key=value pairs of attributes for file0
2 Header for file0 (typeflag=0)
3 Content of file0

The next part of the header is for links, so it’s all 0 for these normal files and directories. Then you finish up the header with:

00000900  00 75 73 74 61 72 20 20  00 6b 00 00 00 00 00 00  |.ustar  .k......|
00000910  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000920  00 00 00 00 00 00 00 00  00 6b 00 00 00 00 00 00  |.........k......|
00000930  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

“ustar” is a “magic” string that gives the tar format. The “k”s are my username and group name.

At 0x200 is the actual file content:

00000200  66 69 6c 65 30 0a 00 00  00 00 00 00 00 00 00 00  |content.........|

Then at 0x400, then next block (subdir0’s header) starts:

00000400  73 75 62 64 69 72 30 2f  00 00 00 00 00 00 00 00  |subdir0/........|

This is what tar looks like “under the covers.” It’s a lot more sparse than I thought it’d be, but I guess that’s where gzip comes in.

TEALS – Teaching CS on your way to work, part 2

If you’re in NYC and thinking about volunteering, there is another TEALS information session tonight.

After my last post on TEALS, Dan Goldin generously offered to answer some questions about his experience teaching students in Kentucky (remotely from NYC).

What class are you teaching? What are they learning now?

I’m currently teaching AP Computer Science at Lee County High School in Beatyville, KY. We’ve just finished covering the material that will be on the test and are having some fun with graphics and going over some practice AP questions.

beattyville

How is it teaching remotely?

It’s challenging. I expected it to be tough on the technology front with the internet and screen sharing not always working but it’s surprisingly difficult to do the administrative work online. Since it’s remote, students will be submitting homework, quizzes, and tests online or via fax and then email. This makes grading and jotting down notes more difficult than it would be with paper. Another challenge is keeping everyone engaged which requires more effort remotely than in person. At the same time, we visited the school and it was great meeting the students and the teachers. Many people have been saying that schools and colleges will start teaching remotely and this is a great opportunity to see how it actually works and what challenges can arise.

Have there been any things that surprised you?

When I first started teaching the class I approached from a college lecturer angle but quickly discovered that that approach didn’t work with high school students. With high school students it’s important to make sure everyone is engaged with requires knowing what topics the students will have trouble with and multiple ways of presenting that information. The other surprise was how different students have different learning approaches. For some, just hearing an overview is enough while others need to visualize it to understand while others need to try it out and play with it in code before they get it.

A big surprise was how much school administration time takes up. There are field trips and club meetings that will take some students out of the classroom which makes it difficult to keep everyone on the same page since different students will be missing different topics.

Anything tougher/less tough than you anticipated?

The toughest problem has been figuring out lesson plans that will appeal to different types of students and making sure each of the students are moving at the same pace. Some students will get concepts quickly while others need a bit of reinforcement. In that situation you have to balance keeping the advanced student interested while other students may need more help. Especially in computer science where concepts build on top of one another, it’s easy to get behind so it’s dangerous to move too quickly.

I expected that we’d run into a ton of technical difficulties but for the most part we’ve been pretty successful. We’ve been using Microsoft’s Lync web conferencing software that makes it easy for us to both share our screens as well as log in to the students’ sessions so we can provide one-on-one feedback. Even with the remoteness it doesn’t feel as if we’re that far apart.

What kind of time commitment is it?

In addition to the full time teacher at the school there are 4 volunteers. Two are the main teachers and two are the teaching assistants so the work gets distributed. In our case, my coteacher, Gabe, and I alternate teaching days so we only have to prep for 2.5 days a week. In the beginning when we were ramping up we spent a lot more time ramping up and doing the administrative work but now that we’re comfortable I would say that each week involves a pretty even split between teaching and the administrative side. I would say when I started I spent around 6 hours a week on the class and now it’s closer to 4.

What to you do for a living?

I work as a data scientist/engineer at a startup in New York called TripleLift.

triplelift

What would you tell a coworker who was interested in volunteering?

Give it a shot! I think it’s a great way to give back and get people interested in computer science. I know I’ve gotten lucky with my schooling that led me to where I am and it’s awesome being able to provide that experience to others.

Anything else you’d like to share about it?

Teaching remotely is only a small part of the TEALS program and most are done locally at nearby schools before the work day starts. Right now the TEALS program is looking to expand so if you have any interest in volunteering definitely attend an info session or reach out to me if you have any questions. Editor’s note: if you leave a comment, Dan will make sure to get back to you.



——————–

Thank you so much, Dan! And, I have to say, this is the first time I’ve heard someone not complain about a video conference system, so props to Microsoft Lync.

kristina chodorow's blog