Low-fat Skylark rules – saving memory with depsets

In my previous post on aspects, I used a Bazel aspect to generate a simple Makefile for a project. In particular, I passed a list of .o files up the tree like so:

  dotos = [ctx.label.name + ".o"]
  for dep in ctx.rule.attr.deps:
    # Create a new array by concatenating this .o with all previous .o's.
    dotos += dep.dotos
 
  return struct(dotos = dotos)

In a toy example, this works fine. However, in a real project, we might have tens of thousands of .o files across the build tree. Every cc_library would create a new array and copy every .o file into it, only to move up the tree and make another copy. It’s very inefficient.

Enter nested sets. Basically, you can create a set with pointers to other sets, which isn’t inflated until its needed. Thus, you can build up a set of dependencies using minimal memory.

To use nested sets instead of arrays in the previous example, replace the lists in the code with depset and |:

  dotos = depset([ctx.label.name + ".o"])
  for dep in ctx.rule.attr.deps:
    dotos = dotos | dep.dotos

Nested sets use | for union-ing two sets together.

“Set” isn’t a great name for this structure (IMO), since they’re actually trees and, if you think of them as sets, you’ll be very confused about their ordering if you try to iterate over them.

For example, let’s say you have the following macro in a .bzl file:

def order_test():
  srcs = depset(["src1", "src2"])
  first_deps = depset(["dep1", "dep2"])
  second_deps = depset(["dep3", "dep4"])
  src_and_deps = srcs | first_deps
  everything = second_deps | src_and_deps
 
  for item in everything:
    print(item)

Now call this from a BUILD file:

load('//:playground.bzl', 'order_test')
order_test()

And “build” the BUILD file to run the function:

$ bazel build //:BUILD
WARNING: /usr/local/google/home/kchodorow/test/a/playground.bzl:7:5: dep1.
WARNING: /usr/local/google/home/kchodorow/test/a/playground.bzl:7:5: dep2.
WARNING: /usr/local/google/home/kchodorow/test/a/playground.bzl:7:5: src1.
WARNING: /usr/local/google/home/kchodorow/test/a/playground.bzl:7:5: src2.
WARNING: /usr/local/google/home/kchodorow/test/a/playground.bzl:7:5: dep3.
WARNING: /usr/local/google/home/kchodorow/test/a/playground.bzl:7:5: dep4.

How did that code end up generating that ordering? We start off with one set containing src1 and src2:

Add the first deps:

And then create a deps set and add the tree we previously created to it:

Then the iterator does a postorder traversal.

This is just the default ordering, you can specify a different ordering. See the docs for more info on depset.

kristina chodorow's blog