A Neat C Preprocessor Trick

I’ve been looking at Clang and they define lexer tokens in a way that I thought was clever.

The challenge is: how do you keep a single list of language tokens but use them as both an enum and a list of strings?

Clang defines C token types in a file, TokenKinds.def, with all of the names of the different C language tokens (pretend C only has four tokens for now):

#ifndef TOK
#define TOK(X)
#endif
 
TOK(comment)
TOK(identifier)
TOK(string_literal)
TOK(char_constant)
 
#undef TOK

If you just #include this file, the preprocessor defines TOK(X) as “” (nothing), so the whole thing becomes an empty file.

However! When they want a declaration of all possible tokens that could be used, they makes an enum of this list like this:

enum TokenKind = {
#define TOK(X) X,
#include "clang/Basic/TokenKinds.def"
    NUM_TOKENS
};

Because TOK is defined when TokenKinds.def is included, the preprocessor will spit out something like:

enum TokKind = {
    comment,
    identifier,
    string_literal,
    char_constant,
    NUM_TOKENS
};

This has the nice property that you can check if a type is valid by making sure that it is less than NUM_TOKENS. But if we’re going to put the tokens into that enum, woudln’t it be clearer just to put them there, instead of in a separate file? Maybe, but doing it this way gives them a nice way to get a string representation of the types, too. In another file, they do:

const char* const TokNames[] = {
#define TOK(X) #X,
#include "clang/Basic/TokenKinds.def"
    0
};

“#X” means that the preprocessor replaces X and surrounds it in quotes, so that turns into:

const char* const TokNames[] = {
    "comment",
    "identifier",
    "string_literal",
    "char_constant",
    0
};

Now if they have a token, they can say TokNames[token.kind] to get the string name of that token. It lets them use the token types efficiently, print them out nicely for debugging, and not have to maintain multiple lists of tokens.

  • David Schneider

    Using Boost’s preprocessor library you can have it even more flexible. No separate file needed: http://pastebin.com/TKG9W9TL
    PS. example compiles with boost 1.44

  • Boostfag

    Are you aware of ugly fuck you just linked ? Good luck boosting your shit.
    FYI:
    http://harmful.cat-v.org/software/c++/linus

  • kristina1

    Dude, don’t be a dick.

  • kristina1

    Cool! I like learning different ways of implementing this kind of thing, it’s a neat mental exercise.

kristina chodorow's blog