I imagine that when talking about the C preprocessor, most people would probably think of it as a very basic parser which supports defining of symbols for constant values and simple macros, and including other source or header files.  But how many would consider the C preprocessor to be able to become a programmable language in itself?

By using some preprocessor magic, it is possible to evaluate conditionals with the preprocessor at compile time, and generate code in a method similar to how values are evaluated in template metaprogramming.  Of course, boost has it’s own preprocessor library, but if you’re interested in learning how it works internally without having your eyes glaze over when trying to reading through the headers, then read on!

Joining Strings

The magic key necessary to begin our (ab)use of the C preprocessor is the join operator, which simply joins two strings together:

1
 
  
#define JOIN(x,y)		x##y

As an example, we can use this macro to join two strings to represent a variable in our code, like so:

1
 
  2
 
  
const char *foobar = "hello";
 
  const char *str = JOIN(foo, bar);

Which, when preprocessed, converts to the following:

1
 
  2
 
  
const char *foobar = "hello";
 
  const char *str = foobar;

You can see how the content of the JOIN() macro was evaluated, and the result was inserted into the source.

Now what would be really neat is if we could use JOIN() to combine two preprocessor symbols together.  Unfortunately, with our JOIN() macro as it is, any symbol passed as a parameter would be expanded by the preprocessor before being joined.  This can be worked around by first wrapping the macro, then using a delayed evaluation trick like so:

1
 
  2
 
  
#define JOIN(x,y)		JOIN2(x,y)
 
  #define JOIN2(x,y)		x##y

Now that we can generate symbols properly, let’s start building some logic into the preprocessor!

Binary Logic

Since our preprocessor language will use binary logic for evaluation, we need a method to convert integers (which we’ll be using in our code) to a boolean value.  This can be done fairly easily by creating a simple table:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  
#define TO_BOOL_0		0
 
  #define TO_BOOL_1		1
 
  #define TO_BOOL_2		1
 
  #define TO_BOOL_3		1
 
  #define TO_BOOL_4		1
 
  #define TO_BOOL_5		1
 
  #define TO_BOOL_6		1
 
  #define TO_BOOL_7		1
 
  #define TO_BOOL_8		1

We can then define our conversion function to be:

1
 
  
#define TO_BOOL(x)		JOIN(TO_BOOL_, x)

And as an example, you can see how this will be evaluated by the preprocessor:

1
 
  2
 
  
int zero = TO_BOOL(0);		// TO_BOOL_0 -> 0
 
  int one = TO_BOOL(5);		// TO_BOOL_5 -> 1

I’ve stopped at eight for brevity in the table above, but feel free to add more as required.  Let’s now put this to use by creating a bitwise-not operator:

1
 
  2
 
  
#define OP_NOT_0		1
 
  #define OP_NOT_1		0

Then the macro to perform the bitwise-not would become:

1
 
  
#define OP_NOT(x)		JOIN(OP_NOT_, TO_BOOL(x))

And once again, an example:

1
 
  2
 
  
int notzero = OP_NOT(0);	// OP_NOT_(TO_BOOL_0) -> OP_NOT_0 -> 1
 
  int notone = OP_NOT(5);		// OP_NOT_(TO_BOOL_5) -> OP_NOT_1 -> 0

You can see that by joining multiple symbol strings together, the result can be computed by having the preprocessor simply evaluate the joined #define.  Let’s take this one step further and add bitwise-or and bitwise-and functions, which need to take two parameters:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  
#define OP_OR_00		0
 
  #define OP_OR_01		1
 
  #define OP_OR_10		1
 
  #define OP_OR_11		1
 
   
 
  #define OP_AND_00		0
 
  #define OP_AND_01		0
 
  #define OP_AND_10		0
 
  #define OP_AND_11		1
 
   
 
  #define OP_OR(x,y)		JOIN( JOIN(OP_OR_, TO_BOOL(x)), TO_BOOL(y) )
 
  #define OP_AND(x,y)		JOIN( JOIN(OP_AND_, TO_BOOL(x)), TO_BOOL(y) )

And an example:

1
 
  2
 
  
int one = OP_OR(0, 2);		 // OP_OR_(TO_BOOL_0)(TO_BOOL_1) -> OP_OR_01 -> 1
 
  int zero = OP_AND(5, 0);	 // OP_AND_(TO_BOOL_1)(TO_BOOL_0) -> OP_AND_10 -> 0

This logic can be extended further by adding support for comparison evaluations, such as equal/not-equal/greater-than/less-than, and even elementary math functions such as addition and subtraction.  The only requirement is that the parameters be unsigned integers, and that you have all of #defines necessary to properly represent the resulting value.

Conditionals

In order to build a conditional, we’ll need to evaluate to a macro which takes one or more parameters.  A simple if-then function can then be implemented like so:

1
 
  2
 
  3
 
  
#define OP_IF_0(x)
 
  #define OP_IF_1(x)		x
 
  #define OP_IF(cond,x)		JOIN(OP_IF_, TO_BOOL(cond))(x)

If the cond value is false, then the result will simply be nothing — otherwise, the value x will be returned.  An if-then-else function is just a slight extension of the above:

1
 
  2
 
  3
 
  
#define OP_IF_ELSE_0(x,y)	y
 
  #define OP_IF_ELSE_1(x,y)	x
 
  #define OP_IF_ELSE(cond,x,y)	JOIN(OP_IF_ELSE_, TO_BOOL(cond))(x,y)

And here’s how the preprocessor would expand it:

1
 
  2
 
  
int a = OP_IF_ELSE(0, 5, 2);	// OP_IF_ELSE_(TO_BOOL_0)(5,2) -> OP_IF_ELSE_0(5,2) -> 2
 
  int b = OP_IF_ELSE(7, 5, 2);	// OP_IF_ELSE_(TO_BOOL_7)(5,2) -> OP_IF_ELSE_1(5,2) -> 5

Of course the parameters are not limited to integers.  You can use strings, or any other type as well:

1
 
  
const char *foo = OP_IF_ELSE(1, "foo", "bar");	// OP_IF_ELSE_(TO_BOOL_1)("foo","bar") -> OP_IF_ELSE_1("foo","bar") -> "foo"

One other useful function is to be able to conditionally include a comma based on a value.  Unfortunately a comma cannot be used directly as an argument to a macro, as the preprocessor interprets the comma as an argument separator.  We can work around this by declaring a specific macro for it, though:

1
 
  2
 
  3
 
  
#define COMMA_MARK_0
 
  #define COMMA_MARK_1		,
 
  #define COMMA_IF(cond)		JOIN(COMMA_MARK_, TO_BOOL(cond))

Building a List

Now that we have some basic primitives implemented, it would be nice to be able to generate a comma-separated list of names.  This could then be used to declare variables, or be passed directly as a parameter set to a function.  This can be implemented by utilizing the join operator, and stacking the joins together by expanding the lower level recursively:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  
#define LIST_0(x)
 
  #define LIST_1(x)		x##1
 
  #define LIST_2(x)		LIST_1(x), x##2
 
  #define LIST_3(x)		LIST_2(x), x##3
 
  #define LIST_4(x)		LIST_3(x), x##4
 
  #define LIST_5(x)		LIST_4(x), x##5
 
  #define LIST_6(x)		LIST_5(x), x##6
 
  #define LIST_7(x)		LIST_6(x), x##7
 
  #define LIST_8(x)		LIST_7(x), x##8
 
   
 
  #define LIST(cnt,x)		JOIN(LIST_, cnt)(x)

As an example, we can now declare three integers each with an incrementing name, like so:

1
 
  
int LIST(3, X);			// int X1, X2, X3;

Putting It All Together

So now that we have all of these language-like preprocessor macros, what good are they?  Well, they’re actually very good at generating C++ templates that need to take a variable number of template arguments!

Let’s consider a simple delegate interface.  We need to have different typenames for the return value, and each parameter for the function delegate itself.  Such a template class would typically need to be declared like so:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  
template <typename R>
 
  class Delegate<R (*)()> { ... };
 
   
 
  template <typename R, typename P1>
 
  class Delegate<R (*)(P1)> { ... };
 
   
 
  template <typename R, typename P1, typename P2>
 
  class Delegate<R (*)(P1, P2)> { ... };

…and so on and so forth for as many parameters that are required, duplicating the same code in all of the classes for each parameter count implementation. This is not only ugly, but it is very bad for code maintenance. If we should come across a bug, or if we want to add a feature, we will need to change every single implementation and make sure we didn’t make any mistakes.

Instead, we can consolidate the repetitive parts using our preprocessor language. It should be easy to spot the first change we can make, by using a list:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  
template <typename R>
 
  class Delegate<R (*)()> { ... };
 
   
 
  template <typename R, LIST(1, typename P)>
 
  class Delegate<R (*)(LIST(1, P))> { ... };
 
   
 
  template <typename R, LIST(2, typename P)>
 
  class Delegate<R (*)(LIST(2, P))> { ... };

The bottom two templates are now identical code (aside from the count), but the first template which doesn’t take any parameters is different.  We can make all of these the same by passing zero to the list generator, which if you recall from earlier, expands to nothing:

1
 
  2
 
  
template <typename R, LIST(0, typename P)>
 
  class Delegate<R (*)(LIST(0, P))> { ... };

However this won’t compile because the expanded template <typename R,> is a syntax error.  We can work around this by using our COMMA_IF() macro from earlier:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  
template <typename R COMMA_IF(0) LIST(0, typename P)>
 
  class Delegate<R (*)(LIST(0, P))> { ... };
 
   
 
  template <typename R COMMA_IF(1) LIST(1, typename P)>
 
  class Delegate<R (*)(LIST(1, P))> { ... };
 
   
 
  template <typename R COMMA_IF(2) LIST(2, typename P)>
 
  class Delegate<R (*)(LIST(2, P))> { ... };

And now that all three of our classes have identical code, we can eliminate all but one by placing the class in a header file by itself, and #including it multiple times with a different counter for each time:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  
#define COUNT 0
 
  #include "Delegate.h"
 
  #undef COUNT
 
   
 
  #define COUNT 1
 
  #include "Delegate.h"
 
  #undef COUNT
 
   
 
  #define COUNT 2
 
  #include "Delegate.h"
 
  #undef COUNT

And the final Delegate.h header would simply look something like the following, using the COUNT #define instead of hard coded integers:

1
 
  2
 
  
template <typename R COMMA_IF(COUNT) LIST(COUNT, typename P)>
 
  class Delegate<R (*)(LIST(COUNT, P))> { ... };

Conclusion

There’s really much more that can be done with the C preprocessor than what I’ve shown here in this post.  boost can do some pretty interesting things, and even has the ability to consolidate the #include "Delegate.h" loop above into just a single call, completely avoiding the need to redefine the count for each instance.  If you are interested in diving in further, I recommend checking it out.

So the next time you need to generate source code, don’t count out the C preprocessor!