Upgrading assert() using the preprocessor

Instapaper Text

Upgrading assert() using the preprocessor

Today we will see how the preprocessor and some lesser-used C++ features can be used to enhance the standard assert functionality, shipping with the C runtime.

assert() is a very useful tool for ensuring that pre- and post-conditions, as well as invariants, are met upon calling or exiting a function. If you have never used assertions before, start using them now – they will help you find bugs in your own code, and quickly highlight when code is not used in the way originally intended.

What to improve?

There are a few bits missing in the standard assert(). Specifically, the features I miss are the following:

No way to output a (formatted) message to the user.
When a debugger is connected, the assert() does not halt execution in the line the assert() fired, but rather somewhere in assert.c. The new assert should trigger a breakpoint at the exact location the assert fired.
No way to show the values of individual variables used in the condition expression in assert(), only the whole condition is shown.

Consider the following example which will demonstrate the above:

// somewhere in a FIFO implementation, the original assert:
 
  assert(GetSize() < GetCapacity());
 
   
 
  // an improved assert:
 
  ME_ASSERT(GetSize() < GetCapacity(), "Cannot push another value into an already full FIFO.")(GetSize(), m_end, m_read, m_fillCount);

Using the standard assert(), all you know is that the “Assertion GetSize() < GetCapacity() failed.” If the assert fired on anybody else’s PC, this is not really helpful. How large was the FIFO? What did GetSize() yield? Was the FIFO really empty, or were the internal pointers messed up because of e.g. a memory stomp? All questions unanswered.

What we would like is an improved assert() which is able to fill in those gaps, and provide answers to exactly these questions.

Improved syntax

In order to fulfill the requirements stated above, one possible syntax for the new assert could be the following:

ME_ASSERT(condition, message, optional comma-separated list of message parameters)(optional comma-separated list of variables/values);

As we will see later, this syntax allows us to log variables and their values, trigger a debugger breakpoint in the line the ME_ASSERT macro was written, and does not generate any instructions in a retail build. Let’s take a look at a few examples to see how we would like to use our assertion macro:

ME_ASSERT(m_start + i < m_end, "Item %d cannot be accessed. Subscript out of range.", i)(m_start, m_end, m_allocEnd);
 
   
 
  ME_ASSERT(from >= std::numeric_limits::min(), "Number to cast exceeds numeric limits.")(from);
 
   
 
  ME_ASSERT(false, "Key %s could not be found.", key.c_str())();

Before we start worrying about how to put that into a macro, let’s try to come up with the equivalent C++ code first. What we first need is a mechanism which allows us to log a formatted message, and an optional, unlimited number of variables, in that order. Logging itself can be done in any way you want, but the unlimited number of variables needs special treatment. There’s a well-known C++ idiom which achieves something similar, the Named parameter idiom which can be used to initialize (an unlimited amount of) class members in any order. In the same vein, we can use a temporary class instance and call its member functions returning a reference-to-self, which nicely fits the bill for our purposes:

class Assert
 
  {
 
  public:
 
    // logs a formatted message internally
 
    Assert(const char* file, int line, const char* format, ...);
 
   
 
    Assert& Variable(const char* const name, bool var);
 
    Assert& Variable(const char* const name, char var);
 
    Assert& Variable(const char* const name, short var);
 
    Assert& Variable(const char* const name, int var);
 
    // more overloads for built-in types...
 
   
 
    // generic template
 
    Assert& Variable(const char* const name, const T& value);
 
  };
 
   
 
  // example usage
 
  Assert(__FILE__, __LINE__, "Item %d cannot be accessed. Subscript out of range.", i).Variable("m_start", m_start).Variable("m_end", m_end);

In the example above, a temporary instance of the Assert class is created on the stack, and the Variable() member-function can then be used to chain an unlimited number of Variable()-calls together. With this system in place, we can output both a variable’s name and its value for built-in types as well as user-defined types (e.g. Strings, Vectors, etc.). The former will use one of the existing overloads, the latter will use the member function template and assume that the given type either offers a c_str() method or a << stream operator.

Debugger break

Remember that we wanted to trigger a debugger breakpoint in the same line the assert has fired, after all the variable’s states have been logged. Triggering a breakpoint is not too hard by itself:

#if !ME_MASTER
 
  #  define ME_BREAKPOINT (IsDebuggerConnected() ? __debugbreak() : ME_UNUSED(true))
 
  #else
 
  #  define ME_BREAKPOINT ME_UNUSED(true)
 
  #endif

The ME_BREAKPOINT macro triggers a breakpoint if a debugger is connected (IsDebuggerConnected() is a platform-dependent implementation returning whether a debugger is connected or not, e.g. IsDebuggerPresent() on Windows), and does nothing otherwise. Furthermore, the macro evaluates to ME_UNUSED(true) in retail builds, not generating any instructions (as we will see later).

Using the comma operator, we can use the above macro to trigger a breakpoint whenever an assert fires:

Assert(__FILE__, __LINE__, "Item %d cannot be accessed. Subscript out of range.", i).Variable("m_start", m_start).Variable("m_end", m_end), ME_BREAKPOINT;

The comma operator ensures that the temporary Assert’s destructor has been called before the breakpoint is triggered, hence both the formatted message and the variables (and their values) will have been logged already. Of course, one thing we need to make sure is that the temporary Assert instance is created and the breakpoint triggered only if the assert’s condition is not met. This can simply be done by using the conditional (?:) operator, like in the following example:

(m_start + i < m_end) ? (void)true : (Assert(__FILE__, __LINE__, "Item %d cannot be accessed. Subscript out of range.", i).Variable("m_start", m_start).Variable("m_end", m_end), ME_BREAKPOINT);

Note the extra parentheses before the Assert and after the breakpoint! Without these, the ME_BREAKPOINT statement would not be part of the second conditional operand, but rather be its own statement – meaning that a breakpoint would always be triggered, no matter if the condition was met or not (due to the operator precedence of the conditional operator and the comma operator).

Of course we don’t want users of the new assert to write code like this – it is much too error-prone and lengthy. Furthermore, we cannot eliminate the unneccessary code in retail builds without manual user intervention. Macros to the rescue!

Stuffing everything into a macro

Putting everything into a macro requires quite some preprocessor magic, which we’ll try to tackle now. First, let’s try to stuff the left part of the example above into a macro:

#define ME_ASSERT(condition, format, ...) (condition) ? (void)true : Assert(__FILE__, __LINE__, "Assertion \"" #condition "\" failed. " format, __VA_ARGS__)

As you can surely see, this will expand into the following:

// macro
 
  ME_ASSERT(m_start + i < m_end, "Item %d cannot be accessed. Subscript out of range.", i);
 
   
 
  // expansion
 
  (m_start + i < m_end) ? (void)true : Assert(__FILE__, __LINE__, "Assertion \" m_start + i < m_end \" failed. Item %d cannot be accessed. Subscript out of range.", i);

Stuffing the optional list of variables into a macro is admittedly somewhat harder. Because of supporting printf-style formatted messages, our macro already is variadic (note the … as macro parameter), so how can we offer an additional, variable number of arguments? The following clearly doesn’t work:

#define ME_ASSERT(condition, format, ..., ...) // huh?

There’s no way to distinguish where one list of arguments ends, and where the next starts. That is, a variable number of arguments (…) must always be the last argument to a macro. But, by introducing an additional pair of parentheses, we can “start” the expansion of any other (variadic) macro, because you can think of the preprocessor running multiple passes as long as new function macros are detected. A simple example will show what I’m talking about:

#define SIMPLE_MACRO(format, ...) Assert(format, __VA_ARGS__) ANOTHER_MACRO
 
  #define ANOTHER_MACRO(...) // doing something with __VA_ARGS__

Putting on our preprocessor hat, we can see that the following happens:

In the first pass, SIMPLE_MACRO() gets expanded into some source code, ending in ANOTHER_MACRO. So if you write “SIMPLE_MACRO(foo, bar, a, b, c)()” (note the parentheses), it will get expanded into “Assert(foo, bar, a, b, c) ANOTHER_MACRO()”.
In the second pass, the preprocessor finds ANOTHER_MACRO() and recognizes it as another function macro. Hence, you can put a variable number of arguments into the parentheses, and they will be the arguments to the ANOTHER_MACRO macro.

This way, we can have a variable number of arguments to the formatted message, and a variable number of arguments for our list of variables. But still, we need a macro which expands every single argument of a __VA_ARGS__ parameter into whatever we want. Specifically, we want to turn

(a, b, c)

into

.Variable("a", a).Variable("b", b).Variable("c", c)

This would allow us to expand the list of variables into .Variable()-calls on the temporary Assert instance.

Excursion: A journey into preprocessor land

In order to get the job done, what we first need is a macro which expands the given arguments into an operation, called with the correct arguments. Something like the following does the trick:

#define ME_PP_EXPAND_ARGS_1(op, a1) op(a1)
 
  #define ME_PP_EXPAND_ARGS_2(op, a1, a2) op(a1) op(a2)
 
  #define ME_PP_EXPAND_ARGS_3(op, a1, a2, a3) op(a1) op(a2) op(a3)
 
  #define ME_PP_EXPAND_ARGS_4(op, a1, a2, a3, a4) op(a1) op(a2) op(a3) op(a4)
 
  // other macros omitted...

This works, but assumes that the caller already knows the amount of arguments provided. What we need is a variadic macro, which dispatches to any of the above macros depending on the number of arguments given. This can be done by first counting the number of arguments (using a separate macro), and then combining the result with the name of the macro – an example will make it clear:

// variadic macro "dispatching" the arguments to the correct macro.
 
  #define ME_PP_EXPAND_ARGS(op, ...) ME_PP_JOIN(ME_PP_EXPAND_ARGS_, ME_PP_NUM_ARGS(__VA_ARGS__)) ME_PP_PASS_ARGS(op, __VA_ARGS__)

Ok, let us try to decipher the above step by step, by assuming that we used the macro like this: ME_PP_EXPAND_ARGS(op, a, b)

ME_PP_NUM_ARGS is the macro which “counts” the number of arguments given, and yields 2 in this case. An example of how such a macro can be implemented can be found here.
ME_PP_JOIN is a simple macro which joins two arguments, by “stitching” them together. In the example above, the result would be ME_PP_EXPAND_ARGS_2 (note the underscore at the end of the first argument), which in turn is a macro itself!
ME_PP_PASS_ARGS is used to pass the given macro arguments to the ME_PP_EXPAND_ARGS_2 macro (working around preprocessor bugs in Visual Studio).
Finally, the resulting ME_PP_EXPAND_ARGS_2(op, __VA_ARGS__) is expanded once again by the preprocessor.

Note that the argument op can be a macro itself! With the above in place, we can turn the arguments “(a, b, c)” into “.Variable(“a”, a).Variable(“b”, b).Variable(“c”, c) using the following:

#define ME_ASSERT(condition, format, ...) (condition) ? ME_UNUSED(true) : (Assert(__FILE__, __LINE__, "Assertion \"" #condition "\" failed. " format, __VA_ARGS__) ME_ASSERT_IMPL_VARS
 
   
 
  #define ME_ASSERT_IMPL_VARS(...) ME_PP_EXPAND_ARGS ME_PP_PASS_ARGS(ME_ASSERT_IMPL_VAR, __VA_ARGS__), ME_BREAKPOINT)
 
   
 
  #define ME_ASSERT_IMPL_VAR(variable) .Variable(#variable, variable)

Again, ME_ASSERT() is a variadic macro ending in ME_ASSERT_IMPL_VARS, which in turn is a variadic macro itself. ME_ASSERT_IMPL_VARS expands and forwards the given arguments to ME_ASSERT_IMPL_VAR(), which is exactly what we use for automatically expanding our list of variables into Variable()-calls on the temporary Assert instance.

Finally, we can use our improved assertion facility. The following shows the output of an assertion which fired:

ME_ASSERT(m_start + i < m_end, "Item %d cannot be accessed. Subscript out of range.", i)(m_start, m_end, m_allocEnd);

Array.inl(146): [Assert] Assertion "m_start + i < m_end" failed. Item 10 cannot be accessed. Subscript out of range.
 
  Array.inl(146): [Assert]   o Variable m_start = 0x2260608 (pointer)
 
  Array.inl(146): [Assert]   o Variable m_end = 0x2260608 (pointer)
 
  Array.inl(146): [Assert]   o Variable m_allocEnd = 0x2260630 (pointer)
 
  molecule_core_d.exe has triggered a breakpoint

And a debugger breakpoint will conveniently halt execution at Array.inl(146).

Retail builds

The last piece of the puzzle still missing is how to get rid of all the extra code in retail builds. Most C/C++ programmers know about the following trick, which is e.g. used for silencing compiler warnings about unused variables:

// i is unused
 
  int i = 10;
 
  (void)i;

You can combine this trick with the sizeof() operator, and have the compiler accept every possible, legal expression in C++ without generating any instructions at all, because sizeof() will always be evaluated at compile-time. This means that something like

(void)sizeof(a < b);

will never generate any instructions in the executable. (Ab)using the sizeof() operator this way, we can simply change our ME_ASSERT macro to behave as follows in retail builds:

#define ME_ASSERT(condition, format, ...)	ME_UNUSED(condition), ME_UNUSED(format), ME_UNUSED(__VA_ARGS__), ME_UNUSED

ME_UNUSED is a macro employing the trick stated above, turning all given arguments into an “empty” statement. Hence, in retail builds, all arguments to the assert as well as the list of variables will expand into pure “nothingness”:

// macro
 
  ME_ASSERT(m_start + i < m_end, "Item %d cannot be accessed. Subscript out of range.", i)(m_start, m_end, m_allocEnd);
 
   
 
  // expanded
 
  (void)sizeof(m_start + i < m_end), (void)sizeof("Item %d cannot be accessed. Subscript out of range."), (void)sizeof(i), (void)sizeof(m_start), (void)sizeof(m_end), (void)sizeof(m_allocEnd);

This also makes sure that the compiler will not warn about any unused local variables or similar, which is a nice side effect.

Performance

One thing we haven’t talked about yet is performance. Debug builds have a tendency to become slower during development anyway, so we should make sure that our improved assert doesn’t totally kill peformance.

It should be quite clear that each ME_ASSERT can only have two outcomes – either the assert fires, or it doesn’t. We’re not so much concerned about performance in the former case, because when an assert fires, the amount of cycles/instructions it needed don’t really matter. But we should take a closer look at the performance of non-triggered asserts.

Non-optimized, debug builds

Again, let’s work on an example by using on of the heavier asserts introduced before:

ME_ASSERT(m_start + i < m_end, "Item %d cannot be accessed. Subscript out of range.", i)(m_start, m_end, m_allocEnd);

The generated assembly (Visual Studio 2010 SP1, x86) will look something like the following:

012F7B50  mov         eax,dword ptr [this]  
 
  012F7B53  mov         ecx,dword ptr [eax+4]  
 
  012F7B56  mov         edx,dword ptr [i]  
 
  012F7B59  lea         eax,[ecx+edx*4]  
 
  012F7B5C  mov         ecx,dword ptr [this]  
 
  012F7B5F  cmp         eax,dword ptr [ecx+8]  
 
  012F7B62  jae         core::Array::operator[]+39h (12F7B69h)  
 
  012F7B64  jmp         core::Array::operator[]+0DBh (12F7C0Bh)
 
  012F7B69  
 
    lots of instructions omitted:
 
      setting up parameters
 
      pushing offsets to strings (e.g. "Assertion "m_start + i < m_end" failed.") onto the stack
 
      calling the Assert constructor
 
      calling Assert::Variable
 
      calling IsDebuggerConnected
 
  012F7C0A  int         3
 
  012F7C0B  mov         edx,dword ptr [this]  
 
    rest of code

Don’t worry, there’s no need to completely understand all of the above. The important things are the following:

The first instructions load m_start and m_end from memory (those are member variables).
The cmp and subsequent jae/jmp instructions carry out the comparison (m_start +i < m_end).
If the condition is not true, the next instruction to be executed is at address 012F7B69.
If the condition is true, the next instruction to be executed is at address 012F7C0B.

So in the case of the assert not firing, there’s some loads, a compare, and a jump instruction to execute. That’s not too much instruction-wise, but the code has to jump over a lot of instructions not executed (about 160 bytes), which will cause instruction-cache misses. If that causes too big a performance hit, one possible solution is to use somewhat more light-weight asserts deep inside your codebase, e.g. in leaf functions which get called a lot.

A possible implementation of such a light-weight assert could be the following:

#define ME_LIGHTWEIGHT_ASSERT(cond)		(cond) ? true : __debugbreak()

This is about as simple as it gets, and it will only ever do a comparison (and any loads needed), and a jump. The pattern to look for in the assembly code would then look like this:

cmp         this,that
 
  jne         fire 
 
  jmp         nofire
 
  fire:       
 
    int 3 
 
  nofire:
 
    rest of code

__debugbreak is an intrinsic function available in Visual Studio which generates the int 3 instruction, which in turn triggers a breakpoint inside the debugger. Similar functions/instructions exist on PowerPC as well, therefore such a light-weight assert can be built for any console platform as well.

Optimized builds

Even though the compiler is smarter in optimized builds and manages to eliminate some unnecessary memory loads, the code for calling the assert constructor, setting up parameters and everything still takes about 100 bytes, so instruction-cache misses will likely be caused in those builds as well. Again, the light-weight assert can be of help here – in optimized builds, the compiler will be smart enough to get rid of one of the jmp instructions, generating even less code.

Static asserts

Another thing worth mentioning is that this post only enhanced run-time asserts. There’s another type of assertions, so-called static or compile-time asserts. The new C++11 standard defines the static_assert keyword for that purpose, and Visual Studio 2010 and GCC already support it.

On other compilers, you can build your own static asserts using some template magic (found in the book “Modern C++” by Andrei Alexandrescu), or compiler-dependent features.

Download

The complete assert implementation can be downloaded zlib license), and comes with the following:

General preprocessor stuff like ME_PP_IF and ME_PP_TO_BOOL.
ME_JOIN macro to concatenate two tokens, even when the tokens are macros themselves.
ME_PP_IS_EMPTY macro for checking whether a variadic macro had an empty argument list or not.
ME_PP_NUM_ARGS macro for counting the number of arguments to a variadic macro.
ME_PP_EXPAND_ARGS macro for expanding an arbitrary amount of arguments into an operation, called with the correct arguments.
ME_UNUSED macro for turning any legal C++ expression into nothing, not generating any instructions.
ME_BREAKPOINT macro for triggering a breakpoint when a debugger is attached.
ME_ASSERT macro as described above, along with a simple implementation of the Assert class.

Word of warning: most of the preprocessor magic is not for the faint of heart.

(This post is a more detailed cross-post of this blog entry).

#AltDevBlog

Stefan Reinalter