The C99 standard officially introduced flexible array members. You may also know them as Unsized Arrays. They are a useful tool for any game programmer working in C or C++, who must take care of memory usage. Despite being over 10 years old, and a widely supported construct, usable in most C and C++ compilers, they are little used and less talked about. In this article I intend to introduce the reader to their proper use in C++.

To begin, consider this toy example class, that implements a pascal style string. Note: I have cut many C++ stylistic corners for the sake of brevity:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  
struct PascalString
 
  {
 
      size_t mLength;
 
      char* mBytes;
 
   
 
      PascalString(const char* c_string)
 
          : mLength( strlen(c_string) )
 
          , mBytes( new char[mLength] )
 
      {
 
          memcpy(mBytes, c_string, mLength );
 
      }
 
   
 
      ~PascalString() {
 
          delete[] mBytes;
 
      }
 
   
 
      // more stuff ...
 
  };

The essence of this class is that it is a data structure that stores a pointer to a block of memory, and a count of how many elements there are in that data block. It also has construction and destruction behaviour implemented, so that the book keeping work of acquiring and releasing of the memory is handled automatically for users of the class.

An example of how this type might be used follows:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  
ostream& operator<< (ostream& out, const PascalString& s)
 
  {
 
      for (int i=0;i!=s.mLength;++i)
 
          out << s.mBytes[i];
 
      return out;
 
  }
 
   
 
  int main (int argc, const char * argv[])
 
  {
 
      PascalString* p = new PascalString("Hello, World!\n");
 
   
 
      std::cout << *p;
 
   
 
      delete p;
 
      return 0;
 
  }

Note that the object is created on the heap, and destroyed manually, with the “new” and “delete” operators. While for such a simple case p could have been a value object placed on the stack, the use pattern shown is more typical of production code, since the lifetime of game objects is usually more complex than this, and the allocations made are entirely relevant to the discussion!

So what’s wrong with this code? Nothing, really. It’s perfectly fine. If you are happy with the trade offs that have been made. The compromise implicit in the design.

This class trades space and time for simplicity of use and flexibility. This basic structure can be fleshed out to be a fully fledged value type. We could, from this starting point, implement assignment, concatenation, and so on and so forth.

These are perfectly fine trade offs, until they’re not. Optimisation is often about fitting the functionality implemented as tightly to the requirements as is feasible. If your data holding object is only created and interrogated, not modified (a situation that vastly more common in games development than is typically assumed), if the amount of data is known at the time the object is created, then the flexible array feature of C99 gives us a language level tool for creating code that more tightly fits that requirement.

Section 6.7.2.1, paragraph 16 of the C99 standard describes flexible arrays as follows:

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

So instead of having a pointer as the last member of a struct, we can declare a flexible array member, like so:

1
 
  2
 
  3
 
  4
 
  5
 
  
struct PascalString
 
  {
 
      size_t mLength;
 
      char mBytes[];
 
      // ...

The array at the end of the structure allows you to allocate a variable-sized block of memory in which to store the struct, and the array elements, thus avoiding the run-time execution cost of an extra pointer dereference.

If you apply the sizeof operator to this structure, the array adds nothing to it’s size (alignment padding not withstanding). To get the true size of an object of this type, you need to consider the array size as well.

So now to complete the picture, we need to provide a new way to create and destroy objects of this type. Since the amount of memory required to allocate the struct itself now depends on the size of the data stored, and the compiler can’t rely on sizeof semantics, some of our construction logic has to be pulled outside the constructor, and the use of “new” and “delete” is no longer appropriate (you can make them private, to ensure they are never used “accidentally”):

5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  
    // ...
 
      static PascalString* Create(const char* c_string) {
 
          // measure:
 
          size_t length = strlen(c_string);
 
          // allocate enough memory for the "struct" and the data it manages, in one block:
 
          PascalString* result = (PascalString*)malloc(sizeof(PascalString) + length);
 
          // initialise the members 
 
          result->mLength = length;
 
          memcpy(result->mBytes, c_string, result->mLength );
 
          return result;
 
      }
 
   
 
      static void Destroy(PascalString* p_string) {
 
          free(p_string);
 
      }
 
  };

Thus creation is changed for the user like so:

10
 
  11
 
  
    // PascalString* p = new PascalString("Hello, World!\n");
 
      PascalString* p = PascalString::Create("Hello, World!\n");

And destruction becomes:

14
 
  15
 
  
    // delete p;
 
      PascalString::Destroy( p );

If you already hide your object lifetime management behind some kind of factory interface then there should be very little code to change. If on the other hand, your game is littered with calls to new and delete, then it’ll be more work to apply this optimisation.

So what benefits can we expect to see from a change like this? To demonstrate, I created a simple allocation tracker. For the trivial Hello World example, I got the following results: Unoptimised, Cumulative & Peak: 2 Allocations, 32 Bytes.
With Flexible Array Optimisation, Cumulative & Peak 1 Allocation, 32 Bytes.

So as expected, we have halved the number of allocations required to create objects of this type on the heap. Given sizeof(PascalString) is 16 bytes unoptimised, and 8 bytes with flexible array, and that the string “Hello, World!\n” is 14 bytes long, we have also learnt that the allocator for my operating system appears to allocate in 16 byte blocks.

In order to give you a clearer picture I also wrote a test function that creates 1000 objects of various sizes and puts them into an array. In this test the unoptimised version used 1032192 bytes, and the flexible array version used 1016192, a saving of roughly 2%. This measurement doesn’t include any savings in memory from the reduced allocation count, which depends very much on how your memory manager is implemented. Allocator book keeping can be significant when working with small objects. Given the per-allocation overhead can be anywhere between 2 – 16 bytes (or more, but lets be reasonable here), the true savings in memory use are closer to 3%, or 32k.

Now just think what you could do with that extra 32k!

The cumulative time savings from reduced allocation count, and the improved data locality of keeping the count adjacent to the data in memory are left as an exercise for the reader.

Disclaimer: While the technique described in this article has been successfully applied as an optimisation in a production environment, and was a critical part of getting a game running inside the fixed memory limits of a console platform, please keep in mind the effectiveness of this approach is entirely dependant on the existing code base and data set. Your milage may vary.