Instapaper Text

Implementing a True realloc() in C++

Introduction

As anyone who’s programmed using C++ would know, the language only provides two operators for memory management: new and delete. No realloc()-like operator or function is available. One can certainly use std::vector or other similar container which wraps the management of the array, handling the dynamic resizing within itself, however the resize implementation is usually rather “dumb”. Let’s take a quick look at what steps a typical std::vector resize implementation would perform:

Allocate a new array 50% larger than the existing array size (operator new[]),
Copy all of the array’s contents into the new array (operator =),
Free the old array (operator delete[]),
Additionally, if the type has any constructors or destructors, these are also called for every element during relocation.

Admittedly, given the fact that only the new[] and delete[] operators are available in C++, this is the best that can be accomplished. While this implementation is fine for most general use cases, game development is typically not general, and there are also real-life cases with data sets where this implementation can be extremely inefficient and sometimes even prohibitively expensive.

Comparing std::vector To realloc()

As stated above, when resizing an array with std::vector, there must be enough available free memory to store a complete copy of the array, plus an extra 50% in order to be able to expand it. On embedded systems with very little total memory, this can be a serious issue. If a 512KB sized array needed to be expanded, even for only a few extra elements, a contiguous free block of 768KB would be necessary, temporarily requiring a total of 1.25MB of memory. This could be rather difficult on a system that has, for example, only 4MB of memory available.

The copying of the data from the old array into the new one can also potentially be a slow operation, depending on the size of the data set. If the array is hundreds of megabytes large, the time required to copy it is not minuscule, and could actually be quite slow if the OS has paged out any portion of that memory out to the swap file. It is due to this copy that storing pointers directly to elements inside an array is not possible with std::vector because the array contents end up relocated elsewhere in memory after a resize.

Additionally, after the array contents have been relocated and the older array has been freed, a hole is created in the heap causing increased memory fragmentation. This hole creation can be confirmed with the following test code, which pushes values into a std::vector until it resizes:

#include <iostream>
 
  #include <vector>
 
  using namespace std;
 
   
 
  int main(void)
 
  {
 
  	size_t cap;
 
  	vector<int> foo;
 
   
 
  	// push one element into the vector to allocate it
 
  	foo.push_back(1);
 
  	cap = foo.capacity();
 
  	cout << "ptr=" << &foo[0] << ", capacity=" << cap << endl;
 
   
 
  	// continue to push elements until the capacity is exceeded, forcing a resize
 
  	for (int t = foo.size(); t <= cap; t++)
 
  		foo.push_back(t);
 
   
 
  	cap = foo.capacity();
 
  	cout << "ptr=" << &foo[0] << ", capacity=" << cap << endl;
 
  	return 0;
 
  }

Upon execution, the output shows:

ptr=0x2192010, capacity=1 ptr=0x2192030, capacity=2

The difference between the pointers show that a hole of 32 bytes has been created at the original location of the vector. While 32 bytes by itself is not much, multiple resizes will result in the size of the hole increasing, as the array held by the vector moves upward in memory.

The C library has provided the realloc() function for resizing memory blocks for decades. It allows a block previously allocated by malloc() to be expanded or shrunk in place, without relocation of the block contents, as long as there is enough free space available after the block to expand itself. If there is not enough free space available after the block to perform the expansion, only then is the block relocated to a different location in memory.

Unfortunately realloc() cannot be directly used on arrays allocated with operator new[] in C++ as realloc() only handles low level memory allocation, and does not call any constructors or destructors that the type may contain. It is possible to call constructors and destructors manually though, and this effectively gives us the ability to create a wrapper around realloc() which can work as expected with C++ arrays.

However, there is one remaining issue of the array count variable itself which is maintained internally by the compiler-generated code at runtime. Since this is the most complicated part of this article, let’s start with it first.

The Magic Counter

Behind the scenes, the compiler stores a value containing the number of elements in the array which is placed directly before the actual pointer returned by operator new[] when that type has a non-trivial destructor. This value is used to determine the number of elements to call the destructor on when the array is freed with operator delete[].[1] That last sentence is important — if the type being allocated has a trivial destructor, then the compiler does not need to call it on delete[], and hence no count is ever stored for that type. Whether the type has a trivial constructor or not is irrelevant, because the number of constructors to call is known to the compiler as the size is directly specified by the parameter to operator new[].

Let’s break this down into a table so it’s a bit easier to understand:

Trivial Constructor	Trivial Destructor	Has Array Counter
 
  -----------------------	-----------------------	-----------------------
 
  Yes			Yes			No
 
  No			Yes			No
 
  Yes			No			Yes
 
  No			No			Yes

If you aren’t familiar with the differences between a trivial and a non-trivial constructor/destructor, you might want to read my previous post on the subject.

So what exactly is this array counter? Unfortunately, there is no standard definition and it is left to however the compiler implements it. While Microsoft Visual C++ uses an int type (32 bits) for 32-bit and 64-bit architectures, both gcc and clang use the size_t type which changes size based on the architecture. By simply modifying this array count value, we can directly affect how many elements the compiler generated code should destroy when operator delete[] is called.

Let’s start by first writing a class which can access the array count for those types which have a non-trivial destructor. Given an arbitrary pointer to a memory block allocated by operator new[], we should allow the array count to be retrieved and modified. By routing all accesses through this class, any future changes that might need to be made to our implementation will be kept here in one place.

template <typename T>
 
  class ArrayRealloc
 
  {
 
  public:
 
  // handle differences between compiler implementations
 
  #ifdef _MSC_VER
 
  	typedef int		CountT;
 
  #else
 
  	typedef size_t		CountT;
 
  #endif
 
   
 
  public:
 
  	ArrayRealloc(T *ptr)
 
  	{
 
  		// calculate pointer to base of memory
 
  		base = reinterpret_cast<CountT *>(ptr) - 1;
 
  	}
 
   
 
  	size_t count(void) const
 
  	{
 
  		// return count of items in array
 
  		return (size_t)*base;
 
  	}
 
   
 
  	void set(size_t newcnt)
 
  	{
 
  		// change count of items in array
 
  		*base = (CountT)newcnt;
 
  	}
 
   
 
  private:
 
  	CountT *	base;
 
  };

Next, let’s follow up with some simple test code which utilizes our new class:

#ifndef __MACH__
 
   #include <malloc.h>
 
  #else
 
   #include <objc/malloc.h>
 
  #endif
 
  #include <cstdlib>
 
  #include <iostream>
 
  using namespace std;
 
   
 
  static int val = 0;
 
  class foocd
 
  {
 
  public:
 
  	int id;
 
  	foocd(void) : id(val++) { cout << "ctor: id=" << id << endl; }
 
  	~foocd(void) { cout << "dtor: id=" << id << endl; }
 
  };
 
   
 
  int main(void)
 
  {
 
  	foocd *cd = new foocd[5];
 
  	ArrayRealloc<foocd> arr(cd);
 
  	cout << "count=" << arr.count() << endl;
 
  	arr.set(3);
 
  	cout << "count=" << arr.count() << endl;
 
  	delete []cd;
 
  	return 0;
 
  }

The above test code allocates five elements of foocd, then uses our ArrayRealloc class to directly access the count of elements in the array. We then call count() to obtain the count of elements, display it, change the number of elements in the array to three, and display the count again. Finally, the array is deleted using the standard C++ operator delete[]. If you compile and run this code, you will get the following output:

ctor: id=0 ctor: id=1 ctor: id=2 ctor: id=3 ctor: id=4 count=5 count=3 dtor: id=2 dtor: id=1 dtor: id=0

Since we changed the count to three, the compiler-generated code for operator delete[] only deletes the first three elements in the array. While this is not a true reallocation by itself, it does show that we can successfully modify the array count value held by the compiler.

Performing the Reallocation

In order to actually perform a reallocation on a memory block, we need to be able to request the memory manager or OS to reallocate it for us. We will use the platform independent C library realloc() here since it is portable, but on Windows the HeapReAlloc() function is also available. Of course, if you have your own memory manager, you will most likely want to use that instead. Regardless of the API you use, it is important to remember that pointers should not be passed across different memory managers — for example, do not attempt to use a pointer that was obtained from malloc() with HeapReAlloc() as it won’t work!

So to be able to use realloc() on an array allocated with operator new[] and delete[], we will first need to override them with the C library’s malloc() and free() functions:

// override operator new[] and delete[] so they use the same memory manager as realloc()
 
  void *operator new[](size_t size) throw(bad_alloc)
 
  {
 
  	return ::malloc(size);
 
  }
 
   
 
  void operator delete[](void *addr) throw()
 
  {
 
  	::free(addr);
 
  }

Now we can revisit our ArrayRealloc class again, this time adding a resize() function to it:

template <typename T>
 
  class ArrayRealloc
 
  {
 
  public:
 
  	// ...
 
   
 
  	T *user(void) const
 
  	{
 
  		// return pointer to user memory
 
  		return reinterpret_cast<T *>(base + 1);
 
  	}
 
   
 
  	T *resize(size_t newcnt)
 
  	{
 
  		// reallocate the array to the specified size, plus the space needed for the array count
 
  		void *newptr = ::realloc(base, sizeof(CountT) + (newcnt * sizeof(T)));
 
   
 
  		// update our base pointer as realloc might have moved the memory block
 
  		base = static_cast<CountT *>(newptr);
 
   
 
  		// update the array count, and return the pointer to reallocated user memory
 
  		set(newcnt);
 
  		return user();
 
  	}
 
   
 
  	// ...
 
  };

Note that the pointer passed to realloc() is the true base of the memory block, already adjusted by the ArrayRealloc constructor. The size passed also accounts for the extra size required for the array count value itself, and multiplies the specified count by the size of the type so we can specify the new count in elements rather than bytes.

Generic Object Construction and Destruction

Now that we can reallocate an array and control the number of elements in it, we need to be able to construct and destruct individual objects at will. Although we can’t call the constructor for an object directly, it is possible to construct an object at a specific memory location by using placement new. To destruct an object, C++ allows us to simply call the destructor directly.

As was noticeable from our test earlier, when the constructors for an array of objects are called with operator new[], they are called in order of increasing addresses. When these objects are destroyed, operator delete[] calls them in reverse order, so we need to ensure we replicate that same functionality:

class GenericObject
 
  {
 
  public:
 
  	template <typename T>
 
  	static void construct(T *base, size_t count)
 
  	{
 
  		// create a pointer to the end of memory to begin construction
 
  		T *end = base + count;
 
   
 
  		// call the default constructor for each object using placement new, in increasing order
 
  		while (base < end)
 
  		{
 
  			new (static_cast<void *>(base)) T();
 
  			base++;
 
  		}
 
  	}
 
   
 
  	template <typename T>
 
  	static void destruct(T *base, size_t count)
 
  	{
 
  		// create a pointer to last object to destroy
 
  		T *end = base + (count - 1);
 
   
 
  		// call the destructor for each object directly, in reverse order
 
  		while (end >= base)
 
  		{
 
  			end->~T();
 
  			end--;
 
  		}
 
  	}
 
  };

Putting It All Together

Now that we have all of the components necessary to perform a proper reallocation of a C++ array, we lastly need to deal with how to detect if a given type has a trivial constructor and/or destructor or not.

The compiler-supplied type trait functions __has_trivial_constructor() and __has_trivial_destructor() have been available since Visual C++ 2005 and gcc-4.3/clang-2, and pretty much do exactly what their names say they do. Both of these functions return boolean values and are available at compile time, so we can use them as arguments to a template:

1 2	template <typename T, bool has_constructor = __has_trivial_constructor(T), bool has_destructor = __has_trivial_destructor(T)> class Reallocate;

This allows us to separately specialize each of the four possibilities that were described in our table earlier. Let’s start with the easiest specialization, where a type has both a trivial constructor and destructor. Since this type has no array count, it can be simply reallocated with a single call to realloc():

template <typename T>
 
  class Reallocate<T, true, true>
 
  {
 
  public:
 
  	static T *reallocate(T *ptr, size_t newcnt)
 
  	{
 
  		// both trivial constructor and destructor, so just call realloc directly
 
  		return static_cast<T *>(::realloc(ptr, newcnt * sizeof(T)));
 
  	}
 
  };

Next, let’s tackle the case where a type has both a non-trivial constructor and destructor. Since this case contains an array count, it will need to utilize our ArrayRealloc class, as well as ensure both the constructors are called when expanding and the destructors are called when shrinking:

template <typename T>
 
  class Reallocate<T, false, false>
 
  {
 
  public:
 
  	static T *reallocate(T *ptr, size_t newcnt)
 
  	{
 
  		ArrayRealloc<T> arr(ptr);
 
   
 
  		// do nothing if reallocating to the same size
 
  		if (arr.count() == newcnt)
 
  			return ptr;
 
   
 
  		// check if we are shrinking
 
  		if (newcnt < arr.count())
 
  		{
 
  			// destruct objects, then resize
 
  			GenericObject::destruct(ptr + newcnt, arr.count() - newcnt);
 
  			return arr.resize(newcnt);
 
  		}
 
  		else
 
  		{
 
  			// resize first
 
  			size_t oldcnt = arr.count();
 
  			arr.resize(newcnt);
 
   
 
  			// then construct the new objects
 
  			GenericObject::construct(arr.user() + oldcnt, arr.count() - oldcnt);
 
  			return arr.user();
 
  		}
 
  	}
 
  };

Now let’s handle the mixed cases. Next up is when a type has a trivial destructor, but a non-trivial constructor. Although this case does not require us to call any destructors on shrinking, we must call new constructors when expanding. Because there are no destructors to call, and the compiler does not supply an array count in this case, we will need to somehow obtain the current size of the memory block to determine how many elements there exist in the array.

With Microsoft Visual C++, the function _msize() allows us to properly determine the size of a block allocated by malloc(). Unfortunately with the GNU C library and on OSX, no such function exists. Both the functions malloc_usable_size() on GNU, and malloc_size() on OSX return the size of the allocated block plus any extra padding that may have been allocated by the memory manager. This throws off the calculation of the number of elements, and cannot be used to properly call constructors on an expansion.

If you are using the GNU C library or OSX with malloc(), a simple solution to this problem would be to store the size of the allocated block inside the memory block itself during the malloc() operation, then reference that size when performing the reallocation. Another solution would be to use a completely different memory manager other than malloc() and realloc(), but that is outside of the scope of this article. A somewhat non-solution is to avoid this single specialization entirely, and never attempt to reallocate an array with a type that has a constructor but no destructor.

In any case, I will be using malloc_size() and malloc_usable_size() in the implementation below:

template <typename T>
 
  class Reallocate<T, false, true>
 
  {
 
  public:
 
  	static T *reallocate(T *ptr, size_t newcnt)
 
  	{
 
  		// first determine current array count (no count is stored since the destructor is trivial)
 
  #ifdef _MSC_VER
 
  		size_t cnt = _msize(ptr) / sizeof(T);
 
  #elif defined(__MACH__)
 
  		size_t cnt = malloc_size(ptr) / sizeof(T);
 
  #else
 
  		size_t cnt = malloc_usable_size(ptr) / sizeof(T);
 
  #endif
 
   
 
  		// do nothing if reallocating to the same size
 
  		if (cnt == newcnt)
 
  			return ptr;
 
   
 
  		// realloc to new size (there are no destructors to call)
 
  		T *newptr = static_cast<T *>(::realloc(ptr, newcnt * sizeof(T)));
 
   
 
  		// if we expanded, call constructors for those elements
 
  		if (newcnt > cnt)
 
  			GenericObject::construct(newptr + cnt, newcnt - cnt);
 
   
 
  		return newptr;
 
  	}
 
  };

The last mixed case is when a type has a trivial constructor but a non-trivial destructor. No constructor needs to be called on expansion, but the destructors will need to be called on a shrink operation. Since the compiler supplies an array count for this type, we can utilize our ArrayRealloc class:

template <typename T>
 
  class Reallocate<T, true, false>
 
  {
 
  public:
 
  	static T *reallocate(T *ptr, size_t newcnt)
 
  	{
 
  		ArrayRealloc<T> arr(ptr);
 
   
 
  		// do nothing if reallocating to the same size
 
  		if (arr.count() == newcnt)
 
  			return ptr;
 
   
 
  		// check if we are shrinking
 
  		if (newcnt < arr.count())
 
  		{
 
  			// destruct objects, then resize
 
  			GenericObject::destruct(ptr + newcnt, arr.count() - newcnt);
 
  			return arr.resize(newcnt);
 
  		}
 
  		else
 
  		{
 
  			// simply resize (no constructor to call)
 
  			return arr.resize(newcnt);
 
  		}
 
  	}
 
  };

Now we can wrap our template up with an easy to use one-line function to automatically detect the type when a pointer is passed as a parameter:

template <typename T>
 
  T *reallocate(T *ptr, size_t newcnt)
 
  {
 
  	return Reallocate<T>::reallocate(ptr, newcnt);
 
  }

And that’s it — we have a completed implementation of a C++ operator realloc[]-like function.

Testing It

Now that we have all of the code written, it only seems appropriate to ensure it works properly with a test. We’ll add a simple function to return the count of elements in an array first, utilizing the ArrayRealloc class:

template <typename T>
 
  size_t count(T *ptr)
 
  {
 
  	ArrayRealloc<T> arr(ptr);
 
  	return arr.count();
 
  }

We’ll also utilize our foocd class from earlier, but this time with a different main() function that calls our new reallocate() function above:

int main(void)
 
  {
 
  	foocd *cd = new foocd[4];
 
  	cout << "ptr=" << cd << ", count=" << count(cd) << endl;
 
  	cd = reallocate(cd, 3);
 
  	cout << "ptr=" << cd << ", count=" << count(cd) << endl;
 
  	cd = reallocate(cd, 6);
 
  	cout << "ptr=" << cd << ", count=" << count(cd) << endl;
 
  	delete []cd;
 
  	return 0;
 
  }

When run, the following output will be displayed (of course, your pointers are likely to be different than the ones shown here):

ctor: id=0 ctor: id=1 ctor: id=2 ctor: id=3 ptr=0x20d7018, count=4 dtor: id=3 ptr=0x20d7018, count=3 ctor: id=4 ctor: id=5 ctor: id=6 ptr=0x20d7018, count=6 dtor: id=6 dtor: id=5 dtor: id=4 dtor: id=2 dtor: id=1 dtor: id=0

This test shows a C++ array being resized dynamically at runtime, properly calling the constructors and destructors for the class at each reallocation. It can also be noted that unlike std::vector, the array was never relocated in memory as the pointer for the array did not change after any call to reallocate().

Of course the other three code paths should be tested as well, but instead of cluttering up this already long article with more test code, I will simply attach the complete code, tests and all, to the bottom of this post. You are more than welcome to download it and test it for yourself.

Caveats

When realloc() cannot expand the memory block and performs a relocation, operator = is not called on the objects when they are moved. If this is a problem for your code, check the section below on how this issue can be worked around.

It is also obvious from this implementation that it assumes that the count for the array is stored at the memory address located in the space directly before the array contents. While it is completely possible that a future version of some compiler might do something completely different — for example, storing the counts for all arrays in a separate memory location not accessible by the user — in my opinion this is unlikely to ever happen. Storing a tiny four- or eight-byte value in a different location means extra overhead in maintaining that memory for the compiler-generated code, and this will have an adverse impact on the application itself which is not in the interest of compiler designers.

Future Ideas and Possibilities

First you will most likely want to wrap this code into a container of your own to use as a replacement for std::vector. I won’t go into that here, but if you would like me to cover this in a future post, please let me know in the comments.

Although it is not possible with the C library realloc() function, the HeapReAlloc() function available on Windows platforms allows an optional HEAP_REALLOC_IN_PLACE_ONLY flag to be specified, ensuring that a memory block is never relocated. If there is not enough space to expand a block, HeapReAlloc() instead returns a failure code and leaves the block untouched. By making use of this flag, it opens up a couple of possibilities:

A container can be created allowing permanently fixed blocks, so maintaining pointers to the internal memory of the container remains safe, and iterators would never become invalid. When more space is needed, the block is expanded as usual. When it can no longer expand, an additional block is allocated and then the blocks are internally chained together. As more items are added, the block on the end continues to expand, until it also reaches a maximum size. The process is then repeated. This container could provide performance similar to std::vector, but would be less wasteful on memory and cause less fragmentation than compared to std::list, with the added benefit that objects retain their fixed locations once they are inserted into the container.
As described in the caveats above, operator = is not called on objects when using realloc(). When HeapReAlloc() returns a failure on expansion, a new block can be allocated manually, then objects can be assigned with operator =. This would effectively provide an identical implementation as std::vector.

Conclusion

It is rather unfortunate that C++ never designed a true realloc() operator into the language itself, but by utilizing the code presented here, it is possible to work around this limitation. Common sense alone should dictate that it can be orders of magnitudes faster to simply update a few words in memory to expand a memory block, rather than to copy the entire block to a different location in memory. Perhaps one day the designers of the C++ language will notice that it is important to be as efficient with memory and performance as possible, and provide us a better mechanism to do so in a future revision.

As promised, here is the .zip containing a single .cpp file which contains all of the source posted in this article, plus the extra tests. It will compile unmodified with Microsoft Visual C++ 2005/2008/2010, gcc-4.3 or higher, clang-2 or higher, and run as expected on Windows, Linux and OSX (and most likely other platforms as well!): ArrayRealloc.zip

References

“[38.7] How do compilers use ‘over-allocation’ to remember the number of elements in an allocated array?” — C++ FAQ Lite

#AltDevBlogADay

Michael Tedder
Follow @bpoint42

Implementing a True realloc() in C++

Introduction

Comparing std::vector To realloc()

The Magic Counter

Performing the Reallocation

Generic Object Construction and Destruction

Putting It All Together

Testing It

Caveats

Future Ideas and Possibilities

Conclusion

References

#AltDevBlogADay

Michael Tedder Follow @bpoint42

Implementing a True realloc() in C++

Introduction

Comparing std::vector To realloc()

The Magic Counter

Performing the Reallocation

Generic Object Construction and Destruction

Putting It All Together

Testing It

Caveats

Future Ideas and Possibilities

Conclusion

References

Michael Tedder
Follow @bpoint42