(Also posted to series of posts about Vectors and Vector based containers.)
This post is essentially a response to feedback to this previous post.
In that post I talked about a change we made to the initialisation semantics for PathEngine’s custom vector class, and described a specific use case where this can make a difference, with that use case involving calls to the vector resize() method.
In the comments for that post, Herb Sutter says:
Thomas, I think the point is that “reserve + push_back/emplace_back” is the recommended style and the one people do use. Really resize() is not used often in the code I’ve seen; you use resize() only when you want that many extra default-constructed elements, since that’s what it’s for.
In this post I’ll be looking into our use case in a bit more detail, and in particular whether or not a resize() call is actually required.
Avoiding unnecessary initialisation
As a result of writing reader feedback to previous posts (including Herb’s comments) I’ve realised that I’ve got a bit of a bad habit of sizing vectors directly, either in the vector constructor, or with calls to vector resize(), where in many cases it’s actually more efficient (or better coding style) to use reserve().
We can see an example of this in the first post in this series, where I posted some code that constructs a vector with a fixed size and then immediately clears the vector, as a way of avoiding memory allocations:
std::vector<cFace> openQueueBuffer(20); openQueueBuffer.clear();
Memory allocations are bad news for performance, and avoiding unecessary allocations should be the first priority in many situations, but there’s also a secondary semantic issue here related to element initialisation and it’s better general purpose vector code to use the resize() method for this, as follows:
std::vector<cFace> openQueueBuffer; openQueueBuffer.reserve(20);
The difference is that the first version asks for initialisation of 20 cFace objects (with what actually happens depending on the way cFace constructors are set up), whereas the second version doesn’t require any initialisation (independantly of how cFace is defined).
Note that this issue doesn’t just apply to vector construction – exactly the same issue applies to vector resize() operations where vector size is increasing.
In PathEngine, because cFace is a POD class, but also because of the way initialisation is implemented in PathEngine’s vector class, both versions actually work out as doing the same thing, i.e. no initialisation is performed in either case. But could we change the PathEngine source code to avoid calling resize() and so avoid the need for non-standard vector initialisation semantics?
Resize benchmark
In my previous post I posted the following minimal benchmark to shows how initialisation semantics can make a difference to timings.
template <class tVector> static int ResizeBenchmark() { int sum = 0; for(int i = 0; i != 400; ++i) { tVector v; v.resize(100000); v.back() += 1; sum += v.back(); } return sum; }
In the comments Herb points out that we shouldn’t actually use code like this in many situations, but rather something like the following (avoiding the resize call and resulting element initialisation):
template <class tVector> int RealResizeBenchmark() { int sum = 0; for(int i = 0; i != 400; ++i) { tVector v; v.reserve(100000); for( int j = 0; j <100000; ++j ) v.emplace_back( GetNextInt() ); // or replace above loop with "v.assign( src, src+srclen );" // if your cVector code would do a bulk memcpy v.back() += 1; sum += v.back(); } return sum; }
Lets look at the buffer copy version, specifically (i.e. with the call to v.assign()), and set up an updated version of our benchmark to use this construction method.
We’ll use std::vector to set up some source data, initially, for the buffer copy:
int sourceDataLength = 100000; std::vector<int> source; source.reserve(sourceDataLength); for(int i = 0; i != sourceDataLength; ++i) { source.push_back(i); }
And then the benchmark can be rewritten as follows:
template <class tVector> static int ResizeBenchmark(const int* sourceData, int sourceDataLength) { int sum = 0; for(int i = 0; i != 1000; ++i) { tVector v; v.reserve(sourceDataLength); v.assign(sourceData, sourceData + sourceDataLength); sum += v.back(); } return sum; }
This could be written more concisely with direct construction from iterators but reserve and assign are closer to what we would do in our original use case where we’re actually reusing an existing vector.
There’s a problem, however, when we try and apply this benchmark to PathEngine’s cVector. cVector only provides a subset of the std::vector interface, and doesn’t provide an assign() method, or construction from iterators, (illustrating one of the potential disadvantages of rolling your own vector, btw!), so we end up putting resize() back in for the cVector version of the benchmark:
template <class tVector> static int ResizeBenchmark_NoAssign(const int* sourceData, int sourceDataLength) { int sum = 0; for(int i = 0; i != 1000; ++i) { tVector v; v.resize(sourceDataLength); memcpy(&v[0], sourceData, sourceDataLength * sizeof(v[0])); sum += v.back(); } return sum; }
I ran this quickly on my main desktop machine (Linux, Clang 3.0, libstdc++ 6) and got the following results:
container type | build | time | sum |
---|---|---|---|
std::vector | release | 0.0246 seconds | 99999000 |
cVector (default initialisation) | release | 0.0237 seconds | 99999000 |
I’m not going to go into more benchmarking specifics because (as with the original benchmark) the point is just to show whether or not there is an issue, and the exact timings values obtainer aren’t really very meaningful.
But, yes, with this version of the benchmark (construction as buffer copy), and with std::vector used in this way it’s fairly clear that there’s basically nothing in it, and therefore no longer any advantage to modified initialisation semantics.
But what if..
But what if we want to load this data directly from a file?
With resize(), we can do something like the following (with good old low level file IO):
void LoadFromFile(const char* fileName, cVector<char>& buffer) { FILE* fp = fopen(fileName, "rb"); // error checking ommitted for simplicity fseek(fp, 0, SEEK_END); buffer.resize(ftell(fp)); if(!buffer.empty()) { fseek(fp, 0, SEEK_SET); fread(&buffer.front(), 1, buffer.size(), fp); } fclose(fp); }
At first glance this seems a bit harder to rewrite ‘the right way’ (i.e. with reserve() instead of resize()), without doing something like loading each individual element separately, and without explicitly creating a separate buffer to load into. After a short search, however, it turns out we can do this by using stream iterators, as described in this answer on stackoverflow.
Let’s implement another version of LoadFromFile() then, without resize(), as follows:
void LoadFromFile(const char* fileName, std::vector<char>& buffer) { std::ifstream source(fileName, std::ios::binary); std::vector<char> toSwap((std::istreambuf_iterator<char>(source)), std::istreambuf_iterator<char>()); buffer.swap(toSwap); }
I used the following code to knock up a quick benchmark for this:
template <class tVector> int FileSum(const char* fileName) { tVector buffer; LoadFromFile(fileName, buffer); int sum = 0; for(int i = 0; i != buffer.size(); ++i) { sum += buffer[i]; } return sum; }
Calling this for a largish file (on my main desktop with Linux, Clang 3.0, libstdc++ 6, release build) gave me the following:
container type | Method | time |
---|---|---|
std::vector | iterators | 0.1231 seconds |
std::vector | resize+fread | 0.0273 seconds |
cVector | resize+fread | 0.0241 seconds |
So, on this machine (and build etc.) stream iterators work out to be a lot more expensive than ‘simple’ direct file IO to a buffer.
Note that I threw in a call to the old style file IO version for std::vector as well, for completeness, and it looks like the overhead for extra buffer initialisation in resize() is actually insignificant in comparison to the cost of using these stream iterators.
I guess this could depend on a lot of things, and I haven’t checked this across different machines, different standard library implementations, and so on, and I don’t have any practical experience to draw on with optimising C++ stream iterator use, but this does show quite clearly that it’s nice to have an option for direct low level buffer access for some situations where performance is critical.
Full disclosure
In fact, for portability reasons, the PathEngine runtime doesn’t actually include any direct filesystem accesses like those shown in the code above. Loading from persistence is handled by client side code loading data into a buffer in memory (using the relevant operating system file loading methods) and then passing this buffer into PathEngine load calls.
So we could theoretically switch to using std::vector assign() and get rid of this specific use case for changing vector initialisation semantics.
But there are some other reasons why I prefer not to do this.
Dependencies and low level interfaces
The use case I showed in my last post involved the following methods on a custom PathEngine stream object:
void openNextElementAsArray_GetSize(tUnsigned32& arraySize); void finishReadElementAsArray(char* destinationBuffer);
This is low level, works with raw pointers, and is not really modern c++ coding style, but I actually prefer this in many ways to passing in a std::vector reference or templated iterator arguments. It’s nice to avoid dependencies on a vector class, for example, or the additional complexity of templating, and it’s nice to be able to use these methods with buffers that are not managed as vectors.
Some of PathEngine is written as quite low level code, and I think that one of the great things about the STL is how easy it is to interface STL components with lower level code. Being able to taking pointers into the buffer of an STL vector for low level buffer access is one example of this, but sometimes this kind of low level buffer access implies direct vector resizing, and resizing vectors without element initialisation can then often be desirable.
Working with partially initialised buffers
Another use case we come across sometimes in PathEngine is when we want a buffer to store some kind of data corresponding with a given set of index values, but don’t need to set values for all elements, and we have information available somewhere else which tells us which elements have actually got meaningful values set.
One example could be where we want to form a sequence of elements from a large set of potential elements, and so we set ‘next’ values for each element currently included in our sequence, but we don’t need to initialise next values for elements not included in the sequence at all.
This is perhaps a less common use case, however, and I’m not sure how much difference this actually makes to timings in practice.
Wrapping up
It’s good to know the right way to do things in modern c++. With vectors the right way usually means preferring reserve() over resize(), and this will avoid unnecessary initialisation in many cases (and the need to worry about details of vector initialisation semantics).
In the case of PathEngine however, I quite like being able to mix low and high level code. We still find it useful in some cases to resize() vectors without explicit element initialisation and changing the initialisation semantics in our custom vector class helps to ensure this is as efficient as possible!
** Comments: Please check the existing comment thread for this post before commenting. **