As I moved further towards programming in a data oriented fashion, I found that I kept needing a ‘blob’ structure. I got sick of writing pointer offset code and casting void pointers to the correct type. Currently, I have four blob abstractions in my engine and I thought I would share them and the motivation behind each one. I’m hoping that, in turn, you will share your blob abstractions and I will get to steal your great ideas!
Some of you may be asking ‘So what is a blob?’. A blob is just a chunk of memory. It does not care the type of data being stored in it, whether or not it’s homogeneous, etc. I differentiate between blobs by the type of access patterns it offers. This is my first blob, the basic blob:
struct BasicBlob { uintptr_t buffer; size_t buffer_size; }; |
That is all the data it stores. It’s either 8 or 16 bytes depending on the pointer size. Aside from initialization, my basic blob only has two methods:
size_t GetBufferSize(); template <typename T> T* GetPtr<T>(size_t byte_offset); |
The purpose of the first method is obvious. The second, GetPtr<T>, does the pointer offset computation and returns a pointer of the requested type.
The most important feature of the basic blob is that it does not own the memory. It only stores an address and a size. This moves the memory allocation policy outside of the blob, where it belongs. As a side effect, the basic blob is a POD-type. Meaning that, instances can be passed by value, re-initialized with different addresses and sizes without side effect. I use this blob everywhere I need to pass a pointer and a size.
The second blob I use is the growing memory blob. Unlike the basic blob and the remaining two, this blob does own the memory. It is initialized with an allocator, has a (dynamic) capacity and a (dynamic) size. This blob could and should be used in your dynamic array and dynamic string classes, which are just dynamically sized contiguous blobs of memory anyway. Also, my file system abstraction relies on this blob when returning the contents of a file on disk.
Like the basic blob, the growing blob offers the GetPtr access method. It adds an Append operation:
template <typename T> int Append<T>(const T* items, int item_count); |
There are a couple of variations on this definition, but you get the picture. It appends item_count * Ts to the buffer. If the buffer is full, the capacity is increased and then the append takes place. I primarily use this blob when I am building larger blobs to be written to disk and later read back in as a read-only resource (a basic blob).
My third, and most complex blob, is a ring buffer blob. This blob does no memory allocation and does not own the memory, but keeps track of a read and write pointer inside the buffer. It has the same Append method as the growing blob. It adds an analogous Consume routine.
A ring buffer, while simple, is not as simple as a linear chunk of memory. Both appending and consuming from a ring buffer must handle the case when the pointer must wrap from the end of the buffer to the beginning. There are also many choices on how to implement a circular buffer
This blob is starting to feel kind of heavy, isn’t it? What if you want to read some data without removing it from the blob or if you just want to append in small chunks and then atomically adjust the write pointer? A good abstraction will always get out of the way and let you, the one with the brain, the programmer, dig inside it’s guts. My ring blob offers that in spades. You can access the read and write pointer through a template function like the above GetPtr<T>. But, what about the discontinuous point at the end of the buffer? You can query for the consecutive amount of bytes each pointer has before it will wrap around back to the beginning, allowing you to split your direct reads and writes around the end of the buffer. You can Peek into the buffer and later Skip those bytes. You can write into the buffer and then MoveWriterPointer over what you wrote.
I use the ring blob primarily as a message passing pipe. One part of my system writes messages with the following header:
struct MsgHeader { uint8_t op_code; uint16_t payload_length; }; |
While another part of the system consumes those messages and reacts to each op code and payload. This avoids the coupling together of two different modules in my code, allowing a module to be completely replaced without impacting the rest of the code base so long as it still responds to the same messages.
The ring blob is great, but it is heavy and I found I wanted something simpler. Also, the ring blob did not work well when dealing network streams and native socket APIs. So, the final blob I use, I call the ‘Append & Chop’ buffer. This blob keeps track of an append pointer and offers the above Append method. The Chop method could also be thought of as a shift operation. When you’re done with the beginning of the buffer you Chop it off and shift all the memory down. Because this blob’s memory is continuous within the region it works better with network stream APIs and is generally simpler.
I want to say something about implementing blobs that also applies to programming in general. Separate your memory allocation policy from your algorithm code. A higher level module should be used to glue the two together in the most optimal way. It always troubles me to see modules that allocate their own memory, restricting programmer freedom to pass in a pointer from inside a blob or on the stack. This is another form of coupling, and coupling sucks.
Another implementation note is that although these blobs share subsets of each others APIs, I don’t use inheritance or virtual methods. This avoids all of that unnecessary shit overhead. But, because the syntax of calling the methods is shared across blob types, I can still swap out the type of blob being used and have minimal impact on existing code.
I also have functions that read from one blob and write into another.
I love my blobs. They’ve become a core part of my technology and the freedom of not being tied to C/C++’s type system and structure layout algorithms is very liberating. What do your blob abstractions look like? By including small descriptive headers (see MsgHeader) you can describe the data inside the blob for the module using it. This gives us a very simple type system. Have you experimented with a more sophisticated blob type system than an op (or type) code followed by a length field?