This is the second part of my API design post, and part three in my series of posts about building a profiling library. at github (released to the public domain). The code for the final implementation will be pushed to this repository tomorrow.
This post is very code-oriented, but I promise to have more pretty pictures next time! I would have liked to go into more details about the implementation, but my time for coding/writing has been severely limited by my one-month old daughter.
To recap, these are the API design guidelines I think are important to keep in mind during the implementation:
- Orthogonal. A function should not have any side effects, and there should be only one way to perform an operation in the system.
- Contained. Following from the rant in my previous post, we should avoid third party dependencies and prefer to use primitive or well-defined data types.
- Specialized. A function in an API should perform a single task. Don’t create functions that do completely different unrelated tasks depending on the contents of the variables passed to the function.
typedef struct _profile_block_data profile_block_data; typedef struct _profile_block profile_block; struct _profile_block_data { uint32_t id; uint32_t parentid; uint32_t processor; uint32_t thread; uint64_t start; uint64_t end; char name[20]; }; //sizeof( profile_block_data ) == 52 struct _profile_block { profile_block_data data; uint32_t previous; uint32_t sibling; uint32_t child; }; //sizeof( profile_block ) == 64
void profile_initialize( char* identifier, void* buffer, profilesize_t size ) { profile_block* block = buffer; profilesize_t num_blocks = size / sizeof( profile_block ); profilesize_t i; for( i = 0; i < ( num_blocks - 1 ); ++i, ++block ) block->child = ( i + 1 ); block->child = 0; //[...] }
static void _profile_put_root_block( profile_block* block ) { uint32_t self = (uint32_t)((uintptr_t)( block - _profile_root )); do { block->sibling = _profile_root.child; } while( !_atomic_cas_32( &_profile_root.child, self, block->sibling ) ); }
Each block can also be a subblock inside another block. To push such a block we implement a thread-local variable storing the current active block, and can perform a swap of pointers without worrying about thread safety:
static void _profile_put_simple_block( profile_block* block ) { //Add to current block, or if no current add to array profile_block* masterblock = _thread_data(); if( masterblock ) { uint32_t self = (uint32_t)((uintptr_t)( block - _profile_root )); block->previous = masterblock; block->sibling = masterblock->child; if( masterblock->child ) masterblock->child->previous = self; masterblock->child = self; } else _profile_put_root_block( block ); }
Once we want to flush the queued blocks we simply iterate the list of queued blocks in the root block, and process each tree of blocks. Again, we can get away without a lock by doing a atomic compare-and-swap on the single child pointer of the root block:
do { block = _profile_root.child; } while( !_atomic_cas_ptr( &_profile_root.child, 0, block ) );
static profile_block* _process_profile_block( profile_block* block ) { profile_block* leaf = block; //[...] process block and write to output stream if( block->child ) leaf = _process_profile_block( _profile_blocks[ block->child ] ); if( block->sibling && !block->child ) { block->child = block->sibling; block->sibling = 0; leaf = _process_profile_block( _profile_blocks + block->child ); } if( block->sibling ) { profile_block* subleaf = _process_profile_block( _profile_blocks + block->sibling ); subleaf->child = block->child; block->child = block->sibling; } return leaf; }
In the next post I’ll present a simple UI for analyzing the data.