I have just completed rewriting of our “worker thread” execution subsystem, fixing some obscure race conditions, and wanted to share the solution in a “step by step” approach.
Primary Aim: (education)
to provide practical knowledge to those recently entered into the “system parallelization” field. I will describe the multithread relevent design process that went into the design and development of Novaleaf‘s new WorkerThread subsystem.
First step, what’s the goal? (use case scenario)
it’s very important for any development (not just multithreading) to fully understand your “functional objectives” before starting to architect, and then to architect your system (high-level design) before you code.
Here is my scenario:
Synchronization between a “main engine thread” and a reusable “worker thread” subsystem. main and worker threads should execute loops at a 1:1 frequency, with the start/end of the worker thread’s loop tied to a specific part of the engine’s loop code.
High level design:
- “engine wait for worker”: need main thread sync-point to wait for worker to finish
- “worker wait for engine”: need worker to wait until main thread resumes worker (starts next loop)
- “others wait for worker” also need other systems to be able to wait for worker to finish. an example is if there is a shared resource that multiple threads access.
- Worker owns it’s own thread: The worker runs in an infinite loop, pausing when waiting for the engine (Design point #2).
- Keep it stupid simple! There are many ways I could sexify (complicate) this setup, but there are three reasons why i do not:
- Educational purposes
- Complex thread synchronization has bugs (race conditions)
- “I use this in real life” (it works)
Implementation Phase 1: (identify existing solutions)
Design point #2,
“worker wait for engine” shows that to fulfill this functionality, only 1 thread (worker) needs to wait for the main engine thread. In this case, the AutoResetEvent works nicely because it will allow a single waiter (the worker) through when the main thread says it’s ready (signals).
//**worker loop, (full workflow)**
//wait for engine to allow this worker to start it's next loop
//do fancy asynchronous work here!
//worker informs engine that it is finished
Above is the worker thread’s loop. it waits for the engine to signal it to proceed, does work, then informs the engine that it’s finished.
Now for what the engine’s code (partially) looks like:
//allow 1 waiting thread through,
//or if none are waiting, exactly 1 future caller through
//then automatically (instantly) resets to blocking.
We can not use ManualResetEvent because we need to call .Set() and .Reset() independently:
//**engine thread with race condition (bad)**
//allow waiting code through
//set to blocking
This doesn’t work because of the situation when worker has called
waitForWorker_ManualResetEvent.Set(); but has not yet looped back and made the
.WaitOne() call. the worker’s execution may be suspended between those two lines, and thus the engine may itself .Set() and .Reset() before the worker even gets to the next line. It sounds unlikely, and it is, but it is not safe (it may fail 1 out of 1 million times).
Design point #1,
“engine wait for worker”: we could get by with the design point #2 solution “AutoResetEvent” stated above, however we need to consider Design point #3.
Design point #3,
“others wait for worker”: because of this, the AutoResetEvent solution doesn’t work.
//fails because only the first waiter will be allowed through
//multiple systems may wait on the worker thread completing.
however we also can not use the ManualResetEvent, because of the problem already outlined above in Design point#2.
If you look at the other classes in the .NET BCL System.Threading namespace, none of those seem to be a good fit for this problem. So looks like we need to write our own synchronization primitive!
Implementation Phase 2: (rethinking the problem)
To recap, we need a synchronization primitive similar to AutoResetEvent, however needs to only reset (block future calles) after the main engine thread finishes waiting. All threads, including the engine, should unblock once the worker thread’s loop completes.
After thinking through the problem, it turns out we actually DO NOT need anything more complex than ManualResetEvent for the engine-wait-on-worker. here’s why:
- if our worker calls
waitForWorker_ManualResetEvent.Set()then all waiting threads will clear, and all future waits will clear
- if the engine calls
waitForWorker_ManualResetEvent.Reset()after waiting for the worker to finish but before launching the worker’s next loop, then the engine’s target workflow is preserved.
- Here’s the kicker: if other threads wait on the worker, they will suffer race conditions (example: worker’s next loop starts before the other thread’s call to .WaitOne() even finishes). Thus this workflow is unsupported. Other threads must synchronize with main, not with workers.
Implementation Phase 3: (Final take away)
Please pay special attention to my “Here’s the kicker” list-item above. This shows that while my initial thoughts were to allow any “other thread” to synchronize with the worker thread, this requirement would have resulted in the construction of a custom (and complex) synchronization primitive. Also, on closer inspection this seems to be a significantly troubling anti-pattern which should consciously be avoided.
//**main thread, (full workflow)**
void Engine_UpdateTheWorker(float time)
//wait for worker's previous loop to finish
//reset our worker's waitEvent back to blocking
//do data synchronization
//signal our worker to start it's next loop
Above is the full implementation of the engine’s synchronization code. This and the worker’s loop code further above is copy-paste-programmer friendly for those looking to quickly utilize the architecture I have provided.
Conclusion: really really try not to re-invent the wheel.
There certainly will be times where the base synchronization primitives will not be enough to get the job done, but with the .NET 4.0 BCL System.Threading additions, those will (should) be few and far between. I hope that this step-by-step on the process I went through for re-designing Novaleaf’s WorkerThread subsystem helps illustrate this point.