Slim and light multitasking with fibers

Instapaper Text

Slim and light multitasking with fibers

Over the past ten years or so, the games industry as a whole has seen a fairly major paradigm shift with regards to parallel processing, from PS1-era where you were lucky to have one CPU (if you were unlucky you had two, but that was because they were working on the Saturn) through to today where the concept of using only one core on a modern console is simply ridiculous. And, as the paradigm of choice for parallel programming in most cases, pretty much everyone is familiar with threads.

Since talking about something that everyone is familiar with is (a) pointless and (b) liable to expose you to a wide audience of people who know more about the topic than you do, I’m going to talk about fibers instead.

Fibers are very lightweight threads with no pre-emptive switching (the only way to change fiber is by asking to). As a basic concept, they have been around for quite some time, under a variety of different names – co-routines, co-operative threads, and even co-operative processes on many older OSes are basically fibers or variants thereof. They are threads reduced to the barest minimum – the key being this concept:

The current state of execution is encapsulated in the stack contents and register file. Save/restore those, and you can swap in and out from any point in your program.

So, basically, all you need to implement fibers is a way to save that execution state, a way to restore it, and some sort of loop that does those two things to a list of active tasks. The processor will run one block of code in normal linear fashion until it hits a call which does a save/restore state (known as a context switch, or more informally simply “yielding”), and then something else will run until it does the same. Lather, rinse, repeat. So, why is this useful, especially in a world where we have threads?

Basically, what’s nice about fibers boils down to this – you are in control. The execution flow will never jump somewhere else, or allow another piece of code to barge in unless you explicitly ask it to – unlike threads where you may be pre-empted at pretty much any moment. Fibers mean that you can do parallel programming on your own terms. This is really great for dealing with problems that threads are good at solving, without incurring the overheads (in terms of code complexity and bug-count) of actually using threads themselves. For example, take the case where you want to load a file inside you game. If you do this:

U8 mem = malloc(size);
 
  ReadFile(my_file, mem, size);
 
  ...Do stuff with mem...

In a single-threaded, fiber-less program the rest of your game will be sitting idle whilst the disc I/O happens. This is bad, but equally unless you’re very carefully making every init routine thread-safe is quite a daunting task. Another option is to use callbacks, or big state machines, or any one of a hundred C++ coding tricks to allow you to drop out of your function and return to the main loop so that you can check a bit later to see if the read has finished and continue… basically, most of those are trying to achieve the same thing – suspend the state of execution in the read call, do something else, and come back when the data is ready.

And that’s exactly what fibers let you do – you simply make the ReadFile() call internally yield until the disc operation completes (obviously this presumes you have asynchronous I/O operations in the first place, but if you don’t then no amount of fiber cleverness is going to help). And that’s it – other bits of your game can get on with doing their thing (even if “their thing” is just rendering a pretty loading screen), and there’s no need to mess up the structure of the code with lots of cruft just for one read call.

Another case where fibers are extremely useful is in writing behavioural code such as enemy AI. Take, for example, the following:

void runEnemy()
 
  {
 
   while(1)
 
   {
 
    m_facing = right;
 
    for (int steps = 0; steps < 100; steps++)
 
    {
 
     m_position.x += 1.0f;
 
     yieldForOneFrame();
 
    }
 
    fireBullet();
 
    m_facing = left;
 
    for (int steps = 0; steps < 100; steps++)
 
    {
 
     m_position.x -= 1.0f;
 
     yieldForOneFrame();
 
    }
 
    fireBullet();
 
   }
 
  }

…by using yield(), we have created an enemy behaviour routine that is entirely self-contained – since we never leave, there’s no need to keep track of how far we have moved as a member variable of the class, because local variables “just work”. Similarly the direction we are currently moving in is implicitly stored by the current point we are at in the code. Compared to the traditional approach to this problem of storing all transient state in the object itself and having an update() or similar function called every frame, the results are both much neater and more readable.

This example also demonstrates one of my favourite features of fibers – you can write small, self-contained blocks of code which perform a specific task and nothing else. A lot of tasks become so much easier when you can simply have an infinite loop that checks for some condition, deals with it, and loops, happily oblivious of everything else going on around it.

So, how hard is it to actually implement fibers? Actually, it’s really very, very simple. About this simple, on ARM for example:

Struct FiberContext
 
  {
 
   u32 m_register[13];
 
  };
 
  void asm SwitchFiber(FiberContext* old_context, FiberContext* new_context)
 
  {
 
   STMIA R0!, {R2-R14}
 
   LDMIA R1!, {R2-R14}
 
   BX R14
 
  }

(It should be noted that this code was written off the top of my head, and hence, er, may well not actually work as such… but it should be close. You can also generally avoid saving quite a few of the registers if you read the ABI for your chosen platform carefully – anything defined as caller-saved is fair game, since you can guarantee that whoever called SwitchFiber() will have already pushed it onto the stack. Speaking of which, you can also make FiberContext* a single value if you so desire, by simply pushing the registers to be saved onto the stack, and then only saving the stack pointer)

…and that’s all you need. SwitchFiber() will save the current context into old_context, and then restore a new one from new_context. You’ll note that we don’t actually touch the contents of the stack, because simply saving and restoring the stack pointer (R13 in this example) is enough – we can prepare a new area of stack space for each new fiber. To go back to old_context, just call SwitchFiber() again.

So, with this, you just need to make a fiber list somewhere, write a loop that calls SwitchContext() on each one in turn and a yield() function which does a SwitchContext() back to the “main” fiber (the one you started in, which has the fiber loop in it), and you’re done. For bonus points, get rid of the loop and do the selection of the next fiber inside the yield() function, saving you the overhead of a context switch each time round.

As you can see from the example, too, fibers are really, really lightweight – the context switch is not considerably different to making a virtual function call or two on most architectures. This means that as long as your management code isn’t horrific, you don’t have to worry too deeply about how many fiber switches you are doing (very good for one-fiber-per-object game logic models). And, because fiber switches only happen when you ask them to, you don’t need to worry too much about nasty parallel-execution corner cases – which cuts down the amount of time you spend fiddling with mutexes, semaphores and the like (not to say that there aren’t cases where synchronisation primitives are necessary, of course, but only having to worry about other fibers where you perform a yield() call and not “what if something preempts us between these two random instructions?” makes life a lot simpler).

So, fibers are great. Fibers are wonderful. What are the downsides?

Probably the biggest is stack space – whilst fiber context switches are lightweight, the space needed for their stacks can very quickly balloon, as you need to allocate enough for the worst case usage of every individual fiber. There are ways to mitigate this (for example, copying the stack in and out of a larger “working buffer” on each context switch, or using MMU page trickery to allocate stack space on demand), but they add overhead and complexity.
Another is that debugging can become quite tiresome – virtually no existing debuggers really understand fibers, and hence the tools you would normally expect (like being able to see the currently executing fiber list, or inspect the saved stack contents) generally don’t exist. Now, one advantage of fibers being an entirely userland construct is that you can write such tools yourself… but it’s still a pain.
On some architectures, fibers can run into issues with restrictions on manipulating the stack pointer (in particular, stack overflow checks may well fire if you do) – depending on the system, this may be something that can be worked around by turning those checks off, or by using an official fibers library (Windows provides one of these in the platform SDK, for example).
On multi-core systems then depending on your usage patterns you may wish to restrict fibers to a single core – if you don’t, then bear in mind that whilst fibers on the same core will never pre-empt each other, fibers on different cores obviously will, and hence need to be treated in the same way as full-fat threads for synchronisation purposes. Also, even on one core some tasks inherently require real threads (for example, if you need to call a library function which blocks internally).
Not so much a downside as a gotcha – make sure to save FPU register state and any applicable CPU flags that are hanging around when you switch contexts, otherwise you’ll have some really amusing bugs to track down…
Finally, depending on how you use them, inter-fiber ordering can be an issue. This is much the same type of problem that arises with threads, but if you intend to use fibers extensively then be prepared to put some development time into writing fiber-centric implementions of things like alarm timers, signals, and priorities, otherwise it is very easy to end up with bugs caused by relative ordering or timing changing unexpectedly.

That all said, though, I’ve personally found fibers to be an incredibly useful tool in simplifying a lot of common tasks that come up engine and gameplay code, as well as providing a means for better separating discrete systems from each other.

#AltDevBlog

Ben Carter

Slim and light multitasking with fibers