Some compilers have an interesting property : pthread_getspecific and pthread_setspecific for POSIX systems), but using automatic TLS variable is very comfy. As you may know, Mojito is available on several platforms, some of them does not provide those kind of variables. And if provided, those variables are limited to POD (Plain Ol’ data) variables. So we search for a solution that is close to what compiler can offer.

The solution presented here provides a mean to create such cosy variables, with a bonus : support for non-pod data.
To emulate the __thread specifications, the requirements are :
- The TLS variable should behave as a normal one.
- Memory should not be allocated if not necessary.
- Destruction of object must be properly done.
- Declaration should not be cryptic.

Solution : ThreadLocalStorage<MyClass> g_SomeThreadLocalStorageVariable;

The template class should have the following interface to conform to the specification :

template
 
  class ThreadLocalStorage
 
  {
 
  public:
 
      ThreadLocalStorage( const T & ); // Use to allow initialization of default value
 
      operator const T&() const;
 
      ThreadLocalStorage & operator=( const T & );
 
  }

The major drawback of this technique : it only works in C++, as it uses operators of classes.
With this class, you can implement both operator in terms of getting and setting the value contains in the OS’ TLS. Generally, the TLS API provides 4 to 8 bytes of memory, generally to store pointers or basic types. This post will cover POD types, while complex types will be covered in the next post.

The code needed to fetch and store TLS is the same for each type of variable. It then makes sense to create a base class for this template.

class ThreadLocalStorageBase
 
  {
 
  protected:
 
   
 
      ThreadLocalStorageBase()
 
       {
 
           m_index = m_instance_count;
 
           ++ m_instance_count; // Not Thread safe
 
       }
 
   
 
       void * GetThreadLocalStorage() const
 
       {
 
           void ** local_storage_table = reinterpret_cast( TLSGet( m_tls_identifier ) );
 
   
 
           return &local_storage_table[ m_index ];
 
       }
 
   
 
      int m_index;
 
      static int m_instance_count;
 
      static TLSIndex m_tls_identifier;
 
  }

The sample code provided only use a single OS’ TLS to store all home-made TLS. The reason is quite simple : the number of TLS an OS provides vary greatly and can be as few as 2. In Mojito we decided that having multiple versions of this code is not usefull until we know the current implementation causes a performance or a memory problem. So we kept this code for all platforms. If you want to adapt it to your platform, you only have to override 4 methods TLSCreate, TLSGet, TLSSet, TLSDestroy.

You may have noticed the m_tls_identifier variable. This is where we store the OS TLS index. To initialize it, you must call ThreadLocalStorageBase::Initialize before your first thread starts. The ThreadLocalStorageBase::Finalize must be called after all threads have returned.

class ThreadLocalStorageBase
 
  {
 
  public:
 
      static void Initialize()
 
      {
 
           //Must be called before any thread is started ( not thread safe at all )
 
           m_tls_identifier = TLSCreate();
 
      }
 
   
 
      static void Finalize()
 
      {
 
          //Must be called after all threads have been stopped ( not thread safe at all )
 
          TLSDestroy( m_tls_identifier );
 
      }
 
  }

The next question is how the OS’ Tls gets initialized. The first way, not shown here, is to poll the output of the OS’ Tls. If it equals to zero, it has not been initialized yet. It’s a lazy initialization. But it comes at the price of testing the value each time you access the variable. I have decided to pre-initialized it. Our threads are encapsulated in a class, so it’s easy to enforce the call to a function before giving hand to the user’s thread function. Same reasoning for deletion of memory. Not all systems allow the registration of destructors as pthread_key_create does. Again, using a class eases the use of a finalization function.

class ThreadLocalStorageBase
 
  {
 
  public:
 
      static void StartupOnThreadEnter()
 
      {
 
          void * local_storage_table;
 
   
 
          local_storage_table = malloc( m_instance_count * sizeof( void* ) );
 
          memset( local_storage_table, 0, sizeof( void* ) * m_instance_count );
 
   
 
          TLSSet( m_tls_identifier, (void*)local_storage_table );
 
      }
 
   
 
      static void CleanupOnThreadExit()
 
      {
 
          free( TLSGet( m_tls_identifier ) );
 
      }
 
  }

The ThreadLocalStorage class is shown hereunder, even if I can already hear some guy shouting about aliasing rules ;-)

template
 
  class ThreadLocalStorage : public ThreadLocalStorageBase
 
  {
 
  public:
 
   
 
      static_assert( sizeof( T ) <= sizeof( void* ) );
 
   
 
      operator const T&() const
 
      {
 
          void * thread_local_storage = GetThreadLocalStorage();
 
          return *reinterpret_cast( thread_local_storage );
 
      }
 
   
 
      ThreadLocalStorage & operator=( const T & value )
 
      {
 
          void * thread_local_storage = GetThreadLocalStorage();
 
          *reinterpret_cast( thread_local_storage ) = value;
 
          return *this;
 
      }
 
  };

The system is ready to go online. The sample code does not spawn thread, but the code displayed here shows how the system should be used :

ThreadLocalStorage g_TestSmall;
 
  // ThreadLocalStorage g_TestBig; Will be possible in next post
 
   
 
  void main()
 
  {
 
      ThreadLocalStorageBase::Initialize();
 
      ThreadLocalStorageBase::StartupOnThreadEnter();
 
   
 
      g_TestSmall = 1234;
 
      int a = g_TestSmall;
 
   
 
      // Do some stuff, such as spawning threads    
 
   
 
      ThreadLocalStorageBase::CleanupOnThreadExit();
 
      ThreadLocalStorageBase::Finalize();
 
  }
 
   
 
  void thread_function() //signature depends on OS
 
  {
 
      ThreadLocalStorageBase::StartupOnThreadEnter();
 
   
 
      g_TestSmall = 5678;
 
      int a = g_TestSmall;
 
   
 
      ThreadLocalStorageBase::CleanupOnThreadExit();
 
  }

In the next post, I’ll explain how to handle complex class ( at least non-POD types ). This can be useful to store, for example, smart pointers. It will also talk about default values (other than zero of course).

While this technique is surely not to be used in hot spot of your code, it provides a simple way to have thread local variable. For example, storing the last error occurred, storing a non-thread safe logger, … Most of the use we have is in debug or for storing thread local memory allocator.

The sample code includes implementation for win32 and posix with an VC2005 and a xcode project. But our version of this code also runs on all consoles. Feel free to comment.