Instapaper Text

Link time sorting

Link-time sorting

Here’s a simple technique to make certain objects, declared in arbitrary translation units, contiguous and sorted in final program memory without having to rely on runtime code to find or copy them. It uses a feature available in most linkers – named sections. I’m using it in conjunction with generated code although with a little effort it’s generally applicable.

Some background

I have some structures, called Priuses, defined in a separate data definition language (DDL), which generates corresponding C++ classes (one per Prius definition). The definitions are organized such that there can be an multiple Prius definitions in an arbitrary number of files. Assume about four DDL files and five Prius definitions in each. Here’s an example of a simple Prius data definition:

struct CratePrius
 
  {
 
    float Damage;
 
  }

The generated C++ code is a bit too big to include here – it’s augmented with one new field and about 50 functions – but the layout is essentially what you’d expect. Here’s a simplified version:

namespace CratePrius
 
  {
 
    enum
 
    {
 
      kNameHash = 0xA50152DDU, // "CratePrius"
 
    };
 
  
 
    struct Instance
 
    {
 
      float m_Damage;
 
  
 
      Instance();
 
  
 
      float GetDamage() const;
 
      void  SetDamage(float Damage);
 
  
 
      void ApplyJson(const JsonValue* json_object);
 
    }
 
  
 
    void* CreateInstance (void* buffer);
 
    void  ApplyJson (void* inst, const JsonValue* json_object)
 
  }

My first implementation used generated C++ to register the different Prius types to the piece of code which was responsible for them. In addition to the C++ Prius definition, each Prius type got a global unique static C++ object which relied on normal C++ static object initialization.

//
 
  // Struct CratePrius
 
  //
 
  extern void PreInitPriusRegister (uint32_t class_name, size_t size,
 
                                    void* (*CreateInstance)(void* buffer),
 
                                    void (*ApplyJson)(void* inst, const JsonValue* json_object));
 
  
 
  struct staticRegCratePrius
 
  {
 
    staticRegCratePrius()
 
    {
 
      PreInitPriusRegister(CratePrius::kNameHash, sizeof(CratePrius::Instance),
 
                           CratePrius::CreateInstance, CratePrius::ApplyJson);
 
    }
 
  };
 
  
 
  static staticRegCratePrius s_StaticCratePriusReg;

At program initialization time (pre-main), s_StaticCratePriusReg and all its ilk would have their constructors invoked, informing the Prius system about a new type available. Internally these calls filled out a structure called a StaticRegistryElement. This structure contains the Prius type name to be instantiated, its size, and some pointers to functions which will create and initialize an instance from a Json instance.

struct StaticRegistryElement
 
  {
 
    uint32_t    m_NameHash;
 
    uint32_t    m_Size;
 
    void*     (*m_CreateInstance)(void* mem);
 
    void      (*m_ApplyJson)     (void* mem, const struct JsonValue* json_object);
 
  };

Using C++ static object initialization had a couple of disadvantages. First, I hate using C++ static object initialization for anything. For trivial applications it’s tolerable, but there’s little ability to robustly handle errors, and since initialization order is arbitrary, resource acquisition lends itself to some ugly design patterns. Second, we had to defer sorting the elements until runtime.

Link section sorting

Really, I wanted that data in usable, image ready format before the program started. The approach I settled on was to fill out the structures I needed from the generated code and convince the linker to aggregate them and put them in sort order.

All linkers we work with have the option to put data in named sections. For gcc this is available with the keyword __attribute__((section("name"))) on the variable. In MSVC, this is available via __declspec(allocate("name")). When MSVC’s PE/COFF linker encounters object section names in the form section$key, it will merge the sections in the final executable in the order determined by key (our PS3 linker requires a custom link script to accomplish the same effect, but the approach is largely compatible). Here’s an example of the generated struct:

#pragma section("prius_seg$KFABFCNN")
 
  
 
  __declspec(allocate("prius_seg$KFABFCNN"))
 
  StaticRegistryElement CratePrius::Instance::kStaticRegistryElem =
 
  {
 
    CratePrius::kNameHash,        /* m_NameHash       */
 
    sizeof(CratePrius::Instance), /* m_Size           */
 
    CratePrius::CreateInstance,   /* m_CreateInstance */
 
    CratePrius::ApplyJson,        /* m_ApplyJson      */
 
  };

MSVC requires we declare the section before first use, which is what the #pragma is doing. The declspec puts this variable in a section with a name based on the Prius hash (here, 0xA50152DD). The linker uses lexical sort order whereas we’ll need numeric sort order at runtime, so we map the ascii representation of the hex value of the hash when generating the section key.

// 0123456789abcdef src
 
  // abcdefghijklmnop dest

By putting each element in the same root section with a special group name (the key), the linker does the work of making them contiguous no matter what translation unit they appear in and putting them in sort order in the final executable. There’s a little shenanigans required to locate the elements:

__declspec(allocate("prius_seg$00000000")) uint32_t prius_seg_start[4];
 
  __declspec(allocate("prius_seg$ZZZZZZZZ")) uint32_t prius_seg_end;
 
  
 
  StaticRegistryElement* elem_array = (StaticRegistryElement*) &prius_seg_start[4];
 
  uint32_t elem_cnt = (uint32_t) ((StaticRegistryElement*) &prius_seg_end - m_Array);

Here we force two symbols to bookend the StaticRegistryElements by setting their section name keys to 00000000 and ZZZZZZZZ. Using them, we point elem_array to the first element in the block and we’ve got our contiguous, sorted array (and its count).

Followup

That’s pretty much the technique. Future plans include a simplification of the code for getting access to the data at runtime and a reduction of the number of unique sections required. Also, it’s possible having so many unique sections will introduce some pathologies with link-time or other build problems at production scale which will need to be addressed. I hope this proves useful to someone else. Please let me know if you have any questions.

PE and COFF specification:

#AltDevBlogADay

Joe Valenzuela