Previously on #AltDevBlogADay…

My last post — How is constant formed? — introduced a method for starting to understand the assembly languages of the x86_64, PPC, SPU and ARM architectures. In it, I described a very, very simple first operation — that of loading a small constant value into a register. In this post I move on to the next semi-logical step: loading a slightly larger constant into a register!

But why is this even a problem? Doesn’t every architecture have a simple, unsurprising Put The Arbitrary Constant In The Register instruction? And if not, why not?

Seasonal varietals

Some architectures have instructions that vary in size, depending on what they need to do — x86 is like this. (Check out this explanation of the x86 instruction format and some of it’s quirks)

Here’s a specific example using the three() function from the previous post:

$ cat const.c
 
  int three() {
 
      return 3;
 
  }
 
  $ gcc -O -c const.c
 
  $ objdump -d -Mintel const.o
 
   
 
  const.o:     file format elf64-x86-64
 
   
 
  Disassembly of section .text:
 
   
 
  0000000000000000 <three>:
 
     0:   b8 03 00 00 00          mov    eax,0x3
 
     5:   c3                      ret

Some hints for understanding the above commands and their output:

  • const.c is the source file, containing the function three()
  • the -c option for gcc performs compilation and assembly, but does not attempt to link the program — the result is an object file. In this case having the (default) name const.o
  • objdump -d disassembles the specified object file, showing the instructions contained therein — as a hexadecimal representation of each instruction, with mnemonics and operands
  • -Mintel specifies that objdump should display Intel assembly syntax

For the mov instruction (that — in this case — moves the value 3 into register eax), you can see that it is encoded as five bytes: b8 03 00 00 00. The instruction opcode (that part of the instruction that specifies the action) is encoded into the first byte (b8) along with the destination register, and the 32 bit value 3 is encoded in the following four bytes (03 00 00 00) in little-endian format — least-significant-byte-first.

By comparison, it can be seen that the ret instruction is encoded as a single byte — the address of the instruction to return to is stored on the top of the stack.

What we can see is that the x86 instruction set does include an Put The Arbitrary Constant In The Register instruction. Hooray!

Fixed length, broken promises

What about the other architectures that I looked at previously? Let’s take a quick look at some disassembled object code:

powerpc-unknown-linux-gnu-const.o:     file format elf32-powerpc
 
   
 
  Disassembly of section .text:
 
  00000000 <three>:
 
     0:   38 60 00 03     li      r3,3
 
     4:   4e 80 00 20     blr
 
   
 
  -----------------------------------------------------------------
 
  spu-elf-const.o:     file format elf32-spu
 
   
 
  Disassembly of section .text:
 
  00000000 <three>:
 
     0:   40 80 01 83     il      $3,3
 
     4:   35 00 00 00     bi      $0
 
   
 
  -----------------------------------------------------------------
 
  arm-linux-gnueabi-const.o:     file format elf32-littlearm
 
  Disassembly of section .text:
 
  00000000 <three>:
 
      0:   e3a00003        mov     r0, #3
 
      4:   e12fff1e        bx      lr
 
   
 
  -----------------------------------------------------------------
 
  mips-unknown-linux-gnu-const.o:     file format elf32-tradbigmips
 
  Disassembly of section .text:
 
  00000000 <three>:
 
      0:   03e00008        jr      ra
 
      4:   24020003        li      v0,3         ...

Some things that may (or may not) stand out…

  • mips-unknown-what-the-triplet? It’s a neat little architecture that has been used in one or two popular gaming devices, as well as some other applications — follow him on twitter) brought it to my attention.
  • With regards to MIPS, notice that the jump register (jr) instruction is before the load immediate (li) instruction. Any right-thinking person will know that this means the function returns before the return value is set — so what is the compiler smoking? It turns out that any chemical-alteration of the compiler is entirely appropriate, and the return value is set correctly. MIPS executes the instruction following a jump or branch (called the branch delay slot). It’s a feature of the architecture.
  • All instructions generated for these architectures are 32 bits wide — in fact, every instruction for these architectures is 32 bits in size (with the exception of ARM’s Thumb instructions). Fixed length instructions make instruction decoding far simpler than is the case for variable length instructions.
  • In the case of PowerPC, ARM and MIPS, the value 3 is apparent in the instruction encoding (03, with some extra zeros in front), but in no case is there a full 32 bit representation of 0×0000003.
  • For SPU, the value is present, but has been offset so is less clear when displayed as hexadecimal. Shifting the encoded instruction left by one bit makes the presence of the value clearer: (0×40800183<<1) = 0×81000305
  • Overall, there’s a lot that is the same across theses architectures. Understanding one can help in understanding others.

It should be clear that it is not possible to load a 32 bit register with an arbitrary 32 bit value using a single 32 bit instruction — some part of the instruction must be used as an opcode (specifying the particular action), some part used to specify the destination register and so in a single instruction there are less than 32 bits available to specify the value to be loaded into the register.

Each of these architectures has instructions that allow the loading of a 16 bit value into the lower 16 bits of a register. (Additionally, the ARM move immediate instruction may be used to load smaller constants to offset locations within a register. SPU also allows an 18 bit value to be loaded via its immediate load address instruction)

16 bits is not enough

ARM has 32 bit general purpose registers (GPRs), PowerPC and MIPS are available in variants with 32 and 64 bit GPRs and SPU has 128 bit registers. If there is no single Put The Arbitrary Constant In The Register instruction for these architectures (which there clearly isn’t), how can you set the remaining parts of a register?

To find out the answer (to this and other questions you may not be asking), write a simple function that sets a larger value, compile it and take a look at how your compiler solves the problem — try to understand why it has produced the assembly that it has, and see if you can come up with something better :) I’ll do the same and write about it sometime in the future.

Until next time…

[Photos by Karen Adamczewski]