A couple of days ago, Neil Henning (asked a question:

SPU gurus of twitter unite, want a vector unsigned int with {1, 2, 3, 4} in each slot, without putting it in as elf constant, any ideas?

There were a few different responses. a great article on ways to build constants on the SPU. Neil’s question got me thinking about how numerical constants are handled on various architectures at the machine level, so I did some investigating.

In this post, I talk about one of the simplest operations a computer can do: putting a small constant number into a register, which — while simple — gives an insight into the architecture and raises plenty of questions. I also describe a method for learning to read and understand assembly language.

I look at the way of doing this on x86/x86_64, PowerPC, SPU and ARM architectures because those are the architectures for which I have working toolchains.

Let the compiler be your guide

My method for learning assembly programming goes something like this:

  1. compile some higher-level code
  2. look at the assembly language generate by the compiler
  3. try to understand it (using a suitable reference)
  4. change the code and/or the compiler options
  5. goto 1.

When you’re starting out, it’s a slow process — there’s lots of subtle details that won’t be readily apparent. Like many things, though, persist and you’ll get better at it. Also, ask the internet.

The process that I go through here uses the compiler as a teacher. Realistically, the compiler doesn’t know everything there is to know about assembly programming or instruction sets, and there will come a time when you know more than it does and must defeat it in hand-to-hand combat. For now, we learn.

Learn you a ‘ssembly

So here’s a program fragment written in that a-little-higher-than-assembly-level language, C:

int three() {
 
      return 3;
 
  }

A function (three) that returns a constant value (three). I’ve saved those three lines to a file called const.c. Really simple. I can then issue the command

    gcc const.c -O -S -o-

which runs GCC, with the arguments:

  • const.c — the name of the source file
  • -O —  to perform optimisation (usually makes the generated code simpler and easier to understand)
  • -S — to stop after compiling – do not assemble or link (causes the compiler to output assembly)
  • -o- — to write the output (assembly) to stdout (straight to the terminal, not to disk)

(If you know nothing about assembly programming, it’s probably a good idea to start with a simpler architecture — for me, my first was SPU, which I think is one of the easier architectures to start with. x86/x86_64 is not a good place to start)

To start, lets look at what my x86_64-targeting GCC produces:
(note that I’ve added one extra option to the command line above: -masm=intel, which tells GCC to output the easier to understand Intel assembly syntax instead GCC’s default AT&T syntax)
        .file   "const.c"
 
          .intel_syntax noprefix
 
          .text
 
  .globl three
 
          .type   three, @function
 
  three:
 
  .LFB0:
 
          .cfi_startproc
 
          mov     eax, 3
 
          ret
 
          .cfi_endproc
 
  .LFE0:
 
          .size   three, .-three
 
          .ident  "GCC: (Gentoo 4.5.2 p1.0, pie-0.4.5) 4.5.2"
 
          .section        .note.GNU-stack,"",@progbits

That’s a lot of … stuff … for a simple one-line function. Most of the output is information for the assembler that isn’t all that important for what we’re interested in.

How I’m deciding what is important:

  • Words followed by a : (colon)  are labels e.g. three:.LFE0:. Any label that is referenced somewhere is interesting, the rest are not.
  • Lines that start with a . (period) and are not labels (e.g. .globl, .size) aren’t interesting to us — they’re there for the benefit of the assembler.

Throw away the uninteresting pieces and what’s left?

three:
 
          mov    eax, 3
 
          ret

Hopefully, you can see some connection between this and the C program above — there’s a three: label, the number 3 is there, and ret is probably short for return…

So how do we work out what it really means?

Grab a copy of the two Intel Instruction Set Reference PDFs, open the second one and look for ret. There are 12 pages describing this instruction (actually mnemonic, not instruction — more on that in a moment), but the first of those is enough to see that it tells the processor to return to the calling procedure — so it is our ‘return’. (How it knows where to return to is a question for another time)

Regarding the other line, we see a mnemonic (mov) and two operands (eax and 3). To be clear about terms, operands are those things that the instruction operates upon, and (from the Instruction Set Reference):

A mnemonic is a reserved name for a class of instruction opcodes which have the same function

This means that mov does not represent one particular instruction but instead indicates that the assembler should generate an appropriate instruction for the mnemonic and operands provided. You can find mov Move — in the Instruction Set Reference. It is used to move data from one place to another.

The operands for mov here are eax and 3 — one is a register name, the other is a literal value. In this case, eax is the register that is used to store the function’s return value (I’m not going to say more about x86 registers for now, except to link to a descriptive image). The 3 is the number being put in the eax register.

To summarise:

mov eax, 3 moves the value 3 into the eax register.

We learned a thing!

Equal but different

There’s a lot in common between assembly languages for different architectures — once you know the basics of the structure (labels, instructions, assembler directives) it’s just a matter of working out the details. Just.

Here’s what GCC generates for a simple immediate load for some other architectures:

PowerPC

        # load the immediate value 3 into general purpose register 3
 
          li 3,3

Immediate value of 3 into register 3? Which is the register, and which is the immediate? Unlike x86, registers are specified by number, not name, so you need to know a bit about the architecture to be able to interpret the assembly language. Fortunately the operand order is the same as for x86 Intel syntax: destination, source. You could think of li 3,3 as something like register(3) = 3.

If you read through the PowerPC Architecture Book (in three volumes), you’ll see (if you can stay awake) that there is actually no li instruction. In this case, li is an extended mnemonic — a shorthand form of an instruction intended to make assembler language programs “easier to write and easier to understand.” Under the hood, li is actually addiAdd Immediate (the details of that are also something for another time).

SPU

        # immediate load the value 3 into register 3
 
          il $3,3

You can find il in the SPU Instruction Set Architecture document. Again, the operands are ordered as destination, source — in this case, the register is specified with a $ prefix, making it slightly easier to differentiate between the two.

As an extra bonus, the il instruction puts the number three in a register four times — four 32 bit values stored in a 128 bit register.

Hopefully you’re noticing some patterns by now: the destination and source operand ordering, the slight notational variations between architectures to keep you on your toes, the seemingly random and inexplicable mnemonic names. This is assembly — it’s great!

ARM

        @ move the value 3 into register r0
 
          mov r0, #3

You’ll find mov in the ARM Architecture Reference Manual — as was the case for x86, it’s a mnemonic that covers a class of instruction opcodes. To make it even more interesting, the manual lists five different bit-sequences that may be used to encode the Move Immediate instruction — ARM is weird like that (yet again, something for another time).

There are minor differences: constants are prefixed with #, and the register is called r0. And comments start with @, which makes me uncomfortable…

Summing up

Four out of four instruction set architectures examined can be used to load a value into a register! Send money now!

Hopefully, I’ve managed to convey the basics of reading and starting to dissect and understand the assembly language generated by your compiler. There’s obviously a lot more to it than loading data into a register, but we see even from this trivial operation that there’s a lot in common between assembly languages across different architectures (at least, between those presented here). This doesn’t answer the question about how to construct the constant as requested by Neil, or even why that process would be more difficult than the single instructions described above. I hope to answer (or get closer to answering) these questions (and more!) in later episodes.

(Please let me know if you found this useful, informative, dull, too simplistic, containing too many assumptions, is too narrow or verbose. Leave a comment below, or message (and follow) me on twitter. Thanks :D)

[Photos by Karen Adamczewski]