Instapaper Text

C/C++ Low Level Curriculum Part 8: looking at optimised assembly

It’s that time again where I have managed to find a few spare hours to squoze out an article for the Low Level Curriculum. This is the 8th post in this series, which is not in any way significant except that I like the number 8. As well as being a power of two, it is also the maximum number of unarmed people who can simultaneously get close enough to attack you (according to a martial arts book I once read).

This post covers how to set up Visual Studio to allow you to easily look at the optimised assembly code generated for simple code snippets like the ones we deal with in this series. If you wonder why I feel this is worth a post of its own here’s the reason – optimising compilers are good, and given code with constants as input and no external output (like the snippets I give as examples in this series) the compiler will generally optimise the code away to nothing – which I find makes it pretty hard to look at. This should prove immensely useful, both to refer back to, and for your own experimentation.

Here are the backlinks for preceding articles in the series in case you want to refer back to any of them (warning: the first few are quite long):

/2011/11/09/a-low-level-curriculum-for-c-and-c/
/2011/11/24/c-c-low-level-curriculum-part-2-data-types/
/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/
/2011/12/24/c-c-low-level-curriculum-part-4-more-stack/
/2012/02/07/c-c-low-level-curriculum-part-5-even-more-stack/
/2012/03/07/c-c-low-level-curriculum-part-6-conditionals/

Once you have clicked OK just click “Finish” on the next stage of the wizard – in case you’re wondering, the options available when you click next don’t matter for our purposes (and un-checking the “Precompiled header” check box makes no difference, it still generates a console app that uses a precompiled header…).

Changing the Project Properties

The next step is to use the menu to select “Project -> <YourProjectName> Properties”, which will bring up the properties dialog for the project.

When the properties dialog appears (see image below):

select “All Configurations” from the Configuration drop list
select “Configuration Properties ->General” in the tree view at the left of the window
in the main pane change “Whole Program Optimisation” to “No Whole Program Optimisation”.

Next, in the tree view (see image below):

in the tree view, navigate to “C/C++ -> Code Generation”
in the main pane, change “Basic Runtime Checks” to “Default” (i.e. off)

Finally (see image below):

in the tree view, go to “C/C++ -> Output Files”
in the main pane change “Assembler Output” to “Assembly With Source Code /(FAs)”
once you’ve done that click “OK”

Now, when you compile the Visual Studio compiler will generate an .asm file as well as an .exe file. This file will contain the intermediate assembly code generated by the compiler, with the source code inserted into it inline as comments.

You could alternatively choose the “Assembly, Machine Code and Source (/FAcs)” option if you like – this will generate a .cod file that contains the machine code as well as the asm and source.

I prefer the regular .asm because it’s less visually noisy and the assembler mnemonics are all aligned on the same column, so that’s what I’ll assume you’re using if you’re following the article, but the .cod file is fine.

So, what did we do there?

Well, first we turned off link time code generation. Amongst other things, this will prevent the linker stripping the .asm generated for functions that are compiled but not called anywhere.

Secondly, we turned off the basic runtime checks (which are already off in Release). These checks make the function prologues and epilogues generated do significant amounts of (basically unneccessary) extra work causing a worst case 5x slowdown (see this post by Bruce Dawson on his personal blog for an in depth explanation).

Finally, we asked the compiler not to throw away the assembly code it generates for our program; this data is produced by the compilation process whenever you compile but is usually thrown away, we’re just asking Visual Studio to write it into an .asm file so we can take a look at it.

Since we made these changes for “All Configurations” this means we will have access to .asm files containing the assembly code generated by both the Debug and Release build configurations.

Let’s try it out

So in the spirit of discovery, let’s try it out (for the sake of familiarity) with a language feature we looked at last time – the conditional operator:

#include "stdafx.h"
 
   
 
  int ConditionalTest( bool bFlag, int iOnTrue, int iOnFalse )
 
  {
 
      return ( bFlag ? iOnTrue : iOnFalse );
 
  }
 
   
 
  int main(int argc, char* argv[])
 
  {
 
      int a = 1, b = 2;
 
      bool bFlag = false;
 
      int c = ConditionalTest( bFlag, a, b );
 
      return 0;
 
  }

The question you have in your head at this moment should be “why have we put the code into a function?”. Rest assured that this will become apparent soon enough.

Now we have to build the code and look in the .asm files generated to see what the compiler has been up to…

First build the Debug build configuration – this should already be selected in the solution configuration drop-down (at the top of your Visual Studio window unless you’ve moved it).

Next build the Release configuration.

Now we need to open the .asm files. Unless you have messed with project settings that I didn’t tell you to these will be in the following paths:

<path where you put the project>/Debug/<projectName>.asm

<path where you put the project>/Release/<projectName>.asm

.asm files

I’m not going to go into any significant detail about how .asm files are laid out here, if you want to find out more here’s a link to the Microsoft documentation for their assembler.

The main thing you should note is that we can find the C/C++ functions in the .asm file by looking for their names; and that – once we find them – the mixture of source code and assembly code looks basically the same as it does in the disassembly view of Visual Studio in the debugger.

main()

Let’s look at main() first. This is where I explain why the code snippet we wanted to look at was put in a function. I can tell you’re excited.

Here’s main() from the Debug .asm (I’ve reformatted it slightly to make it take up less vertical space):

_TEXT    SEGMENT
 
  _c$ = -16                        ; size = 4
 
  _bFlag$ = -9                        ; size = 1
 
  _b$ = -8                        ; size = 4
 
  _a$ = -4                        ; size = 4
 
  _argc$ = 8                        ; size = 4
 
  _argv$ = 12                        ; size = 4
 
  _main    PROC                        ; COMDAT
 
  ; 9    : {
 
      push    ebp
 
      mov    ebp, esp
 
      sub    esp, 80                    ; 00000050H
 
      push    ebx
 
      push    esi
 
      push    edi
 
  ; 10   :     int a = 1, b = 2;
 
      mov    DWORD PTR _a$[ebp], 1
 
      mov    DWORD PTR _b$[ebp], 2
 
  ; 11   :     bool bFlag = false;
 
      mov    BYTE PTR _bFlag$[ebp], 0
 
  ; 12   :     int c = ConditionalTest( bFlag, a, b );
 
      mov    eax, DWORD PTR _b$[ebp]
 
      push    eax
 
      mov    ecx, DWORD PTR _a$[ebp]
 
      push    ecx
 
      movzx    edx, BYTE PTR _bFlag$[ebp]
 
      push    edx
 
      call    ?ConditionalTest@@YAH_NHH@Z        ; ConditionalTest
 
      add    esp, 12                    ; 0000000cH
 
      mov    DWORD PTR _c$[ebp], eax
 
  ; 13   :     return 0;
 
      xor    eax, eax
 
  ; 14   : }
 
      pop    edi
 
      pop    esi
 
      pop    ebx
 
      mov    esp, ebp
 
      pop    ebp
 
      ret    0
 
  _main    ENDP
 
  _TEXT    ENDS

As long as you’ve read the previous posts, this should mostly look pretty familiar.

It breaks down as follows:

lines 1-8: these lines define the offsets of the various Stack variables from [ebp] within main()’s Stack Frame
lines 10-15: function prologue of main()
lines 17-20: initialise the Stack variables
lines 22-30: push the parameters to ConditionalTest() into the Stack, call it, and assign its return value
line 32: sets up main()’s return value
lines 34-38: function epilogue of main()
line 39: return from main()

Nothing unexpected there really, the only new thing to take in is the declarations of the Stack variable offsets from [ebp].

I feel these tend to make the assembly code easier to follow than the code in the disassembly window in the Visual Studio debugger.

And, for comparison, here’s main() for the Release .asm:

_TEXT    SEGMENT
 
  _argc$ = 8                        ; size = 4
 
  _argv$ = 12                        ; size = 4
 
  _main    PROC                        ; COMDAT
 
  ; 10   :     int a = 1, b = 2;
 
  ; 11   :     bool bFlag = false;
 
  ; 12   :     int c = ConditionalTest( bFlag, a, b );
 
  ; 13   :     return 0;
 
      xor    eax, eax
 
  ; 14   : }
 
      ret    0
 
  _main    ENDP
 
  _TEXT    ENDS

The astute amongst you will have noticed that the Release assembly code is significantly smaller than the Debug.

In fact, it’s clearly doing nothing at all other than returning 0. Good optimising! High five!

As I alluded to earlier, the optimising compiler is great at spotting code that evaluates to a compile time constant and will happily replace any code it can with the equivalent constant.

So that’s why we put the code snippet in a function

It should hopefully be relatively clear by this point why we might have put the code snippet into a function, and then asked the linker not to remove code for functions that aren’t called.

Even if it can optimise away calls to a function, the compiler can’t optimise away the function before link time because some code outside of the object file it exists in might call it. Incidentally, the same effect usually keeps variables defined at global scope from being optimised away before linkage.

I’m going to call this Schrödinger linkage (catchy, right?). If we want our simple code snippet to stay around after optimising we only need to make sure that it takes advantage of Schrödinger linkage to cheat the optimiser.

If the compiler can’t tell whether the function will be called, then it certainly can’t tell what the values of its parameters will be during one of these potential calls, or what its return value might be used for and so it can’t optimise away any code that relies on those inputs or contributes to the output either.

The upshot of this is that if we put our code snippet in a function, make sure that it uses the function parameters as inputs, and that its output is returned from the function then it should survive optimisation.

It’s really a testament to all the compiler programmers over the years that it takes so much effort to get at the optimised assembly code generated by a simple code snippet – compiler programmers we salute you!

ConditionalTest()

So, here’s the Debug .asm for ConditionalTest() (ignoring the prologue / epilogue):

; 5    :     return( bFlag ? iOnTrue : iOnFalse );
 
      movzx    eax, BYTE PTR _bFlag$[ebp]
 
      test    eax, eax
 
      je    SHORT $LN3@Conditiona
 
      mov    ecx, DWORD PTR _iOnTrue$[ebp]
 
      mov    DWORD PTR tv66[ebp], ecx
 
      jmp    SHORT $LN4@Conditiona
 
  $LN3@Conditiona:
 
      mov    edx, DWORD PTR _iOnFalse$[ebp]
 
      mov    DWORD PTR tv66[ebp], edx
 
  $LN4@Conditiona:
 
      mov    eax, DWORD PTR tv66[ebp]
 
  ; 6    : }

As you should be able to see, this is doing the basically same thing as the code we looked at in the Debug disassembly in the previous article:

branching based on the result of testing the value of bFlag (the mnemonic test does a bitwise logical AND)
both branches set a Stack variable at an offset of tv66 from [ebp]
and both branches then execute the last line which copies the content of that address into eax

Again, the assembly code is arguably easier to follow than the corresponding disassembly because the jmp mnemonic jumps to labels visibly defined in the code, whereas in the disassembly view in Visual Studio you generally have to cross reference the operand to jmp with the memory addresses in the disassembly view to see where it’s jumping to…

Let’s compare this with the Release assembler (again not showing the function prologue or epilogue):

; 5    :     return( bFlag ? iOnTrue : iOnFalse );
 
      cmp    BYTE PTR _bFlag$[ebp], 0
 
      mov    eax, DWORD PTR _iOnTrue$[ebp]
 
      jne    SHORT $LN4@Conditiona
 
      mov    eax, DWORD PTR _iOnFalse$[ebp]
 
  $LN4@Conditiona:
 
  ; 6    : }

You will note that the work of this function is now done in 4 instructions as opposed to 9 in the Debug:

it compares the value of bFlag against 0
unconditionally moves the value of iOnTrue into eax
if the value of bFlag was not equal to 0 (i.e. it was true) it jumps past the next instruction…
…otherwise this moves the value of iOnFalse into eax

As I’ve stated before I’m not an assembly code programmer and I’m not an optimisation expert. Consequently, I’m not going to offer my opinion on the significance of the ordering of the instructions in this Release assembly code.

I am, however, prepared to go out on a limb and say it’s a pretty safe bet that the Release version with 4 instructions is going to execute significantly faster than the Debug version with 9.

So, why such a big difference between Debug and Release for something that when debugging at source level is a single-step?

Essentially this is because the unoptimised assembly code generated by the compiler must be amenable to single-step debugging at the source level:

it almost always does the exact logical equivalent of what the high level code asked it to do and, specifically, in the same order
it also has to frequently write values from CPU registers back into memory so that the debugger can show them updating

Summary

What’s the main point I’d like you to take away from this article? Optimising compilers are feisty!

You have to know how to stop them optimising away your isolated C/C++ code snippets if you want to easily be able to see the optimised assembly code they generate.

This article shows a simple boilerplate way to short-circuit the Visual Studio optimising compiler – mileage will vary on other platforms.

There are other strategies to stop the optimiser optimising away your code, but they basically all come down to utilising the Schrödinger linkage effect; in general:

use global variables, function parameters, or function call results as inputs to the code
use global variables, function return values, or function call parameters as outputs from the code
if you’re not using Visual Studio’s compiler you may also need to turn off inlining

A final extreme method I have been told about is to insert nop instructions via inline assembly around / within the code you want to isolate. Note that you should use this approach with caution, as it interferes directly with the optimiser and can easily affect the output to the point where it is no longer representative.

Epilogue

So, I hope you found this interesting – I certainly expect you will find it useful :)

The next article (as promised last time!) is about looping, which is another reason why it seemed like a good time to cover getting at optimised assembly code for simple C/C++ snippets.

I will be referring back to this in future articles in situations where looking at the optimised assembly code is particularly relevant.

If you’re wondering what you should look at first to see how Debug and Release code differ, and want to get practise at beating the optimiser, I’d suggest starting with something straight forward like adding a few numbers together.

Lastly, but by no means leastly, thanks to Rich, Ted, and Bruce for their input and proof reading; and Bruce for supplying me with the tip that made this post possible.

#AltDevBlog

Alex Darby
Follow @darbotron

C/C++ Low Level Curriculum Part 8: looking at optimised assembly

Changing the Project Properties

So, what did we do there?

Let’s try it out

.asm files

main()

So that’s why we put the code snippet in a function

ConditionalTest()

Summary

Epilogue

#AltDevBlog

Alex Darby Follow @darbotron

C/C++ Low Level Curriculum Part 8: looking at optimised assembly

Changing the Project Properties

So, what did we do there?

Let’s try it out

.asm files

main()

So that’s why we put the code snippet in a function

ConditionalTest()

Summary

Epilogue

Alex Darby
Follow @darbotron