C/C++ Low Level Curriculum Part 11: Inheritance

Instapaper Text

C/C++ Low Level Curriculum Part 11: Inheritance

Hello, and welcome to the 11th part of the C / C++ low level curriculum. About time? Definitely!

Last time we looked at the basics of User Defined Types: how structs, classes, and unions are laid out in memory; and (some of) the implications of memory alignment on this picture.

In part 11 we’re going to look at how inheritance affects this picture, in particular the implications for memory layout of derived types and also for their behaviour during construction and destruction (note: we’re leaving multiple inheritance and the keyword virtual out of this picture to start with).

Before We Begin

I will assume that you have already read the previous posts in the series, but I will also put in-line links to any important terms or concepts that you might need to know about to make sense of what you’re reading. I’m helpful like that.

Another big assumption I’m going to make is that you’re already very familiar with the C++ language and comfortable using the language features we’re discussing, as well as the accepted usage limitations of those features etc. If I need to demonstrate anything out of the ordinary I’ll explain it – or at least link to an explanation.

In this series I discuss what happens with vanilla unoptimised win32 debug code generated by the VS 2010 compiler – whilst the specifics will differ on other platforms (and probably with other compilers) the general sweep of the code should be basically the same – because it’s assembly that has been generated by a C++ compiler – and so following the same examples given here with a source / disassembly debugger on your platform of choice should provide you with the same insights we get here.

With this in mind, in case you missed them, here are the backlinks to the previous posts in the series:

I won’t lie – it’s not light reading :)

Class vs. Struct: a Gentle Reminder

The C++ keywords struct and class define types that are identical in implementational detail and what you can do with them (the only difference being at the language level: the default access specifier if none is specified is private for class, and public for struct).

So, whilst I will be using the keyword class throughout this article please take it as read that anything we talk about here applies equally to types defined using the keyword struct.

What happens when we derive from another type?

So, what does happen when you derive a user defined type from another non built-in type?

Clearly the data members you specify in the declarations have to go somewhere, and so do all those specified in the type(s) you are deriving from.

At the level of C++ there is nothing other than the standard to tell you how this works – and nothing other than looking at what happens with the code generated by the compiler you are using will tell you for definite.

As in the last post, we will be relying heavily on the frankly awesome secret 007 compiler flag /d1reportSingleClassLayout in order to tell us exactly how the (Visual Studio 2010 win32 x86) compiler has decided to lay our example structures out in memory.

It’s about time to look at some example code, so, rather than have you go through the usual rigmarole of setting up your project I have kindly set one up for you.

The zip file in this link contains a VS2010 solution with a single project and .cpp file lovingly set up to run the code shown below, which is in 00_Inheritance.cpp

class CTestBase
 
  {
 
  public:
 
      int _iA;
 
      int _iB;
 
  };
 
   
 
  class CTestDerived
 
  : public CTestBase
 
  {
 
  public:
 
      int _iC;
 
      int _iD;
 
  };
 
   
 
  int main(int argc, char* argv[])
 
  {
 
      return 0;
 
  }

When you compile this project you should get the following in your “Build” output window (the magic of /d1reportSingleClassLayout!) :

1> class CTestBase size(8):
 
  1>   +---
 
  1> 0 | _iA
 
  1> 4 | _iB
 
  1>   +---
 
  1> 
 
  1> class CTestDerived size(16):
 
  1>    +---
 
  1>    | +--- (base class CTestBase)
 
  1>  0 | | _iA
 
  1>  4 | | _iB
 
  1>    | +---
 
  1>  8 | _iC
 
  1> 12 | _iD
 
  1>    +---

Looking at this, it should be fairly obvious that the data members of CTestDerived have just been concatenated onto the end of the memory layout of CTestBase - and, more importantly, that the memory layout of CTestBase within CTestDerived is identical to that when it’s not a base class.

It’s that simple! (for certain definitions of ‘it’ and ‘simple’…)

Armed with this information from last post:

Stack Overflow for more detail of the wording).”

it is obvious that – since CTestDerived inherits all of the members of CTestBase - its members must appear after those of CTestBase in memory.

I remember when I had this explained this to me - not long after having started my first job in the industry as a fresh faced graduate – I did the internal equivalent of a double-take, because the information I had just received was so bleedingly obvious that I couldn’t believe I’d ever not known it.

If it’s that easy, why post about it?

Good question!

The fact that the memory layout of a type is identical in all situations is required by the standard – and also by logic – let’s see why…

First, download and open the second zipped VS2010 project file - this contains the code below in 01_InheritanceWithFunctions.cpp:

class CTestBase
 
  {
 
  public:
 
      int _iA;
 
      int _iB;
 
   
 
      CTestBase( int iA, int iB )
 
      : _iA( iA )
 
      , _iB( iB )
 
      {}
 
   
 
      int SumBase( void )
 
      {
 
          return _iA + _iB;
 
      }
 
  };
 
   
 
  class CTestDerived
 
  : public CTestBase
 
  {
 
  public:
 
      int _iC;
 
      int _iD;
 
   
 
      CTestDerived( int iA, int iB, int iC, int iD )
 
      : CTestBase ( iA, iB )
 
      , _iC ( iC )
 
      , _iD ( iD )
 
      {}
 
   
 
      int SumDerived( void )
 
      {
 
          return _iA + _iB + _iC + _iD;
 
      }
 
  };
 
   
 
  int main(int argc, char* argv[])
 
  {
 
      CTestBase       cTestBase   ( argc, argc + 1 );
 
      CTestDerived    cTestDerived( argc, argc + 1, argc + 2, argc + 3 );
 
   
 
      return cTestBase.SumBase() + cTestDerived.SumBase() + cTestDerived.SumDerived();
 
  }

Put a breakpoint on the return statement from main, and then compile and run the release build configuration.

The first thing to note is that the memory layouts printed to the output window during the build are unaffected by the addition of these functions.

This is what you would expect, as we know that non-virtual member function calls are resolved at compile time just like regular non-member and static member functions.

Since CTestDerived is derived from CTestBase, we know from our high level knowledge about C++ that we can call both of these functions on an instance of CTestDerived – what we’re looking at right now is how this is implemented.

When the breakpoint is hit, right click and choose “Go To Disassembly”.

I’ve pasted the part I’d like to discuss below…

(N.B. to get the same disassembly as this you should have the following Viewing Options checked in the disassembly window: ‘Show source code’, ‘Show line numbers’, ‘Show address’, and ‘Show symbol names’)

    44:     return cTestBase.SumBase() + cTestDerived.SumBase() + cTestDerived.SumDerived();
 
  0129109A  lea         ecx,[cTestDerived]  
 
  0129109D  call        CTestDerived::SumDerived (1291060h)  
 
  012910A2  lea         ecx,[cTestDerived]  
 
  012910A5  mov         esi,eax  
 
  012910A7  call        CTestBase::SumBase (1291020h)  
 
  012910AC  lea         ecx,[cTestBase]  
 
  012910AF  add         esi,eax  
 
  012910B1  call        CTestBase::SumBase (1291020h)  
 
  012910B6  pop         edi  
 
  012910B7  add         eax,esi

We’ve previously covered that the win32 calling convention for member functions (‘thiscall’) passes this to member functions in the ecx register.

Correspondingly, you’ll notice that the address of cTestBase and cTestDerived are being stored in ecx using lea (‘load effective address’) immediately before calling their member functions.

Specifically, note that the address of cTestDerived is passed un-tampered with in ecx when calling the base class function CTestBase::SumBase. Remember this for later (and for the next post!).

So, let’s look at the disassembly for CTestBase::SumBase and CTestDerived::SumDerived - I tend to single step the disassembly and step into them, but putting breakpoints in them is more reliable :)

CTestBase::SumBase

    14:     int SumBase( void )
 
      15:     {
 
      16:         return _iA + _iB;
 
  01291020  mov         eax,dword ptr [ecx+4]  
 
  01291023  add         eax,dword ptr [ecx]  
 
      17:     }
 
  01291025  ret

CTestDerived::SumDerived

    33:     int SumDerived( void )
 
      34:     {
 
      35:         return _iA + _iB + _iC + _iD;
 
  01291060  mov         eax,dword ptr [ecx+0Ch]  
 
  01291063  add         eax,dword ptr [ecx+8]  
 
  01291066  add         eax,dword ptr [ecx+4]  
 
  01291069  add         eax,dword ptr [ecx]  
 
      36:     }
 
  0129106B  ret

We can see that all offsets from ecx used in both functions correspond to the memory layouts we have in the build output for the type that the function belongs to.

Since _iA and _iB are at the same offset within both CTestBase and CTestDerived (i.e. 0 and 4 bytes respectively), CTestBase::SumBase can safely be called on instances of CTestDerived.

We already know that this is possible from our high level understanding of C++, but now we know the implementational detail that makes it possible.

Whilst the specifics of the disassembly will probably differ from platform to platform, the principles underlying its operation should not.

Summary

To summarise what we’ve established so far :

1) in member functions, member data of a class is accessed via specific offsets from the this pointer

2) these offsets are constants at compile time and are baked into the assembly code for the member functions

3) this means that the memory layout of the members of a given class must always be identical or the member functions won’t work

If we follow this logic through, we can see that:

4) the memory layout of a class B that inherits from another class A must contain class A‘s members in the same memory layout as class A

5) the memory layout of any given class A is identical regardless of whether it is an instance of A, or it is included in the memory of some type derived from A.

6) note: this behaviour is required by the standard, and (more significantly) by logic.

Finally, it follows that (because each member of a struct must have a higher address than those declared before it):

7) the extra memory required by derived class B will be concatenated onto the end of the memory layout of its base class A

That’s all for now – next time we’ll look at how multiple inheritance affects this picture.

I know it’s pretty short, but this just means the next one will get here more quickly :)

Epilogue – for those who wondered what I changed in the project settings

There’s quite few changes to the default VS2010 win32 console app project properties in the projects I’ve zipped up for this post.

The changes have to do with making the optimised release build configuration leave the code structure alone (i.e. not strip out or ‘fold’ functions to save exe size, prevent functions being inlined), and prevent extraneous ‘debug checking’ code being inserted (makes function calls slower, and code less easy to follow in disassembly)

turning off ‘Whole Program Optimisation’ (Configuration Properties->General)
turning off ‘Inline Function Expansion’ (Configuration Properties->C/C++ ->Optimisation)
turning off ‘Basic Runtime Checks’ (Configuration Properties->C/C++ ->Code Generation)
getting rid of pre-compiled headers to streamline the number of files (Configuration Properties->C/C++ ->Precompiled Headers)
turning off ‘Enable COMDAT folding’ (Configuration Properties->Linker-> Optimization)

Essentially, this makes the Release configuration assembly have the same structure as the Debug one WRT function calls.

Also, I use the argc parameter to main as input to the code, and return value computed from that so that the optimiser can’t assume constant input or output values.

If you use constant inputs, or don’t output a value computed from the inputs then it’s pretty hard to convince the optimiser not to optimse the entire .exe to ‘return 0;’… ;)

Shout out

Thanks (again) to Bruce – king (or at the very least duke) of advice and peer review.

#AltDevBlog

Alex Darby
Follow @darbotron