1
 
  2
 
  3
 
  4
 
  5
 
  
void main()
 
  {
 
  	int* p = 0;
 
  	*p = 4919;
 
  }

*CRASH*

First-chance exception at 0×00161015 in test.exe: 0xC0000005: Access violation writing location 0×00000000.
Unhandled exception at 0×00161015 in test.exe: 0xC0000005: Access violation writing location 0×00000000.

    Have you ever stopped to wonder what those 2 lines mean?

  • What is a “First-chance exception?”
  • What is an “Access violation?”
  • What is an “Unhandled exception?”
  • What are those 3 hex values?

Exceptions

To answer the “First-chance exception” and “Unhandled exception” I’ll link you to a nice little article written by David Kline that happens to be at the top of the google search hits. For the lazy: when your program encounters an “exceptional” circumstance then a “first-chance” exception is thrown. This allows your debugger to do something (i.e. stop execution) before it is passed off to any exception handling routine you have setup (for ex: try/catch blocks). Breaking on a first-chance exception is disabled by default but can be turned on in Visual Studio (2005/2008) through Debug->Exceptions… by checking all the “Thrown” boxes. If the debugger decides to do nothing with the exception then it allows the program to handle it, if the program is unable to handle the exception then a second “unhandled” exception is thrown and your program will stop.

Violations

“Access violation” is any sort of action, involving memory, that cannot be completed by the processor. Meaning, you just tried to read/write memory that does not exist or you tried to read/write memory that the operating system has decided you may not touch.

HEX?

Finally we get to the strangest and, in my humble opinion the most enlightening, question of all: what do those hex values mean? Let’s enter hex’s favorite playland: assembly. (Note: to see this for yourself: right-click on your crashed cpp file and select “Go To Disassembly”)

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  
void main()
 
  {
 
  	int* p = 0;
 
  0016100B  mov         dword ptr [p],0 
 
  	*p = 4919;
 
  00161012  mov         eax,dword ptr [p] 
 
  00161015  mov         dword ptr [eax],1337h
 
  }

Take a look at the line you expect to crash. Notice anything about the numbers on the left around it compared to our 2 exception messages?

That’s right! on line 7 we have: 00161015 the same number as our 0×00161015 from the exception messages! The debugger is telling us that we crashed attempting to execute that line of assembly. Now that we know where it crashed, let’s try to figure out why we got an access violation.

mov

Let’s take a quick look at line 00161015 and the line above it. On 00161012 we are setting the value of register eax to the value of our pointer. For work to be done in a processor (like add, multiply, etc…) it must be done on registers. Normally a mov occurs from multiple variables into registers, then some operation is performed and finally the end result of that operation is written back out to the variable so the processor can free up the register to do more work. On the line we crashed on we are attempting to write a literal value of 1337h (aka 4919 in decimal) to the location pointed to by eax. “[eax]” means the value pointed to by eax. That’s why we got an access violation, we are attempting to write to location 0×0 in memory. That is why we get the Access violation writing location 0×00000000, because we are attempting to write data to location 0×00000000!

That’s a lot of stuff that I just threw at you, let’s go through some more examples to see how helpful understanding these 2 lines can be!

Structures

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  
struct test
 
  {
 
  	int foo;
 
  	int bar;
 
  };
 
  void main()
 
  {
 
  	test* t = 0;
 
  	t->bar = 4919;
 
  }

*CRASH*

First-chance exception at 0×00101015 in test.exe: 0xC0000005: Access violation writing location 0×00000004.
Unhandled exception at 0×00101015 in test.exe: 0xC0000005: Access violation writing location 0×00000004.

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  
void main()
 
  {
 
  	test* t = 0;
 
  0010100B  mov         dword ptr [p],0 
 
  	t->bar = 4919;
 
  00101012  mov         eax,dword ptr [p] 
 
  00101015  mov         dword ptr [eax+4],1337h
 
  }

Once again we are dereferencing a NULL pointer and once again our program is crashing but this time we are trying to set a value in a structure. So what’s changed between our exceptions? Obviously the code assembly line we crashed on has changed, what else?
Do you see some 4s showing up in some places?

  • …writing location 0×00000004
  • dword ptr [eax+4],1337h

Where is that coming from? Well, we are trying to access the variable bar in our test structure. foo and bar are ints. sizeof(int) in 32bit programs = 4 bytes = 32 bits. So our program is taking the address of our structure in memory and offsetting that address to get to our variable inside the structure. 0×00000000 + sizeof(int) = 0×00000004. As you can see from the assembly instruction that we crashed on “[eax+4]“, we are attempting to assign some data to the variable in our structure which is at (pointer to our structure) + (offset to our variable in the structure). 0 + 4. Cool.

Basics Complete. Now onto struct/class functions.

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  
#include <cstdio>
 
  struct test
 
  {
 
  	void print()
 
  	{
 
  		printf("hello!\n");
 
  	}
 
  };
 
  void main()
 
  {
 
  	test* p = 0;
 
  	p->print();
 
  }

Now what do you expect to happen? A crash right? …

hello!

Wait, what!?

Let’s take a look at the assembly.

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  
struct test
 
  {
 
  	void print()
 
  	{
 
  		printf("hello!\n");
 
  012D103E  push        offset string "hello!\n" (12E81CCh) 
 
  012D1043  call        printf (12D11F7h)
 
  	}
 
  };
 
  void main()
 
  {
 
  	test* p = 0;
 
  012D100B  mov         dword ptr [p],0 
 
  	p->print();
 
  012D1012  mov         ecx,dword ptr [p] 
 
  012D1015  call        test::print (12D1030h) 
 
  }

Take a close look at ecx, follow its use. Did you notice ecx is only ever set to 0? We never use ecx anywhere! No where in our generated code do we have “[ecx]“!! Let that sink in a bit. Try to understand that the compiler did everything right; it did everything you asked it to.

Ever wonder why sometimes your code crashes multiple levels deep in functions in your class when the real problem was that your pointer was NULL? Here is the reason why. The compiler has no reason, whatsoever, to touch any data in your class unless you tell it to.

Let’s cement this

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  
struct test
 
  {
 
  	int foo;
 
  	int bar;
 
  	void crash()
 
  	{
 
  		printf("hello!\n");
 
  		bar = 4919;
 
  	}
 
  };
 
  void main()
 
  {
 
  	test* p = 0;
 
  	p->crash();
 
  }

hello!
First-chance exception at 0x0003104e in test.exe: 0xC0000005: Access violation writing location 0×00000004.
Unhandled exception at 0x0003104e in test.exe: 0xC0000005: Access violation writing location 0×00000004.

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  
#include <cstdio>
 
  struct test
 
  {
 
  	int foo;
 
  	int bar;
 
  	void crash()
 
  	{
 
  		printf("hello!\n");
 
  0003103E  push        offset string "hello!\n" (481CCh) 
 
  00031043  call        printf (31201h)
 
  		bar = 4919;
 
  0003104B  mov         eax,dword ptr [this] 
 
  0003104E  mov         dword ptr [eax+4],1337h 
 
  	}
 
  };
 
  void main()
 
  {
 
  	test* p = 0;
 
  0003100B  mov         dword ptr [p],0 
 
  	p->crash();
 
  00031012  mov         ecx,dword ptr [p] 
 
  00031015  call        test::crash (31030h) 
 
  }

Now, a little “gotcha” before I continue. You may notice that we are using this instead of ecx on line 0003104B. Time for me to be honest, I’m hiding some assembly from you. My goal in this article is not to teach you assembly but to help you understand how and why programs crash and more importantly what you can glean from them to aid you in debugging and bug-fixing! this is a renaming by visual studio of the register containing our struct/class pointer. In our case that happens to be ecx.

*WHEW*
Hopefully, from the previous examples you now understand 2 things:

  • Why “hello!\n” is printed
  • Why we crashed on 0003104E

VIRTUAL FUNCTIONS

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  
#include <cstdio>
 
  struct test
 
  {
 
  	virtual void crash()
 
  	{
 
  		printf("hello!\n");
 
  	}
 
  };
 
  void main()
 
  {
 
  	test* p = 0;
 
  	p->crash();
 
  }

Now what should happen? You may be surprised by the answer…

First-chance exception at 0×00261016 in test.exe: 0xC0000005: Access violation reading location 0×00000000.
Unhandled exception at 0×00261016 in test.exe: 0xC0000005: Access violation reading location 0×00000000.

Some of you might be thinking: “So wait, if I don’t make that function virtual than it doesn’t crash? But when I make it virtual it crashes!? WTF is going on? It’s not like I’m touching any data inside of my structure!”

“It’s not like I’m touching any data inside of my structure!” I’m sorry to break it to you good sir/ma’am, but you are touching data. To answer why that’s the case, let’s take a look at the assembly!

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  
//... removed for clarity
 
  {
 
  	test* p = 0;
 
  0026100C  mov         dword ptr [p],0 
 
  	p->crash();
 
  00261013  mov         eax,dword ptr [p] 
 
  00261016  mov         edx,dword ptr [eax] 
 
  00261018  mov         esi,esp 
 
  0026101A  mov         ecx,dword ptr [p] 
 
  0026101D  mov         eax,dword ptr [edx] 
 
  0026101F  call        eax
 
  }

Our exception says: “Access violation reading location 0×00000000″. Notice the difference? We are reading data here instead of writing data. 00261016 mov edx,dword ptr [eax] Simply put, the value of eax is 0 and that is not a valid location to read from. But how did eax get set to be 0? Take a look at the line above! eax is 0 because our pointer p is zero. Why are we trying to read the value from eax? Because we are calling a virtual function. Whenever code calls a virtual function then the compiler must do 2 things. First: it must make room in your struct/class for a virtual function table or vftbl for short. Second: it must generate code to figure out which function to call at runtime. So the line we crashed on is just doing the work to first figure out which function to call. It has to read the first variable of our structure (because that is where the vftbl is stored) to figure that out.

POP QUIZ

What do you expect the code below to do? (Hint: the code will crash, try to guess what line it will crash on and what location the Access Violation will occur at)

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  
#include <cstdio>
 
  struct test
 
  {
 
  	int foo, bar;
 
  	void print()
 
  	{
 
  		printf("print\n");
 
  	}
 
  	void touch()
 
  	{
 
  		printf("touch\n");
 
  		bar = 4919;
 
  	}
 
  	virtual void crash()
 
  	{
 
  		printf("crash\n");
 
  	}
 
  };
 
  void main()
 
  {
 
  	test* p = 0;
 
  	p->print();
 
  	p->touch();
 
  	p->crash();
 
  }
1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  15
 
  16
 
  17
 
  18
 
  19
 
  20
 
  21
 
  22
 
  23
 
  24
 
  25
 
  26
 
  27
 
  28
 
  29
 
  30
 
  31
 
  32
 
  33
 
  34
 
  35
 
  36
 
  37
 
  38
 
  39
 
  40
 
  41
 
  42
 
  
#include <cstdio>
 
  struct test
 
  {
 
  	int foo, bar;
 
  	void print()
 
  	{
 
  		printf("print\n");
 
  0035105E  push        offset string "print\n" (3681CCh) 
 
  00351063  call        printf (351251h)
 
  	}
 
  	void touch()
 
  	{
 
  		printf("touch\n");
 
  0035108E  push        offset string "touch\n" (3681D4h) 
 
  00351093  call        printf (351251h)
 
  		bar = 4919;
 
  0035109B  mov         eax,dword ptr [this] 
 
  0035109E  mov         dword ptr [eax+8],1337h 
 
  	}
 
  	virtual void crash()
 
  	{
 
  		printf("crash\n");
 
  	}
 
  };
 
  void main()
 
  {
 
  	test* p = 0;
 
  0035100C  mov         dword ptr [p],0 
 
  	p->print();
 
  00351013  mov         ecx,dword ptr [p] 
 
  00351016  call        test::print (351050h) 
 
  	p->touch();
 
  0035101B  mov         ecx,dword ptr [p] 
 
  0035101E  call        test::touch (351080h) 
 
  	p->crash();
 
  00351023  mov         eax,dword ptr [p] 
 
  00351026  mov         edx,dword ptr [eax] 
 
  00351028  mov         esi,esp 
 
  0035102A  mov         ecx,dword ptr [p] 
 
  0035102D  mov         eax,dword ptr [edx] 
 
  0035102F  call        eax
 
  }

print
touch
First-chance exception at 0x0035109e in test.exe: 0xC0000005: Access violation writing location 0×00000008.
Unhandled exception at 0x0035109e in test.exe: 0xC0000005: Access violation writing location 0×00000008.

We crash on 0x0035109e because that is the first place in our program’s execution that touches invalid memory. Here’s the ordering laid out:

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  
0035100C  mov         dword ptr [p],0 ;p = 0
 
  00351013  mov         ecx,dword ptr [p] ;store p (in case the function call needs a this pointer)
 
  00351016  call        test::print (351050h)
 
  ;enter test::print
 
  0035105E  push        offset string "print\n" (3681CCh)
 
  00351063  call        printf (351251h)
 
  ;leave test::print and return to main
 
  0035101B  mov         ecx,dword ptr [p] ;store p yet again (code was compiled without optimizations)
 
  0035101E  call        test::touch (351080h)
 
  ;enter test::touch
 
  0035108E  push        offset string "touch\n" (3681D4h)
 
  00351093  call        printf (351251h)
 
  0035109B  mov         eax,dword ptr [this] ;move ecx (aka this) into eax
 
  0035109E  mov         dword ptr [eax+8],1337h ;*CRASH* writing to [0 + 8] = 0x00000008.

The reason why the compiler output [eax+8]? sizeof(vftbl) + sizeof(test::foo) = 4 + 4 = 8.

If you wish to see the value of 8 for yourself you can use offsetof.
Also in the Visual Studio watch window you could write:

&((test*)0)->bar

So there ya go. A whole lot of null dereferences, a whole lot of assembly and, hopefully, a better understanding of what happens when your computer crashes and what initial steps you can take to understand and solve the problem. Happy Debugging!

—–UPDATE September 28, 2011—–
Stefan Reinalter posted a link to an excellent article by Elan Ruskin that goes much, much more in-depth into this stuff. I highly recommend working through the “forensic debugging” slide-show!

http://assemblyrequired.crashworks.org/2011/03/08/annotated-slides-for-gdc11-forensic-debugging/

http://assemblyrequired.crashworks.org/gdc-2011-crash-analysis-and-forensic-debugging/