I have never written a blog or tweeted. For the most part I keep to myself and my close friends who exist beyond the Internet. however I am trying to get more involved with the game development community and other software developers in general. That’s the reason I decided to join this blog when Mike started it.
When I came to Insomniac I was a typical naive programmer. I had taken a couple classes that focused on writing code for embedded systems and taught the basics of assembly, however most of my classes, and experience, had focused on high level paradigms like object oriented programming. As I have switched over into the world of writing engine code, I have been forced to think more about performance than I had to at previous jobs. Fortunately some people I work with are very passionate, and somewhat vocal, on the subject and have been more than willing to share their philosophies with me. I didn’t realize how much my views have changed until a conversation that I had with another programmer this week. As part of my efforts to get involved in communities I decided to also join a couple IRC channels that focus on programming. In one channel someone came in and asked a basic questionThis is a question that anyone new to programming will face at some point. Immediately the channel expert replied with:
if( x % 2 ) { odd } else { even }
Again this is a pretty common answer. Heck if you do a google search for “c number odd even” the first result is answering it for C#, but the second result goes to a page with this piece of code
int main(){
int num;
cin >> num;
// num % 2 computes the remainder when num is divided by 2
if ( num % 2 == 0 )
{
cout << num << ” is even “;
} return 0;
} From an acedemic point of view there is nothing inherintly wrong with this code… except that it’s inefficient. At this point I decided to get into the conversation so I suggested that even though performance probably was not his top priority, he should consider it and for such a simple problem instead use the bitwise solution of using “x & 1”. Now the resident expert pointed out that some compilers will optimize the modulus down to a bitwise and so both solutions will end up being exactly the same, in fact he ran both through gcc -O2 and they were the same number of instructions. However, that’s not really the point I was trying to make. If you know exactly what your compiler is doing then it’s fine to make those kinds of choices, however chances are that the person who asked the original question has no idea what his compiler is doing, so he should be aware that while two lines of code may give the exact same output, they may not be doing the same thing. To demonstrate the fact I put both operations through the snc compiler and got the following: x % 2
00025C 34010082 lqd r002,0×0040(r001)
000260 04000103 lr r003,r002 05 (0000025C) REG
000264 4CFFC182 cgti r002,r003,-0×0001 EVN
000268 09208102 nor r002,r002,r002 01 (00000264) REG
00026C 48208183 xor r003,r003,r002 01 (00000268) REG EVN
000270 0800C103 sf r003,r002,r003 01 (0000026C) REG
000274 14004183 andi r003,r003,0×0001 01 (00000270) REG EVN
000278 48208183 xor r003,r003,r002 01 (00000274) REG
00027C 0800C104 sf r004,r002,r003 01 (00000278) REG EVN
000280 3400C082 lqd r002,0×0030(r001) ODD
000284 3EC00083 cwd r003,0×00(r001)
000288 B0408203 shufb r002,r004,r002,r003 04 (00000280) REG ODD
00028C 2400C082 stqd r002,0×0030(r001) 03 (00000288) REG x & 1
000290 34010082 lqd r002,0×0040(r001) ODD
000294 14004104 andi r004,r002,0×0001 05 (00000290) REG EVN
000298 34008082 lqd r002,0×0020(r001) ODD
00029C 3EC00083 cwd r003,0×00(r001)
0002A0 B0408203 shufb r002,r004,r002,r003 04 (00000298) REG ODD
0002A4 24008082 stqd r002,0×0020(r001) 03 (000002A0) REG
As you can see, on my platform the modulus takes 13 instructions while a bitwise and does it in only 6. The best part of my conversation came next when the channel expert said
I don’t always shy from confrontation, however this was a battle I didn’t feel like fighting right then, so I conceded. Yes, if you are programming for a machine that has been around for decades and you know it uses 1’s complement then you should use a modulus. But in reality, that is the most ridiculous argument ever. In the name of coding for “portability” you are going to shy away from using an operation that is potentially more than twice as fast because you want to make sure your code runs on an outdated architecture? Can anyone tell me an architecture that actually uses 1’s compliment? I honestly don’t know of any. In fact, I told a couple co-workers about this exchange and they kept saying “don’t you mean 2’s compliment?” Understand that these are not people who are new to programming, some of them are very senior people whose level of knowledge I aspire to reach. Yet, while they understood what 1’s compliment is when I explained it to them, they had never used an architecture that /uses 1’s compliment for signed number representation.
At this point I was fairly disappointed with this new group of peers, but then another user decided to chime in. “Why would you look at freakish asm-outputs?” he quipped. As I was discussing the merits of knowing the platform your are writing to and optimizing for that platform (something the channel expert disagreed with instead promoting writing portable code) this other user said:Frankly I wasn’t sure if this was a joke so I replied “compilers will never do as good a job at optimizing as people” after which all I got for reply was “yeah right”.
After this I decided to go about my business and didn’t get involved in any more discussions with them. Since then I have been thinking about the mentalilty on display in that IRC channel and unfortunately I think it’s taught in school and promoted through the use of “abstract programming”. My first programming job was doing web development. I mostly wrote php code, with a bit of C# for my intermediate layer that handled communication between UI and the database. During that time I never thought about optimizing instructions. Realistically, when writing web apps it doesn’t make sense to worry about a couple instructions when the network is often your biggest bottleneck. However, even in that kind of programming environment we should still be aware of what our compiler (or interpreter in some cases) is doing and what that means for the architecture that our code will run on.
“Coding for portability” is a terrible excuse, almost no one is writing code that they expect to be able to run on a 286 as well as a core 2 duo. Often you will be targeting multiple platforms, Xbox360 and PS3 is obviously very common, but even then you have a fixed problem space and you write your code to those platforms. You will never be targeting some unknown number of platforms so we need to drop that mind set.This is a problem that needs to be addressed early on in a programmers career. I would love to see a college class that focuses on these topics, having students look at different compilers and analyzing their output. It’s something that I believe would benefit all programmers regardless of what industry they end up in after college, and I can say from personal experience that it would make the transition into game development a lot easier for any programmers that went that route.