Comments on: Defining a SIMD Interface As it happens, I wrote both a class and typedef version for the PC and xbox360 and studied the assembly. The xbox360 version turned off most optimisations and repeatedly failed to keep and pass vectors in vector registers. It kept putting things back on the stack. On the xbox360 this is very inefficient. In the end I went with the typedef approach, which is undoubtedly the most efficient and actually works out rather well to be honest. It's really not as bad as you think. I don't miss the class wrapping these days. Sadly, wrapping basic types in classes does tend to confuse the compiler on platforms game devs care about. ta, Sam As it happens, I wrote both a class and typedef version for the PC and xbox360 and studied the assembly. The xbox360 version turned off most optimisations and repeatedly failed to keep and pass vectors in vector registers. It kept putting things back on the stack. On the xbox360 this is very inefficient.

In the end I went with the typedef approach, which is undoubtedly the most efficient and actually works out rather well to be honest. It’s really not as bad as you think. I don’t miss the class wrapping these days.

Sadly, wrapping basic types in classes does tend to confuse the compiler on platforms game devs care about.

ta,
Sam

]]>
By: Don Olmstead/2011/04/29/defining-an-simd-interface/#comment-3437 Don Olmstead Sun, 01 May 2011 22:19:50 +0000 Thanks for the great article Don. I'm happy to see other people out there so interested in optimization. I come from a compiler background so i'm extremely interested in knowing what compilers can do to help with optimization whether its better advice/warnings or something else that hasn't been done well on all the consoles yet, like decent auto-vectorisation. For some related reading, check out this blog post from a fellow altdev author - http://domipheus.wordpress.com/2010/11/26/migrating-scalar-vec4-classes-to-simd/ Thanks for the great article Don. I’m happy to see other people out there so interested in optimization. I come from a compiler background so i’m extremely interested in knowing what compilers can do to help with optimization whether its better advice/warnings or something else that hasn’t been done well on all the consoles yet, like decent auto-vectorisation.

For some related reading, check out this blog post from a fellow altdev author – Yea I really suppose it depends on the compilers a person has been exposed to. For the most part VS seems to do a pretty good job and thats what I primarily work in. I did do some embedded development and the compiler there wasn't ideal but the project wasn't pushing the hardware too much. So I definitely understand the distrust of the compiler. I tend to try and make everything as safe and idiot proof as possible. I've also seen this create a better idiot =P. In my next iteration I will definitely have hooks to the underlying SIMD type but have a big old warning in the doxygen about it. Yea I really suppose it depends on the compilers a person has been exposed to. For the most part VS seems to do a pretty good job and thats what I primarily work in. I did do some embedded development and the compiler there wasn’t ideal but the project wasn’t pushing the hardware too much. So I definitely understand the distrust of the compiler.

I tend to try and make everything as safe and idiot proof as possible. I’ve also seen this create a better idiot =P. In my next iteration I will definitely have hooks to the underlying SIMD type but have a big old warning in the doxygen about it.

]]>
By: Don Olmstead/2011/04/29/defining-an-simd-interface/#comment-3420 Don Olmstead Sat, 30 Apr 2011 23:19:09 +0000 Great article! More of the same for your next one would be ace :) Great article! More of the same for your next one would be ace :)

]]>
By: Jasper Bekkers/2011/04/29/defining-an-simd-interface/#comment-3416 Jasper Bekkers Sat, 30 Apr 2011 21:14:21 +0000 Thanks for the tips Jasper! The gamasutra article mentioned staying away from operator overloading as well. In the author's tests VS only seemed to choke on fairly long expressions. The Intel compiler had no problem with the overloaded operators. I'm not sure how bad the other compilers he tested did as VS 2008 and Intel's compiler (didn't see a particular version mentioned). When initially toying with these ideas I modified some of that author's code to use my own code, as the author's C++ version was not optimal (his class was sans copy constructor so VS was creating its own and not inlining it, hence the hit), and found performance to be similar to the numbers the Intel compiler was spitting out. I might have to run some performance numbers of my own using that code and Visual Studio 2010. Definitely agree about keeping everything in the registers. If I were including more code in this article you'd see a scalar class, where all 4 values are the same, so you'd see the dot product returning a scalar, or the length returning a scalar. Could you elaborate on how a typedef allows for easy platform specific optimizations? I can see how allowing access to the raw values would be useful. My concern was breaking the contracts on the types. For example if someone grabbed a scalar and modified it so it was (4, 3, 3, 4) and later down the line it was used as if it was a scalar hilarity would ensue. I've been using friendship between the classes to keep the contracts valid. Definitely agree on the last points. I'm planning on migrating my library to github once I have some free time. Once that happens I'd like to send you a link and see if you have any additional thoughts. Again thanks for the tips! Thanks for the tips Jasper!

The gamasutra article mentioned staying away from operator overloading as well. In the author’s tests VS only seemed to choke on fairly long expressions. The Intel compiler had no problem with the overloaded operators. I’m not sure how bad the other compilers he tested did as VS 2008 and Intel’s compiler (didn’t see a particular version mentioned).

When initially toying with these ideas I modified some of that author’s code to use my own code, as the author’s C++ version was not optimal (his class was sans copy constructor so VS was creating its own and not inlining it, hence the hit), and found performance to be similar to the numbers the Intel compiler was spitting out. I might have to run some performance numbers of my own using that code and Visual Studio 2010.

Definitely agree about keeping everything in the registers. If I were including more code in this article you’d see a scalar class, where all 4 values are the same, so you’d see the dot product returning a scalar, or the length returning a scalar.

Could you elaborate on how a typedef allows for easy platform specific optimizations?

I can see how allowing access to the raw values would be useful. My concern was breaking the contracts on the types. For example if someone grabbed a scalar and modified it so it was (4, 3, 3, 4) and later down the line it was used as if it was a scalar hilarity would ensue. I’ve been using friendship between the classes to keep the contracts valid.

Definitely agree on the last points.

I’m planning on migrating my library to github once I have some free time. Once that happens I’d like to send you a link and see if you have any additional thoughts.

Again thanks for the tips!

]]>
By: Jasper Bekkers/2011/04/29/defining-an-simd-interface/#comment-3390 Jasper Bekkers Fri, 29 Apr 2011 08:49:59 +0000
struct VecList
 
  {
 
  	float *x;
 
  	float *y;
 
  	float *z;
 
  };

– Aligned allocation of data
– Always allocate a multiple of the width of the SIMD registers (4 on most platforms).
– Freeform functions to do bulk operations on the VecList (eg. clear, add, dot etc)

]]>