Comments on: Old Tricks, New Dogs: Removing Branches Code coverage can be improved, though branch free code doesn't completely solve the problem. It is still possible to calculate an incorrect value, but never select it. Just as it is also possible to get something wrong in the select criteria. So still need to check different inputs that test the "logical branches". A real world example that springs to mind here is a bug i wrote in code to calculate the length of a three dimensional vector. Pseudo code along the lines of <pre>vec len3( vec v ) { vec len2 = dot3( v, v ); vec inv_len = reciprocal_sqrt_1_newton_raphson( len2 ); vec mask = mask_set_if_eq( len2, zero_vec() ); vec len = len2 * inv_len; return len & ~ mask; }</pre> If len2 was zero, then the inv_len would be positive infinity. Multiplying these together would make len NaN. So mask is used to catch this case and correctly return zero instead of NaN. The bug was that i accidentally used an integer rather than floating point comparison in calculating the mask. In theory this should have been fine with denormals flushed to zero, but for a hardware bug where a denormal was actually being generated by one of these instructions (won't say what hardware this was on, reported it and never got any response). So here the bug wasn't immediately found because even though every instruction was always executed, there was a flaw in the code that selected between the two logical branches. Code coverage can be improved, though branch free code doesn’t completely solve the problem. It is still possible to calculate an incorrect value, but never select it. Just as it is also possible to get something wrong in the select criteria. So still need to check different inputs that test the “logical branches”.

A real world example that springs to mind here is a bug i wrote in code to calculate the length of a three dimensional vector. Pseudo code along the lines of

vec len3( vec v ) {
 
    vec len2 = dot3( v, v );
 
    vec inv_len = reciprocal_sqrt_1_newton_raphson( len2 );
 
    vec mask = mask_set_if_eq( len2, zero_vec() );
 
    vec len = len2 * inv_len;
 
    return len & ~ mask;
 
  }

If len2 was zero, then the inv_len would be positive infinity. Multiplying these together would make len NaN. So mask is used to catch this case and correctly return zero instead of NaN.

The bug was that i accidentally used an integer rather than floating point comparison in calculating the mask. In theory this should have been fine with denormals flushed to zero, but for a hardware bug where a denormal was actually being generated by one of these instructions (won’t say what hardware this was on, reported it and never got any response).

So here the bug wasn’t immediately found because even though every instruction was always executed, there was a flaw in the code that selected between the two logical branches.

]]>
By: Mike Acton/2011/06/24/old-tricks-new-dogs-removing-branches/#comment-6316 Mike Acton Sat, 25 Jun 2011 06:53:42 +0000 Cool! For even more fun with removing branches, GPU programming. In CUDA there are big penalties for warp divergence, leads to lots of fun branch-free code. GPUs do have predication that the compiler uses automatically, but good tricks still make a difference. Cool! For even more fun with removing branches, GPU programming. In CUDA there are big penalties for warp divergence, leads to lots of fun branch-free code. GPUs do have predication that the compiler uses automatically, but good tricks still make a difference.

]]>
By: Ian Callaghan/2011/06/24/old-tricks-new-dogs-removing-branches/#comment-6267 Ian Callaghan Fri, 24 Jun 2011 01:19:55 +0000