Comments on: Mobile graphics API wishlist: performance Not sure I agree with "helper" APIs (like immediate mode / fixed function) being part of a core. As long as there is a good core API, everything else can be built on top. IHVs should not be wasting their time implementing a glBegin API, I can do whatever I need myself there thank you. Not sure I agree with “helper” APIs (like immediate mode / fixed function) being part of a core. As long as there is a good core API, everything else can be built on top. IHVs should not be wasting their time implementing a glBegin API, I can do whatever I need myself there thank you.

]]>
By: Aras Pranckevičius/2011/03/04/mobile-graphics-api-wishlist-performance/#comment-1758 Aras Pranckevičius Sat, 19 Mar 2011 14:02:26 +0000 Right.. I see it now. It's about not making drivers more complex than they have to be... If that really is the problem, just forget what I said about that system. :D Right.. I see it now. It’s about not making drivers more complex than they have to be… If that really is the problem, just forget what I said about that system. :D

]]>
By: Promit Roy/2011/03/04/mobile-graphics-api-wishlist-performance/#comment-1237 Promit Roy Sat, 05 Mar 2011 01:16:47 +0000 error checking without killing performance? It happens in OpenGL (if u call glGetError after every gl command), i never used GLES error checking without killing performance? It happens in OpenGL (if u call glGetError after every gl command), i never used GLES

]]>
By: snake5/2011/03/04/mobile-graphics-api-wishlist-performance/#comment-1221 snake5 Fri, 04 Mar 2011 12:03:45 +0000 snake5: It's not about client-side API. It's about how information is delivered to driver/GPU. Everything can be wrapped. But it won't gain you performance. snake5: It’s not about client-side API. It’s about how information is delivered to driver/GPU. Everything can be wrapped. But it won’t gain you performance.

]]>
By: Tomasz Dąbrowski/2011/03/04/mobile-graphics-api-wishlist-performance/#comment-1219 Tomasz Dąbrowski Fri, 04 Mar 2011 09:56:59 +0000 About those vertex declarations... the code doesn't have to be a mess. My code is like this: <pre lang="C">M_SetMaterial( Mtl ); //...setting uniforms here... glSetVertexDecl( SMeshVtxDecl ); glSetBufferData( 0, VB, ptr ); glSetIndexBuffer( IB ); glDrawIndexedPrimitives( GL_TRIANGLES, VertexCount, TriangleCount * 3, GL_UNSIGNED_INT, 0 );</pre> That's just because I made a <a href="http://cragegames.com/blog/vertex-declarations-opengl-23" rel="nofollow">system that wraps all those attribute-setting neck-breaking calls</a>. I hope it's a temporary solution and that GL will really have something like this in the specification though. About those vertex declarations… the code doesn’t have to be a mess. My code is like this:

M_SetMaterial( Mtl );
 
  //...setting uniforms here...
 
  glSetVertexDecl( SMeshVtxDecl );
 
  glSetBufferData( 0, VB, ptr );
 
  glSetIndexBuffer( IB );
 
  glDrawIndexedPrimitives( GL_TRIANGLES, VertexCount, TriangleCount * 3, GL_UNSIGNED_INT, 0 );

That’s just because I made a system that wraps all those attribute-setting neck-breaking calls. I hope it’s a temporary solution and that GL will really have something like this in the specification though.

]]>
By: Mykhailo Parfeniuk/2011/03/04/mobile-graphics-api-wishlist-performance/#comment-1216 Mykhailo Parfeniuk Fri, 04 Mar 2011 08:20:13 +0000 Fully offline shader compilation is a myth. At best you'll get offline tokenisation followed by a slightly cheaper runtime compile. There's just far too many runtime circumstances that can warrant a recompile for the programmer to predict. You can stamp your foot and say "well they shoudn't do that" and they'll nod and agree and shrug and do it anyway because there's a significant real-world penalty to not doing it. Joules and milliseconds beat elegance and debuggability every time. I assure you it sucks for people on both sides of the API. Just accept runtime recompile, and work to make it less agonising. Work with IHVs to "prefetch" the compiled state as efficiently as possible. Is this hard to predict? Yes. Is it hard to debug? Yes. Can you get something less wacky? No. Stop wishing for it. And of course... you're working on mobile phones right? So why do you demand more predictability from your GPU than you get from some of your CPUs? :-) > Something like D3D9 shader assembly is probably too low level (it assumes a vector4-based GPU, limited number of registers and so on) As low-level assembly goes, there's not a lot wrong with the D3D9 version (or rather, the DWORD-token stream it becomes inside the driver). The vec4 thing is trivially mapped to a bunch of scalar regs - it's a completely benign legacy. The number of registers is IIRC 4096 (*vec4) these days, which is a practically unlimited number. Of course the Microsoft compiler went to absurd lengths to use as few as possible, but it's irrelevant and actually harmful (if the MS compiler reuses r0 in two different clauses, the driver has to work hard to figure out the two uses are not dependent. If the MS compiler aggressively used new registers whenever possible, reuse is very rare and there would very little need to disambiguate). Fix that and it's a pretty good asm language - I like it. However, reducing code to any assembly-level representation does remove a lot of very useful information that the GPU's compiler then has to reconstruct (...and again with the bugs and runtime cost). On the whole I don't really recommend this path. Direct-from-HLSL has its issues, but they're honestly less gnarly than the alternatives. I would like to see APIs add ways to control where the recompile happens (and whether it can happen asynchronously!) rather than try to pretend it doesn't happen, or that you can shuffle it offline. OGL(ES) vertex declarations are indeed a disaster. They'll get it sorted just as DX moves away from the whole concept entirely and just has a bunch of untyped bit-buffers. > to guarantee there would be no shader patching Wait - you have a guarantee of that? Really? Totally sure? Would you bet a few million dollars of IHV revenue on that? :-) Shader uniforms - having them per-program is a delusion of what really goes on, but so is having them global across programs. The fact is, sometimes the driver will reorder your globals (because the access pattern is more efficient that way), and sometimes it won't (because the copy overhead is too high). However, for the purposes of an API definition, it's probably more harmful to *force* a reorder/reupload on shader change than it is to *hope* that one isn't necessary. Ideally an API would allow the app to specify the third state - "don't care". It is frequently difficult to tell by inspection of the shader code which globals the programmer expects to be valid from a previous call and which can be any old junk because the programmer thinks (asserts?) they shouldn't ever be accessed. Fully offline shader compilation is a myth. At best you’ll get offline tokenisation followed by a slightly cheaper runtime compile. There’s just far too many runtime circumstances that can warrant a recompile for the programmer to predict. You can stamp your foot and say “well they shoudn’t do that” and they’ll nod and agree and shrug and do it anyway because there’s a significant real-world penalty to not doing it. Joules and milliseconds beat elegance and debuggability every time. I assure you it sucks for people on both sides of the API.

Just accept runtime recompile, and work to make it less agonising. Work with IHVs to “prefetch” the compiled state as efficiently as possible. Is this hard to predict? Yes. Is it hard to debug? Yes. Can you get something less wacky? No. Stop wishing for it.

And of course… you’re working on mobile phones right? So why do you demand more predictability from your GPU than you get from some of your CPUs? :-)

> Something like D3D9 shader assembly is probably too low level (it assumes a vector4-based GPU, limited number of registers and so on)

As low-level assembly goes, there’s not a lot wrong with the D3D9 version (or rather, the DWORD-token stream it becomes inside the driver). The vec4 thing is trivially mapped to a bunch of scalar regs – it’s a completely benign legacy. The number of registers is IIRC 4096 (*vec4) these days, which is a practically unlimited number. Of course the Microsoft compiler went to absurd lengths to use as few as possible, but it’s irrelevant and actually harmful (if the MS compiler reuses r0 in two different clauses, the driver has to work hard to figure out the two uses are not dependent. If the MS compiler aggressively used new registers whenever possible, reuse is very rare and there would very little need to disambiguate). Fix that and it’s a pretty good asm language – I like it.

However, reducing code to any assembly-level representation does remove a lot of very useful information that the GPU’s compiler then has to reconstruct (…and again with the bugs and runtime cost). On the whole I don’t really recommend this path. Direct-from-HLSL has its issues, but they’re honestly less gnarly than the alternatives. I would like to see APIs add ways to control where the recompile happens (and whether it can happen asynchronously!) rather than try to pretend it doesn’t happen, or that you can shuffle it offline.

OGL(ES) vertex declarations are indeed a disaster. They’ll get it sorted just as DX moves away from the whole concept entirely and just has a bunch of untyped bit-buffers.

> to guarantee there would be no shader patching

Wait – you have a guarantee of that? Really? Totally sure? Would you bet a few million dollars of IHV revenue on that? :-)

Shader uniforms – having them per-program is a delusion of what really goes on, but so is having them global across programs. The fact is, sometimes the driver will reorder your globals (because the access pattern is more efficient that way), and sometimes it won’t (because the copy overhead is too high). However, for the purposes of an API definition, it’s probably more harmful to *force* a reorder/reupload on shader change than it is to *hope* that one isn’t necessary. Ideally an API would allow the app to specify the third state – “don’t care”. It is frequently difficult to tell by inspection of the shader code which globals the programmer expects to be valid from a previous call and which can be any old junk because the programmer thinks (asserts?) they shouldn’t ever be accessed.

]]>