I am not writing this to bash Microsoft, as much as people love to do that. I use products, platforms, and operating systems from every Microsoft competitor, but many of Microsoft’s products are still my favorite – particularly Windows 7 and Visual Studio. Rather, I am writing this in the hopelessly desperate attempt that someone, somewhere high up at Microsoft will read this and give a damn.
Let us start with DirectX. I remember programming on DirectX when it was still a fixed-function pipeline, and I thought it was a brilliant move when programmable shaders were introduced, or at least when they matured a bit in shader model 3.0c. Obviously, this was not solely Microsoft’s innovation, but a joint (even if not directly cooperative) effort between MS, the Khronos Group, and various hardware manufacturers. That was August, 2004.
Fast forward 7 years later. Most titles are still shipping based on DirectX 9.0c code or the OpenGL equivalent (except in mobile applications, obviously). Why? Because its still the lowest reasonable common denominator, especially with the number of consoles that support this hardware level.
To be honest, DirectX 9.0c is not that bad, but it is 7 years old. The real problem is the mess of APIs Microsoft has introduced since 2004.
DirectX 10 introduced geometry shaders, new texture formats, and slight performance improvements in some areas, but required new hardware over 9.0c cards. Geometry shaders (at least from what I have perceived) were poorly received by developers. 10.1 introduced more texture formats, blend modes, and enhanced MSAA support, but required a new set of hardware over 10.0 cards. DirectX 11 introduced tessellation (which some people viewed as “geometry shaders, but done right”), native multi-threading support, and DirectCompute (similar to CUDA/OpenCL). I have done a lot of experimentation with tessellation (see pic below), and while it is a very cool feature, I have to say the hardware is not quite there yet in terms of justifying the performance gains. Also, the documentation is horrible at best – at the time I programmed with it, there was literally no documentation that properly explained the functionality of each tessellation pipeline. DirectX 11 is not a bad product overall, but the timing of its release was absolutely horrible.
Since DirectX 9.0c, 3 new levels of hardware have been required for each subsequent version (10.0, 10.1, and 11.0). Over a span of 7 years, that would be a rough average of 2.3 years per hardware cycle. It takes more time than that to create a single AAA title! But the absolute worst part of it all are the consoles of 2005 to present. Don’t get me wrong, they are all great consoles, but they all obliterated the practicality of programming for 10.0 and above hardware.
Rightfully so, consumers have become jaded about purchasing new graphics hardware, because so far the payoff has been increasingly slim. Microsoft did not give DirectX 9 enough time to mature; 10, 10.1, and 11 only constitute the beginnings of a significant new version IMHO – I felt like they were rushed out the door to promote Vista and Windows 7, respectively. The lack of drive to develop on these new APIs has sent out the wrong message to every party – that we don’t care about advancing technology and might as well continue catering to the old APIs. Instead of a console refresh, we have Kinect (I actually think the product is great for certain applications, but I hope MS is developing a new console relatively soon as well).
In reality, there are many parties responsible, but I would argue Microsoft has the most influence in the world when it comes to determining graphics APIs, given they have the most popular operating systems and a remarkably popular console. Indeed, it would appear that the past releases of OpenGL have simply followed in the wake of DirectX features. Microsoft should have held off on a new version of DirectX until the next console generation – I hope they have learned this lesson.
But it does not stop at DirectX. XNA studio seems to be exacerbating the problem of hardware/platform fragmentation. In a rather questionable move, the most recent release of XNA (4.0) has been dumbed down to support Windows 7 Phones, although they allow you to override this by targeting different releases (i.e. “performance” and “reach”) – but this fragmentation leads to confusion and inability to target all platforms with a decent level of hardware. XNA also has a few strange limitations on PC because it is forced to support Xbox hardware. Windows 7 phones also only support very limited shaders, not even 9.0c level – almost fixed-function-like.
So, I am going to ask it once: one API, one level of hardware to worry about please, Microsoft. If this means abandoning backwards compatibility with older hardware, so be it. Just make sure you only do it once every five years or so, and give the hardware and platform time to mature.
While I am complaining, allow me to bring up your various GUI technologies. Silverlight, WPF, Windows Forms, and now with Windows 8: HTML 5 – yet no support for WebGL. What in the world is going on over there? It’s almost as if you have 20 different people independently making executive-level decisions. And why is it still so hard to get a decent GUI inside a DirectX/XNA application? One ring to rule them all, that is all I ask.
I think that our problems run deeper than the APIs though. Current graphics hardware is still, in many ways, tied to its fixed-function past. It does not resemble the design of CPUs, which try to minimize instruction sets and maximize flexibility. I do not doubt that GPUs will continue to dominate in graphics performance, but I think there is a viable market for less powerful, more flexible, easier to work with hardware. Intel tried to enter this arena once before with Larabee, and although they were not initially successful I still see a bright future here, perhaps one aided by their new 3D transistor technology. I think that as you lower the barrier to becoming a competent graphics programmer, you see an influx of new developers and products into the market. I cannot count how many times I wanted to give up learning on account of the difficulty in working with modern graphics hardware and APIs.
If you want to really amaze me, Microsoft, start working with your hardware partners to get a CPU-based graphics solution in place with competitive performance. I think the intricacies of GPU hardware are annoying. I understand from a performance standpoint how they are justified, but at what point are performance gains worth a programmer’s sanity? I hate having to deal with a million different texture formats. What would be ideal is to simply output to a variable number of channels, say 0-64 channels, each 32-bit floating point, rather than juggling a number of textures, each with different bits per channel, number of channels, etc. Yes, it might very well consume a lot more bandwidth, but I think overall it is a more optimal solution – if you look at most CPU’s, they do not have special registers to store and calculate bytes – rather they convert bytes to 32-bit integers and then perform calculations on them. Even if performance is worse, it is easier for a programmer rather than trying to carefully decide how to pack information into channels between several textures. Obviously it would pay off to build a memory and caching system that would play nicely with this. It would be nice to have 16 GB of memory at my disposal, instead of the standard 0.5-1.0 gigabyte in a given graphics card, perhaps optimized to be divided among each thread or core. Most importantly, it would be amazing to to eliminate the CPU/GPU bottleneck that prevents us from doing a lot of meaningful interactions between the two. Also, for the love of all that is holy, make the depth buffer easily readable. Or better yet, get rid of the fixed-function buffer and replace it with something programmable. I think that it might even be possible to do away with features like MSAA, especially with the increased interest in ray-tracing and post-process rendering, for the sake of speed and simplicity in designing new chips. Keep the APIs as low level as possible – people writing the middleware will worry about the rest.
I would love to see a return to the “good old days” when people like Carmack and Silverman could write their own engines from the ground up – things of real technical beauty – instead of dancing around the awkward limitations of graphics hardware and APIs.