At November we aim to deliver a high quality 3D experience in your browser with minimal wait times. A big part of a good looking experience is good looking textures. To prepare us to serve more online users, we recently implemented the following texture compression method. I imagine other developers have done similar things and would love to hear your experiences.
A case for compression
If you want to make a game that looks good, you are likely to use a lot of high-resolution-textures. The problem with beautiful textures is that they traditionally takes up a lot of bandwidth. Assuming you don’t have pre-rendered movies, a wild estimate would be that around 70% of your asset-bandwidth are textures.
Smaller texture file-sizes enable us to serve more users because of lower bandwidth, also, each user can start playing faster.
There are some standard ways of compressing textures (ETC, S3TC, PVRTC) where the graphics hardware can use the compressed texture directly, thus saving memory not only on the DVD/network but also in the graphics hardware. These formats are designed for hardware-friendly decompression and all have fixed compression ratios. To enable us to squeeze more pixels from each bit of bandwidth, we need to look for other compression schemes.
The most well known to compress images (outside of games) these days is JPEG. This seems like a great way to go. One reason is that optimized decompression code is readily available. One downside is that JPEG does only support color-channels (no alpha), so a vanilla JPEG solution is not good enough for us. If you can live without alpha support, you’re probably fine with JPEG.
Another option is JPEG2000. On the upside, it does support alpha. On the downside it is way slower (10x) than JPEG and it’s crippled by patents. As such there are no freely available libraries for compression and decompression of JPEG2000 images.
The visualization of the basis functions are particularly helpful. If you’re familiar with spherical harmonics, you’ll appreciate the similarities.
The compression we implemented works on a block of 16×16 pixels. Instead of supporting all texture sizes, and have special cases for edges, we make it easy for ourselves and support only textures with dimensions divisible with 16. In the example below only RGB channels are present. An alpha-channel would be treated like the luminosity channel. For each block we do the following:
- Convert pixels from red-green-blue to YCoCg
- Down-sample the chroma channels
- Divide the Luminance, into four 8×8 blocks.
Now we only have blocks of size 8×8
For each 8×8 block:
- Discrete Cosine Transform to get frequency-coefficients
- Quantize coefficients (will zero out small coefficients)
- zigzag reorder
- store only non-zeroes
The color space we convert to (YCoCg - luminance, chroma-orange, chroma-green ) is also used by other modern image processing formats, like H.264. It reached its popularity because of the simplicity of integer implementation and the fact that you can convert from RGB and back with no rounding errors.
Most of the compression happens in the Quantize step, and this is where the “quality” parameter is used. The compression-ratio is not easily guessed, since it’s determined by both the quality-parameter and the complexity of the image.
Zigzag re-order might look like a silly thing to do, but since the non-zero coefficients tend to bunch up in the top left corner of the block, the zig-zag order will create the longest string of trailing zeroes.After the reordering, we simply store all coefficients up to the last non-zero.
This whole method is very close to JPEG-compression, the big difference being we’re not doing any loss-less compression at the end. In our case (November) we have encryption and compression built in to our general streaming-system already, so we do not worry about this step.
An interesting quality regarding this type of compression is the symmetry in processing between compression and decompression. In our experience the DCT-compression is magnitudes faster than the DXT compression tool we use. At November, we often compress large batches of textures, and this speedup is much appreciated.
One size might not fit all
In the initial testing we noticed that certain textures are not suitable for DCT compression. Simply put, natural images are well suited and synthetic images like test-screens are not: you get ringing, color-bleeding and other ugly artifacts.
Another quality of those images are that they don’t compress very well. We are able to use that fact to our advantage though. We look at the resulting compressed file-size, and if the compression ratio is not good enough, we fall back to DXT.
The decompression follows the compression steps exactly, except in the opposite order.
To avoid redundant memory copying, the decompression is made to write straight in to a locked texture (RGBA or RGBX). A likely future improvement is re-compressing to a hardware friendly format (like DXT or similar)
We optimized the decompression in a few ways:
- Using Loeffler algorithm (fewer multiplies for DCT)
- Special case for trivial DCT cases (single coefficient)
- Fixed-point to eliminate float-to-int conversions.
For a while we pondered for a bit compressing mip-maps together with the texture, but realized that generating the mip-maps are magnitudes faster. Also not storing the mip-maps saves about 25% of space. We simply generate the mip-maps using a 2×2 box-filter.
At first we experimented by simple sharpening for the mip-maps, but too often the compression -artifacts would be emphasized, so we left it out.
The below is a detail from a sky-dome texture. The image has been scaled up to clearly show the compression artifacts. To judge the quality more justly, try squinting a little bit or scooting your chair away from the screen :)
|Top Left||Original TGA||1816 KB|
|Top Middle||DXT||328 KB|
|Top Right||DCT-Q99||93 KB|
|Bottom Left||DCT-Q70||79 KB|
|Bottom Middle||DCT-Q60||68 KB|
|Bottom Right||DCT-Q50||52 KB|
As expected, this reduction in file-size comes at a price. Most finer details are smeared out, while the strong features are preserved.
To put the above sizes in context, the original image size is 2048×512(RGB). To better approximate our use case, the file sizes are after loss-less compression (7z).
On an Intel E8400 3GHz CPU decompressing this texture takes 12ms (83 megapixels / second) on a single thread.
To avoid hitches to the frame-rate, we do the DCT-decompression and mip-map generation in a separate thread together with the streaming systems decryption and zlib-decompression.
- The decoding step could be optimized for trivial cases considering all channels. For instance when decoding a 16×16 block where every pixel is the same color, we should convert the Y-Co-Cg to the RGB value once and than just fill the destination block with that color.
- The first 1D DCT could ignore rows of all-zeroes.
- Instead of de-compressing to raw pixel-data, we can re-compress the image into more hardware friendly formats, such as DXT. Both are block-based, so this a good fit.
- The DCT code could be improved to take advantage of hardware specific SIMD features (SSE2, NEON), by operating on multiple DCT blocks at a time.
- The mip-map generation should be gamma corrected.
- In our current implementation we only support low to medium quality compression. By adding support for coefficients larger 255 we would make the format more complex, but also allow for quality levels like JPEG.
We feel lossy texture-compression provides a very useful slider that we can seamlessly adjust to tweak the balance of quality and bandwidth.
The above method was fairly straight forward to implement, and performs as expected, and still has room for improvements.
If you made it through all the way here, I’m sure you’re interested in compression. I’d love to hear how you have tackled compression in the past or how you’re planning on doing it in the future.