Playing (with) Video

Instapaper Text

Playing (with) Video

So you want to play some video? Shouldn’t be too hard, right? Just download some video playing library and call the play_video() function. Easy-peasy-lemon-squeezy.

Well, you have to make sure that the video is encoded correctly, that the library works on all platforms and plays nice with your memory, file, sound and streaming abstractions, and that the audio and video doesn’t desynchronize, which for some inexplicable reason seems to be a huge problem.

But this is just technical stuff. We can deal with that. What is worse is that video playback is also a legal morass.

There are literally thousands of broad patents covering different aspects of video decompression. If you want to do some video coding experiments of your own you will have to read, understand and memorize all these patents so that you can carefully tip-toe your code and algorithms around them.

Of course, if you had a big enough pool of patents of your own you might not have to care as much, since if someone sued you, you could sue them right back with something from your own stockpile. Mutually assured destruction through lawyers. Ah, the wonderful world of software patents.

So, creating your own solution is pretty much out of the question. You have to pick one of the existing alternatives and do the best you can with it. In this article I’m going to look at some different options and discuss the advantages and drawbacks of each one:

Just say no
Bink
Platform specific
H.264
WebM

There are other alternatives that didn’t make it to this list, such as Dirac, Theora, and DivX. I’ve decided to focus on these five, since in my view H.264 is the best of the commercial formats and WebM the most promising of the “free” ones.

An initial idea might be: Why not just do whatever it is VLC does? Everybody’s favorite video player plays pretty much whatever you throw at it and is open source software.

Unfortunately that doesn’t work, for two reasons. First, VLC:s code is a mix of GPL and LGPL stuff. Even if you just use the LGPL parts you will run into trouble on platforms that don’t support dynamic linking. Second, the VLC team doesn’t really care about patents and just infringe away. You can probably not afford to do the same. (As a result, there is a very real threat that VLC might be sued out of existence.)

A quick introduction

Before we start looking at the alternatives I want to say something short about what a video file is, since there is some confusion in the matter, even among educated people.

A video file has three main parts:

Video data (H.264, DivX, Theora, VP8, …)
Audio data (MP3, AAC, Vorbis, …)
A container format (Avi, Mkv, MP4, Ogg, …)

The container format is just a way of packing together the audio and video data in a single file, together with some additional information.

The simplest possible container format would be to just concatenate the audio data to the video data and be done with it. But typically we want more functionality. We want to be able to stream the content, i. e. start playing it before we have downloaded the whole file, which means that audio and video data must be multiplexed. We also want to be able to quickly seek to specific time codes, so we may need an index for that. We might also want things like audio tracks in different languages, subtitling, commentary, DVD menus, etc. Container formats can become quite intricate once you start to add all this stuff.

A common source of confusion is that the extension of a video file (.avi, .mkv, .mp4, .ogg) only tells you the container format, not the codecs used for the audio and video data in the container. So a video player may fail to play a file even though it understands the container format (because it doesn’t understand what’s inside it).

Option 1: Just say no

Who says there has to be video in a game? The alternative is to do all cut scenes, splash screens, logos, etc in-game and use the regular renderer for everything. As technology advances and real-time visuals come closer and closer in quality to offline renders, this becomes an increasingly attractive option. It also has a number of advantages:

You can re-use the in-game content.
Production is simpler. If you change something you don’t have to re-render the entire movie.
You don’t have to decide on resolution and framerate, everything is rendered at the user’s settings.
You can dynamically adapt the content, for example dress the players in their customized gear.
Having everything be “in-game visuals” is good marketing.

If I was making a game I would do everything in-game. But I’m not, I’m making an engine. And I can’t really tell my customers what they can and cannot do. The fact is that there are a number of legitimate reasons for using video:

Some scenes are too complex to be rendered in-game.
Producing videos can be simpler than making in-game content, since it is easier to outsource. Anybody can make a video, but only the core team can make in-game content and they may not have much time left on their hands.
Playing a video while streaming in content can be used to hide loading times. An in-game scene could be used in the same way, but a high-fidelity in-game scene might require too much memory, not leaving enough for the content that is streaming in.

As engine developers it seems we should at least provide some way of playing video, even if we recommend to our customers to do their cutscenes in-game.

Option 2: Bink

Bink from RAD game tools is as close as you can get to a de facto standard in the games industry, being used in more than 5800 games on 14 different platforms.

The main drawback of Bink is the pricing. At $ 8500 per platform per game it is not exactly expensive, but for a smaller game targeting multiple platforms that is still a noticeable sum.

Many games have quite modest video needs. Perhaps they will just use the video player for a 30 second splash screen at the start of the game and nothing more. Paying $ 34 000 to get that on four platforms seems excessive.

At Bitsquid our goal has always been to develop an engine that works for both big budget and small budget titles. This means that all the essential functionality of an engine (animation, sound, gui, video, etc) should be available to the licensees without any additional licensing costs (above what they are already paying for an engine). Licensees who have special interest in one particular area may very well choose to integrate a special middleware package to fulfill their needs, but we don’t want to force everybody to do that.

So, in terms of video, this means that we want to include a basic video player without the $ 8500 price tag of Bink. That video player may not be as performant as Bink in terms of memory and processor use, but it should work well enough for anyone who just wants to play a full screen cutscene or splash screen when the CPU isn’t doing much else. People who want to play a lot of video in CPU taxing situations can still choose to integrate Bink. For them, the price and effort will be worth it.

Option 3: Platform specific

One approach to video playing is to not develop a platform-independent library but instead use the video playing capabilities inherent in each platform. For example, Windows has Windows Media Foundation, MacOS has QuickTime, etc.

Using the platform’s own library has several advantages. It is free to use, even for proprietary formats, because the platform manufacturers have already payed the license fees for the codecs. (Note though, that for some formats you need a license not just for the player, but for the distribution of content as well.) The implementation is already there, even if the APIs are not the easiest to use.

The biggest advantage is that on low-end platforms, using the built-in platform libraries can give you access to special video decoding hardware. For example, many phones have built-in H.264 decoding hardware. This means you can play video nearly for free, something that otherwise would be very costly on a low-end CPU.

But going platform specific also has a lot of drawbacks. If you target many platforms you have your work cut out for you in integrating all their different video playing backends. It adds an additional chunk of work that you need to do whenever you want to add a new platform. Furthermore, it may be tricky to support the same capabilities on all different platforms. Do they all support the same codecs, or do you have to encode the videos specifically for each platform? Do all platforms support “play to texture” or can you only play the videos full screen? What about the sound? Can you extract that from the video and position it as a regular source that reverbs through your 3D sound world? Some platforms (i.e. Vista) have almost no codecs installed by default, forcing you to distribute codecs together with your content.

Since we are developing a generic engine we want to cover as many platforms as possible and minimize the effort required to move a project from one platform to another. For that reason, we need a platform independent library as the primary implementation. But we might want to complement it with platform specific libraries for low end platforms that have built-in decoding hardware.

Option 4: H.264 (MPEG-4, AVC)

Over the last few years H.264 has emerged as the most popular commercial codec. It is used in Blu-ray players, video cameras, on iTunes, YouTube, etc. If you want a codec with good tool support and high quality, H.264 is the best choice.

However, H.264 is covered by patents. Patents that need to be licensed if you want to use H.264 without risking a lawsuit.

The H.264 patents are managed by an entity known as MPEG LA. They have gathered all the patents that they believe pertain to H.264 in “patent pool” that you can license all at once, with a single agreement. That patent pool contains 1700 patents. Yes, you read that right. The act of encoding/decoding a H.264 file is covered by 1700 patents. You can find the list in all its 97 page glory at The Bitsquid blog.

#AltDevBlogADay

Niklas Frykholm