Paradoxically, most audio engines are deaf. They take decisions that will impact the audio output of a game without listening to what they are playing. The same can be said of sound tools: they will randomize, pitch up and down, loop and combine sounds but they don’t really have any knowledge about the assets they manipulate. To follow up on my previous post “Putting the audio back in ‘audio programmer’”, this series of articles will examine how audio analysis algorithms can be used to create “smarter” (hopefully ;-)) audio engines and tools.
Analyze this!
In game audio, when we talk – rarely – about audio analysis, we often refer to the RMS (Root Mean Square) which gives an indication of the average loudness of a signal, or the FFT (Fast Fourier Transform) which allows us to examine that signal in the spectral domain. Both may be used to debug the audio during the mixing stage and the FFT will sometimes also be used for music visualizers or as the basis of pitch detection algorithms in singing games (more on audio analysis for game design in a coming post).
But besides the RMS, there are many other useful features you can extract from audio signals. For example, you can calculate the spectral flux, rolloff, spread, flatness and centroid. You can detect transients, evaluate pitch and amplitude envelopes, extract resonant modes, separate a signal into source and filter with LPC (Linear Predictive Coding) and so on… You can also go beyond the FFT and use more appropriate analysis functions. For example the Constant Q-Transform is especially well suited for music analysis (e.g. chord detection), the MFCCs (Mel Frequency Cepstrum Coefficients) will give you a condensed representation of a sound – very useful to compare it with others -, the Goertzel algorithm is more efficient if you want to detect the presence of a single frequency with precision etc…
This is why I developed an audio features extraction library at work. Without entering into too much details (I probably can’t anyway), there is a core library of DSP functions, on top of which features extraction plug-ins can be built. Developers can either use the low level math functions, the provided plug-ins, or develop their own. There are also .Net controls allowing the visualization of waveforms, spectrograms, collections of resonant modes etc… The main goal is to help with the research and development of audio algorithms for the runtime (e.g. analysis of the signal coming from the microphone or from one of the busses of the audio engine, intelligent mixing) and to create “smart” audio tools.
On that topic, let’s start with an easy way you can improve the workflow of your sound designers…
A picture is worth a thousand clicks
Imagine you have all these sound effects to design for a new game. What do you think will be one of the most time-consuming tasks you will be performing? Scripting? Adding random variations? Creating an amplitude envelope? Setting an EQ? Think again… A lot less glamorous and creative than audio processing, it is simply browsing and selecting samples… Unless some material has been recorded or synthesized specifically for a given asset, sound designers will indeed spend an awful lot of time browsing their sound effects library, selecting samples and listening to them in order to find the best candidate for that particular sound, or the one which will be perfect to layer with another sample.
This can be quite a cumbersome process. Let’s say you are using a tool which is only offering the regular “open file” dialog box. This is what you will get:
A list of names, and that’s it. You could be selecting anything, from your tax return to pictures of your vacations on Oahu as long as it has a “.wav” extension. You will need to open a file, play it, close it, and then go to the next one etc… Hopefully your in-house audio tool or your middleware will offer a window similar to the ones you can find in a DAW (Digital Audio Workstation) or you will actually use your DAW. For example, here you can see the windows used to open a file in both SoundForge and Protools. More information about the audio format of the file is displayed, and an auto-play feature limits the number of clicks necessary to browse your entire collection…
Now, this is very good when you have 5 samples to choose from, but what if you are looking for the perfect animal growl among two hundreds of them? You will have to click on all these files and listen to all of them before being able to take a decision or to find what you were looking for. Hardly a creative endeavour… If you are using a sample database such as Netmix, you might be able to search per keywords but you will still be dependent on other sound designers to tag the sounds correctly. Also, you will not be able to find samples which have the same perceptual characteristics but have totally different sources and therefore different keywords (more on that in the next post).
Leveraging our audio features extraction system, I developed a few things to help with sample selection. One of them is a small file browser. We can use it as a regular “open file” dialog box from any C# tools. It can display a list of files as any conventional file selector, although you will notice it already indicates the duration, sample rate, resolution and number of channels of the files and has an autoplay feature.
But you can also switch it to a thumbnail mode. In this mode, the waveforms of the various samples will be displayed, giving you an immediate insight about the type of sound: is it a single impact or are there a lot of audio events, how is the amplitude evolving? You can change the size of the waveforms to view more details, or on the contrary get an overview of a whole folder.
More importantly – and that’s where audio analysis comes into play – you can also select any feature extraction plug-in to colour the waveforms (values are normalized at the folder level and results of the analyses and waveform peaks are cached). Below you can see an example using the pitch detection plug-in. However, it could be the spectral flux or any other feature of interest. For example you could be browsing your music loops and select the beat detection plug-in, in which case a loop with a low tempo would appear darker and a faster one would appear lighter.
In the case of the picture above, the palette goes from black / dark blue for low pitch to yellow / white for high pitch. Therefore it is very easy – just by looking at your whole folder – to find a sound that corresponds to what you are looking for (e.g. a sample that starts with a high pitch which slowly decreases, a vocalization with a lot of vibrato), without having to listen to dozens or even hundreds of sample files…
Here you can see that “Camel_groan.wav” is lower in pitch than the “Cat_angry_meow.wav”, which is itself lower than the “Coyote_howl.wav”. As expected after watching countless spaghetti westerns, the pitch of the latest goes up quickly, stays relatively stable for a while, and goes down again slowly. No animals were harmed during the analysis, by the way ;-).
To be continued…
This opens the door to a lot of interesting things. For example we could add a sorting button to rearrange the files based on some meta-feature such as “average pitch” or “pitch variation”. We could also add a way to query sounds with a slightly higher tempo, or with a longer attack than the one currently selected etc… Knowing more about your assets offers a lot of new opportunities to improve workflow, even when it is just about selecting samples. More about that and how to make sound effects databases more helpful (using neural networks) in the next post…