Localization Pipeline

Instapaper Text

Localization Pipeline

In my previous post on localization I talked about some of my experiences localizing games for different languages / regions. This time I wanted to expand upon those notes a little and talk more about the technical aspects of localization and walk through a pipeline.

The Language and Locale Encoding

In the early days I used to simply have an enumeration in a header file that was very similar to this:

enum ELanguage
 
  {
 
       eLanguage_English,
 
       eLanguage_French,
 
       eLanguage_German,
 
       eLanguage_Spanish,
 
       eLanguage_Amount
 
  };

20 years ago this was fine, I was developing on a cartridge that had all the languages essentially loaded at once and really there was no need to support regions beyond the specific languages. These days however we need something a little more robust and as you should have picked up from my last post the locale is very important these days. So lets start by looking at how we identify each translation, thankfully two very useful standards have been defined by people who know a lot more about languages and regions than I. These standards allow us to specifying each language and each region as a two digit code.

Language Code www.loc.gov/standards/iso639-2/php/code_list.php

Region Code www.iso.org/iso/country_codes/iso_3166_code_lists/country_names_and_code_elements.htm

Using these we can create a short code for every possible supported language and region we are likely to encounter, for example:

   en-US     English America
 
     en-GB     English Great Britain
 
     es-MX     Spanish Mexico
 
     nl-NL     Dutch Netherlands
 
     en-CA     English Canada
 
     fr-CA     French Canada

A Pipeline

This is by no means the only pipeline that can be used for localization, they all have different benefits and issues this one just so happens to be my preference, probably because I like offline tools.

The storage and manipulation of localized strings I have seen done in every way possible from databases to proprietary editing tools. My personal choice is to use Excel for editing and manipulation but this does not come without two issues that you should be aware of;

Although version control software generally is fairly good at merging xml files the xml generated from Excel always seems to make merging difficult (especially for designers) to the point that it is safest to simply lock the file while it is being edited.
Not all translators like to work in Excel so you will probably need someone or a tool (probably both) to convert what ever format the translators are working in to the excel format.

An example of the strings in excel:

Column A contains the identifier string then each column along contains one translation. Notice the encoding id at the top of the sheet, this not only tells us the language / region but is used by the exporter tool to know which files to generate. The export tool exports the data into whatever binary compressed format you prefer to use in-game. Since I have not worked on a game with massive amounts of text I have generally stuck to a text format with each language being written out as a separate text file, like so:

en-US.lang

     PRESS_START=Press [Start]
 
       OPTIONS=Options
 
       MUSIC=Music
 
       …

fr-FR.lang

     PRESS_START=Appuie sur [START]
 
       OPTIONS=Options
 
       MUSIC=Musique
 
       …

Depending on which language / locale is required at run-time just that single translation file is loaded into memory.

The exporter tool can also be useful in other ways:

Automatically detect and report missing strings.
Build fonts based upon the characters that are actually used, very important if you are doing Chinese which has thousands of characters, this method alone has been known to save megs of texture space.
Detect formatting mistakes and illegal / reserved characters.

Strings in Code / Scripts

In order to provide a framework for localization the first thing that needs to be cracked down upon is the use of strings themselves.
Previously you might of written the code or script:

     DrawString(“Hello World, My name is Mr Flibble.”);

instead it should now be written passing a String Identifier like so,

     DrawString( eStringId_HelloMessage );

The string enumerations can be auto generated by the export tool, however I did this for a couple of projects and decided that it was more hassle than it was worth. My recommendation is to avoid this if possible, a better way is to pass the string identifier as a string itself:

     DrawString( “Hello_Message” );

Either way both methods would end up looking into a table to find the specific string to be displayed.

Encoding

There are quite a few encoding systems for text out there, since this ground has been walked quite a few times in a lot of other posts I’ll skip it here with only a note that for game development my take on the subject is if you are working with limited memory use UTF-8 otherwise use UTF-16.

Icons

More often than not it is far simpler to insert an icon into a string than it is to use a long drawn out explanation to describe something. In the text string I indicate where an icon is to be displayed and which one by using the [] markers, for example:

     Press [START] to continue.
 
       Activate [GEM] by pulling string.

Part of my text rendering manager loads a setup file (text again) at startup that contains a list of all the codes and textures to use when that icon is encountered. Very similar to this:

     START, 0, X360_StartButton.tga
 
       MOVESTICK, 0, X360_LS.tga

I can add additional textures on the line if I wanted to animate the icon for example:

     DODGE, 4, Wii_RemoteWave_1.tga, Wii_RemoteWave_2.tga, Wii_RemoteWave_3.tga

The number after the code is the animation speed (FPS).

On the subject of icons, consider this:

     "Use [RS] to aim and [RT] to mark enemy before pressing [A] to fire."

Imagine that your project is multi-platform, [A] should really be [X] on the PS3 and [B] on the Wii. An additional issue is that the Wii doesn’t generally have a [RS]! You could create a string unique to each platform but that really would just double or triple the amount of data that needs to be maintained as and when things change.

My solution in the past to this little nightmare has been to ban platform specific icon names, which includes identifiers like [D-PadLeft], [A], [LeftStick], [X], [Y], [Z], [RT], [L1], etc. Instead I encourage game descriptive text:

     "Use [TARGETTING] to aim and [TARGETREGISTER] to mark enemy before pressing [FIRE] to fire."

Then I have a different icon setup file for each platform and everything works between platforms without any major headaches.

Parameters

It’s quite common to construct a string for displaying on screen however it can cause issues for the translators if they don’t know the context. Consider this:

     DrawString(“%s! Get rid of them!”, m_PlayersName );

Now you can see straight away that the %s will be replaced with the players name, however what the translators see is:

     “%s! Get rid of them!”

Their best guess might be that it is going to be a name of a character but it might also be something else i.e.:

     “Chairs! Get rid of them!”
 
       “Michael! Get rid of them!”

In order to help the translators I use {} to mark parameters:

     “{s-Name}! Get rid of them!”
 
       “{s-Object}! Get rid of them!”

The context after the ‘–‘ is ignored when rendering the text, it is purely descriptive text to help the translators.

Formatting

As I’m sure we are all aware by now not everyone writes the date in the same way.

Consider the date 3/4/2012 to me personally this is 3rd April 2012, however to some it is 4th March 2012. Obviously once you get past halfway through the month it becomes a lot easier to spot but it does mean that your region needs to know which date format to use.

Translators

Good translators should produce strings in the new language that are roughly the same length as the original string. I usually estimate a rough 20% difference between the English and other languages. This is another useful feature I have built into my tool, it can detect excessive differences between the lengths of the various translations.

Translators MUST not change the order of parameters in a translation. Obvious from a programming stand point but I have in the past had translations that not only re-ordered the parameters but added additional ones as well!

Keep communication levels between you and the translators to a minimum, there have been times when a 5-10 minute email or phone call could of solved a problem but because it has to be filtered through channels it can end up taking days or even weeks to sort out.

Assets

Asset management for localization has the potential to touch so many different moving parts of an engine it very quickly stops being funny. The solution I describe is tailored to the way my engine works and it may not be applicable to how your tech works, still you might find this useful.

When an asset is requested my manager has a list of directories that it scans for the requested asset, the first instance it finds is the file that gets loaded. By controlling which directories are in the list and their order I can in effect override assets according to the language and region.

This is my directory structure for localization:

If the requested audio file exists in the specific language directory that file will be loaded, if it doesn’t exist the manager will carry on searching the other directories until it finds the asset. Obviously I don’t allow the player the ability to change languages half-way through a game.

It’s simple but it works.

Finally

That’s everything I wanted to talk about in relation to localization, I hope you find it useful.

#AltDevBlogADay

Michael A. Carr-Robb-John