Context: where we use this
“Cultural Text” handles rendering text into images to display on-screen, use in textures, or export as a resource file. This set of services is used to correctly format different languages, cultural styles, and font features such as color, strike-thru, indent, or shadowing. It also encompasses managing sentence layout, glyph and line spacing, word-wrapping to borders and optimal legibility tradeoffs with anti-aliasing & pixel granularity.
Generally this is the classic “render-text” functionality found in most video-game engines or provided in libraries like FreeType, with support for Text-FX to display richly formatted layouts.
Goals: what we need
- An API to render text into arbitrary destinations with pixel-perfect consistency across different platforms & screen sizes
- Can combine multiple glyphs on top of each to form a single character (needed for some languages)
- Can output with colored gradients, shadows, and other visual effects based on a markup encompassing 1 or more characters
- Supports word wrapping, skipping ahead to a fixed column, and directional ‘justify’ modes
- Can handle a full screen of legible text with changing values without performance slowing down
- Can support fixed as well as variable width layout
- Can reorient to render text in opposite directions as the usual flow ( such as english letters arrayed vertically )
- Supports Logging (to the standard console as well as HTML)
- Provides for user-controlled (not programmed) ‘variable precision’ control for decimal numbers, time and date formats
- Can display two different languages side by side (helpful for in-app language translation) but more importantly allows using rectangles from other images for embedding icons, emoticons, avatar pics, and camera viewport images
- Text can be transformed, such as moving, bending, squishing, etc
- Has a method to import data from existing font files
Solutions: how we tried
Technique: Letter-Only-Blitter aka ‘Lob’
Originally we looked at some of the font engines available but none met all of our platform needs so we decided to generate our own. We built a gridded font texture for ASCII characters and generated the used subset for Japanese (hiragana, katakana, & ~500 kanji). It was used on games in the mid 1990s and on the PSOne, so there was little text on-screen compared to these 2560×1600 modern times. The process converted UTF-16 characters into a texture-page and rectangle-index for the characters themselves. We stored a flag in the top bit of these ‘characters’ to act as an escape to trigger offsetting the top vectors (italics), upscaling (low-quality ‘bold’).
Pros: Ran reasonably fast with hardware blitters or CPU software copying. Allowed real-time typing and editing of the Text-FX like in many wysiwyg editors.
Cons: Could only handle left-to-right layout. Only supported english, spanish, french, german, and japanese (there were a fixed enumeration) and further languages would’ve required a lot of table-adjustments and possibly other coding to use. Required artists to fill in ‘rectangle-text-files’ and build the fonts manually (no font-file extraction) which was painful. Runs terribly slow on systems that have a high penalty for each draw call (modern era). Japanese used a lot of memory which required the font to be scaled down which resulted in unsatisfyingly blurry text at the time. Even now, compositing the characters using radicals could save meaningful space and deliver a broader range.
Technique: Word-Particles aka ‘Wopa’
This technique was spawned primarily from the batching issues ( costs per draw-call ) found in the Lob system. The idea was to have two systems, one that rendered words in powers-of-two-sized rectangles inside of large textures, and one that composited sentences to their destination. We initially had a big speed boost on systems where batching mattered but the code became complex to handle optimal fitting of the words into the ‘recently used words’ textures. We had to handle a downsizing approach when too many unique words were needed at once. It did allow us to render Japanese at a much higher resolution than Lob did however. We used a fast-hashing scheme to identify which words were stored in which rectangle/texture. We were forced to constrain the text FX approach to be per-word, which affected color tints, italic-tilts, and shadowing effects but mostly didn’t limit our artists’ goals.
Pros: Automated tools generated the texture information from images. Avoided batching-woes. Regular text browsing, such as in ‘help’ menus or information updates, went very quickly.
Cons: On systems that didn’t support render-to-texture, speed suffered due to poor Copy_Pixels_from_Screen_to_Texture or Texture_Upload times (when we rasterized the words on the CPU). Rapid number updates could cause stutters as the mru-word textures could get overloaded.
Technique: Just use the OS aka ‘Juto’
After struggling to properly handle Arabic, Thai, Hindi, Hebrew, and the various languages using Chinese characters, we decided to use the native OS capabilities to composite text into a buffer and upload that to the GPU for rendering or to format for export. This approach allowed us to skip many of the complexities that has cost so much time in Q&A. As this happened before Microsoft’s DirectX “DirectWrite” API, we used GDI+ on Windows, FreeType on Linux & BREW cell phones, and Cocoa on OSX.
Pros: Most of the foreign language single-word issues were handled correctly. We could reach a broader audience and support translators easily.
Cons: It was costly per draw call & update. Adding features like underline or colored letters became very complicated due to tracking various sizes and issues with the OS allocating buffers (not FreeType however). Foreign language paragraphs still had a lot of complexity and required different per-platform coding-responses. Most of the Text-FX features were inconsistent from platform to platform.
Survivor: who proved best & why
Technique: Cached lines of Variable-Interval-Composites aka ‘Clovic’
Clovic is an outgrowth of Wopa that relies on caching entire lines instead of words. It uses a simple string sorting/matching approach to determine what text in on what line. Each ‘text-texture’ is broken into a series of lines. Each line is packed using half powers of two…such as 2, 3, 4, 6, 8,12, 16, 24, 32, 48, 64, etc. which gives better coverage than the previously power of 2 W x H rectangles. We convert UTF-8 directly into texture/rectangle references as before, but allow mixed language compositing to support icon-based values…such as the signal or battery-life indicators on your phone. The text FX have been unified with the regular “render things into a viewport” visualizer language so that each ‘cache-line’ of text can support all of the visuals (blur out, HDR, movement) of any regular 3D scene. For speed purposes, we update each of these lines at a slower rates than the main display rate. Values like health or location coordinates which may rapidly change seem acceptable to update at 10fps instead of 30 or 60.
Pros: Simplified the per word fitting schemes of Wopa. Easier to handle the multi-cultural language layouts with a per line (aka continuous run) to handle kerning/spacing issues. Makes true bold or outlining ( using a bloom filter, not rescaling ) much cheaper and more accurate.
Cons: Hard to tune memory use and requires overestimating the amount of text needed. Currently aliasing approaches are not well suited ( MLAA, FXAA ) and aliasing effects are apparent.
Future: It would be good to automate a method to show progressive ordering of the handwriting ‘strokes’, which are important to know well in many languages, especially any employing Chinese Characters. This stroke approach could make for interesting visual effects as well as the obvious educational aspects. There also can be value in providing a mechanism to displace the rendered text into 3D shapes (likely a height field where height 0 is an edge) or back into vectors for SVG support. Mostly, the future should hold more robust versions of different languages and the interesting nuances of rendering messages correctly.
( Lamely the below image, made with Lob, has been JPG’d so some Text is blurred…)
(Here text is composited from other views)