The accidental template language

Instapaper Text

The accidental template language

I’ve often found myself in situations where I have textual input in some form or another, usually human readable. It could be simply configuration or meta data, or even levels for your game (A couple of years ago I made this game where the levels were drawn in text files much like ASCII art). At some point it becomes critical make these documents tweakable, and you end up with two options. You may either create a script which step by step outputs the document you want, or you turn to templates. While technically equivalent, there’s usually a substantial difference in the mindset (a template is a document with variables and code built into it, rather than the other way around). In the beginning perhaps you only need some variable substitutions, so you turn to your trusty sed, or perl, and some regular expressions.

“I know, I’ll use regular expressions.” Now you have two problems.

I’ve seen this happen many times (yours truly has been responsible for a fair share too), and it often is quite sufficient. I actually kind of like regular expressions, for those quick and dirty jobs. However, as soon as it starts to be a little bit more than just a quick and dirty job, the problems start piling up.

Template languages

So we turn to a template language. There are a bunch of template languages and engines out there, most of them intended for the web. I suppose the most well known examples are

Wikipedia naturally has a whole list on the subject.

There’s also a large set of template languages written for various CMS systems, which are actually often built using one of the above, what’s up with that? Why write a template language using another template language, seriously? But I digress.

Enter Tcl

Tcl (Tool Command Language) resembles shell scripting languages in many ways, each command is one line (you can separate multiple ‘lines’ using semi-colons, and escape new-lines using backslash as in any other shell scripting language) made up of multiple strings. Then you have the substitutions (variable and command substitutions), escaping, and argument splitting. Where Tcl differs from most other shell scripting languages though, is that the argument splitting is done before any substitutions, which means you don’t have to worry about quoting those pesky whitespace-containing-variables, they’ll still be part of the same argument. What’s even more interesting is that there are no keywords, and there are no control flow constructs in the language itself, it’s all made up of commands. As an example, ‘while’ is a command, taking two arguments, the expression to test, and the loop body to execute. This is where quoting becomes important. There are two kinds of quoting, double quotes and braces, the former is susceptible to substitutions, and the latter isn’t, which is why braces are used to pass code blocks into commands, like ‘while’ or ‘if’. Braces are also used when passing the conditional expressions as well. Why? Because to the Tcl interpreter, these are indeed commands, and it would otherwise perform substitutions to the expression, producing a static snapshot of the expression at the time the call was made. The parser will be ‘smart’ and track the number of open braces, allowing you to nest them.

Then there’s a command substitution which is demarcated using square brackets, and may be present within double quotes. The parser is smart here as well, and double quotes within the square brackets will not end the surrounding string.

If you want to make your own control flow constructs, have a look at the ‘upvar’ and ‘uplevel’ commands, which allow you to make use of higher stack frames, like executing a code block in the frame where the call was placed. I suppose this is what makes Tcl so popular for making domain specific languages.

Tcl also offers a very light weight C-API for easy integration as we shall see at the end of the post for those who are interested.

Tcl templates

I wanted to use Tcl’s substitution phase, and let it pass over an entire document. What I wanted to achieve was to have a template like this

 
    [doctype xhtml 4.0 transitional]
 
    <html>
 
      <head>
 
        <title>$title</title>
 
      </head>
 
      <body>
 
        [allowed XYZ {<a src="?XYZ">XYZ</a>}]
 
        [render content]
 
      </body>
 
    </html>

I have no idea why I used an HTML example, it was just the easiest thing I could come up with.

During some research I did on a different subject a while ago, I came across a piece of software called “Fossil”, which is a DVCS with integrated wiki, ticket system and some other things, say Trac and git in a single program. Fossil has its own template language for presenting web pages, which is called TH1, based on Tcl (the language that is, not the implementation). So when I considered Tcl for writing templates, I naturally came back to Fossil and TH1. However, TH1 is based on the same concepts as PHP, and ASP etc, where you explicitly switch from pass-through mode to script mode, which I didn’t really like.

So I went on to implement my own prototype of a Tcl language for templating. Of course, about the time I was getting close to having a complete working prototype, I realized what I wanted to achieve was possible using Tcl alone, and the command for doing it was sitting right in front of me the entire time, ‘subst’. I had considered it already in beginning, but I came to the incorrect conclusion that there would be corner cases making it unsuitable. These turned out to be non-issues. Bummer. To really rub it in, the simplest case can be done with a single line of Tcl code.

1	puts -nonewline [subst [read stdin]]

Which would be a filter, reading the template document from standard in, performing the substitutions, and spitting it back out on standard out. The ‘-nonewline’ is there to prevent ‘puts’ from appending a newline character at the end. You would probably want to load some custom commands as well, but if all you need is already offered within Tcl (or you load it from the template itself), you’re done.

Add one more line to make it a little bit more flexible, allowing it to take a file name on the command line, or using stdin if none are given.

1 2	set input [expr {$argc > 0 ? [open [lindex $argv 0]] : "stdin"}] puts -nonewline [subst [read $input]]

Caveats

There are a few quirks to be aware of though (yes, some of these are related to the aforementioned corner cases). When using the above script with the following template

 
    [ set var World ]
 
    Hello $var!
 
    We're using Tcl [info tclversion]!

The output is

 
    World
 
    Hello World!
 
    We're using Tcl 8.5!

What happened here? The Tcl interpreter carries a current result around, which is the output of the last executed statement, and is what will be substituted in place of a square bracket. This has two consequences, first you’ll need to be a bit ‘functional’ when writing your templates, as only the output from the last statement will be substituted (i.e. there’s no ‘echo’ command as in PHP), and it also means that you might have stuff inserted where you were only interested in the side effects. In the example above, ‘set’ sets the variable ‘var’ to ‘World’, however, it also sets the current result to value, which could be useful for chaining sets as in

  set x [set y [set z World]]

But in this case it causes more harm than good. To suppress this, we need to overwrite the current result, and in order to do that we need to call another command.

  proc . {} {}

To borrow a syntactic theme from Erlang, we define a procedure called ‘.’. Yes, this is a valid name, pretty much anything is a valid name in Tcl, even the empty string (which can actually be quite elegant when you have a one-of associative array: ‘$(key)’).

 
    [
 
      set var World
 
      .
 
    ]
 
    Hello $var!
 
    We're using Tcl [info tclversion]!

This now produces:

 
  
 
    Hello World!
 
    We're using Tcl 8.5!

Note the empty line at the beginning. This is the next caveat. Unlike PHP, which suppresses any newlines directly following the ‘?>’ (which can be irritating as well), Tcl leaves them where they are. This can be a nuisance if you have block of code which only have side effects, and there’s no real elegant way to work around it, except to always have it result in something useful. There’s also the option to actually not have the newline there after the closing square bracket of course, which works, but might not be very elegant.

C interface

As promised, this is an example C program doing the equivalent of the two Tcl lines I used above (I could of course use ‘Tcl_Eval’ but that kind of defeats the purpose of demonstrating the API). There’s nothing new here, except for the C integration, so if that isn’t your cup of tea feel free to jump straight to the comments and drop me a note. :)

  #include <tcl.h>
 
   
 
    #include <stdlib.h>
 
    #include <stdio.h>
 
   
 
    int main(int argc, char *argv[])
 
    {
 
      Tcl_Interp *interp = Tcl_CreateInterp();
 
      FILE* input = stdin;
 
      const char *inputName = "<stdin>";
 
      Tcl_Obj *template, *output;
 
   
 
      Tcl_IncrRefCount(template = Tcl_NewObj());
 
   
 
      switch (argc)
 
        {
 
        case 1:
 
          break;
 
        case 2:
 
          inputName = argv[1];
 
          input = fopen(argv[1], "r");
 
          break;
 
        default:
 
          fprintf(stderr, "Usage:\n\t%s [filename]\n", argv[0]);
 
          exit(1);
 
        }
 
   
 
      while (! feof(input))
 
        {
 
          char buffer[4096];
 
          size_t len = fread(buffer, 1, sizeof(buffer), input);
 
   
 
          if (ferror(input))
 
            {
 
              perror(inputName);
 
              exit(1);
 
            }
 
          Tcl_AppendToObj(template, buffer, len);
 
        }
 
   
 
      fclose(input);
 
   
 
      if (!(output = Tcl_SubstObj(interp, template, TCL_SUBST_ALL)))
 
        {
 
          fprintf(stderr, "Error in template\n");
 
        }
 
      else
 
        {
 
          char *string;
 
          int len;
 
          Tcl_IncrRefCount(output);
 
   
 
          string = Tcl_GetStringFromObj(output, &len);
 
          fwrite (string, 1, len, stdout);
 
   
 
          Tcl_DecrRefCount(output);
 
        }
 
   
 
      Tcl_DecrRefCount(template);
 
      Tcl_DeleteInterp(interp);
 
      return 0;
 
    }

As you can see, the actual Tcl integration is fairly simple, and most of the code is worrying about the rest (reading the file into memory and so on). The error message when something went sideways is not very helpful though, and you need to do a little more work to get that at least halfway sensible

    if (!(output = Tcl_SubstObj(interp, template, TCL_SUBST_ALL)))
 
        {
 
          int len;
 
          const char* string;
 
          Tcl_Obj *options = Tcl_GetReturnOptions(interp, TCL_ERROR);
 
          Tcl_Obj *key, *errorInfo;
 
          Tcl_IncrRefCount(key = Tcl_NewStringObj("-errorinfo", -1));
 
          Tcl_DictObjGet(NULL, options, key, &amp;errorInfo);
 
          Tcl_DecrRefCount(key);
 
          fprintf(stderr, "Error in template:\n");
 
          string = Tcl_GetStringFromObj(errorInfo, &amp;len);
 
          fwrite(string, 1, len, stderr);
 
          fprintf(stderr, "\n");
 
        }

This will give you a little more, like stack trace where the error occurred and so on.

This concludes this post, which turned out longer than I originally intended, I hope you liked it.

Note: Cross posted on my own blog, excu.se

#AltDevBlogADay

Fredrik Alströmer