Internationalization (i18n) with Mojolicious and Template Toolkit

In a previous post I talked about this new Mojolicious-based application that I've been working on, that btw was rolled out in production today (yay!)

Classic I18N with TT

One of the required features of this app was "i18n", internationalization. To be less vague, the requirement was to present the UI in different languages. We're using Template Toolkit, so our templates need to have strings marked in a special way to allow translation to kick-in at run-time. Usually in TT you do this with:

<html>
 
<head>
<title>[% l('This is the title of the page') %]</title>
</head>
 
<body>
<h1>[% l('Hello, world!') %]</h1>
<p>
[% l('Some text here') %]
</p>
</body>
</html>

so all the strings that have to be translated according to the user language have to be marked up with:

[% l('<your string here>') %]

Enter Mojolicious

Mojolicious includes a built-in I18N plugin that simplifies your life allowing the <% l('somestring)' %> syntax to work. That is, it gives you a l() helper.

Helper?

A helper is a method that it's available both as part of your controller object, and within templates.

Back to Mojolicious…

In the example helper syntax I wrote <% l('somestring)' %> because that's Mojolicious default templating system syntax. However, under Template Toolkit, you can't use that syntax! You have to pass through an extra level, as in:

<!-- This is my TT template -->
[% c.l('<your string here>') %]

I'm not exactly sure why that c. is required, but that's how it is.

I18N workflow: extracting the strings

Everything would be fantastic, except there's one tricky problem. After you worked so hard on your TT templates, now it's time to collect all the marked up strings, presumably to build a .PO file to be shipped to translation agencies or whatever system you're using for that. More on that later.

In the Perl world, there is an equivalent of GNU xgettext, which is xgettext.pl. This tool is part of the Locale::Maketext::Lexicon CPAN distribution, which is kind of "the standard" way to i18n in Perl. Or it is for us here anyway since we started building i18n for my.opera.com in 2008.

The tricky problem is that even though xgettext.pl understands quite a few syntax variants, it didn't understand [% c.l('string') %]. After a few Perl debugger sessions, I managed to teach Locale::Maketext::Extract::Plugin::TT2 how to parse Mojolicious-style syntax. I knew that Clinton Gormley, the maintainer of L::M::Lexicon had a source repository for it on Github, so I forked his repository and pushed my changes on a dedicated branch.

CPAN, Github and the Community

This is where the Github + CPAN model really shines. You're using a CPAN module. You stumble on a problem. Fix the problem. Find its repository on Github. Fork it, push your fix, and if you're lucky, you have your fix merged and out on CPAN the same day.

This is what actually happened. Clinton got in touch the very same day I sent him the pull request and later pushed out the changes on CPAN. If you ask me, that's just awesome. I wish everything worked that way :)

Closing the i18n workflow

Fixed the c.l() problem, everything else was easier. xgettext.pl allows you to collect strings from your code and templates and build a master .PO file with all the strings. Then msgmerge, a standard GNU gettext tool, allows you to take the generated master PO file and merge it with any existing language-specific PO if any. If you don't have any, just copy the master PO file (usually called POT, or reference PO file) to <language>.po and start translating.

Last step is either:

compiling the .po files to .mo, a lookup-optimized form of the .po file
creating the "lexicon" files. In the Perl world, these are nothing more than Perl modules with a %Lexicon hash that contains all string IDs and their translations

We're long time fans of the latter approach, so our lexicon files look like this:

package AuthOpera::Locale::it;
 
use strict;
use utf8;
use base qw(AuthOpera::Locale);
 
### LEXICON STARTS HERE (don't remove this line)
our %Lexicon = (
 
    # Automatic fallback to string ID when no translation available
    _AUTO => 1,
 
    # String IDs                  # Translations
    "Application name:"        => "Nome dell'applicazione:",
    "Application registration" => "Registrazione dell' applicazione",
    "Data provider:"           => "Provider dei dati:",
 
    # ...
);
### LEXICON ENDS HERE
 
1;

and we use a simple subclass of Locale::PO to read the PO file in memory and write back a lexicon based on a fixed template, hence the ### LEXICON lines above.

Transifex

Currently we also use Transifex, that allows to have external translators contribute to PO files directly from a web page, and if you configure it to do so, commit straight to your source code repository. You can then trigger automated builds of the lexicon files, having completed the full i18n workflow.

I find this system pretty simple but at the same time fully automated and very powerful. I'd love to hear comments or feedback about this stuff, especially from people adopting a different process.

Random hacking

Assume nothing. Code defensively. Keep it simple, stupid!