Tag Archives: transifex

Internationalization (i18n) with Mojolicious and Template Toolkit

In a previous post I talked about this new Mojolicious-based application that I've been working on, that btw was rolled out in production today (yay!)

Classic I18N with TT

One of the required features of this app was "i18n", internationalization. To be less vague, the requirement was to present the UI in different languages. We're using Template Toolkit, so our templates need to have strings marked in a special way to allow translation to kick-in at run-time. Usually in TT you do this with:

<html>
 
<head>
<title>[% l('This is the title of the page') %]</title>
</head>
 
<body>
<h1>[% l('Hello, world!') %]</h1>
<p>
[% l('Some text here') %]
</p>
</body>
</html>

so all the strings that have to be translated according to the user language have to be marked up with:

[% l('<your string here>') %]

Enter Mojolicious

Mojolicious includes a built-in I18N plugin that simplifies your life allowing the <% l('somestring)' %> syntax to work. That is, it gives you a l() helper.

Helper?

A helper is a method that it's available both as part of your controller object, and within templates.

Back to Mojolicious…

In the example helper syntax I wrote <% l('somestring)' %> because that's Mojolicious default templating system syntax. However, under Template Toolkit, you can't use that syntax! You have to pass through an extra level, as in:

<!-- This is my TT template -->
[% c.l('<your string here>') %]

I'm not exactly sure why that c. is required, but that's how it is.

I18N workflow: extracting the strings

Everything would be fantastic, except there's one tricky problem. After you worked so hard on your TT templates, now it's time to collect all the marked up strings, presumably to build a .PO file to be shipped to translation agencies or whatever system you're using for that. More on that later.

In the Perl world, there is an equivalent of GNU xgettext, which is xgettext.pl. This tool is part of the Locale::Maketext::Lexicon CPAN distribution, which is kind of "the standard" way to i18n in Perl. Or it is for us here anyway since we started building i18n for my.opera.com in 2008.

The tricky problem is that even though xgettext.pl understands quite a few syntax variants, it didn't understand [% c.l('string') %]. After a few Perl debugger sessions, I managed to teach Locale::Maketext::Extract::Plugin::TT2 how to parse Mojolicious-style syntax. I knew that Clinton Gormley, the maintainer of L::M::Lexicon had a source repository for it on Github, so I forked his repository and pushed my changes on a dedicated branch.

CPAN, Github and the Community

This is where the Github + CPAN model really shines. You're using a CPAN module. You stumble on a problem. Fix the problem. Find its repository on Github. Fork it, push your fix, and if you're lucky, you have your fix merged and out on CPAN the same day.

This is what actually happened. Clinton got in touch the very same day I sent him the pull request and later pushed out the changes on CPAN. If you ask me, that's just awesome. I wish everything worked that way :)

Closing the i18n workflow

Fixed the c.l() problem, everything else was easier. xgettext.pl allows you to collect strings from your code and templates and build a master .PO file with all the strings. Then msgmerge, a standard GNU gettext tool, allows you to take the generated master PO file and merge it with any existing language-specific PO if any. If you don't have any, just copy the master PO file (usually called POT, or reference PO file) to <language>.po and start translating.

Last step is either:

  • compiling the .po files to .mo, a lookup-optimized form of the .po file
  • creating the "lexicon" files. In the Perl world, these are nothing more than Perl modules with a %Lexicon hash that contains all string IDs and their translations

We're long time fans of the latter approach, so our lexicon files look like this:

package AuthOpera::Locale::it;
 
use strict;
use utf8;
use base qw(AuthOpera::Locale);
 
### LEXICON STARTS HERE (don't remove this line)
our %Lexicon = (
 
    # Automatic fallback to string ID when no translation available
    _AUTO => 1,
 
    # String IDs                  # Translations
    "Application name:"        => "Nome dell'applicazione:",
    "Application registration" => "Registrazione dell' applicazione",
    "Data provider:"           => "Provider dei dati:",
 
    # ...
);
### LEXICON ENDS HERE
 
1;

and we use a simple subclass of Locale::PO to read the PO file in memory and write back a lexicon based on a fixed template, hence the ### LEXICON lines above.

Transifex

Currently we also use Transifex, that allows to have external translators contribute to PO files directly from a web page, and if you configure it to do so, commit straight to your source code repository. You can then trigger automated builds of the lexicon files, having completed the full i18n workflow.

I find this system pretty simple but at the same time fully automated and very powerful. I'd love to hear comments or feedback about this stuff, especially from people adopting a different process.

Dependencies suck

We love dependencies. For example, in the CPAN universe. They make our job so damn easier. Thousands of production quality, unit tested modules at your fingertips.

But dependencies also suck really badly, for example when you're using a Linux distribution that has packages that are just too old to be useful. Hey, but they are stable! More stable-as-dead or more stable-as-production-quality? You decide.

It's been many months since I installed a local instance of Transifex, a Django application that allows translators to easily contribute to projects. We're using it for My Opera, but also trying to get other internal projects to use it.

So far, it has worked nicely. I think Transifex is a really good application, its feature set is just right for what we need etc… Last week I decided to upgrade our Transifex instance from v0.8.0-devel to 0.9.0-devel. The improvements were really nice and needed, so I just decided to go for it. I had been upgrading in the 0.8.0 series from their repository (aka HEAD, aka master, aka trunk).

This time though, the list of dependencies was a bit more specific than usual. Also, please note that 0.9.0 is a **BLEEDING EDGE** development version as of June 2010.

Anyway, first dependency listed was "Django = 1.1.2". I think I started going down the wrong path when I upgraded Django with:


$ sudo easy_install 'Django>=1.1.2'

Here you can see that my mind is somewhat hardwired to the Perl culture, where backward compatibility is of paramount importance. I wrote code 10 years ago, using perl 5.005, that it's still in production, unmodified, with perl 5.10, and I'm talking about commercial stuff, not silly home projects. The terrible mistake here is to think that this also applies everywhere else. Forget it. It's not true.

In fact, easy_install picked up Django 1.2.1, which is an entirely different beast that breaks at least a couple of assumptions that Transifex was making. I don't remember exactly now, but one had to do with the automatic export of email.MIMEBase into django.core.mail and another I only remember it broke horribly.

So, a couple of hours later, thanks to the guys on the #transifex channel, I figured out that what I really needed to write was:


$ sudo easy_install 'Django==1.1.2'

This forces to install the given version instead of any later one. So far so good. Then I had another problem, completely unrelated, the required me to strace the ./manage.py Django script, to figure out that it was using a totally screwed up sqlite database coming from a year old test version of transifex I had installed through easy_install and was completely ignoring my local settings that went to a MySQL db. How nice.

So, yes, we always complain about CPAN, dependencies, Module::Install, ExtUtils::MakeMaker and whatnot, but a look at other worlds (easy_install, ruby-gems anyone?) can remind Perl people of the fantastic toolchain and especially culture "we" have built, and that's still kicking everyone else's ass, on any platform.

So, regarding the debate in the Perl community, my vote goes to keeping Sane(tm) backward compability standards, as we always did. It matters, especially for commercial software companies!