Using Perl and Google Chromium’s CLD to identify the language of a text

For a new project I'm working on, given a body of text, I need to identify which language it's written in (English, Russian, Chinese, etc…).

I'm not exactly the first person on Earth to do this, so it turns out there's Google's CLD library. Surprisingly, several people around here didn't know it. The library is open source and very good too, so I immediately looked for Perl bindings for it.

There is a great Perl module on CPAN called Lingua::Identify::CLD. This module bundles a copy of the CLD library, and fully automates build and link steps too. So I gave it a shot.

How to use Lingua::Identify::CLD

It's amazingly easy to use. Here's a sample of the code:


#!/usr/bin/perl

use strict;
use Lingua::Identify::CLD ();

my $text;
while (readline) { $text .= $_ }
chomp $text;

# In my case, the content is HTML
my $cld = Lingua::Identify::CLD->new(isPlainText => 0);

# Example: (ENGLISH, en, 64)
my @lang = $cld->identify($text);
say "Language: $lang[0]";

Failing tests

I decided to start using this module into my project. The build phase went fine (perl ./Build), while the tests were failing (./Build test). Here's the log of a failed test run:


$ ./Build test
cc -I/usr/lib/perl/5.14/CORE -fPIC -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o /tmp/gAc_glZta2/library.o /tmp/gAc_glZta2/library.c
cc -I/usr/lib/perl/5.14/CORE -fPIC -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o /tmp/gAc_glZta2/test.o /tmp/gAc_glZta2/test.c
cc -shared -L/usr/local/lib -fstack-protector -o /tmp/gAc_glZta2/libfoo.so /tmp/gAc_glZta2/library.o
cc -fstack-protector -L/usr/local/lib -o /tmp/gAc_glZta2/foo /tmp/gAc_glZta2/test.o -L/tmp/gAc_glZta2 -lfoo

** Preparing XS code
t/00-load.t ....... 1/1 Bailout called.  Further testing stopped:  

#   Failed test 'use Lingua::Identify::CLD;'
#   at t/00-load.t line 6.
#     Tried to use 'Lingua::Identify::CLD'.
#     Error:  Not a CODE reference at /usr/lib/perl/5.14/DynaLoader.pm line 207.
# END failed--call queue aborted at .../Lingua-Identify-CLD-0.05/blib/lib/Lingua/Identify/CLD.pm line 207.
# BEGIN failed--compilation aborted at .../Lingua-Identify-CLD-0.05/blib/lib/Lingua/Identify/CLD.pm line 24.
# Compilation failed in require at (eval 4) line 2.
# BEGIN failed--compilation aborted at (eval 4) line 2.
Use of uninitialized value $Lingua::Identify::CLD::VERSION in concatenation (.) or string at t/00-load.t line 9.
# Testing Lingua::Identify::CLD , Perl 5.014002, /usr/bin/perl
# Looks like you failed 1 test of 1.
FAILED--Further testing stopped.

Just the day before I had successfully compiled and run the tests for the same version of the module, but on Ubuntu 11.10, which I was using. Then I decided to upgrade to 12.10, and that's where I got this failed test run.

Contacting the author

Then I decided to contact the author of the module. Being Alberto quite a known author, with lots of CPAN contributions, I hoped he would answer my query within 2-3 days. That would give me some time to do other stuff, and hopefully would give him time to analyze the failure.

As usual with the best CPAN authors ;-) he answered in a couple of hours, which was fantastic for me. He had already identified a few failures like mine thanks to another awesome resource we have in the Perl community, the CPAN Testers service.

CPAN Testers

CPAN testers is a group of users that regularly (or not) report back the build/test status of everything that's released to CPAN in a multitude of platforms and versions of Perl. I think this is one of the most underestimated awesome features we have in the Perl community. The CPAN testers status of Lingua::Identify::CLD shows one report that looks exactly the same as the failure I experienced. This is on Ubuntu 12.10 with the stock perl 5.14.2.

The ugly patch

I tried to analyze the problem, apparently located in DynaLoader, and came up with a shotgun-debugging-driven patch that I copy/paste here for reference:

@@ -18,10 +18,23 @@ Version 0.05

 our $VERSION = '0.05';
 
-use XSLoader;
-BEGIN {
+eval {
+
+    require XSLoader;
     XSLoader::load('Lingua::Identify::CLD', $VERSION);
-}
+
+} or do {
+
+    # This warning triggers on Ubuntu 12.10 with the
+    # stock perl 5.14.2. Strangely enough, this doesn't
+    # seem to affect the tests at all.
+    #
+    # Not a CODE reference at /usr/lib/perl/5.14/DynaLoader.pm line 207.
+    # END failed--call queue aborted at .../blib/lib/Lingua/Identify/CLD.pm line 207.
+    # ) at .../blib/lib/Lingua/Identify/CLD.pm line 28."
+    #
+    #warn "Something's wrong with XSLoader? ($@)";
+};
 
 =head1 SYNOPSIS

It's shotgun debugging because I don't really know what's going on, I just came up with this patch because of the assumptions and information I gathered during the years on how DynaLoader/XSLoader and BEGIN {} blocks work or interact with the rest of the code :-)

Anyway, it makes the tests pass again, even with a weird warning. I agree with Alberto that it's not wise to incorporate this patch into Lingua::Identify::CLD, until we have understood why the original code fails, and why just for 2 people in the world.

All this blah-blah, to say: please do help! If you have seen the same problem, help us figure out what it is. My repository with the forked/patched code is on Github:

https://github.com/cosimo/Lingua-Identify-CLD

Have fun!

My first attempt at a responsive layout: OSQA

The so called responsive layouts are all the rage these days, and frankly, I feel ashamed that some of my personal projects I do at home are not responsive yet. Partly that's because I don't have that much spare time to dedicate to them, but partly it's also because I have no idea how to transform a "fixed" or traditional layout into a responsive one.

I tried to remedy this by studying a few responsive layouts I stumbled upon. However, I didn't find it very easy to just stare at the code and understand what's going on. Modern CSS has quite some dose of magic for me. I searched a bit around the web to find some responsive layout guides and tried to read them. I remembered we must have had quite some information on responsive layouts on Dev Opera.

A search for "responsive" turns up lots of good results, including Love your devices: adaptive web design with media queries and more… by colleague Chris Mills.

Recently I heard that Chris published the "Practical CSS 3" book, so I just dived into the article, eager to learn everything about responsive layouts, adapting to devices, etc… The occasion for it is another tiny personal project I'm working on during nights and weekends.

It's yet another stack-overflow clone, but for parenting, newborns, pregnancy, etc… powered by the open source Q&A software OSQA. I'm in the process of splitting the existing web site with articles, comments, questions and answers into two different sites, a pure blog with articles and comments, and another site with just questions and answers. The latter is what I'm talking about in this article.

When you install OSQA, by default it looks like this:

It's not bad, but I like the default stack-overflow layout much better. So I spent a few days learning about OSQA and importing the existing content. I found it robust and well designed. It has everything I need, including themes or skins that you can build, and a custom-css functionality:
you can stack your custom CSS content on top of the selected skin, much like what we have in My Opera too.

After a bit of CSS fiddling, I came up with the following layout:

Unfortunately, the default OSQA layout is not responsive at all, and it looks terrible on mobile devices (initial-scale, anyone?). So I started this journey into unknown territory, guided by Chris Mills' article, to discover how to make a layout responsive from scratch. Now, I'm sure there's a crapload of useless/harmful stuff in my custom CSS, but the final result left me really satisfied:

… apart from the "Cerca"/Search button and a few minor things. In the end, I had to duplicate the default skin to make a few very small changes, but apart from that, all the rest is accomplished by the custom CSS snippet. Here it is. Of course the heart of it is the media query for mobile devices:

Please tell me where I screwed up, KTHXBYE :-)

Displaying realtime memcached traffic on a backend

Sometimes I like to write down posts like this, to remind myself how to do something, sort of a mental note.
Suppose you have a few application servers that use 1+ memcached servers, and you want some way to display the outbound traffic, providing some insights on what are the most used keys, counters, etc…

Here's a quick way to do that, assuming you're using the memcached text protocol:

tcpflow -ce dst port 11211 
    | cut -b53- 
    | grep ^get 
    | pipestat --clear --runtime 60 --field 2 --time 1 --limit 40

What this does is:

  • Use tcpflow to capture all outbound traffic to destination port 11211, default memcached port.
  • Remove the first 53 bytes from each line, to filter out source and destination ip/ports
  • Only display get requests (alternatively, use set, incr, …)
  • Feed the resulting data to pipestat, a simple but great Perl tool that aggregates the data, displaying the most frequent ones. The specific options I used are good if you want to display quick statistics like other tools as top, mytop, or varnishstat.

It goes without saying that these tools are automatically installed on all servers that our Devops team here at Opera manages. I couldn't work without them :)

How to find unused CSS selectors, a quick solution

Was talking to a colleague today, and he mentioned the problem he was working on: trying to find site-wide unused CSS selectors. That is, having a static CSS file on disk, try to go through all the selectors in there and see if there's some matching elements in an entire site, crawling it page by page.

I thought it was a really interesting problem, so I gave it a quick shot by glueing together CSS::Tiny, Mojo::UserAgent and Mojo::DOM::CSS.

This is what came out of it. I'd say a decent first quick solution:

So I also learned about this deadweight project, that apparently also can crawl a site by logging in, kind of WWW::Mechanize style. Would be interesting to improve this initial solution :-)

Dist::Zilla, Y U suddenly no work anymore? [FIXED!]

I'm trying to understand why Dist::Zilla doesn't work anymore on my laptop. Here's the epic wall of warnings I get when running dzil test:


$ dzil test
Could not create the 'reader' method for zilla because : The method '_inline_store' was not found in the inheritance hierarchy for Moose::Meta::Class::__ANON__::SERIAL::9 at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 1053
	Class::MOP::Class::__ANON_Moose::Meta::Class=HASH(0x3556088) called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 1098
	Class::MOP::Class::add_around_method_modifier('Moose::Meta::Class=HASH(0x3556088)', '_inline_store', 'CODE(0x351cea8)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application/ToClass.pm line 231
	Moose::Meta::Role::Application::ToClass::apply_method_modifiers('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'around', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application.pm line 78
	Moose::Meta::Role::Application::apply_around_method_modifiers('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application.pm line 64
	Moose::Meta::Role::Application::apply('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application/ToClass.pm line 36
	Moose::Meta::Role::Application::ToClass::apply('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)', 'HASH(0x354ce50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role.pm line 470
	Moose::Meta::Role::apply('Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Util.pm line 160
	Moose::Util::_apply_all_roles('Moose::Meta::Class=HASH(0x3556088)', undef, 'MooseX::SetOnce::Accessor') called at /usr/local/lib/perl/5.10.1/Moose/Util.pm line 99
	Moose::Util::apply_all_roles('Moose::Meta::Class=HASH(0x3556088)', 'MooseX::SetOnce::Accessor') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Class.pm line 104
	Moose::Meta::Class::create('Moose::Meta::Class', 'Moose::Meta::Class::__ANON__::SERIAL::9', 'roles', 'ARRAY(0x33e50d8)', 'weaken', '', 'superclasses', 'ARRAY(0x353a7e8)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Package.pm line 120
	Class::MOP::Package::create_anon('Moose::Meta::Class', 'superclasses', 'ARRAY(0x353a7e8)', 'roles', 'ARRAY(0x33e50d8)', 'cache', 1) called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 474
	Class::MOP::Class::create_anon_class('Moose::Meta::Class', 'superclasses', 'ARRAY(0x353a7e8)', 'roles', 'ARRAY(0x33e50d8)', 'cache', 1) called at /usr/share/perl5/MooseX/SetOnce.pm line 27
	Class::MOP::Class:::around('CODE(0x1c87bf0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Method/Wrapped.pm line 162
	Class::MOP::Method::Wrapped::__ANON_Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50) called at /usr/local/lib/perl/5.10.1/Class/MOP/Method/Wrapped.pm line 91
	Moose::Meta::Class::__ANON__::SERIAL::8::accessor_metaclass('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 389
	Class::MOP::Attribute::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x3543bb8)', 'Try::Tiny::Catch=REF(0x354c718)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 401
	Class::MOP::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1074
	Moose::Meta::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 428
	Class::MOP::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1013
	Moose::Meta::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 891
	Class::MOP::Class::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x354c5b0)', 'Try::Tiny::Catch=REF(0x3435780)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 896
	Class::MOP::Class::_post_add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Mixin/HasAttributes.pm line 44
	Class::MOP::Mixin::HasAttributes::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Class.pm line 570
	Moose::Meta::Class::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'writer', 'set_zilla', 'lazy_required', 1, 'isa', ...) called at /usr/local/lib/perl/5.10.1/Moose.pm line 79
	Moose::has('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', ...) called at /usr/local/lib/perl/5.10.1/Moose/Exporter.pm line 382
	Moose::has('zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', 'set_zilla', ...) called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 22
	require Dist/Zilla/MVP/RootSection.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13
	Dist::Zilla::MVP::Assembler::Zilla::BEGIN() called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	eval {...} called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	require Dist/Zilla/MVP/Assembler/Zilla.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 204
	Dist::Zilla::Dist::Builder::_load_config('Dist::Zilla::Dist::Builder', 'HASH(0x342fe00)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 27
	Dist::Zilla::Dist::Builder::from_config('Dist::Zilla::Dist::Builder', 'HASH(0x33e2608)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 112
	Dist::Zilla::App::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x3084e60)', 'Try::Tiny::Catch=REF(0x33a8848)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 120
	Dist::Zilla::App::zilla('Dist::Zilla::App=HASH(0x204eb48)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command.pm line 13
	Dist::Zilla::App::Command::zilla('Dist::Zilla::App::Command::test=HASH(0x280b910)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command/test.pm line 28
	Dist::Zilla::App::Command::test::execute('Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)', 'ARRAY(0x13bef10)') called at /usr/share/perl5/App/Cmd.pm line 220
	App::Cmd::execute_command('Dist::Zilla::App=HASH(0x204eb48)', 'Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)') called at /usr/share/perl5/App/Cmd.pm line 159
	App::Cmd::run('Dist::Zilla::App') called at /usr/bin/dzil line 11
 at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 400
	Class::MOP::Attribute::__ANON_The method '_inline_store' was not found in the inheritance... called at /usr/share/perl5/Try/Tiny.pm line 100
	Try::Tiny::try('CODE(0x3543bb8)', 'Try::Tiny::Catch=REF(0x354c718)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 401
	Class::MOP::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1074
	Moose::Meta::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 428
	Class::MOP::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1013
	Moose::Meta::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 891
	Class::MOP::Class::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x354c5b0)', 'Try::Tiny::Catch=REF(0x3435780)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 896
	Class::MOP::Class::_post_add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Mixin/HasAttributes.pm line 44
	Class::MOP::Mixin::HasAttributes::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Class.pm line 570
	Moose::Meta::Class::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'writer', 'set_zilla', 'lazy_required', 1, 'isa', ...) called at /usr/local/lib/perl/5.10.1/Moose.pm line 79
	Moose::has('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', ...) called at /usr/local/lib/perl/5.10.1/Moose/Exporter.pm line 382
	Moose::has('zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', 'set_zilla', ...) called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 22
	require Dist/Zilla/MVP/RootSection.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13
	Dist::Zilla::MVP::Assembler::Zilla::BEGIN() called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	eval {...} called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	require Dist/Zilla/MVP/Assembler/Zilla.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 204
	Dist::Zilla::Dist::Builder::_load_config('Dist::Zilla::Dist::Builder', 'HASH(0x342fe00)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 27
	Dist::Zilla::Dist::Builder::from_config('Dist::Zilla::Dist::Builder', 'HASH(0x33e2608)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 112
	Dist::Zilla::App::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x3084e60)', 'Try::Tiny::Catch=REF(0x33a8848)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 120
	Dist::Zilla::App::zilla('Dist::Zilla::App=HASH(0x204eb48)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command.pm line 13
	Dist::Zilla::App::Command::zilla('Dist::Zilla::App::Command::test=HASH(0x280b910)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command/test.pm line 28
	Dist::Zilla::App::Command::test::execute('Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)', 'ARRAY(0x13bef10)') called at /usr/share/perl5/App/Cmd.pm line 220
	App::Cmd::execute_command('Dist::Zilla::App=HASH(0x204eb48)', 'Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)') called at /usr/share/perl5/App/Cmd.pm line 159
	App::Cmd::run('Dist::Zilla::App') called at /usr/bin/dzil line 11
Compilation failed in require at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13.
BEGIN failed--compilation aborted at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13.
Compilation failed in require at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 204.

Due to chronic lack of time, I blindly tried to upgrade Moose, MooseX::Types, Dist::Zilla, Config::MVP, but no luck.

Before I start dealing with this madness… any idea?

EDIT: thanks to the comments, I found out about moose-outdated, a script that reports the Moose(X) modules that have newer versions up on CPAN. Running moose-outdated I got back the following list:

$ moose-outdated
MooseX::LazyRequire
MooseX::Role::Parameterized
MooseX::SetOnce

Then I just run:

$ cpanm MooseX::LazyRequire MooseX::Role::Parameterized MooseX::SetOnce

After doing this, dzil started working again. Thanks everyone for your comments and help!

Problems with bnx2 kernel module and high traffic

We're seeing an "elevated" level of traffic these days on the My Opera servers. As usual with operations matters, it's difficult to find one exact clear root cause. The rest of the post explains what we found and the fix for it.

TL;DR

You want to try options bnx2 disable_msi=1 in your /etc/modprobe.d/bnx2.conf if:

  • using squeeze and bnx2 version is 2.0.2
  • you see high traffic (10K+ connections)
  • you see errors on public network interface
  • server is dropping packets/connections randomly or it's really slow

The gory details

During last Tuesday the DDoS attack (that is still continuing now) on the My Opera servers ramped up from ~4k req/s/frontend to ~16k+ req/s/frontend. Both frontends were dist-upgraded (including a kernel upgrade) on May 23rd, but not rebooted, so the kernel update was armed but not actually live.

We started seeing these bad problems of dropped connections and general slowness after the frontend servers were rebooted. The reason why there were rebooted is because we have been hitting another really weird problem, the 210 days uptime timer bug. See this and this bug reports for more details.

Anyway, I'm not sure how to verify this, because I didn't restart the boxes myself, but my theory is after they were rebooted, the new bnx2 kernel module version 2.0.2 was loaded.

Then later on we found out about this very specific bnx2 v2.0.2 bug that only triggers in high traffic situations, at least on Debian Squeeze and Ubuntu, that causes network interfaces to stop working correctly, dropping traffic.

Long story short, there's a magic option that prevents this from happening. rmmod'ing and modprobing back the bnx2 module with this option fixed the problem so far.

# /etc/modprobe.d/bnx2.conf
options bnx2 disable_msi=1

Regarding what the option is about, I'm not even going to lie about it. I have no idea… We found it with this search:

https://encrypted.google.com/search?client=opera&rls=en&q=bnx2+debian+2.0.2+traffic&sourceid=opera&ie=utf-8&oe=utf-8&channel=suggest

First hit is our own Sven from sysadmin team:

http://lists.us.dell.com/pipermail/linux-poweredge/2011-October/045485.html

Second hit is the solution we used:

http://ubuntuforums.org/archive/index.php/t-1726045.html

We also did some tweaking for the large amount of TIME_WAIT connections that were resulting from this bnx2 bug, namely bumped up net.sys.ipv4.tcp_max_tw_buckets quite a bit.

Take aways

  1. Before rebooting a machine, check what's going to happen, when was last upgrade etc…, f.ex. /var/log/dpkg.log.
  2. In case you have firewall rules, iptables-save > /root/iptables-rules.YYYYMMDD and later restore if needed with iptables-restore < iptables-rules.YYYYMMDD
  3. Always check if the conntrack module is enabled. Most times you don't need it, and it will cause performance to drop under very high traffic (of course).

In this case what happened is that the conntrack module was accidentally also re-enabled by the reboot. We had previously disabled it, but didn't make the change permanent. This is because on My Opera we're still not using our config management infrastructure… Looking forward to make that happen. Soon. Hopefully :)

Verifying MySQL behaviour with automated test suites and mytap

You know everything about how MySQL treats UTF8 and LATIN1 charsets and how the collation table impacts on selection and insertion of data, right?

Great, then stop reading :)

I don't and since I'm in the process of setting up a new version of the Opera accounts database, I really don't want to screw up things. I tried to fully understand how MySQL works in this respect (charsets, collations, etc…) but reading documentation and memorizing it wasn't very easy. Plus, there's a thousands blog posts on the matter, not always 100% accurate.

So I thought I'd better get hands on and I wrote a kind of database test suite.

Now this test suite is hooked up to the main project builds on Jenkins. Here's a sample output:


[...]
[workspace] $ /bin/sh -xe /tmp/hudson3255767718598715423.sh
+ ./bin/run-dbtest-suite
basedir=/var/lib/jenkins/jobs/auth-db/workspace
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/__initdb__.my ........................................... 
1..2
ok 1 - Using utf8tests database
ok 2 - Server charset is latin1
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_bin.my ................................... 
1..6
ok 1 - All our records are there. No duplicate key error.
ok 2 - utf8_bin collation does not collate a/â/à/A/...
ok 3 - utf8_bin collation does not collate a/â/à/A/...
ok 4 - utf8_bin collation does not collate a/â/à/A/...
ok 5 - Query for mixed-case username does not return lowercase username
ok 6 - Query for upper-case username does not return lowercase username
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_general_ci.my ............................ 
1..7
ok 1 - Collation for t007 is utf8_general_ci
ok 2 - utf8_general_ci collation normalizes accents, diacritics and the like
ok 3 - A and Ã… are collated to the same character in the utf8_general_ci table
ok 4 - å and Å are collated to the same character in the utf8_general_ci table
ok 5 - lower/upper case chars are collated in the utf8_general_ci table
ok 6 - lower/upper case chars are collated in the utf8_general_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_unicode_ci.my ............................ 
1..7
ok 1 - Collation for t005 is utf8_unicode_ci
ok 2 - utf8_unicode_ci collation normalizes accents, diacritics and the like
ok 3 - A and Ã… are collated to the same character in the utf8_unicode_ci table
ok 4 - å and Å are collated to the same character in the utf8_unicode_ci table
ok 5 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 6 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/default-table-charset.my ................................ 
1..3
ok 1 - Default character set is utf8 when no charset is specified (from server)
ok 2 - Default character set is utf8 when "CHARSET utf8" specified in the CREATE TABLE
ok 3 - Default character set is utf8 when "CHARSET utf8" and "COLLATE" specified in the CREATE TABLE
ok
...
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/username-with-utf8-chars.my ............................. 
1..5
ok 1 - We have some UTF-8 encoded string in our hands (hex)
ok 2 - We have some UTF-8 encoded string in our hands (charset)
ok 3 - Can select back UTF-8 content from a CHARSET utf8 table
ok 4 - Given string is exactly 24 bytes long (length)
ok 5 - Given string is exactly 8 (wide) characters long (char_length)
ok
All tests successful.
Files=11, Tests=80, 0.739731 wallclock secs ( 0.05 usr  0.02 sys +  0.07 cusr  0.01 csys =  0.15 CPU)
Result: PASS
Recording test results
Finished: SUCCESS

And here's an example of "sanity check" test case, which doesn't do much:


   1 -- Check that we can insert and retrieve UTF-8 content correctly
   2 
   3 BEGIN;
   4 
   5 SET NAMES utf8;
   6 
   7 SELECT tap.plan(5);
   8 
   9 USE auth_utf8tests;
  10 
  11 SET @username = '今日话题今日话题';
  12 SET @encoded  = 'C1BB8AE697A5E8AF9DE9A298E4BB8AE697A5E8AF9DE9A298';
  13 
  14 SELECT tap.eq(
  15     HEX(@username),
  16     @encoded,
  17     'We have some UTF-8 encoded string in our hands (hex)'
  18 );
  19 
  20 SELECT tap.eq(
  21     CHARSET(@username),
  22     'utf8',
  23     'We have some UTF-8 encoded string in our hands (charset)'
  24 );
  25 
  26 INSERT INTO t001 (f1) VALUES (@username);
  27 
  28 SELECT tap.eq(
  29     (SELECT HEX(f1) FROM t001 WHERE f1 = @username),
  30     @encoded,
  31     'Can select back UTF-8 content from a CHARSET utf8 table'
  32 );
  33 
  34 SELECT tap.eq(
  35     (SELECT LENGTH(f1) FROM t001 WHERE f1 = @username),
  36     24,
  37     'Given string is exactly 24 bytes long (length)'
  38 );
  39 
  40 SELECT tap.eq(
  41     (SELECT CHAR_LENGTH(f1) FROM t001 WHERE f1 = @username),
  42     8,
  43     'Given string is exactly 8 (wide) characters long (char_length)'
  44 );
  45 
  46 -- Finish the tests and clean up.
  47 CALL tap.finish();
  48 ROLLBACK;

This SQL test code uses mytap. You can see how the SELECT tap.* calls are just the equivalents of the TAP testing framework of Perl. SELECT tap.eq() is the equivalent of Test::More::is(), and so on.

Another, more interesting test case, is the following:


   1 --
   2 -- Verify how the utf8_unicode_ci collation works
   3 --
   4 
   5 BEGIN;
   6 
   7 SET NAMES utf8;
   8 
   9 SELECT tap.plan(12);
  10 
  11 USE auth_utf8tests;
  12 

  [...]

  40 SELECT tap.eq(
  41     (SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA=SCHEMA() AND TABLE_NAME='t015'),
  42     'utf8_unicode_ci',
  43     'Collation for t015 is utf8_unicode_ci'
  44 );

  [...]

  48 
  49 SELECT tap.eq(
  50     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1a' ORDER BY id),
  51     '10',
  52     'utf8_unicode_ci collation normalizes accents, diacritics and the like'
  53 );
  54 
  55 SELECT tap.eq(
  56     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1Ã…' ORDER BY id),
  57     '10',
  58     'A and Ã… are collated to the same character in the utf8_unicode_ci table'
  59 );
  60 
  61 SELECT tap.eq(
  62     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1Ã¥' ORDER BY id),
  63     '10',
  64     'Ã¥ and Ã… are collated to the same character in the utf8_unicode_ci table'
  65 );
  66 
  67 SELECT tap.eq(
  68     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TestUser1A' ORDER BY id),
  69     '10',
  70     'lower/upper case chars are collated in the utf8_unicode_ci table'
  71 );
  72 
  73 SELECT tap.eq(
  74     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TESTUSER1A' ORDER BY id),
  75     '10',
  76     'lower/upper case chars are collated in the utf8_unicode_ci table'
  77 );
  78 
  79 SELECT tap.eq(
  80     (SELECT COUNT(*) FROM t015),
  81     1,
  82     'We are allowed to insert only 1 record, because the others collate to the same string'
  83 );
  84 
  85 -- Finish the tests and clean up.
  86 CALL tap.finish();
  87 ROLLBACK;

An interesting thing that I didn't know how to do in the beginning is how to trap errors. I left out that part from the test code to simplify, but here it is:


  13 DELIMITER //
  14 
  15 DROP PROCEDURE IF EXISTS populate_table //
  16 
  17 CREATE PROCEDURE populate_table ()
  18 BEGIN
  19 
  20     DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' BEGIN
  21         SELECT tap.ok(
  22             1,
  23             'We should get dupkey errors when inserting data with collation utf8_unicode_ci'
  24         );
  25     END;
  26 
  27     INSERT INTO t015 (id,username,note) VALUES (10, 'testuser1a', 'plain');
  28     INSERT INTO t015 (id,username,note) VALUES (20, 'testuser1â', 'circumflex a');
  29     INSERT INTO t015 (id,username,note) VALUES (30, 'testuser1à', 'a grave');
  30     INSERT INTO t015 (id,username,note) VALUES (40, 'testuser1Ã…', 'A circ');
  31     INSERT INTO t015 (id,username,note) VALUES (50, 'TestUser1A', 'mixed case');
  32     INSERT INTO t015 (id,username,note) VALUES (60, 'TESTUSER1A', 'upper case');
  33 
  34 END;
  35 
  36 //
  37 
  38 DELIMITER ;
  39 
  46 /* Should generate 5 dupkey errors (taken as successful tests) */
  47 CALL populate_table;

It's a bit convoluted. To trap errors you have use the DECLARE HANDLER statement. DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' means that whenever SQLSTATE is '23000', and that corresponds to a duplicate key error, then execute this block of code. All of that must necessarily be wrapped into a stored procedure. Handlers outside of stored procedures are not allowed.

In this particular tests, the table uses the utf8_unicode_ci collation table, so we are expecting a duplicate key error on username whenever we insert the string 'testuser1à' or 'TESTUSER1A', because 'testuser1a' was already inserted at the beginning. Of all the INSERT statements, only the first one is bound to succeed, so I put a SELECT tap.ok(1) for the duplicate key HANDLER and I expect 5 tests when I make the CALL populate_table;.

This of course may seem trivial. And I guess it is, but for me it's a much better way of learning than scouring through the manuals or the many blog posts out there that may or may not reflect the environment I'm working with.

Routinely running this kind of test suite makes it possible and easy to verify the database behaviour:

  • instantly
  • after upgrades (5.1 -> 5.5? -> 6?) or storage engine changes
  • after mysql configuration changes. For example, I discovered in this way that adding default-charset=utf8 in my MySQL config breaks everything.

I consider this my live documentation on how MySQL works. I would really appreciate if you have any feedback on this. Have fun!

Report from the Varnish Users Group (VUG5) meeting in Paris – Day 1

Last week I attended the VUG5 meeting (https://www.varnish-cache.org/vug5). The following is my report of the conference Day 1, the "Users" day.

TL;DR

I learned a lot on (for me) gray areas of Varnish like 3.0, VMODs, ESI and various corner cases. My presentation on how we use Varnish at Opera sparked a lot of interest especially in our thumbnail service.

Day 1, VUG5 users day

Day 1 was held at La Défense, a mega business district just outside of Paris. All day was filled with presentations by Varnish Software people and a few other companies. On with the list, and my notes on the side.

Keynote: Varnish in 2020 by Poul Henning Kamp, Varnish Software

Poul runs thttpd, he's not a varnish user, so welcomes feeback from all users. That's why of the VUGs.

Varnish today is "The HTTP delivery engine". And in 2020? Hard to predict. PHK usually predicts things really badly. What we _can_ see is:

  • HTTP/2.0 Last call status just a few weeks ago
  • Google's SPDY support in Varnish? Most likely. Depends on future development and what/how many clients pick it up
  • HTTP over UDP? Lots of interest in this lately

Most likely future work on varnish:

  • Clearer split of transport and semantics
    (could speak HTTP no matter whether over UDP, TCP or SPDY)
  • Generic pluggable protocols (SPDY, f.ex.)
  • Decouple client protocol and backend protocol. Talk SPDY to client, talk HTTP to backend.

SSL in Varnish? Unlikely, just use Pound or nginx or whatever. Pound is simple and robust.

Varnish Book by Kristian Lyngstøl, Varnish Software

Expanding and improving on the existing training course material, Kristian and some contributors created a "Varnish Book", to help people starting up with Varnish. It will be is freely available at https://www.varnish-software.com/static/book/. Now there's only a cute bunny though.

Varnish + Escenic by Richard Zuidhof, Escenic?

Richard explained how he used Varnish to migrate away from the Apache/Squid/Apache sandwich and made it better/faster and his company saved a lot of money in the process.

Interesting points:

50x errors received from the backends are served doing a restart in vcl_fetch() but hitting a "dummy" backend, a sort of static version of a real backend. Something like:


  sub vcl_recv {
     ...
     if (req.restarts > 0) {
         set req.backend = dummy;
     }
  }
 
  sub vcl_fetch {
     if (beresp.status == 500) {
        return (restart); # Or whatever this is
     }
  }

Also talked about various timeouts, like:

 
  {
    .first_byte_timeout = 1s;
    .between_bytes_timeout = 1s;
  }

and how he needed to reset them back to 120s/180s for some of their pages to work.

He said: a timeout event from backend should cause Varnish to fall back to stale content. Not the case currently.
Varnish will abort the fetch operation. So pay attention.

Mobile device detection by Lasse Karsten, Varnish Software

Talked about various libraries and ways to detect mobile devices, including:

  • libvarnish-deviceatlas
  • WURFL
  • … others I didn't write down in time

Basically it was a way to survey how many people
use this technology and say that Varnish Software has a
commercial solution but they are going to open source
it Soon(tm), or something along these lines.

I was a bit distracted because I was having problems with the laptop and my presentation
was coming up, so… I plan to go back to this presentation once the slides are up.

ESI and Varnish by Federico Schwindt, RBS

Summary of how RBS is using ESI for an internal website used by RBS employees.

Basically the service is composed of various "boxes", small windows in the page with some information that depends on location, department or other things, and they use Varnish to cache those small boxes and ESI to compose the final page.

Problems:

  • They can't find a way to also keep the fully composed page as a cache object.
  • Invalidation logic is complex because of inter-dependent content between different boxes.

Interesting: they use a HTTP header sent by the backend to instruct Varnish on when to do ESI processing, so ESI is not a on/off as a whole, but it can be triggered on specific pages. This is very cool because it could also solve the development/production setup problem I had always feared when using ESI. With that I mean the complication of using development environments with ESI, where every dev installation needs a ESI-aware varnish.

Varnish at Opera by me

I talked about how we use Varnish in our projects. I mentioned a few Varnish extensions I worked on, including varnish-accept-language and varnish-geoip, plus other tools like http-cuke.

There were plenty of real world examples of VCL configuration we use in the various projects. I also talked about the varnish puppet module we wrote, that comes with a bunch of interesting customizations and fixes, included in the puppet-modules repository on Github.

If you're interested, slides are published here:

http://www.slideshare.net/cstrep/vug5-varnish-at-opera-software

I got lots of feedback and questions about our picture thumbnail service, so I'll probably write more about it soon.

Security with VCL by Kacper Wysocki, Redpill Linpro

Easily one of the best talks of the day. Kacper explained his security.vcl project. Here's a few highlights, but it's really interesting, I hope slides will be up soon.

  • Wrote modsec rules parser and converter to VCL
  • Eduardo Scarpellini, Master thesis, OWASP, worked on a varnish-firewall project, similar in scope, and did a in-depth research, finding that out of the OWASP top broken apps, he could automatically block 73% of XSS and SQL injections.
  • security.vcl is now used in ~10 sites with lots of traffic
  • Drawback compared to mod-security is that no POST data can be analyzed (yet)
  • In the future, we will see a merge of security.vcl and varnish-firewall projects.

Varnish modules by Kristian Lyngstøl, Varnish Software

I don't remember much, but I think Kristian basically tried to get more people to use VMODs, and said there's now a nice page where a list of known VMODs is kept:

http://www.varnish-cache.org/vmods

and you can register your own VMODs and have them listed.

Stay tuned for the "Day 2, Developers day" part.

Using hypnotoad in production, anyone?

So, you're using hypnotoad in production. And it works perfectly for you. Maybe you have an Nginx or Apache in front of it configured as reverse proxy. Everything's great. Right? Right. Then I have a zillion questions for you.

Maybe I don't understand how it works, but I'm having the following problems:

  • "sometimes" hypnotoad won't stop. I usually try to stop it with:
    hypnotoad --stop /path/to/my/script
  • I use symlinks to deploy applications, so for example I deploy in /opt/myapp and each new deployment gets a timestamped folder, /opt/myapp/releases/20120224-180801.

    Then there's a symlink that always points to the last deployed version, /opt/myapp/current/opt/myapp/releases/{whatever-datetime}. Now, using hypnotoad --stop /opt/myapp/current doesn't work, because hypnotoad probably uses the actual filename, not the symlink, to identify the running application.

    That's fine, but then how can I stop it reliably? I wish it had a hypnotoad --force-stop mode or something.

  • Last problem, when I push a new deployment, and stop and restart hypnotoad, often the application doesn't work properly, it only generates exceptions for unknown reasons. Stopping and restarting again manually usually fixes the problems…
  • I was a bit frustrated today, so I decided to switch back to starman. I have never ever had a problem with it, so I will stick to it for now. But I would still be interested to know whether you use hypnotoad in production and how well it works. Write in the small box below, you don't need to register. Thanks :)

Find uses of perl 5.10 features in your code: a bit of PPI magic

This morning on IRC we were talking about old perl installations, and how forgetting the use 5.010 but using 5.10 features, for example the // operator, can be a problem.

I suggested maybe using a git hook would be an idea, so I assembled this proof of concept script to test for 5.10-isms but without a use 5.010; statement. It's too long to include here, so I put in on Github (https://gist.github.com/1875528).

The script uses the "impossible" PPI module to parse the Perl code and extract information about used operators using PPI::Token::Operator, to catch // or ~~ or similar, and use statements, with PPI::Statement::Include.

It's meant to be run as part of a post-commit hook or similar. I had searched for similar modules in the Perl::Critic space but found nothing of sorts. Maybe I just didn't look too well?

Anyway, https://gist.github.com/1875528. Enjoy.