Tag Archives: perl

Net::Statsd::Server, a Perl port of Flickr/Etsy’s statsd

If you’re looking for a Perl client to connect to a statsd daemon, checkout Net::Statsd on CPAN, now at version 0.08.

This post is about the server component of statsd.

Tracking metrics: up to now

The idea of statsd started in Flickr by Cal Henderson, and some code is still available, but it’s not very functional or complete.

Since reading about statsd, I found the concept brilliant. I have been using a similar technique long before hearing about statsd though. I learned it from colleagues here at Opera in 2008. They were using it to track application metrics for the Opera Link server. I thought it was great, so I also implemented it, extending it by making it very easy to add metrics and to see the output automatically in Munin. Here’s how it worked basically:

# ...
use Opera::Stats;
# ...
Opera::Stats::count("site.logins");
# ...

The project code would have typically tens or hundreds of these calls. Each call would store/increment a counter in a local or remote memcached. Then a complementary Opera::Stats::Munin module would automatically generate the output needed to implement a full Munin plugin given the metrics to be exposed.

So far, so good. Except there were a few things that didn’t work quite right:

  • Using TCP connections, maybe even to remote machines, even though it was never a problem, could be in case the memcached machines went down
  • Volume was a concern. I had to worry about tracking too many metrics. How would that affect functioning of memcached for regularly stored keys and values? Would those metrics-related keys cause evictions in the regular memcached content?
  • Even though the munin integration made it very easy to have charts, there were still some limitations: creating new charts requires some wrapper plugin with 1 or 2 lines of Perl code. Flexibility was also an issue.

Enter statsd

I have been thinking of replacing this system with statsd for a while. However, I wanted to have a more in-depth look at it before deploying it.

Turns out that statsd is a simple project, which I like, but requires nodejs. Knowing next to nothing about nodejs, I took some time to learn a few things.

I also realized I have been wanting to learn about AnyEvent for a long time.

Net::Statsd::Server

Two weeks ago, I spent a busy weekend reimplementing 95% of statsd in Perl. On Sunday night, I had a functional version of statsd written in Perl with AnyEvent.

AnyEvent stuff is surprising at times. I found especially interesting to debug the cases where your timer (AE::timer) doesn’t fire unless you actually save it to a scalar, as in:

# This won't fire!
AE::timer 10, 10, \&do_something;

# This will though.
# This behaviour is triggered by "defined wantarray"
my $t = AE::timer 10, 10, \&do_something;

Since that weekend, I have spent a few more nights tweaking Net::Statsd::Server. Yesterday I wrote a new piece of functionality (a new “File” backend) that is actually not in the original statsd.

It looks like I might need new backends as well, so I think it’s “an investment with a good ROI”, even though I did it mainly for fun and in my free time.

Performance

I wanted to make sure my statsd server implementation would be fast. I started by bringing up the nodejs statsd and firing my official benchmark script with 1 million iterations, and then comparing the results with my own statsd server.

That didn’t work out very well. Or rather, it worked out brilliantly, showing around 40K requests/s being handled by nodejs-statsd and 50K requests/s by Net::Statsd::Server. Problem is: how do you measure the performance of a UDP server? Or, for that matter, of a UDP client?

I figured out that, being UDP connection-less fire-and-forget, it doesn’t really matter how many packets/s the client fires, as long as you can generate more than your server can handle. Just as a data point, I reached around 73-75k statsd API calls per second (for the gauge API, around 55-58k for counters and timers). What really matters is how many packets reach the server.

BTW, I used another amazing piece of software called Devel::NYTProf to optimize the performance of the incoming packets code path as much as I could.

The test setup

To measure how many packets are received on the server-side, I prepared a test configuration:

{ graphitePort: 2003
, graphiteHost: "graphite.localdomain"
, host: "0.0.0.0"
, port: 8125
, backends: [ "./backends/graphite", "./backends/console" ]
, mgmt_address: "0.0.0.0"
, mgmt_port: 8126
}

The same configuration file for the Perl server becomes:

{ "graphitePort": 2003,
  "graphiteHost": "graphite.localdomain",
  "host" : "0.0.0.0",
  "port": 8125,
  "mgmt_address" : "0.0.0.0",
  "mgmt_port": 8126,
  "backends": [ "Graphite", "Console" ],
  "log" : {
    "backend" : "stdout",
    "level" : "LOG_WARN",
  }
}

Using the benchmark.pl code mentioned above, run with:

$ perl benchmark.pl 1000000

I started up first the nodejs statsd, then the Net::Statsd::Server daemon and captured their output. Both servers are configured to use their Graphite backend and flush to a valid and active graphite host. The Console backend is also active for both servers, so I could capture the output and look at the statsd.packets_received counter and directly measure how many packets are received in the server.

The benchmark utility with first argument = 1000000 generates 5 million statsd API calls, that is, 5 million UDP packets.

Of these 5 million packets, nodejs statsd was able to capture 2106768, 1596275, 1479145 and 1490640 packets over several runs.

Net::Statsd::Server, again in 3 different runs, was able to capture 2106242, 1884810, 1822042 and 1866500 packets.

I have performed more tests, and they had a very low deviation from the last runs (1.5M for etsy’s statsd and 1.8M for Net::Statsd::Server). Removing the 2 peak results of ~2.1Mb, it would seem that the Perl statsd is capable of receiving 22% more packets than the original statsd daemon written in javascript.

Of course, this is just my test. I have tried to run the test on different hardware, but I haven’t got significantly different results. If you try yourself, please let me know what numbers you get. I’d be curious to know :-)

SO_RCVBUF

Given the massive amount of UDP packets that were lost in the tests (50%+ in the best runs), I tried to figure out a way to improve this and I stumbled on SO_RCVBUF.

My understanding was that bumping up SO_RCVBUF on the listening UDP socket would dramatically decrease packet loss. However, I hadn’t been able to prove the theory because I hadn’t seen an improvement in the total number of packets received. At least until I read this article on UDP packet loss on stackoverflow.com, that pointed me to the net.core.rmem_max sysctl.

After modifying net.core.rmem_max, setting it to 100M, just to avoid its effect, and using the following code in Net::Statsd::Server:

# Bump up SO_RCVBUF on UDP socket, to buffer up incoming
# UDP packets, to avoid massive packet loss when load is very high.
setsockopt($self->{server}->fh, SOL_SOCKET, SO_RCVBUF, 1*1024*1024)
or die "Couldn't set SO_RCVBUF: $!";

I can see some very interesting effect.

Re-running the node.js statsd, I could see an increased amount of captured packets (1691700, 1675902, ~10% increase).
Running again the Net::Statsd::Server daemon, I recorded 2678507 and 2477246 packets, for an impressive ~40% increase!

As a last effort, I tried varying the SO_RCVBUF size from 1 to 64Mb to see what effect it had on the amount of captured packets (or UDP packet loss if you prefer).

I haven’t run any scientific set of tests, but I can’t see any statistically significant increase for values greater than 4-8Mb, so I haven’t decided where to set the default in Net::Statsd::Server yet. Any chosen value is likely to need specific sysctl tuning anyway, so YMMV.

Why?

Did I really do it for fun? Yes, mainly, but also because:

  • I don’t like adding node.js to our production stack just to run statsd. I have never operated a node.js server, so I don’t want to take this “risk”. The product we’re building is going live soon! :-) And note that this does apply to anything, it’s not about node.js per se :-)
  • to learn how statsd was put together
  • to learn AnyEvent
  • to learn how to build a high performance UDP server
  • Basically, to learn :-)

Code is up on CPAN, as usual: https://metacpan.org/module/Net::Statsd::Server.

If you happen to use it, please give me some feedback!

Using Perl and Google Chromium’s CLD to identify the language of a text

For a new project I'm working on, given a body of text, I need to identify which language it's written in (English, Russian, Chinese, etc…).

I'm not exactly the first person on Earth to do this, so it turns out there's Google's CLD library. Surprisingly, several people around here didn't know it. The library is open source and very good too, so I immediately looked for Perl bindings for it.

There is a great Perl module on CPAN called Lingua::Identify::CLD. This module bundles a copy of the CLD library, and fully automates build and link steps too. So I gave it a shot.

How to use Lingua::Identify::CLD

It's amazingly easy to use. Here's a sample of the code:


#!/usr/bin/perl

use strict;
use Lingua::Identify::CLD ();

my $text;
while (readline) { $text .= $_ }
chomp $text;

# In my case, the content is HTML
my $cld = Lingua::Identify::CLD->new(isPlainText => 0);

# Example: (ENGLISH, en, 64)
my @lang = $cld->identify($text);
say "Language: $lang[0]";

Failing tests

I decided to start using this module into my project. The build phase went fine (perl ./Build), while the tests were failing (./Build test). Here's the log of a failed test run:


$ ./Build test
cc -I/usr/lib/perl/5.14/CORE -fPIC -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o /tmp/gAc_glZta2/library.o /tmp/gAc_glZta2/library.c
cc -I/usr/lib/perl/5.14/CORE -fPIC -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o /tmp/gAc_glZta2/test.o /tmp/gAc_glZta2/test.c
cc -shared -L/usr/local/lib -fstack-protector -o /tmp/gAc_glZta2/libfoo.so /tmp/gAc_glZta2/library.o
cc -fstack-protector -L/usr/local/lib -o /tmp/gAc_glZta2/foo /tmp/gAc_glZta2/test.o -L/tmp/gAc_glZta2 -lfoo

** Preparing XS code
t/00-load.t ....... 1/1 Bailout called.  Further testing stopped:  

#   Failed test 'use Lingua::Identify::CLD;'
#   at t/00-load.t line 6.
#     Tried to use 'Lingua::Identify::CLD'.
#     Error:  Not a CODE reference at /usr/lib/perl/5.14/DynaLoader.pm line 207.
# END failed--call queue aborted at .../Lingua-Identify-CLD-0.05/blib/lib/Lingua/Identify/CLD.pm line 207.
# BEGIN failed--compilation aborted at .../Lingua-Identify-CLD-0.05/blib/lib/Lingua/Identify/CLD.pm line 24.
# Compilation failed in require at (eval 4) line 2.
# BEGIN failed--compilation aborted at (eval 4) line 2.
Use of uninitialized value $Lingua::Identify::CLD::VERSION in concatenation (.) or string at t/00-load.t line 9.
# Testing Lingua::Identify::CLD , Perl 5.014002, /usr/bin/perl
# Looks like you failed 1 test of 1.
FAILED--Further testing stopped.

Just the day before I had successfully compiled and run the tests for the same version of the module, but on Ubuntu 11.10, which I was using. Then I decided to upgrade to 12.10, and that's where I got this failed test run.

Contacting the author

Then I decided to contact the author of the module. Being Alberto quite a known author, with lots of CPAN contributions, I hoped he would answer my query within 2-3 days. That would give me some time to do other stuff, and hopefully would give him time to analyze the failure.

As usual with the best CPAN authors ;-) he answered in a couple of hours, which was fantastic for me. He had already identified a few failures like mine thanks to another awesome resource we have in the Perl community, the CPAN Testers service.

CPAN Testers

CPAN testers is a group of users that regularly (or not) report back the build/test status of everything that's released to CPAN in a multitude of platforms and versions of Perl. I think this is one of the most underestimated awesome features we have in the Perl community. The CPAN testers status of Lingua::Identify::CLD shows one report that looks exactly the same as the failure I experienced. This is on Ubuntu 12.10 with the stock perl 5.14.2.

The ugly patch

I tried to analyze the problem, apparently located in DynaLoader, and came up with a shotgun-debugging-driven patch that I copy/paste here for reference:

@@ -18,10 +18,23 @@ Version 0.05

 our $VERSION = '0.05';
 
-use XSLoader;
-BEGIN {
+eval {
+
+    require XSLoader;
     XSLoader::load('Lingua::Identify::CLD', $VERSION);
-}
+
+} or do {
+
+    # This warning triggers on Ubuntu 12.10 with the
+    # stock perl 5.14.2. Strangely enough, this doesn't
+    # seem to affect the tests at all.
+    #
+    # Not a CODE reference at /usr/lib/perl/5.14/DynaLoader.pm line 207.
+    # END failed--call queue aborted at .../blib/lib/Lingua/Identify/CLD.pm line 207.
+    # ) at .../blib/lib/Lingua/Identify/CLD.pm line 28."
+    #
+    #warn "Something's wrong with XSLoader? ($@)";
+};
 
 =head1 SYNOPSIS

It's shotgun debugging because I don't really know what's going on, I just came up with this patch because of the assumptions and information I gathered during the years on how DynaLoader/XSLoader and BEGIN {} blocks work or interact with the rest of the code :-)

Anyway, it makes the tests pass again, even with a weird warning. I agree with Alberto that it's not wise to incorporate this patch into Lingua::Identify::CLD, until we have understood why the original code fails, and why just for 2 people in the world.

All this blah-blah, to say: please do help! If you have seen the same problem, help us figure out what it is. My repository with the forked/patched code is on Github:

https://github.com/cosimo/Lingua-Identify-CLD

Have fun!

Displaying realtime memcached traffic on a backend

Sometimes I like to write down posts like this, to remind myself how to do something, sort of a mental note.
Suppose you have a few application servers that use 1+ memcached servers, and you want some way to display the outbound traffic, providing some insights on what are the most used keys, counters, etc…

Here's a quick way to do that, assuming you're using the memcached text protocol:

tcpflow -ce dst port 11211 
    | cut -b53- 
    | grep ^get 
    | pipestat --clear --runtime 60 --field 2 --time 1 --limit 40

What this does is:

  • Use tcpflow to capture all outbound traffic to destination port 11211, default memcached port.
  • Remove the first 53 bytes from each line, to filter out source and destination ip/ports
  • Only display get requests (alternatively, use set, incr, …)
  • Feed the resulting data to pipestat, a simple but great Perl tool that aggregates the data, displaying the most frequent ones. The specific options I used are good if you want to display quick statistics like other tools as top, mytop, or varnishstat.

It goes without saying that these tools are automatically installed on all servers that our Devops team here at Opera manages. I couldn't work without them :)

How to find unused CSS selectors, a quick solution

Was talking to a colleague today, and he mentioned the problem he was working on: trying to find site-wide unused CSS selectors. That is, having a static CSS file on disk, try to go through all the selectors in there and see if there's some matching elements in an entire site, crawling it page by page.

I thought it was a really interesting problem, so I gave it a quick shot by glueing together CSS::Tiny, Mojo::UserAgent and Mojo::DOM::CSS.

This is what came out of it. I'd say a decent first quick solution:

So I also learned about this deadweight project, that apparently also can crawl a site by logging in, kind of WWW::Mechanize style. Would be interesting to improve this initial solution :-)

Dist::Zilla, Y U suddenly no work anymore? [FIXED!]

I'm trying to understand why Dist::Zilla doesn't work anymore on my laptop. Here's the epic wall of warnings I get when running dzil test:


$ dzil test
Could not create the 'reader' method for zilla because : The method '_inline_store' was not found in the inheritance hierarchy for Moose::Meta::Class::__ANON__::SERIAL::9 at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 1053
	Class::MOP::Class::__ANON_Moose::Meta::Class=HASH(0x3556088) called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 1098
	Class::MOP::Class::add_around_method_modifier('Moose::Meta::Class=HASH(0x3556088)', '_inline_store', 'CODE(0x351cea8)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application/ToClass.pm line 231
	Moose::Meta::Role::Application::ToClass::apply_method_modifiers('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'around', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application.pm line 78
	Moose::Meta::Role::Application::apply_around_method_modifiers('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application.pm line 64
	Moose::Meta::Role::Application::apply('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application/ToClass.pm line 36
	Moose::Meta::Role::Application::ToClass::apply('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)', 'HASH(0x354ce50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role.pm line 470
	Moose::Meta::Role::apply('Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Util.pm line 160
	Moose::Util::_apply_all_roles('Moose::Meta::Class=HASH(0x3556088)', undef, 'MooseX::SetOnce::Accessor') called at /usr/local/lib/perl/5.10.1/Moose/Util.pm line 99
	Moose::Util::apply_all_roles('Moose::Meta::Class=HASH(0x3556088)', 'MooseX::SetOnce::Accessor') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Class.pm line 104
	Moose::Meta::Class::create('Moose::Meta::Class', 'Moose::Meta::Class::__ANON__::SERIAL::9', 'roles', 'ARRAY(0x33e50d8)', 'weaken', '', 'superclasses', 'ARRAY(0x353a7e8)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Package.pm line 120
	Class::MOP::Package::create_anon('Moose::Meta::Class', 'superclasses', 'ARRAY(0x353a7e8)', 'roles', 'ARRAY(0x33e50d8)', 'cache', 1) called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 474
	Class::MOP::Class::create_anon_class('Moose::Meta::Class', 'superclasses', 'ARRAY(0x353a7e8)', 'roles', 'ARRAY(0x33e50d8)', 'cache', 1) called at /usr/share/perl5/MooseX/SetOnce.pm line 27
	Class::MOP::Class:::around('CODE(0x1c87bf0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Method/Wrapped.pm line 162
	Class::MOP::Method::Wrapped::__ANON_Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50) called at /usr/local/lib/perl/5.10.1/Class/MOP/Method/Wrapped.pm line 91
	Moose::Meta::Class::__ANON__::SERIAL::8::accessor_metaclass('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 389
	Class::MOP::Attribute::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x3543bb8)', 'Try::Tiny::Catch=REF(0x354c718)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 401
	Class::MOP::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1074
	Moose::Meta::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 428
	Class::MOP::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1013
	Moose::Meta::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 891
	Class::MOP::Class::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x354c5b0)', 'Try::Tiny::Catch=REF(0x3435780)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 896
	Class::MOP::Class::_post_add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Mixin/HasAttributes.pm line 44
	Class::MOP::Mixin::HasAttributes::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Class.pm line 570
	Moose::Meta::Class::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'writer', 'set_zilla', 'lazy_required', 1, 'isa', ...) called at /usr/local/lib/perl/5.10.1/Moose.pm line 79
	Moose::has('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', ...) called at /usr/local/lib/perl/5.10.1/Moose/Exporter.pm line 382
	Moose::has('zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', 'set_zilla', ...) called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 22
	require Dist/Zilla/MVP/RootSection.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13
	Dist::Zilla::MVP::Assembler::Zilla::BEGIN() called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	eval {...} called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	require Dist/Zilla/MVP/Assembler/Zilla.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 204
	Dist::Zilla::Dist::Builder::_load_config('Dist::Zilla::Dist::Builder', 'HASH(0x342fe00)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 27
	Dist::Zilla::Dist::Builder::from_config('Dist::Zilla::Dist::Builder', 'HASH(0x33e2608)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 112
	Dist::Zilla::App::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x3084e60)', 'Try::Tiny::Catch=REF(0x33a8848)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 120
	Dist::Zilla::App::zilla('Dist::Zilla::App=HASH(0x204eb48)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command.pm line 13
	Dist::Zilla::App::Command::zilla('Dist::Zilla::App::Command::test=HASH(0x280b910)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command/test.pm line 28
	Dist::Zilla::App::Command::test::execute('Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)', 'ARRAY(0x13bef10)') called at /usr/share/perl5/App/Cmd.pm line 220
	App::Cmd::execute_command('Dist::Zilla::App=HASH(0x204eb48)', 'Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)') called at /usr/share/perl5/App/Cmd.pm line 159
	App::Cmd::run('Dist::Zilla::App') called at /usr/bin/dzil line 11
 at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 400
	Class::MOP::Attribute::__ANON_The method '_inline_store' was not found in the inheritance... called at /usr/share/perl5/Try/Tiny.pm line 100
	Try::Tiny::try('CODE(0x3543bb8)', 'Try::Tiny::Catch=REF(0x354c718)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 401
	Class::MOP::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1074
	Moose::Meta::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 428
	Class::MOP::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1013
	Moose::Meta::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 891
	Class::MOP::Class::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x354c5b0)', 'Try::Tiny::Catch=REF(0x3435780)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 896
	Class::MOP::Class::_post_add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Mixin/HasAttributes.pm line 44
	Class::MOP::Mixin::HasAttributes::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Class.pm line 570
	Moose::Meta::Class::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'writer', 'set_zilla', 'lazy_required', 1, 'isa', ...) called at /usr/local/lib/perl/5.10.1/Moose.pm line 79
	Moose::has('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', ...) called at /usr/local/lib/perl/5.10.1/Moose/Exporter.pm line 382
	Moose::has('zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', 'set_zilla', ...) called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 22
	require Dist/Zilla/MVP/RootSection.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13
	Dist::Zilla::MVP::Assembler::Zilla::BEGIN() called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	eval {...} called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	require Dist/Zilla/MVP/Assembler/Zilla.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 204
	Dist::Zilla::Dist::Builder::_load_config('Dist::Zilla::Dist::Builder', 'HASH(0x342fe00)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 27
	Dist::Zilla::Dist::Builder::from_config('Dist::Zilla::Dist::Builder', 'HASH(0x33e2608)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 112
	Dist::Zilla::App::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x3084e60)', 'Try::Tiny::Catch=REF(0x33a8848)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 120
	Dist::Zilla::App::zilla('Dist::Zilla::App=HASH(0x204eb48)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command.pm line 13
	Dist::Zilla::App::Command::zilla('Dist::Zilla::App::Command::test=HASH(0x280b910)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command/test.pm line 28
	Dist::Zilla::App::Command::test::execute('Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)', 'ARRAY(0x13bef10)') called at /usr/share/perl5/App/Cmd.pm line 220
	App::Cmd::execute_command('Dist::Zilla::App=HASH(0x204eb48)', 'Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)') called at /usr/share/perl5/App/Cmd.pm line 159
	App::Cmd::run('Dist::Zilla::App') called at /usr/bin/dzil line 11
Compilation failed in require at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13.
BEGIN failed--compilation aborted at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13.
Compilation failed in require at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 204.

Due to chronic lack of time, I blindly tried to upgrade Moose, MooseX::Types, Dist::Zilla, Config::MVP, but no luck.

Before I start dealing with this madness… any idea?

EDIT: thanks to the comments, I found out about moose-outdated, a script that reports the Moose(X) modules that have newer versions up on CPAN. Running moose-outdated I got back the following list:

$ moose-outdated
MooseX::LazyRequire
MooseX::Role::Parameterized
MooseX::SetOnce

Then I just run:

$ cpanm MooseX::LazyRequire MooseX::Role::Parameterized MooseX::SetOnce

After doing this, dzil started working again. Thanks everyone for your comments and help!

Verifying MySQL behaviour with automated test suites and mytap

You know everything about how MySQL treats UTF8 and LATIN1 charsets and how the collation table impacts on selection and insertion of data, right?

Great, then stop reading :)

I don't and since I'm in the process of setting up a new version of the Opera accounts database, I really don't want to screw up things. I tried to fully understand how MySQL works in this respect (charsets, collations, etc…) but reading documentation and memorizing it wasn't very easy. Plus, there's a thousands blog posts on the matter, not always 100% accurate.

So I thought I'd better get hands on and I wrote a kind of database test suite.

Now this test suite is hooked up to the main project builds on Jenkins. Here's a sample output:


[...]
[workspace] $ /bin/sh -xe /tmp/hudson3255767718598715423.sh
+ ./bin/run-dbtest-suite
basedir=/var/lib/jenkins/jobs/auth-db/workspace
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/__initdb__.my ........................................... 
1..2
ok 1 - Using utf8tests database
ok 2 - Server charset is latin1
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_bin.my ................................... 
1..6
ok 1 - All our records are there. No duplicate key error.
ok 2 - utf8_bin collation does not collate a/â/à/A/...
ok 3 - utf8_bin collation does not collate a/â/à/A/...
ok 4 - utf8_bin collation does not collate a/â/à/A/...
ok 5 - Query for mixed-case username does not return lowercase username
ok 6 - Query for upper-case username does not return lowercase username
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_general_ci.my ............................ 
1..7
ok 1 - Collation for t007 is utf8_general_ci
ok 2 - utf8_general_ci collation normalizes accents, diacritics and the like
ok 3 - A and Å are collated to the same character in the utf8_general_ci table
ok 4 - å and Å are collated to the same character in the utf8_general_ci table
ok 5 - lower/upper case chars are collated in the utf8_general_ci table
ok 6 - lower/upper case chars are collated in the utf8_general_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_unicode_ci.my ............................ 
1..7
ok 1 - Collation for t005 is utf8_unicode_ci
ok 2 - utf8_unicode_ci collation normalizes accents, diacritics and the like
ok 3 - A and Å are collated to the same character in the utf8_unicode_ci table
ok 4 - å and Å are collated to the same character in the utf8_unicode_ci table
ok 5 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 6 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/default-table-charset.my ................................ 
1..3
ok 1 - Default character set is utf8 when no charset is specified (from server)
ok 2 - Default character set is utf8 when "CHARSET utf8" specified in the CREATE TABLE
ok 3 - Default character set is utf8 when "CHARSET utf8" and "COLLATE" specified in the CREATE TABLE
ok
...
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/username-with-utf8-chars.my ............................. 
1..5
ok 1 - We have some UTF-8 encoded string in our hands (hex)
ok 2 - We have some UTF-8 encoded string in our hands (charset)
ok 3 - Can select back UTF-8 content from a CHARSET utf8 table
ok 4 - Given string is exactly 24 bytes long (length)
ok 5 - Given string is exactly 8 (wide) characters long (char_length)
ok
All tests successful.
Files=11, Tests=80, 0.739731 wallclock secs ( 0.05 usr  0.02 sys +  0.07 cusr  0.01 csys =  0.15 CPU)
Result: PASS
Recording test results
Finished: SUCCESS

And here's an example of "sanity check" test case, which doesn't do much:


   1 -- Check that we can insert and retrieve UTF-8 content correctly
   2 
   3 BEGIN;
   4 
   5 SET NAMES utf8;
   6 
   7 SELECT tap.plan(5);
   8 
   9 USE auth_utf8tests;
  10 
  11 SET @username = '今日话题今日话题';
  12 SET @encoded  = 'C1BB8AE697A5E8AF9DE9A298E4BB8AE697A5E8AF9DE9A298';
  13 
  14 SELECT tap.eq(
  15     HEX(@username),
  16     @encoded,
  17     'We have some UTF-8 encoded string in our hands (hex)'
  18 );
  19 
  20 SELECT tap.eq(
  21     CHARSET(@username),
  22     'utf8',
  23     'We have some UTF-8 encoded string in our hands (charset)'
  24 );
  25 
  26 INSERT INTO t001 (f1) VALUES (@username);
  27 
  28 SELECT tap.eq(
  29     (SELECT HEX(f1) FROM t001 WHERE f1 = @username),
  30     @encoded,
  31     'Can select back UTF-8 content from a CHARSET utf8 table'
  32 );
  33 
  34 SELECT tap.eq(
  35     (SELECT LENGTH(f1) FROM t001 WHERE f1 = @username),
  36     24,
  37     'Given string is exactly 24 bytes long (length)'
  38 );
  39 
  40 SELECT tap.eq(
  41     (SELECT CHAR_LENGTH(f1) FROM t001 WHERE f1 = @username),
  42     8,
  43     'Given string is exactly 8 (wide) characters long (char_length)'
  44 );
  45 
  46 -- Finish the tests and clean up.
  47 CALL tap.finish();
  48 ROLLBACK;

This SQL test code uses mytap. You can see how the SELECT tap.* calls are just the equivalents of the TAP testing framework of Perl. SELECT tap.eq() is the equivalent of Test::More::is(), and so on.

Another, more interesting test case, is the following:


   1 --
   2 -- Verify how the utf8_unicode_ci collation works
   3 --
   4 
   5 BEGIN;
   6 
   7 SET NAMES utf8;
   8 
   9 SELECT tap.plan(12);
  10 
  11 USE auth_utf8tests;
  12 

  [...]

  40 SELECT tap.eq(
  41     (SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA=SCHEMA() AND TABLE_NAME='t015'),
  42     'utf8_unicode_ci',
  43     'Collation for t015 is utf8_unicode_ci'
  44 );

  [...]

  48 
  49 SELECT tap.eq(
  50     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1a' ORDER BY id),
  51     '10',
  52     'utf8_unicode_ci collation normalizes accents, diacritics and the like'
  53 );
  54 
  55 SELECT tap.eq(
  56     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1Å' ORDER BY id),
  57     '10',
  58     'A and Å are collated to the same character in the utf8_unicode_ci table'
  59 );
  60 
  61 SELECT tap.eq(
  62     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1å' ORDER BY id),
  63     '10',
  64     'å and Å are collated to the same character in the utf8_unicode_ci table'
  65 );
  66 
  67 SELECT tap.eq(
  68     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TestUser1A' ORDER BY id),
  69     '10',
  70     'lower/upper case chars are collated in the utf8_unicode_ci table'
  71 );
  72 
  73 SELECT tap.eq(
  74     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TESTUSER1A' ORDER BY id),
  75     '10',
  76     'lower/upper case chars are collated in the utf8_unicode_ci table'
  77 );
  78 
  79 SELECT tap.eq(
  80     (SELECT COUNT(*) FROM t015),
  81     1,
  82     'We are allowed to insert only 1 record, because the others collate to the same string'
  83 );
  84 
  85 -- Finish the tests and clean up.
  86 CALL tap.finish();
  87 ROLLBACK;

An interesting thing that I didn't know how to do in the beginning is how to trap errors. I left out that part from the test code to simplify, but here it is:


  13 DELIMITER //
  14 
  15 DROP PROCEDURE IF EXISTS populate_table //
  16 
  17 CREATE PROCEDURE populate_table ()
  18 BEGIN
  19 
  20     DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' BEGIN
  21         SELECT tap.ok(
  22             1,
  23             'We should get dupkey errors when inserting data with collation utf8_unicode_ci'
  24         );
  25     END;
  26 
  27     INSERT INTO t015 (id,username,note) VALUES (10, 'testuser1a', 'plain');
  28     INSERT INTO t015 (id,username,note) VALUES (20, 'testuser1â', 'circumflex a');
  29     INSERT INTO t015 (id,username,note) VALUES (30, 'testuser1à', 'a grave');
  30     INSERT INTO t015 (id,username,note) VALUES (40, 'testuser1Å', 'A circ');
  31     INSERT INTO t015 (id,username,note) VALUES (50, 'TestUser1A', 'mixed case');
  32     INSERT INTO t015 (id,username,note) VALUES (60, 'TESTUSER1A', 'upper case');
  33 
  34 END;
  35 
  36 //
  37 
  38 DELIMITER ;
  39 
  46 /* Should generate 5 dupkey errors (taken as successful tests) */
  47 CALL populate_table;

It's a bit convoluted. To trap errors you have use the DECLARE HANDLER statement. DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' means that whenever SQLSTATE is '23000', and that corresponds to a duplicate key error, then execute this block of code. All of that must necessarily be wrapped into a stored procedure. Handlers outside of stored procedures are not allowed.

In this particular tests, the table uses the utf8_unicode_ci collation table, so we are expecting a duplicate key error on username whenever we insert the string 'testuser1à' or 'TESTUSER1A', because 'testuser1a' was already inserted at the beginning. Of all the INSERT statements, only the first one is bound to succeed, so I put a SELECT tap.ok(1) for the duplicate key HANDLER and I expect 5 tests when I make the CALL populate_table;.

This of course may seem trivial. And I guess it is, but for me it's a much better way of learning than scouring through the manuals or the many blog posts out there that may or may not reflect the environment I'm working with.

Routinely running this kind of test suite makes it possible and easy to verify the database behaviour:

  • instantly
  • after upgrades (5.1 -> 5.5? -> 6?) or storage engine changes
  • after mysql configuration changes. For example, I discovered in this way that adding default-charset=utf8 in my MySQL config breaks everything.

I consider this my live documentation on how MySQL works. I would really appreciate if you have any feedback on this. Have fun!

Find uses of perl 5.10 features in your code: a bit of PPI magic

This morning on IRC we were talking about old perl installations, and how forgetting the use 5.010 but using 5.10 features, for example the // operator, can be a problem.

I suggested maybe using a git hook would be an idea, so I assembled this proof of concept script to test for 5.10-isms but without a use 5.010; statement. It's too long to include here, so I put in on Github (https://gist.github.com/1875528).

The script uses the "impossible" PPI module to parse the Perl code and extract information about used operators using PPI::Token::Operator, to catch // or ~~ or similar, and use statements, with PPI::Statement::Include.

It's meant to be run as part of a post-commit hook or similar. I had searched for similar modules in the Perl::Critic space but found nothing of sorts. Maybe I just didn't look too well?

Anyway, https://gist.github.com/1875528. Enjoy.

Perl client for Etsy statsd, improved and released v0.02 on CPAN

Sometimes bugs reported on the CPAN issue tracker are the perfect excuse to improve your code. In this case, my client module for Etsy statsd service, Net::Statsd, got an update because of this ticket, RT#74172.

As with all my recent CPAN module, when a new bug is filed against it, I try to create a specific test case. Sometimes it's quite hard to do, but this time wasn't, even though I had to refactor the existing code. This allowed me to improve the testability of the code in the process, so thanks to the reporter of that ticket :)

I still haven't managed to test my own code with the statsd service, and hook it up to Graphite. Soon :-)

As usual, code is up on Github:

https://github.com/cosimo/perl5-net-statsd

And on CPAN too:

https://metacpan.org/module/Net::Statsd.
Have fun!

Calling all Mojolicious users: patches welcome?

So you're using Mojolicious. Good. We started using it too, and it's great. We started having some patches lying around, which wouldn't be integrated into the mainline.

We're starting to reach a critical mass and I have been considering the idea of starting our own Mojolicious "branch". I'd like to know how many of you are in the same situation, and issue a call to action:

if you need or have needed patches to Mojolicious that for whatever reason were not integrated into the official repository, please contact me, leave a comment here or send me an email. I'd like to hear from you!

Fixed temporary files handling in HTTP::DAV

It's more than 3 years already that I took over maintainership for HTTP::DAV. I've been fixing several bugs, last one today (and 0.45 is just out on CPAN), and I have to say that it was a fantastic exercise, that I really suggest to anyone even moderately interested in open source development and improving their own programming skills.

Here's how it works:

  • Pick a CPAN distribution that has been put up for adoption, or one that your $work depends on (my case for HTTP::DAV)
  • Contact its author or current maintainer
  • Take a look at its RT queue (usually it's something like https://rt.cpan.org/Public/Dist/Display.html?Name=Some-Dist-Name
  • Pick whatever bug you fancy from the list
  • Write a test case for it, naming it t/RT_[ticket_number].t
  • Fix the bug in the code, and see your test case pass

That's what I've been trying to do with HTTP::DAV, that was back then completely unknown code to me. I hope the results are decent. At least there hasn't been any regression reported so far… :-)

Enjoy, https://github.com/cosimo/perl5-http-dav and https://metacpan.org/module/HTTP::DAV.