Tag Archives: perl

The Perl echo chamber, marketing and … is Perl really dying?

Recently I came across this tweet from Curtis/Ovid, which references longer post about a proposal to integrate a better, more modern object-oriented “system” (Corinna) in Perl 5.

The proposal itself is not what I’d like to address here. I haven’t followed Corinna’s evolution. I believe it goes in a positive direction for the language, FWIW.

From that original tweet, a comment from Rafael followed:

[…] but I’m still wondering what are the real factors that make companies seek an exit strategy from Perl 5. Who makes this kind of expensive decision, and why? I suspect obscure OO syntax is not a major one.

This is what I replied with:

This is indicative of the fundamental problem in the Perl echo chamber. Some people still have no idea why companies are moving away from Perl. If you want to hear the perspective from someone who has seen this happen in multiple companies, let me know :-)

Sorry for this premise, but I was afraid what follows would make no sense otherwise.

Why is Perl dying today?

First of all, I don’t think “<language> is dying” is a useful question to ask, nor it is indicative of anything particularly interesting. I’m sure everyone reading this will have encountered plenty of “C is dying”, “Java is dying” or similar, and yet, C and Java are still being used everywhere. In one sense, no language really dies ever. In Perl’s situation, things are slightly different though, as (I believe) Python slowly conquered Perl’s space over time.

What does it mean for a language to die, or to be dead?

From an end user point of view, let’s say a random programmer employed in a company or freelance, a language could be dying if a task they want to accomplish using that language is hard because there are no supporting libraries for it (think CPAN or PyPi), or the libraries are so old they don’t work anymore. That situation surely conveys the idea that the language is not in use anymore, or very few people must be using that language. One would expect that a common task in 2021 must be easy to accomplish with a language worth using in 2021.

What about a company‘s point of view? The reality is that companies don’t have an opinion on languages, only people do. Teams do have an opinion on languages. The group dynamics inside a team influence what languages are acceptable for current and new projects.

Is Perl dying then?

My experience

Some years ago I was a fairly active member of the Perl community, I attended and presented at various Perl conferences around Europe, talking about my experience using Perl at a few small and large companies.

I remember picking up Perl for the first time based on a suggestion from my manager back then. He gave me a hard copy print-out of the whole of Perl 5.004 man pages, and said: “We are going to use this language. It’s amazing, take some time to study it and we’ll start!”. This was 1998, and I had such a fantastic time :-). I was such a noob, but Perl was amazing. It could do everything you needed and then some, and it was easy and simple. The language was fast already back then, and it got faster over time. At that point in time, I was working in a very small company, we were three people initially, and we ended up writing a complete web framework from scratch that is still in use today, after more than 20 years. If that’s not phenomenal, I don’t know what is. It’d be cool to talk about this framework: it was more advanced than anything that’s ever been done even considering it’s 2021… a story for another time.

And by the way, we were running our Perl code on *anything*, and I mean anything, Windows PCs, Linux, Netware and even AS/400, a limited subset of it at least, at a time when Java’s “write once, run everywhere” was just an empty marketing promise. Remember this was the time of Netscape Navigator and Java applets. Ramblings, I know, but perhaps useful to understand where things have gone wrong.

In 2007, I left my job in Italy and moved to Norway to work for Opera Software. Back then, Opera’s browser was still running the Presto engine, and a little department inside Opera was in charge of web services. That’s where I was headed. Most services there were written in Perl. Glorious times for me, I would learn an awful lot there, meet a lot of skilled developers. Soon after I started working there, 2007, some colleagues were already making fun of Perl. It’s a “write-only language”, “not meant for serious stuff”, “lack of web frameworks”, etc… Those were the times when Python frameworks started to emerge, some of which would eventually disappear. I remember a few colleagues strongly arguing to move to this Python framework called Pylons, and then eventually to Django.

I believe this general attitude towards Perl originated from different factors:

  • personal preference towards other languages and/or dislike towards Perl
  • the desire to be working with the latest “hip” framework or language
  • the discomfort of maintaining an aging codebase with problems

These factors exist and are legitimate reasons to want to move away from any language or framework. I’m not saying they are justified, but I do understand why people wanted that. In our field, I have seen it’s quite common to try and avoid the objective difficulties of maintaining a legacy project, going the greener way of an overly optimistic rewrite, which normally ends in tears.

Throughout the years, I noticed other contributing factors to the progressive abandonment of Perl, even in companies like Opera.
I’ll mention two that I experienced directly:

  1. Outdated or non existent supporting libraries
  2. Teams composition

There was a time a few years ago, when CPAN was awesome, the best language support system in existence and every other language community was envying it. CPAN pretty much was selling Perl by itself. In my case, the libraries on CPAN educated me and made me adopt a testing culture that no other language (in my knowledge) had before Perl. Today, seeing npm modules being installed without running tests makes me uncomfortable :-)

Then over time (years) a shift happened. You would search on CPAN for a library that would help you with a common task and you wouldn’t find anything, or you would only find quick hacks that didn’t really work properly. In my case, I remember the first example of that being OAuth2. If I had to speculate, I would say this is a product of many elements, one of which is the average age of Perl programmers getting higher.

Another related shift I remember from those years is companies publishing their APIs/SDKs started dismissing Perl, at first relying on some CPAN module to eventually appear, then completely omitting Perl support. In the beginning, we politely complained to those companies, trying to make a point, but unfortunately there was no turning back. These days almost no SDK comes with a Perl component.

The second major aspect I have experienced is related to teams. In 2012 I was tasked with writing my first ever greenfield project, entirely from scratch, a project that would turn out to be one of the things I’m most proud of, Opera Discover, an online news recommendation system for the Opera browser, still working today! A team of three veteran engineers (myself included) was assembled, and there and then, we were faced with a decision: what language should we use for this?

While I was most experienced in Perl and knew Python a little, the other two colleagues didn’t know Perl. They had experience in C++ mostly, as this was Opera after all. We were chosen not based on our programming language expertise, rather (I suppose) based on our capability to tackle such a big and complex project. While I could have proposed that the project be written in Perl, in good conscience I knew that choice was not viable. Django was readily available and could provide a wide range of functionality we actually needed. No alternative in the Perl world could come close to such a good value proposition. The fact that Python was (like Perl had been for me!) a very accessible choice, simple to pick up, easily installed on any Linux system, and with plenty of solid up-to-date libraries, made the choice obvious.

With the Discover project, I started learning Python properly as a day-to-day programming language. I remember being horrified (and making fun of) the httplib2/httplib3 situation initially. Then I learned about the requests module and forgot all about it. This is to say, Python also has its quirks of course. The disastrous Python 2 vs Python 3 decision in the Python community caused a lot of grief and uncertainty for people (Perl could have learned something from that…). Nowadays, that’s a non-argument, everything runs on Python 3 and if you still haven’t moved, you will soon.

In general, having learned Python quite well, my mindset with regards to programming and my job changed completely. I’m not a Perl programmer. I’m not a Python programmer either. I can use different tools whenever they are more suited to what I need to do. In fact, in my last four years I have written software in NodeJS and Java of all things… I used to despise and make fun of Java, but I had never worked on any professional project before. While I do maintain that Java has some horrible aspects, contrary to my expectations, I have enjoyed working with it, it has an efficient runtime, awesome threading, solid libraries and debugging/inspection tools.

While I do understand Ovid’s point about wanting to keep the business going, and enjoying Perl as a language, I have personally moved on many years ago. I still use Perl for the occasional script when it’s convenient, but for other use cases, like web APIs, I prefer Python and FastAPI, PyTorch for machine learning, etc.. so my conclusion is that it’s the libraries and the ecosystem that drive language use, and not the language itself.

A better OO system will unfortunately do nothing for Perl (in my opinion at least). Better marketing will without a doubt do nothing for Perl. As if a prettier website could change the situation and the aspects I talked about… it can’t! The situation we have in front of us in 2021 is the result of technological and social changes started at least a decade ago.

I realize this may be an incoherent post. Sorry about that, I tried to write it right away or it would have probably never come out.
If you have questions or comments, let me know and I’ll try to address them if I can.

Most importantly, I do not wish to convince anyone that what I wrote is true. It is simply my experience. If there’s one thing I wish people would take from it, it’s to move away from the thought of yourself being a “X Programmer” and broaden your horizons and set of tools available to you. It was a tremendously positive move for myself, one I wished I had done before.

Peace.

pgtop – a top clone for PostgreSQL

According to meta::cpan records, the first release of pgtop is dated April 26, 2005, which makes this little software more than 15 years old!

Back then I had just found out about the brilliant mytop by Jeremy Zawodny, and my day-to-day experience being on Postgres, IIRC version 6.5.3, I decided to try and “convert” mytop to Postgres.

Being quite naive, I thought the endeavour would be much easier than it really was. I’m glad I started though, which is why pgtop exists in the first place. It’s not the only one either. I seem to remember a few similar pgtop projects by other programmers.

After using MySQL and Percona Server for many years, due to a new job, I have gone back to Postgres, version 9.5 and 10 at this time. In recent months, I have done some work to improve performance of our database queries, and remembered writing and using pgtop years before.

Since I lost(*) the original sources, I tried the pgtop version I last uploaded to CPAN, 0.05, dated 2008. It did work, in the sense that I could run the same perl code unmodified, a great testament to Perl as language and as runtime. It didn’t work because the underlying Postgres meta tables that were used in version 6 changed their schema in the 10-12 years since :-)

I spent some time to adapt the metadata queries to work with recent Postgres versions, and was slightly amused by the quality of my 15 year old code… The best feeling about this little tool was to rediscover how useful a few dozen lines of code can be. The service provider monitoring helps, but doesn’t even come close to the level of detail pgtop can provide.

After getting pgtop to work again, I quickly added a few more useful features. I was pleased by the efficiency with which I could work on this tool, considering its age.

So far I added just what was strictly necessary to me:

  • Updated pgtop to the current decade. Now requires perl >= 5.014
  • Fixed to work with Postgres >= 9.0
  • Added a sample Dockerfile to build and run pgtop as Docker container
  • Added a --config option, to load arbitrary config files. This is useful if you want to monitor several databases at once, for example in a tmux session. The config file supports all the options that are available on the command line.
  • Implemented a query killer command, activated pressing K to kill at once all queries slower than a given threshold, in seconds. This is useful if the database is overwhelmed by a lot of slow queries. I don’t recommend using it, particularly if it involves killing UPDATE or INSERT queries, but it can be quite useful.
  • Added a --slow_threshold option, to consider queries slow if they have been running for longer than the given value (in seconds). Now the tool highlights slow queries in bold yellow, and logs all the slow queries to a pgtop.log file.
  • Added a --slack_webhook option, to automatically notify a slack channel if a query crosses the slow threshold runtime value. All the information about the slow query including the SQL will be included in the slack message.

Please let me know if you give it a try! :-)

Net::Statsd::Server, a Perl port of Flickr/Etsy’s statsd

If you’re looking for a Perl client to connect to a statsd daemon, checkout Net::Statsd on CPAN, now at version 0.08.

This post is about the server component of statsd.

Tracking metrics: up to now

The idea of statsd started in Flickr by Cal Henderson, and some code is still available, but it’s not very functional or complete.

Since reading about statsd, I found the concept brilliant. I have been using a similar technique long before hearing about statsd though. I learned it from colleagues here at Opera in 2008. They were using it to track application metrics for the Opera Link server. I thought it was great, so I also implemented it, extending it by making it very easy to add metrics and to see the output automatically in Munin. Here’s how it worked basically:

# ...
use Opera::Stats;
# ...
Opera::Stats::count("site.logins");
# ...

The project code would have typically tens or hundreds of these calls. Each call would store/increment a counter in a local or remote memcached. Then a complementary Opera::Stats::Munin module would automatically generate the output needed to implement a full Munin plugin given the metrics to be exposed.

So far, so good. Except there were a few things that didn’t work quite right:

  • Using TCP connections, maybe even to remote machines, even though it was never a problem, could be in case the memcached machines went down
  • Volume was a concern. I had to worry about tracking too many metrics. How would that affect functioning of memcached for regularly stored keys and values? Would those metrics-related keys cause evictions in the regular memcached content?
  • Even though the munin integration made it very easy to have charts, there were still some limitations: creating new charts requires some wrapper plugin with 1 or 2 lines of Perl code. Flexibility was also an issue.

Enter statsd

I have been thinking of replacing this system with statsd for a while. However, I wanted to have a more in-depth look at it before deploying it.

Turns out that statsd is a simple project, which I like, but requires nodejs. Knowing next to nothing about nodejs, I took some time to learn a few things.

I also realized I have been wanting to learn about AnyEvent for a long time.

Net::Statsd::Server

Two weeks ago, I spent a busy weekend reimplementing 95% of statsd in Perl. On Sunday night, I had a functional version of statsd written in Perl with AnyEvent.

AnyEvent stuff is surprising at times. I found especially interesting to debug the cases where your timer (AE::timer) doesn’t fire unless you actually save it to a scalar, as in:

# This won't fire!
AE::timer 10, 10, \&do_something;

# This will though.
# This behaviour is triggered by "defined wantarray"
my $t = AE::timer 10, 10, \&do_something;

Since that weekend, I have spent a few more nights tweaking Net::Statsd::Server. Yesterday I wrote a new piece of functionality (a new “File” backend) that is actually not in the original statsd.

It looks like I might need new backends as well, so I think it’s “an investment with a good ROI”, even though I did it mainly for fun and in my free time.

Performance

I wanted to make sure my statsd server implementation would be fast. I started by bringing up the nodejs statsd and firing my official benchmark script with 1 million iterations, and then comparing the results with my own statsd server.

That didn’t work out very well. Or rather, it worked out brilliantly, showing around 40K requests/s being handled by nodejs-statsd and 50K requests/s by Net::Statsd::Server. Problem is: how do you measure the performance of a UDP server? Or, for that matter, of a UDP client?

I figured out that, being UDP connection-less fire-and-forget, it doesn’t really matter how many packets/s the client fires, as long as you can generate more than your server can handle. Just as a data point, I reached around 73-75k statsd API calls per second (for the gauge API, around 55-58k for counters and timers). What really matters is how many packets reach the server.

BTW, I used another amazing piece of software called Devel::NYTProf to optimize the performance of the incoming packets code path as much as I could.

The test setup

To measure how many packets are received on the server-side, I prepared a test configuration:

{ graphitePort: 2003
, graphiteHost: "graphite.localdomain"
, host: "0.0.0.0"
, port: 8125
, backends: [ "./backends/graphite", "./backends/console" ]
, mgmt_address: "0.0.0.0"
, mgmt_port: 8126
}

The same configuration file for the Perl server becomes:

{ "graphitePort": 2003,
  "graphiteHost": "graphite.localdomain",
  "host" : "0.0.0.0",
  "port": 8125,
  "mgmt_address" : "0.0.0.0",
  "mgmt_port": 8126,
  "backends": [ "Graphite", "Console" ],
  "log" : {
    "backend" : "stdout",
    "level" : "LOG_WARN",
  }
}

Using the benchmark.pl code mentioned above, run with:

$ perl benchmark.pl 1000000

I started up first the nodejs statsd, then the Net::Statsd::Server daemon and captured their output. Both servers are configured to use their Graphite backend and flush to a valid and active graphite host. The Console backend is also active for both servers, so I could capture the output and look at the statsd.packets_received counter and directly measure how many packets are received in the server.

The benchmark utility with first argument = 1000000 generates 5 million statsd API calls, that is, 5 million UDP packets.

Of these 5 million packets, nodejs statsd was able to capture 2106768, 1596275, 1479145 and 1490640 packets over several runs.

Net::Statsd::Server, again in 3 different runs, was able to capture 2106242, 1884810, 1822042 and 1866500 packets.

I have performed more tests, and they had a very low deviation from the last runs (1.5M for etsy’s statsd and 1.8M for Net::Statsd::Server). Removing the 2 peak results of ~2.1Mb, it would seem that the Perl statsd is capable of receiving 22% more packets than the original statsd daemon written in javascript.

Of course, this is just my test. I have tried to run the test on different hardware, but I haven’t got significantly different results. If you try yourself, please let me know what numbers you get. I’d be curious to know :-)

SO_RCVBUF

Given the massive amount of UDP packets that were lost in the tests (50%+ in the best runs), I tried to figure out a way to improve this and I stumbled on SO_RCVBUF.

My understanding was that bumping up SO_RCVBUF on the listening UDP socket would dramatically decrease packet loss. However, I hadn’t been able to prove the theory because I hadn’t seen an improvement in the total number of packets received. At least until I read this article on UDP packet loss on stackoverflow.com, that pointed me to the net.core.rmem_max sysctl.

After modifying net.core.rmem_max, setting it to 100M, just to avoid its effect, and using the following code in Net::Statsd::Server:

# Bump up SO_RCVBUF on UDP socket, to buffer up incoming
# UDP packets, to avoid massive packet loss when load is very high.
setsockopt($self->{server}->fh, SOL_SOCKET, SO_RCVBUF, 1*1024*1024)
or die "Couldn't set SO_RCVBUF: $!";

I can see some very interesting effect.

Re-running the node.js statsd, I could see an increased amount of captured packets (1691700, 1675902, ~10% increase).
Running again the Net::Statsd::Server daemon, I recorded 2678507 and 2477246 packets, for an impressive ~40% increase!

As a last effort, I tried varying the SO_RCVBUF size from 1 to 64Mb to see what effect it had on the amount of captured packets (or UDP packet loss if you prefer).

I haven’t run any scientific set of tests, but I can’t see any statistically significant increase for values greater than 4-8Mb, so I haven’t decided where to set the default in Net::Statsd::Server yet. Any chosen value is likely to need specific sysctl tuning anyway, so YMMV.

Why?

Did I really do it for fun? Yes, mainly, but also because:

  • I don’t like adding node.js to our production stack just to run statsd. I have never operated a node.js server, so I don’t want to take this “risk”. The product we’re building is going live soon! :-) And note that this does apply to anything, it’s not about node.js per se :-)
  • to learn how statsd was put together
  • to learn AnyEvent
  • to learn how to build a high performance UDP server
  • Basically, to learn :-)

Code is up on CPAN, as usual: https://metacpan.org/module/Net::Statsd::Server.

If you happen to use it, please give me some feedback!

Using Perl and Google Chromium’s CLD to identify the language of a text

For a new project I'm working on, given a body of text, I need to identify which language it's written in (English, Russian, Chinese, etc…).

I'm not exactly the first person on Earth to do this, so it turns out there's Google's CLD library. Surprisingly, several people around here didn't know it. The library is open source and very good too, so I immediately looked for Perl bindings for it.

There is a great Perl module on CPAN called Lingua::Identify::CLD. This module bundles a copy of the CLD library, and fully automates build and link steps too. So I gave it a shot.

How to use Lingua::Identify::CLD

It's amazingly easy to use. Here's a sample of the code:


#!/usr/bin/perl

use strict;
use Lingua::Identify::CLD ();

my $text;
while (readline) { $text .= $_ }
chomp $text;

# In my case, the content is HTML
my $cld = Lingua::Identify::CLD->new(isPlainText => 0);

# Example: (ENGLISH, en, 64)
my @lang = $cld->identify($text);
say "Language: $lang[0]";

Failing tests

I decided to start using this module into my project. The build phase went fine (perl ./Build), while the tests were failing (./Build test). Here's the log of a failed test run:


$ ./Build test
cc -I/usr/lib/perl/5.14/CORE -fPIC -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o /tmp/gAc_glZta2/library.o /tmp/gAc_glZta2/library.c
cc -I/usr/lib/perl/5.14/CORE -fPIC -c -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -o /tmp/gAc_glZta2/test.o /tmp/gAc_glZta2/test.c
cc -shared -L/usr/local/lib -fstack-protector -o /tmp/gAc_glZta2/libfoo.so /tmp/gAc_glZta2/library.o
cc -fstack-protector -L/usr/local/lib -o /tmp/gAc_glZta2/foo /tmp/gAc_glZta2/test.o -L/tmp/gAc_glZta2 -lfoo

** Preparing XS code
t/00-load.t ....... 1/1 Bailout called.  Further testing stopped:  

#   Failed test 'use Lingua::Identify::CLD;'
#   at t/00-load.t line 6.
#     Tried to use 'Lingua::Identify::CLD'.
#     Error:  Not a CODE reference at /usr/lib/perl/5.14/DynaLoader.pm line 207.
# END failed--call queue aborted at .../Lingua-Identify-CLD-0.05/blib/lib/Lingua/Identify/CLD.pm line 207.
# BEGIN failed--compilation aborted at .../Lingua-Identify-CLD-0.05/blib/lib/Lingua/Identify/CLD.pm line 24.
# Compilation failed in require at (eval 4) line 2.
# BEGIN failed--compilation aborted at (eval 4) line 2.
Use of uninitialized value $Lingua::Identify::CLD::VERSION in concatenation (.) or string at t/00-load.t line 9.
# Testing Lingua::Identify::CLD , Perl 5.014002, /usr/bin/perl
# Looks like you failed 1 test of 1.
FAILED--Further testing stopped.

Just the day before I had successfully compiled and run the tests for the same version of the module, but on Ubuntu 11.10, which I was using. Then I decided to upgrade to 12.10, and that's where I got this failed test run.

Contacting the author

Then I decided to contact the author of the module. Being Alberto quite a known author, with lots of CPAN contributions, I hoped he would answer my query within 2-3 days. That would give me some time to do other stuff, and hopefully would give him time to analyze the failure.

As usual with the best CPAN authors ;-) he answered in a couple of hours, which was fantastic for me. He had already identified a few failures like mine thanks to another awesome resource we have in the Perl community, the CPAN Testers service.

CPAN Testers

CPAN testers is a group of users that regularly (or not) report back the build/test status of everything that's released to CPAN in a multitude of platforms and versions of Perl. I think this is one of the most underestimated awesome features we have in the Perl community. The CPAN testers status of Lingua::Identify::CLD shows one report that looks exactly the same as the failure I experienced. This is on Ubuntu 12.10 with the stock perl 5.14.2.

The ugly patch

I tried to analyze the problem, apparently located in DynaLoader, and came up with a shotgun-debugging-driven patch that I copy/paste here for reference:

@@ -18,10 +18,23 @@ Version 0.05

 our $VERSION = '0.05';
 
-use XSLoader;
-BEGIN {
+eval {
+
+    require XSLoader;
     XSLoader::load('Lingua::Identify::CLD', $VERSION);
-}
+
+} or do {
+
+    # This warning triggers on Ubuntu 12.10 with the
+    # stock perl 5.14.2. Strangely enough, this doesn't
+    # seem to affect the tests at all.
+    #
+    # Not a CODE reference at /usr/lib/perl/5.14/DynaLoader.pm line 207.
+    # END failed--call queue aborted at .../blib/lib/Lingua/Identify/CLD.pm line 207.
+    # ) at .../blib/lib/Lingua/Identify/CLD.pm line 28."
+    #
+    #warn "Something's wrong with XSLoader? ($@)";
+};
 
 =head1 SYNOPSIS

It's shotgun debugging because I don't really know what's going on, I just came up with this patch because of the assumptions and information I gathered during the years on how DynaLoader/XSLoader and BEGIN {} blocks work or interact with the rest of the code :-)

Anyway, it makes the tests pass again, even with a weird warning. I agree with Alberto that it's not wise to incorporate this patch into Lingua::Identify::CLD, until we have understood why the original code fails, and why just for 2 people in the world.

All this blah-blah, to say: please do help! If you have seen the same problem, help us figure out what it is. My repository with the forked/patched code is on Github:

https://github.com/cosimo/Lingua-Identify-CLD

Have fun!

Displaying realtime memcached traffic on a backend

Sometimes I like to write down posts like this, to remind myself how to do something, sort of a mental note.
Suppose you have a few application servers that use 1+ memcached servers, and you want some way to display the outbound traffic, providing some insights on what are the most used keys, counters, etc…

Here's a quick way to do that, assuming you're using the memcached text protocol:

tcpflow -ce dst port 11211 
    | cut -b53- 
    | grep ^get 
    | pipestat --clear --runtime 60 --field 2 --time 1 --limit 40

What this does is:

  • Use tcpflow to capture all outbound traffic to destination port 11211, default memcached port.
  • Remove the first 53 bytes from each line, to filter out source and destination ip/ports
  • Only display get requests (alternatively, use set, incr, …)
  • Feed the resulting data to pipestat, a simple but great Perl tool that aggregates the data, displaying the most frequent ones. The specific options I used are good if you want to display quick statistics like other tools as top, mytop, or varnishstat.

It goes without saying that these tools are automatically installed on all servers that our Devops team here at Opera manages. I couldn't work without them :)

How to find unused CSS selectors, a quick solution

Was talking to a colleague today, and he mentioned the problem he was working on: trying to find site-wide unused CSS selectors. That is, having a static CSS file on disk, try to go through all the selectors in there and see if there's some matching elements in an entire site, crawling it page by page.

I thought it was a really interesting problem, so I gave it a quick shot by glueing together CSS::Tiny, Mojo::UserAgent and Mojo::DOM::CSS.

This is what came out of it. I'd say a decent first quick solution:

So I also learned about this deadweight project, that apparently also can crawl a site by logging in, kind of WWW::Mechanize style. Would be interesting to improve this initial solution :-)

Dist::Zilla, Y U suddenly no work anymore? [FIXED!]

I'm trying to understand why Dist::Zilla doesn't work anymore on my laptop. Here's the epic wall of warnings I get when running dzil test:


$ dzil test
Could not create the 'reader' method for zilla because : The method '_inline_store' was not found in the inheritance hierarchy for Moose::Meta::Class::__ANON__::SERIAL::9 at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 1053
	Class::MOP::Class::__ANON_Moose::Meta::Class=HASH(0x3556088) called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 1098
	Class::MOP::Class::add_around_method_modifier('Moose::Meta::Class=HASH(0x3556088)', '_inline_store', 'CODE(0x351cea8)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application/ToClass.pm line 231
	Moose::Meta::Role::Application::ToClass::apply_method_modifiers('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'around', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application.pm line 78
	Moose::Meta::Role::Application::apply_around_method_modifiers('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application.pm line 64
	Moose::Meta::Role::Application::apply('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role/Application/ToClass.pm line 36
	Moose::Meta::Role::Application::ToClass::apply('Moose::Meta::Role::Application::ToClass=HASH(0x3556b40)', 'Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)', 'HASH(0x354ce50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Role.pm line 470
	Moose::Meta::Role::apply('Moose::Meta::Role=HASH(0x351dc28)', 'Moose::Meta::Class=HASH(0x3556088)') called at /usr/local/lib/perl/5.10.1/Moose/Util.pm line 160
	Moose::Util::_apply_all_roles('Moose::Meta::Class=HASH(0x3556088)', undef, 'MooseX::SetOnce::Accessor') called at /usr/local/lib/perl/5.10.1/Moose/Util.pm line 99
	Moose::Util::apply_all_roles('Moose::Meta::Class=HASH(0x3556088)', 'MooseX::SetOnce::Accessor') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Class.pm line 104
	Moose::Meta::Class::create('Moose::Meta::Class', 'Moose::Meta::Class::__ANON__::SERIAL::9', 'roles', 'ARRAY(0x33e50d8)', 'weaken', '', 'superclasses', 'ARRAY(0x353a7e8)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Package.pm line 120
	Class::MOP::Package::create_anon('Moose::Meta::Class', 'superclasses', 'ARRAY(0x353a7e8)', 'roles', 'ARRAY(0x33e50d8)', 'cache', 1) called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 474
	Class::MOP::Class::create_anon_class('Moose::Meta::Class', 'superclasses', 'ARRAY(0x353a7e8)', 'roles', 'ARRAY(0x33e50d8)', 'cache', 1) called at /usr/share/perl5/MooseX/SetOnce.pm line 27
	Class::MOP::Class:::around('CODE(0x1c87bf0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Method/Wrapped.pm line 162
	Class::MOP::Method::Wrapped::__ANON_Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50) called at /usr/local/lib/perl/5.10.1/Class/MOP/Method/Wrapped.pm line 91
	Moose::Meta::Class::__ANON__::SERIAL::8::accessor_metaclass('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 389
	Class::MOP::Attribute::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x3543bb8)', 'Try::Tiny::Catch=REF(0x354c718)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 401
	Class::MOP::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1074
	Moose::Meta::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 428
	Class::MOP::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1013
	Moose::Meta::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 891
	Class::MOP::Class::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x354c5b0)', 'Try::Tiny::Catch=REF(0x3435780)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 896
	Class::MOP::Class::_post_add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Mixin/HasAttributes.pm line 44
	Class::MOP::Mixin::HasAttributes::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Class.pm line 570
	Moose::Meta::Class::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'writer', 'set_zilla', 'lazy_required', 1, 'isa', ...) called at /usr/local/lib/perl/5.10.1/Moose.pm line 79
	Moose::has('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', ...) called at /usr/local/lib/perl/5.10.1/Moose/Exporter.pm line 382
	Moose::has('zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', 'set_zilla', ...) called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 22
	require Dist/Zilla/MVP/RootSection.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13
	Dist::Zilla::MVP::Assembler::Zilla::BEGIN() called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	eval {...} called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	require Dist/Zilla/MVP/Assembler/Zilla.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 204
	Dist::Zilla::Dist::Builder::_load_config('Dist::Zilla::Dist::Builder', 'HASH(0x342fe00)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 27
	Dist::Zilla::Dist::Builder::from_config('Dist::Zilla::Dist::Builder', 'HASH(0x33e2608)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 112
	Dist::Zilla::App::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x3084e60)', 'Try::Tiny::Catch=REF(0x33a8848)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 120
	Dist::Zilla::App::zilla('Dist::Zilla::App=HASH(0x204eb48)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command.pm line 13
	Dist::Zilla::App::Command::zilla('Dist::Zilla::App::Command::test=HASH(0x280b910)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command/test.pm line 28
	Dist::Zilla::App::Command::test::execute('Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)', 'ARRAY(0x13bef10)') called at /usr/share/perl5/App/Cmd.pm line 220
	App::Cmd::execute_command('Dist::Zilla::App=HASH(0x204eb48)', 'Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)') called at /usr/share/perl5/App/Cmd.pm line 159
	App::Cmd::run('Dist::Zilla::App') called at /usr/bin/dzil line 11
 at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 400
	Class::MOP::Attribute::__ANON_The method '_inline_store' was not found in the inheritance... called at /usr/share/perl5/Try/Tiny.pm line 100
	Try::Tiny::try('CODE(0x3543bb8)', 'Try::Tiny::Catch=REF(0x354c718)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 401
	Class::MOP::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1074
	Moose::Meta::Attribute::_process_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)', 'reader', 'zilla', undef) called at /usr/local/lib/perl/5.10.1/Class/MOP/Attribute.pm line 428
	Class::MOP::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Attribute.pm line 1013
	Moose::Meta::Attribute::install_accessors('Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 891
	Class::MOP::Class::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x354c5b0)', 'Try::Tiny::Catch=REF(0x3435780)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Class.pm line 896
	Class::MOP::Class::_post_add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Class/MOP/Mixin/HasAttributes.pm line 44
	Class::MOP::Mixin::HasAttributes::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'Moose::Meta::Class::__ANON__::SERIAL::8=HASH(0x3556a50)') called at /usr/local/lib/perl/5.10.1/Moose/Meta/Class.pm line 570
	Moose::Meta::Class::add_attribute('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'writer', 'set_zilla', 'lazy_required', 1, 'isa', ...) called at /usr/local/lib/perl/5.10.1/Moose.pm line 79
	Moose::has('Moose::Meta::Class=HASH(0x35122a0)', 'zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', ...) called at /usr/local/lib/perl/5.10.1/Moose/Exporter.pm line 382
	Moose::has('zilla', 'is', 'ro', 'isa', 'Moose::Meta::TypeConstraint::Class=HASH(0x3092830)', 'traits', 'ARRAY(0x350d590)', 'writer', 'set_zilla', ...) called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 22
	require Dist/Zilla/MVP/RootSection.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13
	Dist::Zilla::MVP::Assembler::Zilla::BEGIN() called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	eval {...} called at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/RootSection.pm line 0
	require Dist/Zilla/MVP/Assembler/Zilla.pm called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 204
	Dist::Zilla::Dist::Builder::_load_config('Dist::Zilla::Dist::Builder', 'HASH(0x342fe00)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 27
	Dist::Zilla::Dist::Builder::from_config('Dist::Zilla::Dist::Builder', 'HASH(0x33e2608)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 112
	Dist::Zilla::App::__ANON__() called at /usr/share/perl5/Try/Tiny.pm line 76
	eval {...} called at /usr/share/perl5/Try/Tiny.pm line 67
	Try::Tiny::try('CODE(0x3084e60)', 'Try::Tiny::Catch=REF(0x33a8848)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App.pm line 120
	Dist::Zilla::App::zilla('Dist::Zilla::App=HASH(0x204eb48)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command.pm line 13
	Dist::Zilla::App::Command::zilla('Dist::Zilla::App::Command::test=HASH(0x280b910)') called at /usr/local/share/perl/5.10.1/Dist/Zilla/App/Command/test.pm line 28
	Dist::Zilla::App::Command::test::execute('Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)', 'ARRAY(0x13bef10)') called at /usr/share/perl5/App/Cmd.pm line 220
	App::Cmd::execute_command('Dist::Zilla::App=HASH(0x204eb48)', 'Dist::Zilla::App::Command::test=HASH(0x280b910)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x291d7c0)') called at /usr/share/perl5/App/Cmd.pm line 159
	App::Cmd::run('Dist::Zilla::App') called at /usr/bin/dzil line 11
Compilation failed in require at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13.
BEGIN failed--compilation aborted at /usr/local/share/perl/5.10.1/Dist/Zilla/MVP/Assembler/Zilla.pm line 13.
Compilation failed in require at /usr/local/share/perl/5.10.1/Dist/Zilla/Dist/Builder.pm line 204.

Due to chronic lack of time, I blindly tried to upgrade Moose, MooseX::Types, Dist::Zilla, Config::MVP, but no luck.

Before I start dealing with this madness… any idea?

EDIT: thanks to the comments, I found out about moose-outdated, a script that reports the Moose(X) modules that have newer versions up on CPAN. Running moose-outdated I got back the following list:

$ moose-outdated
MooseX::LazyRequire
MooseX::Role::Parameterized
MooseX::SetOnce

Then I just run:

$ cpanm MooseX::LazyRequire MooseX::Role::Parameterized MooseX::SetOnce

After doing this, dzil started working again. Thanks everyone for your comments and help!

Verifying MySQL behaviour with automated test suites and mytap

You know everything about how MySQL treats UTF8 and LATIN1 charsets and how the collation table impacts on selection and insertion of data, right?

Great, then stop reading :)

I don't and since I'm in the process of setting up a new version of the Opera accounts database, I really don't want to screw up things. I tried to fully understand how MySQL works in this respect (charsets, collations, etc…) but reading documentation and memorizing it wasn't very easy. Plus, there's a thousands blog posts on the matter, not always 100% accurate.

So I thought I'd better get hands on and I wrote a kind of database test suite.

Now this test suite is hooked up to the main project builds on Jenkins. Here's a sample output:


[...]
[workspace] $ /bin/sh -xe /tmp/hudson3255767718598715423.sh
+ ./bin/run-dbtest-suite
basedir=/var/lib/jenkins/jobs/auth-db/workspace
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/__initdb__.my ........................................... 
1..2
ok 1 - Using utf8tests database
ok 2 - Server charset is latin1
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_bin.my ................................... 
1..6
ok 1 - All our records are there. No duplicate key error.
ok 2 - utf8_bin collation does not collate a/â/à/A/...
ok 3 - utf8_bin collation does not collate a/â/à/A/...
ok 4 - utf8_bin collation does not collate a/â/à/A/...
ok 5 - Query for mixed-case username does not return lowercase username
ok 6 - Query for upper-case username does not return lowercase username
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_general_ci.my ............................ 
1..7
ok 1 - Collation for t007 is utf8_general_ci
ok 2 - utf8_general_ci collation normalizes accents, diacritics and the like
ok 3 - A and Å are collated to the same character in the utf8_general_ci table
ok 4 - å and Å are collated to the same character in the utf8_general_ci table
ok 5 - lower/upper case chars are collated in the utf8_general_ci table
ok 6 - lower/upper case chars are collated in the utf8_general_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_unicode_ci.my ............................ 
1..7
ok 1 - Collation for t005 is utf8_unicode_ci
ok 2 - utf8_unicode_ci collation normalizes accents, diacritics and the like
ok 3 - A and Å are collated to the same character in the utf8_unicode_ci table
ok 4 - å and Å are collated to the same character in the utf8_unicode_ci table
ok 5 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 6 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/default-table-charset.my ................................ 
1..3
ok 1 - Default character set is utf8 when no charset is specified (from server)
ok 2 - Default character set is utf8 when "CHARSET utf8" specified in the CREATE TABLE
ok 3 - Default character set is utf8 when "CHARSET utf8" and "COLLATE" specified in the CREATE TABLE
ok
...
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/username-with-utf8-chars.my ............................. 
1..5
ok 1 - We have some UTF-8 encoded string in our hands (hex)
ok 2 - We have some UTF-8 encoded string in our hands (charset)
ok 3 - Can select back UTF-8 content from a CHARSET utf8 table
ok 4 - Given string is exactly 24 bytes long (length)
ok 5 - Given string is exactly 8 (wide) characters long (char_length)
ok
All tests successful.
Files=11, Tests=80, 0.739731 wallclock secs ( 0.05 usr  0.02 sys +  0.07 cusr  0.01 csys =  0.15 CPU)
Result: PASS
Recording test results
Finished: SUCCESS

And here's an example of "sanity check" test case, which doesn't do much:


   1 -- Check that we can insert and retrieve UTF-8 content correctly
   2 
   3 BEGIN;
   4 
   5 SET NAMES utf8;
   6 
   7 SELECT tap.plan(5);
   8 
   9 USE auth_utf8tests;
  10 
  11 SET @username = '今日话题今日话题';
  12 SET @encoded  = 'C1BB8AE697A5E8AF9DE9A298E4BB8AE697A5E8AF9DE9A298';
  13 
  14 SELECT tap.eq(
  15     HEX(@username),
  16     @encoded,
  17     'We have some UTF-8 encoded string in our hands (hex)'
  18 );
  19 
  20 SELECT tap.eq(
  21     CHARSET(@username),
  22     'utf8',
  23     'We have some UTF-8 encoded string in our hands (charset)'
  24 );
  25 
  26 INSERT INTO t001 (f1) VALUES (@username);
  27 
  28 SELECT tap.eq(
  29     (SELECT HEX(f1) FROM t001 WHERE f1 = @username),
  30     @encoded,
  31     'Can select back UTF-8 content from a CHARSET utf8 table'
  32 );
  33 
  34 SELECT tap.eq(
  35     (SELECT LENGTH(f1) FROM t001 WHERE f1 = @username),
  36     24,
  37     'Given string is exactly 24 bytes long (length)'
  38 );
  39 
  40 SELECT tap.eq(
  41     (SELECT CHAR_LENGTH(f1) FROM t001 WHERE f1 = @username),
  42     8,
  43     'Given string is exactly 8 (wide) characters long (char_length)'
  44 );
  45 
  46 -- Finish the tests and clean up.
  47 CALL tap.finish();
  48 ROLLBACK;

This SQL test code uses mytap. You can see how the SELECT tap.* calls are just the equivalents of the TAP testing framework of Perl. SELECT tap.eq() is the equivalent of Test::More::is(), and so on.

Another, more interesting test case, is the following:


   1 --
   2 -- Verify how the utf8_unicode_ci collation works
   3 --
   4 
   5 BEGIN;
   6 
   7 SET NAMES utf8;
   8 
   9 SELECT tap.plan(12);
  10 
  11 USE auth_utf8tests;
  12 

  [...]

  40 SELECT tap.eq(
  41     (SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA=SCHEMA() AND TABLE_NAME='t015'),
  42     'utf8_unicode_ci',
  43     'Collation for t015 is utf8_unicode_ci'
  44 );

  [...]

  48 
  49 SELECT tap.eq(
  50     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1a' ORDER BY id),
  51     '10',
  52     'utf8_unicode_ci collation normalizes accents, diacritics and the like'
  53 );
  54 
  55 SELECT tap.eq(
  56     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1Å' ORDER BY id),
  57     '10',
  58     'A and Å are collated to the same character in the utf8_unicode_ci table'
  59 );
  60 
  61 SELECT tap.eq(
  62     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1å' ORDER BY id),
  63     '10',
  64     'å and Å are collated to the same character in the utf8_unicode_ci table'
  65 );
  66 
  67 SELECT tap.eq(
  68     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TestUser1A' ORDER BY id),
  69     '10',
  70     'lower/upper case chars are collated in the utf8_unicode_ci table'
  71 );
  72 
  73 SELECT tap.eq(
  74     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TESTUSER1A' ORDER BY id),
  75     '10',
  76     'lower/upper case chars are collated in the utf8_unicode_ci table'
  77 );
  78 
  79 SELECT tap.eq(
  80     (SELECT COUNT(*) FROM t015),
  81     1,
  82     'We are allowed to insert only 1 record, because the others collate to the same string'
  83 );
  84 
  85 -- Finish the tests and clean up.
  86 CALL tap.finish();
  87 ROLLBACK;

An interesting thing that I didn't know how to do in the beginning is how to trap errors. I left out that part from the test code to simplify, but here it is:


  13 DELIMITER //
  14 
  15 DROP PROCEDURE IF EXISTS populate_table //
  16 
  17 CREATE PROCEDURE populate_table ()
  18 BEGIN
  19 
  20     DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' BEGIN
  21         SELECT tap.ok(
  22             1,
  23             'We should get dupkey errors when inserting data with collation utf8_unicode_ci'
  24         );
  25     END;
  26 
  27     INSERT INTO t015 (id,username,note) VALUES (10, 'testuser1a', 'plain');
  28     INSERT INTO t015 (id,username,note) VALUES (20, 'testuser1â', 'circumflex a');
  29     INSERT INTO t015 (id,username,note) VALUES (30, 'testuser1à', 'a grave');
  30     INSERT INTO t015 (id,username,note) VALUES (40, 'testuser1Å', 'A circ');
  31     INSERT INTO t015 (id,username,note) VALUES (50, 'TestUser1A', 'mixed case');
  32     INSERT INTO t015 (id,username,note) VALUES (60, 'TESTUSER1A', 'upper case');
  33 
  34 END;
  35 
  36 //
  37 
  38 DELIMITER ;
  39 
  46 /* Should generate 5 dupkey errors (taken as successful tests) */
  47 CALL populate_table;

It's a bit convoluted. To trap errors you have use the DECLARE HANDLER statement. DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' means that whenever SQLSTATE is '23000', and that corresponds to a duplicate key error, then execute this block of code. All of that must necessarily be wrapped into a stored procedure. Handlers outside of stored procedures are not allowed.

In this particular tests, the table uses the utf8_unicode_ci collation table, so we are expecting a duplicate key error on username whenever we insert the string 'testuser1à' or 'TESTUSER1A', because 'testuser1a' was already inserted at the beginning. Of all the INSERT statements, only the first one is bound to succeed, so I put a SELECT tap.ok(1) for the duplicate key HANDLER and I expect 5 tests when I make the CALL populate_table;.

This of course may seem trivial. And I guess it is, but for me it's a much better way of learning than scouring through the manuals or the many blog posts out there that may or may not reflect the environment I'm working with.

Routinely running this kind of test suite makes it possible and easy to verify the database behaviour:

  • instantly
  • after upgrades (5.1 -> 5.5? -> 6?) or storage engine changes
  • after mysql configuration changes. For example, I discovered in this way that adding default-charset=utf8 in my MySQL config breaks everything.

I consider this my live documentation on how MySQL works. I would really appreciate if you have any feedback on this. Have fun!

Find uses of perl 5.10 features in your code: a bit of PPI magic

This morning on IRC we were talking about old perl installations, and how forgetting the use 5.010 but using 5.10 features, for example the // operator, can be a problem.

I suggested maybe using a git hook would be an idea, so I assembled this proof of concept script to test for 5.10-isms but without a use 5.010; statement. It's too long to include here, so I put in on Github (https://gist.github.com/1875528).

The script uses the "impossible" PPI module to parse the Perl code and extract information about used operators using PPI::Token::Operator, to catch // or ~~ or similar, and use statements, with PPI::Statement::Include.

It's meant to be run as part of a post-commit hook or similar. I had searched for similar modules in the Perl::Critic space but found nothing of sorts. Maybe I just didn't look too well?

Anyway, https://gist.github.com/1875528. Enjoy.

Perl client for Etsy statsd, improved and released v0.02 on CPAN

Sometimes bugs reported on the CPAN issue tracker are the perfect excuse to improve your code. In this case, my client module for Etsy statsd service, Net::Statsd, got an update because of this ticket, RT#74172.

As with all my recent CPAN module, when a new bug is filed against it, I try to create a specific test case. Sometimes it's quite hard to do, but this time wasn't, even though I had to refactor the existing code. This allowed me to improve the testability of the code in the process, so thanks to the reporter of that ticket :)

I still haven't managed to test my own code with the statsd service, and hook it up to Graphite. Soon :-)

As usual, code is up on Github:

https://github.com/cosimo/perl5-net-statsd

And on CPAN too:

https://metacpan.org/module/Net::Statsd.
Have fun!