Tag Archives: perl

How to convert Opera contacts file to Mutt aliases format

Recently I've been looking more and more into mutt, the email client. I've been a very happy M2 (Opera built-in email client) user for almost 3 years now. But still I felt I was missing something if I didn't try out mutt. I've been a pine user as well, many many years ago :) So, decided to give it a go, I started about a month ago.

I struggled a bit while getting a reasonable .muttrc file together. Fortunately, there's plenty of examples out there. After getting a working config, the problem was to get back my contacts list.

Mutt has a simple address book integration (through abook) and stores the contacts into an alias file, typically ~/.mutt/aliases. Now, Opera can of course export all your mail contacts to an .adr file, a simple "addressbook" text file. Did that, and I needed to convert it to mutt's aliases format.

Ten minutes later, a Perl script to do just that was ready. Here it is:


#!/usr/bin/env perl
#
# Convert Opera contacts file (.adr) into
# mutt aliases file format.
#
# Usage:
#   perl opera-adr-to-mutt-aliases.pl < ~/.opera/contacts.adr >> ~/.mutt/aliases
#
# Cosimo, 31/Jan/2011
#

use strict;
use warnings;
use utf8;

sub harvest ($) {
    my ($contact_info) = @_;

    my ($id)    = $contact_info =~ m{^ s+ ID   = (.*) $}mx;
    my ($name)  = $contact_info =~ m{^ s+ NAME = (.*) $}mx;
    my ($email) = $contact_info =~ m{^ s+ MAIL = (.*) $}mx;

    return if ! $id and ! $email;

    return {
        ID    => $id,
        NAME  => $name,
        MAIL => $email,
    };

}

my $adr_file_contents = q{};
$adr_file_contents .= $_ while <STDIN>;

my @contacts = split m{#CONTACT}, $adr_file_contents;

for (@contacts) {
    my $contact = harvest($_) or next;
    my ($first_word) = $contact->{MAIL} =~ m{ (S+) @ }x;
    printf "alias %s %s <%s>n",
        lc($first_word), $contact->{NAME}, $contact->{MAIL};
}

Download link: https://gist.github.com/803454

Ubuntu 10.10, modperl and Apache segfaulting fixed

Last month, before moving to Melbourne, where I am now, to work in the Opera Australia office for a few months, I had to setup a laptop for all the development work I normally do. So I chose Ubuntu 10.10 amd64. I have to say I'm quite happy with it. Everything works out of the box for me, including a Quickcam 9000 USB camera I used to shoot this poor time-lapse video from my new office window. Woot!

Anyway, the development environment for one particular project consists of Apache and mod_perl. So I setup the usual list of dependencies, but when I tried to start Apache to run the test suite, it would always stop right away with a segmentation fault.

Didn't really dig into the problem. Just straced the apache process, and that's what I got:

[apache starts up, reads a bunch of Perl modules, and opens the access  
log...]
...
brk(0x7f342adbd000)                     = 0x7f342adbd000
...
...
brk(0x7f342adde000)                     = 0x7f342adde000
brk(0x7f342adff000)                     = 0x7f342adff000
brk(0x7f342ae20000)                     = 0x7f342ae20000
brk(0x7f342ae41000)                     = 0x7f342ae41000
stat("/usr/lib/perl5/auto/DBI/DESTROY.al", 0x7f341e8459b0) = -1 ENOENT (No  
such file or directory)
stat("/home/cosimo/src/auth-svn/lib/auto/DBI/DESTROY.al", 0x7fffc4e2d520)  
= -1 ENOENT (No such file or directory)
stat("/home/cosimo/src/myopera-trunk/lib/auto/DBI/DESTROY.al",  
0x7fffc4e2d520) = -1 ENOENT (No such file or directory)
stat("/etc/perl/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT (No such  
file or directory)
stat("/usr/local/lib/perl/5.10.1/auto/DBI/DESTROY.al", 0x7fffc4e2d520) =  
-1 ENOENT (No such file or directory)
stat("/usr/local/share/perl/5.10.1/auto/DBI/DESTROY.al", 0x7fffc4e2d520) =  
-1 ENOENT (No such file or directory)
stat("/usr/lib/perl5/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT (No  
such file or directory)
stat("/usr/share/perl5/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT  
(No such file or directory)
stat("/usr/lib/perl/5.10/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT  
(No such file or directory)
stat("/usr/share/perl/5.10/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1  
ENOENT (No such file or directory)
stat("/usr/local/lib/site_perl/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1  
ENOENT (No such file or directory)
stat("./auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT (No such file or  
directory)
stat("/var/tmp/test_cosimo_22931/auto/DBI/DESTROY.al", 0x7fffc4e2d520) =  
-1 ENOENT (No such file or directory)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

I thought it would be wiser to ask for advice on the DBI and mod_perl mailing lists. Tim Bunce suggested to try and get a stack trace of Apache. Why didn't I think of that in the first place? A few days later, I got my stack trace:

# gdb -c ./core /usr/sbin/apache2 
... 
Reading symbols from ... 
... 
Core was generated by `/usr/sbin/apache2 -d /var/tmp/test_cosimo_9727 -k 
start -C User cosimo -C Group ...'. 

Program terminated with signal 11, Segmentation fault. 
#0 0x00007fdaedfed858 in XS_Class__XSAccessor_END () 
from /usr/lib/perl5/auto/Class/XSAccessor/XSAccessor.so 
(gdb) backtrace 
#0 0x00007fdaedfed858 in XS_Class__XSAccessor_END () 
from /usr/lib/perl5/auto/Class/XSAccessor/XSAccessor.so 
#1 0x00007fdaf83cf845 in Perl_pp_entersub () from /usr/lib/libperl.so.5.10 
#2 0x00007fdaf83752c6 in Perl_call_sv () from /usr/lib/libperl.so.5.10 
#3 0x00007fdaf86ad40b in modperl_perl_call_list () 
from /usr/lib/apache2/modules/mod_perl.so 
#4 0x00007fdaf86b5786 in modperl_perl_destruct () 
from /usr/lib/apache2/modules/mod_perl.so 
#5 0x00007fdaf86a6256 in modperl_interp_destroy () 
from /usr/lib/apache2/modules/mod_perl.so 
#6 0x00007fdaf86a6715 in modperl_tipool_destroy () 
from /usr/lib/apache2/modules/mod_perl.so 
#7 0x00007fdaf86a62b2 in modperl_interp_pool_destroy () 
from /usr/lib/apache2/modules/mod_perl.so 
#8 0x00007fdaf98fd4e3 in ?? () from /usr/lib/libapr-1.so.0 
#9 0x00007fdaf98fc3b1 in apr_pool_destroy () from /usr/lib/libapr-1.so.0 
#10 0x00007fdaf98fc27f in apr_pool_clear () from /usr/lib/libapr-1.so.0 
#11 0x00007fdafa1b960d in main (argc=11, argv=0x7fff93b50ef8) 
at /build/buildd/apache2-2.2.16/server/main.c:692

Even if you don't know anything about stack traces, this output gently points to Class::XSAccessor. Perrin Harkins on the mod_perl list suggested to update Class::XSAccessor to the latest CPAN version, since its changelog mentioned some segmentation faults fixed in 0.10.

And that did it. No more segfaults on Ubuntu 10.10. Solution: upgrade Class::XSAccessor to 0.10+. Thanks to Class::XSAccessor maintainer(s)!

Puppet external nodes classifier script in Perl

The upgrade to puppet 2.6.2 worked out fine. Coming from 0.24.5, I noticed a really welcome speed improvement. However, I had a tricky problem.

While upgrading to 2.6, I also decided to switch to external nodes classifier script. If you don't know about it, it's a nice puppet feature that I planned to use since the start. It allows you to write a small script in any language you want, to basically tell the puppetmaster, given the hostname, what you want that machine to be.

Puppet calls your script with one argument that is the hostname of the machine that is asking for its catalog of resources. In your script, you have to output something like the following:

---
classes:
  - perl
  - apache
  - ntp
  - munin
environment: production
parameters:
  puppet_server: my.puppetmaster.com

You can specify all the "classes" of that machine, so basically what you want puppet to install (or repair) on that machine. So far so good. So my classifier script looks into some preset directories for some project-specific JSON files, and then checks if any of these JSON files contain the name of the machine that puppet is asking for. Code follows:

#!/usr/bin/env perl
#
# http://docs.puppetlabs.com/guides/external_nodes.html
#

use strict;
use warnings;
use File::Basename ();
use File::Slurp ();
use JSON::XS ();
use YAML ();

our $nodes_dir = '/etc/puppet/manifests/nodes';

our %default_params = (
    puppet_server => 'my.puppetmaster.com',
);

# ...
# A few very simple subs
# omitted for brevity
# ...

# The hostname puppet asks for
my $wanted = $ARGV[0];

# The data structure found in the JSON file
my $node_info = search_into_json_files_and_find($wanted);

my $puppet_classifier_info = {
    classes => $node_info->{puppet_classes},
    environment => 'production',
    parameters => %default_params,
};

print YAML->Dump($puppet_classifier_info);

Now, I don't know if you can immediately spot the problem here, but I didn't, so I wasted a good part of an afternoon chasing a bug I didn't even know existed. The resulting YAML (puppet wants YAML) was this one:

--- YAML 
--- 
classes: 
    - geodns::production::backend 
environment: production 
name: z01-06-02 
parameters: 
    puppet_server: z01-06-02

The problem with this, is that it looks innocent and valid, and in fact is valid, but it's two YAML documents, not one. So puppet will parse the --- YAML line since that is one single complete YAML document, and ignore the rest.

And why is that happening in the first place? Because of the YAML->Dump() call I wrote, instead of the correct YAML::Dump()… Eh :) So the correct code is:

print YAML::Dump($puppet_classifier_info);

Never use YAML->Something()

Geo-IP lookup without installing Geo-IP libraries (everywhere)

We've been using GeoDNS to distribute client requests to different data centers around the world, first as a highly experimental project, then more and more as months passed.

Currently we're using it as simple global load balancer for files.myopera.com, help.opera.com, and some get.opera.com stuff.

However, there is another minor feature that we built into it, like a hacker's backdoor :) Since it's using a full DNS server, and it relies on having GeoIP libraries installed and always up-to-date, we thought it was a nice and cool idea to have a quick way to perform geo-ip lookups from the command line.

It works in a similar way as DNS black lists do. Suppose you want to look up the IP address 80.90.100.110. You reverse the IP, and lookup a special geo.opera.com zone:

cosimo@cd01:~$ host -t TXT 110.100.90.80.lookup.geo.opera.com
110.100.90.80.lookup.geo.opera.com descriptive text "ip:80.90.100.110, country:de, continent:europe"

This uses the GeoDNS backends to resolve country and continent of the given IP address, and gets back the information in a very basic string format. A simple shell or Perl script can then process that for you if you need. In fact, I made a ~/bin/geolookup Perl script that I can use like this:

cosimo@cd01:~$ geolookup 80.90.100.110 90.100.110.120 100.110.120.130
80.90.100.110 => de, europe
90.100.110.120 => fr, europe
100.110.120.130 => unknown, unknown

Nothing special, but in this way, no matter what machine I'm on, I can always quickly lookup IPs if I need to, without having to download the Country or City GeoIP databases, and keep them up-to-date. On the geodns backends, this is of course done routinely with a set of simple cronjobs.

Facter ported to Perl 6

A few days ago I wrote about my fun experiment trying to port facter to rakudo Perl 6. I said it was "almost completely functional".

Well, I was a bit optimistic, it seems :) It took me a few more nights of hacking, but in the end now it's almost completely functional :) It basically runs, I have ported a few facts from Facter's original ruby code too, like the kernel fact, or the physicalprocessorcount fact and other simple ones.

What's missing to declare the experiment successful is the implementation of confines. A confine in facter speak it's a specific restriction that applies to a fact. For example, the physicalprocessorcount fact reads some files from /proc, and that is only available on Linux. So, in this case, the confine rule for physicalprocessorcount is that the fact kernel must have "Linux" as its value. In code that becomes:

Facter.add("physicalprocessorcount", sub ($f) {
    $f.confine("kernel" => "Linux");
    $f.setcode(block => sub {
        Facter::Util::Resolution.exec('grep "physical id" /proc/cpuinfo|cut -d: -f 2|sort -u|wc -l');
    });
});

which is pretty similar to the Ruby counterpart:

Facter.add("physicalprocessorcount") do
    confine :kernel => :linux

    setcode do
        ppcount = Facter::Util::Resolution.exec('grep "physical id" /proc/cpuinfo|cut -d: -f 2|sort -u|wc -l')
    end
end

The Ruby version is still more elegant, but all in all I'm very happy with the outcome so far. It could probably be improved a lot too. Perl 6 is awesome. Get the code from http://github.com/cosimo/perl6-facter/ and feel free to ping me or comment if you want to know more.

IPv6 and Perl. What’s the status?

This is more like a request for help than the usual babbling about something. IPv4 address space appears to be almost entirely allocated. IPv6 is there, ready for us to use (more or less).

I'm trying to reserve some time to prepare and test our systems for client and server IPv6 addresses. What is your experience in particular with Perl software? I know almost nothing about it, except there's:

I tried to download and test Net::IPv6Address, but that didn't work, tests fail, and the module is also last updated in 2008… Can you enlighten me on what's the state of the art regarding Perl (but not only, ofc) and IPv6? KTHXBYE.

In the meanwhile, I'll read a bit more on the whole IPv6 matter. First task: write an anonymizer for a IPv6 address.

Porting Facter from Ruby to Perl6

I've been playing with Puppet for a while now. One of the most interesting (and simple!) components of Puppet is Facter.

Facter is a small software that reports "facts" about your computer. When you run facter, its output looks like the following:

architecture => x86_64
facterversion => 1.5.6
fqdn => cd01.localdomain.lan
hardwaremodel => x86_64
hostname => cd01
id => cosimo
interfaces => eth0,pan0
ipaddress => 10.0.0.1
...
uptime_hours => 422
uptime_seconds => 1519256
virtual => physical

It is interesting also because it's extensible with your own custom plugins. A custom plugin is just a Ruby file with usually a call to Facter.add:

Facter.add(:kernel) do
    setcode do
        require 'rbconfig'
        case Config::CONFIG['host_os']
        when /mswin|win32|dos|cygwin|mingw/i
            'windows'
        else
            Facter::Util::Resolution.exec("uname -s")
        end
    end
end

As a little experiment to become more familiar with Ruby and at the same time to enjoy writing some Perl 6, I decided to study the facter project and then port it to Perl6.

That took me a few hours over a couple of weekends, and it's almost completely functional. It is a straight port from the Ruby code, so it doesn't really use the magic powers of Perl6 yet. The idea is to use a different branch, now that I know it well under the hood, and to rewrite it from the ground up in Perl6.

I found out that Ruby code maps very closely to Perl6, apart from the yield instruction and a different model for static/instance variables. For yield, something that would be close in Perl6 is gather/take, but I'm not really sure that is the appropriate statement to use. yield is used in the fact value resolution algorithm, and that is currently the only thing that doesn't work properly in my port. Everything else is in place. Regarding static/instance variables, Ruby uses @attribute for instance variables, and @@attribute for static variables. In Perl6, instance variables are denoted with $.attribute if public, while static variables are package globals, so you can use our $attribute.

Of course, there's a lot more! If you're interested, take a look at the code on Github, the URL is http://github.com/cosimo/perl6-facter.

Adding the irc NOTICE capability to Bot::BasicBot

Bot::BasicBot is a Perl module that provides a really easy, fast and convenient way to build plugin-based IRC bots. I'm playing around with an IRC bot that should assist in continuously deploying projects.

This bot has two main functionalities:

  • keep track of continous integration builds
  • initiate and keep track of deployments

Right now the bot reads a main configuration file with data about projects, repositories, continuous integration, etc… and answers commands. This is an example:


21:58 <@cosimo> projects-list
21:58 < deployer> auth, geodns, libopera, link, myopera, sso
21:58 <@cosimo> build-status geodns
21:58 < deployer> 97ad24e success cosimo https://git.server/functests/builds/geodns/97ad24e
21:58 <@cosimo> latest-revision sso
21:58 < deployer> 5207cfe, https://git.server/?p=sso.git;a=commit;h=fe977d32e9580551dffe8139396106ba25207cfe
21:59 <@cosimo> build-status auth
21:59 < deployer> 24135 success cosimo https://test.server/functests/builds/auth-unit/24135
21:59 < deployer> 24135.2 success (manual) https://test.server/functests/builds/auth-functional/24135.2

Another functionality of the bot is to detect new builds, and automatically send updates to a given channel, stating the project, the new VCS revision, the committer and a link to the continuous integration test run. Example:


17:22 -deployer:#chan- sso, fe977d3 success cosimo https://test.server/functests/builds/opera-sso/fe977d3

In the future, I also want to command the bot to initiate deployments. Anyway, the problem was that Bot::BasicBot apparently lacked support for sending IRC notices. This caused all the bot messages to interrupt the flow of IRC conversations. Bot::BasicBot has source code also on github, so I just forked it and added support for IRC notices. I just noticed that the author already pulled in the new changes. o/

It will still take years for this to land in Debian, but still… :-)

Survival guide to UTF-8

Please, future me, and please, you cool programmer that one way or another, one day or the other, have struggled understanding UTF-8 (in Perl or not), do yourself a really big favor and read the following links:

After reading these few articles, you will be a much better human being. I promise. In the meantime, Perl programmer, remember that:

  • use utf8; is only for the source code, not for the encoding of your data. Let's say you define a scalar variable like:
    
    my $username = 'ネオ';
    

    Ok. Now, if you happen to have use utf8 or not inside your script, there will be no whatsoever difference in the actual content of that scalar variable. Exactly, no difference. Except there's one difference. The variable itself (the $username box) will be flagged as containing UTF-8 characters (if you used utf8, of course). Clear, right?

  • For the rest, open your filehandles declaring the encoding (open my $fh, '<:utf8', $file;), or explicitly use Encode::(en|de)code_utf8($data).
  • You can make sure the strings you define in your source code are UTF-8 encoded by opening and then writing to your source code file with an editor that supports UTF-8 encoding, for example vim has a :set encoding=utf8 command.
  • Also, make sure your terminal, if you're using one, is set to UTF-8 encoding otherwise you will see gibberish instead of your beloved Unicode characters. You can do that with any terminal on this planet, bar the windows cmd.exe shell… If anyone knows how to, please tell me.
  • And finally, use a font with Unicode characters in it, like Bitstream Vera Sans Mono (the default Linux font), Envy R, plain Courier, etc… or you will just see the broken-UTF8-character-of-doom. Yes, this one → :-)

There's an additional problem, and that is when you need to feed some strings to a Digest module like Digest::SHA1, to obtain back a hash. In that case, I presume the SHA1 algorithm, as MD5 and others, they don't really work on Unicode characters, or UTF8-encoded characters, they just work on bytes, or octets.

So, if you try something like:


use utf8;
use Digest::SHA1;

my $string = "ログインメールアドレス";
my $sha1 = Digest::SHA1->new();
$sha1->add($string);

print $sha1->hexdigest();

it will miserably fail (Wide character in subroutine entry at line 6) because $string is marked as containing "wide" characters, so it must be turned into octets, by doing:


use utf8;
use Encode;
use Digest::SHA1;

my $string = "ログインメールアドレス";
my $sha1 = Digest::SHA1->new();
$sha1->add( Encode::encode_utf8($string) );

print $sha1->hexdigest();

I need to remind myself all the time that:

  • Encode::encode_utf8($string) wants a string with Unicode characters and will give you a string converted to UTF-8 octets, with the UTF8 flag *turned off*. Basically bytes. You can then do anything with them, print, put in a file, calculate a hash, etc…
  • Encode::decode_utf8($octets) wants a string of (possibly UTF-8) octets, and will give you a string of Unicode characters, with the UTF8 flag *turned on*, so for example trying to lowercase (lc) a "Å" will result in a "å" character.

So, there you go! Now you are a 1st level UTF-8 wizard. Go and do your UTF-8 magic!

Epilogue: now I'm sure: in a couple of weeks I will come back to this post, and think that I still don't understand how UTF-8 works in Perl… :-)

If Text::Hunspell never worked for you, now it’s time to try it again!

If you don't know, Hunspell is the spell checker engine of OpenOffice.org, and it's also included in the Opera and Mozilla browsers.

We were trying to use it from Perl, using the old Text::Hunspell module, version 1.3, but we had problems with it. Big problems. Like segfaults and tests that wouldn't run.

A bright hacker from Italy :) was then called in to fix the problem, with the promise of a fantastic prize he hasn't seen yet… [ping?] :-)

During the process, I found out I know absolutely nothing about dictionary files and stuff, and my fixes were – I would say – definitely horrible.

But! There's a bright side, of course, and that is that the module works just fine now, at least on Debian/Ubuntu systems. Before using Text::Hunspell, you want to install the following packages:

  • hunspell
  • libhunspell-dev

The example in the POD documentation (and in the examples dir) uses the standard US english dictionary. If you don't have that, you will need to change the script slightly. But the code is tested and should work without a problem. If you try it out and you have feedback, by all means let me know. Thanks!

Source code available on GitHub at:
http://github.com/cosimo/perl5-text-hunspell/

The module, tagged as 2.00 because it's cool :), will be up on CPAN shortly at this address:
http://search.cpan.org/dist/Text-Hunspell/