Category Archives: Uncategorized

A command line tool for Debian to purge Varnish objects

I've been using varnish mostly on Debian systems. I found the reload-vcl script included in Debian to be useful.

The reload-vcl script

It's part of the standard varnish debian package. It uses the system defaults in /etc/defaults/varnish, so it knows how to correctly invoke the varnishadm utility to perform administrative commands. As the name implies, it reloads the default VCL file using the vcl.load and vcl.use commands, checking that every step succeeds properly before continuing so it's safe to use. It loads the new VCL file and labels it automatically with a unique id.

Something analogous but regarding the purge functionality could have been useful, so I looked at the source code for reload-vcl. Most of it deals with loading of /etc/defaults/varnish and various sanity checks. I reused that bit to make another script, to control cache purging.

The purge-cache script

Here's the full source code. Below there's a link to download the latest version from github.


#!/bin/sh

# purge-cache: Script to purge varnish cache. Defaults are defined in
# /etc/default/varnish.
#
# Cosimo <cosimo@cpan.org>
# Based on reload-vcl, by Stig Sandbeck Mathisen <ssm at debian dot org>

# Settings
defaults=/etc/default/varnish

# Paths
varnishadm=/usr/bin/varnishadm
date=/bin/date 
tempfile=/bin/tempfile

# Messages
# msg_no_varnishadm: varnishadm
msg_no_varnishadm="Error: Cannot execute %sn"
msg_no_management="Error: $DAEMON_OPTS must contain '-T hostname:port'n"
# msg_defaults_not_readable: defaults
msg_defaults_not_readable="Error: %s is not readablen"
# msg_defaults_not_there: defaults
msg_defaults_not_there="Error: %s does not existn"
msg_usage="Usage: $0 [-h][-q][-u <url>|-r <regex>|-a]nt-htdisplay helpnt-qtbe quietnt-utpurge by exact (relative) url (ex.: /en/products/)nt-rtpurge objects with URL matching a regex (ex.: ^/blogs/)nt-atpurge all objects from cachen"
msg_purge_failed="Error: purge command failedn"
# msg_purge_url: url
msg_purge_url="Purging objects by exact url: %sn"
# msg_purge_regex: regex
msg_purge_regex="Purging objects with URL matching regex: %sn"
msg_purge_all="Purging all cachen"
msg_purge_ok="Purge command successfuln"

# Load defaults file
if [ -f "$defaults" ]
then
    if [ -r "$defaults" ]
    then
        . "$defaults"
    else
        printf >&2 "$msg_defaults_not_readable" $defaults
        exit 1 
    fi
else
    printf >&2 "$msg_defaults_not_there" $defaults
    exit 1
fi

# parse command line arguments
while getopts hqu:r:a flag
do
    case $flag in
        h)
            printf >&2 "$msg_usage"
            exit 0
            ;; 
        u)
            purge_method=url
        url="$OPTARG"
            ;; 
        r)
            purge_method=regex
        regex="$OPTARG"
            ;; 
        a)
            purge_method=all
            ;; 
        q)
            quiet=1
            ;; 
        *)
            printf >&2 "$msg_usagen"
            exit 1
            ;; 
    esac
done

# Parse $DAEMON_OPTS (options must be kept in sync with varnishd).
# Extract the -f and the -T option, and (try to) ensure that the
# management interface is on the form hostname:address
OPTIND=1
while getopts a:b:dFf:g:h:l:n:P:p:s:T:t:u:Vw: flag $DAEMON_OPTS
do
    case $flag in
        f)
            if [ -f "$OPTARG" ]; then
                vcl_file="$OPTARG"
            fi 
            ;; 
        T)
            if [ -n "$OPTARG" -a "$OPTARG" != "${OPTARG%%:*}" ]
                then
                mgmt_interface="$OPTARG"
            fi  
            ;;  
    esac
done

# Sanity checks 
if [ ! -x "$varnishadm" ]
then
    printf >&2 "$msg_no_varnishadm" $varnishadm
    exit 1
fi

if [ -z "$mgmt_interface" ]
then
    printf >&2 "$msg_no_management"
    exit 1
fi

logfile=$($tempfile)
purge_command="vcl.list"

# Now run the purge command against the admin interface
if [[ $purge_method = "url" ]]
then
        purge_command="purge req.url == $url"
        printf >&2 "$msg_purge_url" $url | grep -v "^$" > $logfile
else
    if [[ $purge_method = "regex" ]]
    then
        purge_command="purge.url $regex"
        printf >&2 "$msg_purge_regex" $regex | grep -v "^$" > $logfile
    else
        if [[ $purge_method = "all" ]]
        then
            purge_command="purge.url ."
            printf >&2 "$msg_purge_all" | grep -v "^$" > $logfile
        fi
    fi
fi

# For some reason, using:
#
#   fi | grep -v "^$" > $logfile
#
# results in purge_command assignment being wiped out
# at the end of the block??

if [ -z "$purge_command" ]
then
    printf >&2 "$msg_usagen"
    exit 1
fi

# echo "cmd: $varnishadm -T $mgmt_interface $purge_command"

if $varnishadm -T $mgmt_interface $purge_command
then
    printf >&2 "$msg_purge_ok"
else
    printf >&2 "$msg_purge_failed"
    exitstatus=1
fi | grep -v "^$" > $logfile

# Blather
if [ -z "${quiet}" -o -n "$exitstatus" ]
then
    cat >&2 $logfile
fi

# Cleanup
rm -f $logfile  
exit $exitstatus

You can control how objects are purged from the cache with 3 options:

  • -a: purges all objects
  • -u <url>: purges an exact url
  • -r <regexp>: purges objects matching a regular expression
  • Examples

    
    # Purges all objects
    purge-cache -a
     
    # Purges all objects starting with "/products"
    purge-cache -r '^/products'
    
    # Purges objects with exact URL
    purge-cache -u '/en/homepage'
    

    Goal: no downtime

    Both reload-vcl and purge-cache can be combined together in a single script to be triggered when deploying new VCL code or new backend applications. Instead of restarting varnish, which I really don't like, and it's not very reliable either (on Debian sometimes it won't come back up), I use purge-cache -a to purge all objects and then reload-vcl to load and use the newly deployed VCL code.

    This procedure has no downtime at all. The effect of purging all objects can potentially be hard on the backends, but we're not at that point yet. Usually in the busiest applications we have, it takes around 10-20 seconds to reach 70%-75% of hit rate, so I would say that's not really a problem right now.

    Download!

    You can download the purge-cache script from github. I contacted the maintainer of the reload-vcl script. Maybe he will include purge-cache in the next release of the varnish debian package… or maybe I could package it as a Perl CPAN module.

Matching IPv6 addresses with Regexp::Common

I wish Regexp::Common had a $RE{net}{IPv6} regular expression, but it doesn't (yet).

So I tried to implement this myself, but ripped off the IPv6 matching bit from the existing Regexp::IPv6 which happens to have a working IPv6 regular expression with a reasonable test suite. Now, why Regexp::IPv6 is not part of Regexp::Common?

By the way, I'll copy/paste the full regular expression to match IPv6 addresses, just for fun:

:(?::[0-9a-fA-F]{1,4}){0,5}(?:(?::[0-9a-fA-F]{1,4}){1,2}|:(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.]
(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?
[0-9]{1,2})))|[0-9a-fA-F]{1,4}:(?:[0-9a-fA-F]{1,4}:(?:[0-9a-fA-F]{1,4}:(?:[0-9a-fA-F]{1,4}:(?:[0-9a-fA-F]{1,4}:(?:
[0-9a-fA-F]{1,4}:(?:[0-9a-fA-F]{1,4}:(?:[0-9a-fA-F]{1,4}|:)|(?::(?:[0-9a-fA-F]{1,4})?|(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?
[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4]
[0-9]|[0-1]?[0-9]{1,2}))))|:(?:(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})
[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))|[0-9a-fA-F]{1,4}(?::[0-9a-fA-F]
{1,4})?|))|(?::(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|
2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))|:[0-9a-fA-F]{1,4}(?::(?:(?:25[0-5]|2[0-4]
[0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.]
(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))|(?::[0-9a-fA-F]{1,4}){0,2})|:))|(?:(?::[0-9a-fA-F]{1,4}){0,2}(?::(?:
(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?
[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))|(?::[0-9a-fA-F]{1,4}){1,2})|:))|(?:(?::[0-9a-fA-F]{1,4}){0,3}
(?::(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|
[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))|(?::[0-9a-fA-F]{1,4}){1,2})|:))|(?:(?::[0-9a-fA-F]{1,4})
{0,4}(?::(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4]
[0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))|(?::[0-9a-fA-F]{1,4}){1,2})|:))

Of course, I will never be able to tell if it's right or wrong, but the fact is that it passes the test suite :)
However, the actual code is not like that: it generates the full regular expression from a few components. Anyway, I've pushed a ipv6 branch on my fork of Regexp::Common. I hope it will be included soon in Regexp::Common or improved it enough to be included in it, so we can finally match IPv6 addresses with:


use Regexp::Common;

my $addr = '2001:0db8:0000:0000:0000:0000:1428:57ab';
if ($addr =~ $RE{net}{IPv6}) {
    print "Yes, it is an IPv6 address";
}
else {
    print "No, it isn't";
}

“Loadsnake” AKA the Novell Netware snake screensaver clone

For those that didn't have the pleasure to see the old Novell Netware snakes screensaver, I'll say here that it was the default Netware screensaver, in console/text mode. It showed one snake for each CPU you had (99.9% of people had just 1 really). The cool thing is that the snakes became longer and longer as your server load increased. They also started going faster.

Anyway, this is one of those little time-wasting projects that usually go nowhere. I started working on a clone of this Netware screensaver in 2007. I remember I wanted to figure out how to write an xscreensaver "hack", so I spent a weekend looking at the source code for all the existing hacks, and I picked popsquares.c as a base and started tearing it apart, injecting dubious amounts of crappy C code until it did what I wanted.

Fast forward 4 years. Yesterday, for some reason, I got back to it, cleaned up the code a bit, and implemented a "fantastic" new feature I've always wanted: different snake colors for every different CPU, instead of all snakes being red. So I did, and the result is, well, see it for yourself:

Source code, but don't take inspiration from it, please… :) is up on github at http://github.com/cosimo/xscreensaver-loadsnake. You can also download the xscreensaver binary module if you want (only for Linux x86_64), as compiling it requires a bit of fiddling on the xscreensaver source code.

I have to admit that it's cool to run your own screensaver :)

Continuous integration of Perl-based projects in Hudson/Jenkins

I didn't find massive amounts of information about how to link any Perl-based project to Jenkins for continuous integration, but there's a few presentations on Slideshare that carry some nice ideas.

While some older pages say that "there's no out-of-the-box integration, etc…", I think there is. A very simple, very straight-forward way to integrate any (Perl) project that uses TAP into Jenkins.

TAP::Harness::JUnit

Here we go then:

TAP::Harness::JUnit will capture all the standard TAP output and turn it into the default JUnit XML output that Jenkins expects. And you don't need to do anything to make this happen. How cool is that? Read below.

Build instructions

You need to instruct Jenkins on how to build your project. So, in the "Build" panel, I usually put:

prove -I ./lib -v

If you don't use prove, be ashamed and start using it :) You'll never look back. So, getting Jenkins to understand TAP is just a matter of modifying that command to read:

prove -I ./lib -v --harness=TAP::Harness::JUnit

Here's the actual Build panel screenshot:

That's it. prove will produce a junit_output.xml file with the JUnit-compatible XML output that corresponds to the standard TAP output.

Post-build actions

Now you need to tell Jenkins that the file is actually there. I'm not sure why, but this is not automatic. You need to tell it to "Publish JUnit test results". Now, if you ask me that's totally surprising, but it works. So:

That should be it. Run your build and you should see your tests output picked up.

Geo::IP support for IPv6 geolocation

We're currently looking into IPv6-enabling our services. One of the missing bits is being able to geolocate IPv6 client addresses. We're using the MaxMind GeoIP database. The main Perl library for this is Geo::IP.

The current version of Geo::IP out on CPAN, 1.38, does not support IPv6 lookups. I contacted the maintainer of Geo::IP asking for more information. In the meanwhile, I hacked together just enough of IPv6 support to be able to successfully geolocate a test address. Later on, I discovered that IPv6 is already available in the hopefully soon-to-be-released version of Geo::IP archived at Sourceforge.

Let's hope it lands on CPAN soon. In the meantime, if you really really want, you can try out my changes against CPAN v1.38. It was enough for me to start testing the integration with our other code and projects.

My Geo::IP with IPv6 support

How to convert Opera contacts file to Mutt aliases format

Recently I've been looking more and more into mutt, the email client. I've been a very happy M2 (Opera built-in email client) user for almost 3 years now. But still I felt I was missing something if I didn't try out mutt. I've been a pine user as well, many many years ago :) So, decided to give it a go, I started about a month ago.

I struggled a bit while getting a reasonable .muttrc file together. Fortunately, there's plenty of examples out there. After getting a working config, the problem was to get back my contacts list.

Mutt has a simple address book integration (through abook) and stores the contacts into an alias file, typically ~/.mutt/aliases. Now, Opera can of course export all your mail contacts to an .adr file, a simple "addressbook" text file. Did that, and I needed to convert it to mutt's aliases format.

Ten minutes later, a Perl script to do just that was ready. Here it is:


#!/usr/bin/env perl
#
# Convert Opera contacts file (.adr) into
# mutt aliases file format.
#
# Usage:
#   perl opera-adr-to-mutt-aliases.pl < ~/.opera/contacts.adr >> ~/.mutt/aliases
#
# Cosimo, 31/Jan/2011
#

use strict;
use warnings;
use utf8;

sub harvest ($) {
    my ($contact_info) = @_;

    my ($id)    = $contact_info =~ m{^ s+ ID   = (.*) $}mx;
    my ($name)  = $contact_info =~ m{^ s+ NAME = (.*) $}mx;
    my ($email) = $contact_info =~ m{^ s+ MAIL = (.*) $}mx;

    return if ! $id and ! $email;

    return {
        ID    => $id,
        NAME  => $name,
        MAIL => $email,
    };

}

my $adr_file_contents = q{};
$adr_file_contents .= $_ while <STDIN>;

my @contacts = split m{#CONTACT}, $adr_file_contents;

for (@contacts) {
    my $contact = harvest($_) or next;
    my ($first_word) = $contact->{MAIL} =~ m{ (S+) @ }x;
    printf "alias %s %s <%s>n",
        lc($first_word), $contact->{NAME}, $contact->{MAIL};
}

Download link: https://gist.github.com/803454

Ubuntu 10.10, modperl and Apache segfaulting fixed

Last month, before moving to Melbourne, where I am now, to work in the Opera Australia office for a few months, I had to setup a laptop for all the development work I normally do. So I chose Ubuntu 10.10 amd64. I have to say I'm quite happy with it. Everything works out of the box for me, including a Quickcam 9000 USB camera I used to shoot this poor time-lapse video from my new office window. Woot!

Anyway, the development environment for one particular project consists of Apache and mod_perl. So I setup the usual list of dependencies, but when I tried to start Apache to run the test suite, it would always stop right away with a segmentation fault.

Didn't really dig into the problem. Just straced the apache process, and that's what I got:

[apache starts up, reads a bunch of Perl modules, and opens the access  
log...]
...
brk(0x7f342adbd000)                     = 0x7f342adbd000
...
...
brk(0x7f342adde000)                     = 0x7f342adde000
brk(0x7f342adff000)                     = 0x7f342adff000
brk(0x7f342ae20000)                     = 0x7f342ae20000
brk(0x7f342ae41000)                     = 0x7f342ae41000
stat("/usr/lib/perl5/auto/DBI/DESTROY.al", 0x7f341e8459b0) = -1 ENOENT (No  
such file or directory)
stat("/home/cosimo/src/auth-svn/lib/auto/DBI/DESTROY.al", 0x7fffc4e2d520)  
= -1 ENOENT (No such file or directory)
stat("/home/cosimo/src/myopera-trunk/lib/auto/DBI/DESTROY.al",  
0x7fffc4e2d520) = -1 ENOENT (No such file or directory)
stat("/etc/perl/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT (No such  
file or directory)
stat("/usr/local/lib/perl/5.10.1/auto/DBI/DESTROY.al", 0x7fffc4e2d520) =  
-1 ENOENT (No such file or directory)
stat("/usr/local/share/perl/5.10.1/auto/DBI/DESTROY.al", 0x7fffc4e2d520) =  
-1 ENOENT (No such file or directory)
stat("/usr/lib/perl5/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT (No  
such file or directory)
stat("/usr/share/perl5/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT  
(No such file or directory)
stat("/usr/lib/perl/5.10/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT  
(No such file or directory)
stat("/usr/share/perl/5.10/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1  
ENOENT (No such file or directory)
stat("/usr/local/lib/site_perl/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1  
ENOENT (No such file or directory)
stat("./auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT (No such file or  
directory)
stat("/var/tmp/test_cosimo_22931/auto/DBI/DESTROY.al", 0x7fffc4e2d520) =  
-1 ENOENT (No such file or directory)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

I thought it would be wiser to ask for advice on the DBI and mod_perl mailing lists. Tim Bunce suggested to try and get a stack trace of Apache. Why didn't I think of that in the first place? A few days later, I got my stack trace:

# gdb -c ./core /usr/sbin/apache2 
... 
Reading symbols from ... 
... 
Core was generated by `/usr/sbin/apache2 -d /var/tmp/test_cosimo_9727 -k 
start -C User cosimo -C Group ...'. 

Program terminated with signal 11, Segmentation fault. 
#0 0x00007fdaedfed858 in XS_Class__XSAccessor_END () 
from /usr/lib/perl5/auto/Class/XSAccessor/XSAccessor.so 
(gdb) backtrace 
#0 0x00007fdaedfed858 in XS_Class__XSAccessor_END () 
from /usr/lib/perl5/auto/Class/XSAccessor/XSAccessor.so 
#1 0x00007fdaf83cf845 in Perl_pp_entersub () from /usr/lib/libperl.so.5.10 
#2 0x00007fdaf83752c6 in Perl_call_sv () from /usr/lib/libperl.so.5.10 
#3 0x00007fdaf86ad40b in modperl_perl_call_list () 
from /usr/lib/apache2/modules/mod_perl.so 
#4 0x00007fdaf86b5786 in modperl_perl_destruct () 
from /usr/lib/apache2/modules/mod_perl.so 
#5 0x00007fdaf86a6256 in modperl_interp_destroy () 
from /usr/lib/apache2/modules/mod_perl.so 
#6 0x00007fdaf86a6715 in modperl_tipool_destroy () 
from /usr/lib/apache2/modules/mod_perl.so 
#7 0x00007fdaf86a62b2 in modperl_interp_pool_destroy () 
from /usr/lib/apache2/modules/mod_perl.so 
#8 0x00007fdaf98fd4e3 in ?? () from /usr/lib/libapr-1.so.0 
#9 0x00007fdaf98fc3b1 in apr_pool_destroy () from /usr/lib/libapr-1.so.0 
#10 0x00007fdaf98fc27f in apr_pool_clear () from /usr/lib/libapr-1.so.0 
#11 0x00007fdafa1b960d in main (argc=11, argv=0x7fff93b50ef8) 
at /build/buildd/apache2-2.2.16/server/main.c:692

Even if you don't know anything about stack traces, this output gently points to Class::XSAccessor. Perrin Harkins on the mod_perl list suggested to update Class::XSAccessor to the latest CPAN version, since its changelog mentioned some segmentation faults fixed in 0.10.

And that did it. No more segfaults on Ubuntu 10.10. Solution: upgrade Class::XSAccessor to 0.10+. Thanks to Class::XSAccessor maintainer(s)!

Puppet external nodes classifier script in Perl

The upgrade to puppet 2.6.2 worked out fine. Coming from 0.24.5, I noticed a really welcome speed improvement. However, I had a tricky problem.

While upgrading to 2.6, I also decided to switch to external nodes classifier script. If you don't know about it, it's a nice puppet feature that I planned to use since the start. It allows you to write a small script in any language you want, to basically tell the puppetmaster, given the hostname, what you want that machine to be.

Puppet calls your script with one argument that is the hostname of the machine that is asking for its catalog of resources. In your script, you have to output something like the following:

---
classes:
  - perl
  - apache
  - ntp
  - munin
environment: production
parameters:
  puppet_server: my.puppetmaster.com

You can specify all the "classes" of that machine, so basically what you want puppet to install (or repair) on that machine. So far so good. So my classifier script looks into some preset directories for some project-specific JSON files, and then checks if any of these JSON files contain the name of the machine that puppet is asking for. Code follows:

#!/usr/bin/env perl
#
# http://docs.puppetlabs.com/guides/external_nodes.html
#

use strict;
use warnings;
use File::Basename ();
use File::Slurp ();
use JSON::XS ();
use YAML ();

our $nodes_dir = '/etc/puppet/manifests/nodes';

our %default_params = (
    puppet_server => 'my.puppetmaster.com',
);

# ...
# A few very simple subs
# omitted for brevity
# ...

# The hostname puppet asks for
my $wanted = $ARGV[0];

# The data structure found in the JSON file
my $node_info = search_into_json_files_and_find($wanted);

my $puppet_classifier_info = {
    classes => $node_info->{puppet_classes},
    environment => 'production',
    parameters => %default_params,
};

print YAML->Dump($puppet_classifier_info);

Now, I don't know if you can immediately spot the problem here, but I didn't, so I wasted a good part of an afternoon chasing a bug I didn't even know existed. The resulting YAML (puppet wants YAML) was this one:

--- YAML 
--- 
classes: 
    - geodns::production::backend 
environment: production 
name: z01-06-02 
parameters: 
    puppet_server: z01-06-02

The problem with this, is that it looks innocent and valid, and in fact is valid, but it's two YAML documents, not one. So puppet will parse the --- YAML line since that is one single complete YAML document, and ignore the rest.

And why is that happening in the first place? Because of the YAML->Dump() call I wrote, instead of the correct YAML::Dump()… Eh :) So the correct code is:

print YAML::Dump($puppet_classifier_info);

Never use YAML->Something()

Upgrade of puppet from 0.24.5 to 2.6.2: “Got nil value for content”

Something I should have probably done a while ago. Upgrading our puppet clients and master from 0.24.5 (current Debian stable) to 2.6.2 (current backports). I've been advised to go all the way to 2.6.4, that contains an important security fix. I'll probably do that soon.

So, first error I did was to upgrade one client before the puppetmaster. In my mental model, it's always ok to upgrade the server first (puppetmaster), but I remember I was kind of surprised when I learned that puppet needs upgrading the clients first, so that is something I seemed to remember well: first upgrade clients, then the puppetmaster.

So I did, and that didn't work at all. The client couldn't retrieve the catalog. I discovered I had to update the puppetmaster first. Another very common thing with puppet: if something doesn't work, wipe out the SSL certificates. Almost every puppet user, maybe unexperienced like me, will tell you to remove the entire /var/lib/puppet/ssl directory, and restart your client. Good. So I did.

After fiddling for a while with client and server-side SSL certificates, removing, --waitforcert and puppetca invocations, I was able to finally connect the client with the puppetmaster and have them talk to each other correctly, downloading the manifests and applying the changes. However, one problem remained…

Digression: custom facter plugins

I wrote a specific facter plugin that provides a datacenter fact. Just as an example, it works like this:

#
# Provide an additional 'datacenter' fact
# to use in generic modules to provide datacenter
# specific settings, such as resolv.conf
#
# Cosimo, 03/Aug/2010
#
# $Id: datacenter.rb 15240 2011-01-03 14:27:44Z cosimo $

Facter.add("datacenter") do
    setcode do

        datacenter = "unknown"

        # Get current ip address from Facter's own database
        ipaddr = Facter.value(:ipaddress)

        if ipaddr.match("^12.34.56.")
            datacenter = "dc1"
        elsif ipaddr.match("^34.56.78.")
            datacenter = "dc2"
        elsif ipaddr.match("^56.78.12.")
            datacenter = "dc3"
        # etc ...
        # etc ...
        end

        datacenter
    end
end

This allows for very cool things, like specific DNS resolver settings by datacenter:

# Data-center based settings
case $datacenter {
    "dc1" :  { include opera::datacenters::dc1 }
    "dc2" :  { include opera::datacenters::dc2 }
    "dc3" :  { include opera::datacenters::dc3 }
    # ...
    default: { include opera::datacenters::dc1 }
}

where each opera::datacenter::dc class contains all the datacenter-dependent settings. In my case, just the resolver for now.

class opera::datacenters::dc1 {
    resolver::dns { "dc1-ns":
        domain => "opera.com",
        nameservers => [ "1.2.3.4", "5.6.7.8" ],
    }
}

resolver::dns is in turn a small define in a resolver module I wrote to generate resolv.conf contents from a small template. It's so small I can copy/paste it here:

# "resolver/manifests/init.pp"
define resolver::dns ($nameservers, $search="", $domain="") {
    file { "/etc/resolv.conf":
        ensure => "present",
        owner  => "root",
        group  => "root",
        mode   => 644,
        content => template("resolver/resolv.conf.erb"),
    }
}

# "resolver/templates/resolv.conf.erb"
# File managed by puppet <%= puppetversion %> on <%= fqdn %>
# Data center: <%= name %>
<% if domain != "" %>domain <%= domain %>
<% end %>
<% if search != "" %>search <%= search %>
<% end %>
<% nameservers.each do |ns| %>nameserver <%= ns %>
<% end %>

One problem remained…

Let's get back to the remaining problem. Every client run was spitting this error string:

info: Retrieving plugin
err: /File[/var/lib/puppet/lib]: Could not evaluate: Got nil value for content

After a lot of searching and reading through similar (but not quite the same) messages, I found this thread where one guy was having a very similar problem, but with a different filename, /root/.gitconfig that he obviously specified in his manifests.

My problem happened with /var/lib/puppet/lib, which is never specified in any of my manifests. But the symlink bit got me thinking. At some point, to have the specific facter plugin work, and having read about differences between older and 2.6.x versions of puppet, I had put it into my module's lib/facter folder, but creating also a symlink called plugins (required by the new puppet). Doing so, I thought, would probably prevent problems, as the new puppet could read the file from the new folder, while the older version could read it from "lib".

Well, turns out that puppet will not be able to read custom facter plugins from a plugins folder that is a symlink. Removing the symlink and making {module}/plugins/facter a real directory solved that problem too.

I really hope to save someone else the time I spent on this. Cheers :)

Geo-IP lookup without installing Geo-IP libraries (everywhere)

We've been using GeoDNS to distribute client requests to different data centers around the world, first as a highly experimental project, then more and more as months passed.

Currently we're using it as simple global load balancer for files.myopera.com, help.opera.com, and some get.opera.com stuff.

However, there is another minor feature that we built into it, like a hacker's backdoor :) Since it's using a full DNS server, and it relies on having GeoIP libraries installed and always up-to-date, we thought it was a nice and cool idea to have a quick way to perform geo-ip lookups from the command line.

It works in a similar way as DNS black lists do. Suppose you want to look up the IP address 80.90.100.110. You reverse the IP, and lookup a special geo.opera.com zone:

cosimo@cd01:~$ host -t TXT 110.100.90.80.lookup.geo.opera.com
110.100.90.80.lookup.geo.opera.com descriptive text "ip:80.90.100.110, country:de, continent:europe"

This uses the GeoDNS backends to resolve country and continent of the given IP address, and gets back the information in a very basic string format. A simple shell or Perl script can then process that for you if you need. In fact, I made a ~/bin/geolookup Perl script that I can use like this:

cosimo@cd01:~$ geolookup 80.90.100.110 90.100.110.120 100.110.120.130
80.90.100.110 => de, europe
90.100.110.120 => fr, europe
100.110.120.130 => unknown, unknown

Nothing special, but in this way, no matter what machine I'm on, I can always quickly lookup IPs if I need to, without having to download the Country or City GeoIP databases, and keep them up-to-date. On the geodns backends, this is of course done routinely with a set of simple cronjobs.