Monthly Archives: January 2011

Puppet external nodes classifier script in Perl

The upgrade to puppet 2.6.2 worked out fine. Coming from 0.24.5, I noticed a really welcome speed improvement. However, I had a tricky problem.

While upgrading to 2.6, I also decided to switch to external nodes classifier script. If you don't know about it, it's a nice puppet feature that I planned to use since the start. It allows you to write a small script in any language you want, to basically tell the puppetmaster, given the hostname, what you want that machine to be.

Puppet calls your script with one argument that is the hostname of the machine that is asking for its catalog of resources. In your script, you have to output something like the following:

---
classes:
  - perl
  - apache
  - ntp
  - munin
environment: production
parameters:
  puppet_server: my.puppetmaster.com

You can specify all the "classes" of that machine, so basically what you want puppet to install (or repair) on that machine. So far so good. So my classifier script looks into some preset directories for some project-specific JSON files, and then checks if any of these JSON files contain the name of the machine that puppet is asking for. Code follows:

#!/usr/bin/env perl
#
# http://docs.puppetlabs.com/guides/external_nodes.html
#

use strict;
use warnings;
use File::Basename ();
use File::Slurp ();
use JSON::XS ();
use YAML ();

our $nodes_dir = '/etc/puppet/manifests/nodes';

our %default_params = (
    puppet_server => 'my.puppetmaster.com',
);

# ...
# A few very simple subs
# omitted for brevity
# ...

# The hostname puppet asks for
my $wanted = $ARGV[0];

# The data structure found in the JSON file
my $node_info = search_into_json_files_and_find($wanted);

my $puppet_classifier_info = {
    classes => $node_info->{puppet_classes},
    environment => 'production',
    parameters => %default_params,
};

print YAML->Dump($puppet_classifier_info);

Now, I don't know if you can immediately spot the problem here, but I didn't, so I wasted a good part of an afternoon chasing a bug I didn't even know existed. The resulting YAML (puppet wants YAML) was this one:

--- YAML 
--- 
classes: 
    - geodns::production::backend 
environment: production 
name: z01-06-02 
parameters: 
    puppet_server: z01-06-02

The problem with this, is that it looks innocent and valid, and in fact is valid, but it's two YAML documents, not one. So puppet will parse the --- YAML line since that is one single complete YAML document, and ignore the rest.

And why is that happening in the first place? Because of the YAML->Dump() call I wrote, instead of the correct YAML::Dump()… Eh :) So the correct code is:

print YAML::Dump($puppet_classifier_info);

Never use YAML->Something()

Upgrade of puppet from 0.24.5 to 2.6.2: “Got nil value for content”

Something I should have probably done a while ago. Upgrading our puppet clients and master from 0.24.5 (current Debian stable) to 2.6.2 (current backports). I've been advised to go all the way to 2.6.4, that contains an important security fix. I'll probably do that soon.

So, first error I did was to upgrade one client before the puppetmaster. In my mental model, it's always ok to upgrade the server first (puppetmaster), but I remember I was kind of surprised when I learned that puppet needs upgrading the clients first, so that is something I seemed to remember well: first upgrade clients, then the puppetmaster.

So I did, and that didn't work at all. The client couldn't retrieve the catalog. I discovered I had to update the puppetmaster first. Another very common thing with puppet: if something doesn't work, wipe out the SSL certificates. Almost every puppet user, maybe unexperienced like me, will tell you to remove the entire /var/lib/puppet/ssl directory, and restart your client. Good. So I did.

After fiddling for a while with client and server-side SSL certificates, removing, --waitforcert and puppetca invocations, I was able to finally connect the client with the puppetmaster and have them talk to each other correctly, downloading the manifests and applying the changes. However, one problem remained…

Digression: custom facter plugins

I wrote a specific facter plugin that provides a datacenter fact. Just as an example, it works like this:

#
# Provide an additional 'datacenter' fact
# to use in generic modules to provide datacenter
# specific settings, such as resolv.conf
#
# Cosimo, 03/Aug/2010
#
# $Id: datacenter.rb 15240 2011-01-03 14:27:44Z cosimo $

Facter.add("datacenter") do
    setcode do

        datacenter = "unknown"

        # Get current ip address from Facter's own database
        ipaddr = Facter.value(:ipaddress)

        if ipaddr.match("^12.34.56.")
            datacenter = "dc1"
        elsif ipaddr.match("^34.56.78.")
            datacenter = "dc2"
        elsif ipaddr.match("^56.78.12.")
            datacenter = "dc3"
        # etc ...
        # etc ...
        end

        datacenter
    end
end

This allows for very cool things, like specific DNS resolver settings by datacenter:

# Data-center based settings
case $datacenter {
    "dc1" :  { include opera::datacenters::dc1 }
    "dc2" :  { include opera::datacenters::dc2 }
    "dc3" :  { include opera::datacenters::dc3 }
    # ...
    default: { include opera::datacenters::dc1 }
}

where each opera::datacenter::dc class contains all the datacenter-dependent settings. In my case, just the resolver for now.

class opera::datacenters::dc1 {
    resolver::dns { "dc1-ns":
        domain => "opera.com",
        nameservers => [ "1.2.3.4", "5.6.7.8" ],
    }
}

resolver::dns is in turn a small define in a resolver module I wrote to generate resolv.conf contents from a small template. It's so small I can copy/paste it here:

# "resolver/manifests/init.pp"
define resolver::dns ($nameservers, $search="", $domain="") {
    file { "/etc/resolv.conf":
        ensure => "present",
        owner  => "root",
        group  => "root",
        mode   => 644,
        content => template("resolver/resolv.conf.erb"),
    }
}

# "resolver/templates/resolv.conf.erb"
# File managed by puppet <%= puppetversion %> on <%= fqdn %>
# Data center: <%= name %>
<% if domain != "" %>domain <%= domain %>
<% end %>
<% if search != "" %>search <%= search %>
<% end %>
<% nameservers.each do |ns| %>nameserver <%= ns %>
<% end %>

One problem remained…

Let's get back to the remaining problem. Every client run was spitting this error string:

info: Retrieving plugin
err: /File[/var/lib/puppet/lib]: Could not evaluate: Got nil value for content

After a lot of searching and reading through similar (but not quite the same) messages, I found this thread where one guy was having a very similar problem, but with a different filename, /root/.gitconfig that he obviously specified in his manifests.

My problem happened with /var/lib/puppet/lib, which is never specified in any of my manifests. But the symlink bit got me thinking. At some point, to have the specific facter plugin work, and having read about differences between older and 2.6.x versions of puppet, I had put it into my module's lib/facter folder, but creating also a symlink called plugins (required by the new puppet). Doing so, I thought, would probably prevent problems, as the new puppet could read the file from the new folder, while the older version could read it from "lib".

Well, turns out that puppet will not be able to read custom facter plugins from a plugins folder that is a symlink. Removing the symlink and making {module}/plugins/facter a real directory solved that problem too.

I really hope to save someone else the time I spent on this. Cheers :)