Category Archives: Uncategorized

Upgrade of puppet from 0.24.5 to 2.6.2: “Got nil value for content”

Something I should have probably done a while ago. Upgrading our puppet clients and master from 0.24.5 (current Debian stable) to 2.6.2 (current backports). I've been advised to go all the way to 2.6.4, that contains an important security fix. I'll probably do that soon.

So, first error I did was to upgrade one client before the puppetmaster. In my mental model, it's always ok to upgrade the server first (puppetmaster), but I remember I was kind of surprised when I learned that puppet needs upgrading the clients first, so that is something I seemed to remember well: first upgrade clients, then the puppetmaster.

So I did, and that didn't work at all. The client couldn't retrieve the catalog. I discovered I had to update the puppetmaster first. Another very common thing with puppet: if something doesn't work, wipe out the SSL certificates. Almost every puppet user, maybe unexperienced like me, will tell you to remove the entire /var/lib/puppet/ssl directory, and restart your client. Good. So I did.

After fiddling for a while with client and server-side SSL certificates, removing, --waitforcert and puppetca invocations, I was able to finally connect the client with the puppetmaster and have them talk to each other correctly, downloading the manifests and applying the changes. However, one problem remained…

Digression: custom facter plugins

I wrote a specific facter plugin that provides a datacenter fact. Just as an example, it works like this:

#
# Provide an additional 'datacenter' fact
# to use in generic modules to provide datacenter
# specific settings, such as resolv.conf
#
# Cosimo, 03/Aug/2010
#
# $Id: datacenter.rb 15240 2011-01-03 14:27:44Z cosimo $

Facter.add("datacenter") do
    setcode do

        datacenter = "unknown"

        # Get current ip address from Facter's own database
        ipaddr = Facter.value(:ipaddress)

        if ipaddr.match("^12.34.56.")
            datacenter = "dc1"
        elsif ipaddr.match("^34.56.78.")
            datacenter = "dc2"
        elsif ipaddr.match("^56.78.12.")
            datacenter = "dc3"
        # etc ...
        # etc ...
        end

        datacenter
    end
end

This allows for very cool things, like specific DNS resolver settings by datacenter:

# Data-center based settings
case $datacenter {
    "dc1" :  { include opera::datacenters::dc1 }
    "dc2" :  { include opera::datacenters::dc2 }
    "dc3" :  { include opera::datacenters::dc3 }
    # ...
    default: { include opera::datacenters::dc1 }
}

where each opera::datacenter::dc class contains all the datacenter-dependent settings. In my case, just the resolver for now.

class opera::datacenters::dc1 {
    resolver::dns { "dc1-ns":
        domain => "opera.com",
        nameservers => [ "1.2.3.4", "5.6.7.8" ],
    }
}

resolver::dns is in turn a small define in a resolver module I wrote to generate resolv.conf contents from a small template. It's so small I can copy/paste it here:

# "resolver/manifests/init.pp"
define resolver::dns ($nameservers, $search="", $domain="") {
    file { "/etc/resolv.conf":
        ensure => "present",
        owner  => "root",
        group  => "root",
        mode   => 644,
        content => template("resolver/resolv.conf.erb"),
    }
}

# "resolver/templates/resolv.conf.erb"
# File managed by puppet <%= puppetversion %> on <%= fqdn %>
# Data center: <%= name %>
<% if domain != "" %>domain <%= domain %>
<% end %>
<% if search != "" %>search <%= search %>
<% end %>
<% nameservers.each do |ns| %>nameserver <%= ns %>
<% end %>

One problem remained…

Let's get back to the remaining problem. Every client run was spitting this error string:

info: Retrieving plugin
err: /File[/var/lib/puppet/lib]: Could not evaluate: Got nil value for content

After a lot of searching and reading through similar (but not quite the same) messages, I found this thread where one guy was having a very similar problem, but with a different filename, /root/.gitconfig that he obviously specified in his manifests.

My problem happened with /var/lib/puppet/lib, which is never specified in any of my manifests. But the symlink bit got me thinking. At some point, to have the specific facter plugin work, and having read about differences between older and 2.6.x versions of puppet, I had put it into my module's lib/facter folder, but creating also a symlink called plugins (required by the new puppet). Doing so, I thought, would probably prevent problems, as the new puppet could read the file from the new folder, while the older version could read it from "lib".

Well, turns out that puppet will not be able to read custom facter plugins from a plugins folder that is a symlink. Removing the symlink and making {module}/plugins/facter a real directory solved that problem too.

I really hope to save someone else the time I spent on this. Cheers :)

Geo-IP lookup without installing Geo-IP libraries (everywhere)

We've been using GeoDNS to distribute client requests to different data centers around the world, first as a highly experimental project, then more and more as months passed.

Currently we're using it as simple global load balancer for files.myopera.com, help.opera.com, and some get.opera.com stuff.

However, there is another minor feature that we built into it, like a hacker's backdoor :) Since it's using a full DNS server, and it relies on having GeoIP libraries installed and always up-to-date, we thought it was a nice and cool idea to have a quick way to perform geo-ip lookups from the command line.

It works in a similar way as DNS black lists do. Suppose you want to look up the IP address 80.90.100.110. You reverse the IP, and lookup a special geo.opera.com zone:

cosimo@cd01:~$ host -t TXT 110.100.90.80.lookup.geo.opera.com
110.100.90.80.lookup.geo.opera.com descriptive text "ip:80.90.100.110, country:de, continent:europe"

This uses the GeoDNS backends to resolve country and continent of the given IP address, and gets back the information in a very basic string format. A simple shell or Perl script can then process that for you if you need. In fact, I made a ~/bin/geolookup Perl script that I can use like this:

cosimo@cd01:~$ geolookup 80.90.100.110 90.100.110.120 100.110.120.130
80.90.100.110 => de, europe
90.100.110.120 => fr, europe
100.110.120.130 => unknown, unknown

Nothing special, but in this way, no matter what machine I'm on, I can always quickly lookup IPs if I need to, without having to download the Country or City GeoIP databases, and keep them up-to-date. On the geodns backends, this is of course done routinely with a set of simple cronjobs.

Facter ported to Perl 6

A few days ago I wrote about my fun experiment trying to port facter to rakudo Perl 6. I said it was "almost completely functional".

Well, I was a bit optimistic, it seems :) It took me a few more nights of hacking, but in the end now it's almost completely functional :) It basically runs, I have ported a few facts from Facter's original ruby code too, like the kernel fact, or the physicalprocessorcount fact and other simple ones.

What's missing to declare the experiment successful is the implementation of confines. A confine in facter speak it's a specific restriction that applies to a fact. For example, the physicalprocessorcount fact reads some files from /proc, and that is only available on Linux. So, in this case, the confine rule for physicalprocessorcount is that the fact kernel must have "Linux" as its value. In code that becomes:

Facter.add("physicalprocessorcount", sub ($f) {
    $f.confine("kernel" => "Linux");
    $f.setcode(block => sub {
        Facter::Util::Resolution.exec('grep "physical id" /proc/cpuinfo|cut -d: -f 2|sort -u|wc -l');
    });
});

which is pretty similar to the Ruby counterpart:

Facter.add("physicalprocessorcount") do
    confine :kernel => :linux

    setcode do
        ppcount = Facter::Util::Resolution.exec('grep "physical id" /proc/cpuinfo|cut -d: -f 2|sort -u|wc -l')
    end
end

The Ruby version is still more elegant, but all in all I'm very happy with the outcome so far. It could probably be improved a lot too. Perl 6 is awesome. Get the code from http://github.com/cosimo/perl6-facter/ and feel free to ping me or comment if you want to know more.

IPv6 and Perl. What’s the status?

This is more like a request for help than the usual babbling about something. IPv4 address space appears to be almost entirely allocated. IPv6 is there, ready for us to use (more or less).

I'm trying to reserve some time to prepare and test our systems for client and server IPv6 addresses. What is your experience in particular with Perl software? I know almost nothing about it, except there's:

I tried to download and test Net::IPv6Address, but that didn't work, tests fail, and the module is also last updated in 2008… Can you enlighten me on what's the state of the art regarding Perl (but not only, ofc) and IPv6? KTHXBYE.

In the meanwhile, I'll read a bit more on the whole IPv6 matter. First task: write an anonymizer for a IPv6 address.

Porting Facter from Ruby to Perl6

I've been playing with Puppet for a while now. One of the most interesting (and simple!) components of Puppet is Facter.

Facter is a small software that reports "facts" about your computer. When you run facter, its output looks like the following:

architecture => x86_64
facterversion => 1.5.6
fqdn => cd01.localdomain.lan
hardwaremodel => x86_64
hostname => cd01
id => cosimo
interfaces => eth0,pan0
ipaddress => 10.0.0.1
...
uptime_hours => 422
uptime_seconds => 1519256
virtual => physical

It is interesting also because it's extensible with your own custom plugins. A custom plugin is just a Ruby file with usually a call to Facter.add:

Facter.add(:kernel) do
    setcode do
        require 'rbconfig'
        case Config::CONFIG['host_os']
        when /mswin|win32|dos|cygwin|mingw/i
            'windows'
        else
            Facter::Util::Resolution.exec("uname -s")
        end
    end
end

As a little experiment to become more familiar with Ruby and at the same time to enjoy writing some Perl 6, I decided to study the facter project and then port it to Perl6.

That took me a few hours over a couple of weekends, and it's almost completely functional. It is a straight port from the Ruby code, so it doesn't really use the magic powers of Perl6 yet. The idea is to use a different branch, now that I know it well under the hood, and to rewrite it from the ground up in Perl6.

I found out that Ruby code maps very closely to Perl6, apart from the yield instruction and a different model for static/instance variables. For yield, something that would be close in Perl6 is gather/take, but I'm not really sure that is the appropriate statement to use. yield is used in the fact value resolution algorithm, and that is currently the only thing that doesn't work properly in my port. Everything else is in place. Regarding static/instance variables, Ruby uses @attribute for instance variables, and @@attribute for static variables. In Perl6, instance variables are denoted with $.attribute if public, while static variables are package globals, so you can use our $attribute.

Of course, there's a lot more! If you're interested, take a look at the code on Github, the URL is http://github.com/cosimo/perl6-facter.

Another round of fixes for the varnish Accept-Language VCL extension

This very specific VCL extension to parse and normalize Accept-Language headers is becoming more robust as time goes by. Here's the latest round of fixes:

  • the original req.http.Accept-Language header could be overwritten when calling the vcl_rewrite_accept_language() function. Now this is fixed by copying the original header string into a static buffer and executing the processing on the copy.
  • improved a bit the style of the C code in a few places. Nothing miraculous really. Feels improved for me at least :)
  • fixed the use of a wrong define (string max length instead of languages list max length)
  • use of sizeof instead of using same constants in strncpy, etc…
  • added a small intro on the generated file too. In this way, if you find the code on some server, you can immediately understand what's that supposed to do, and where it came from :)

Enough blah blah, here's the new code. If you were already running this piece of VCL, then I won't tell you this is experimental stuff, because at this point it's no more experimental. But if you were running it already, then by all means upgrade. It is definitely better :)

http://github.com/cosimo/varnish-accept-language

Enjoy!

Adding the irc NOTICE capability to Bot::BasicBot

Bot::BasicBot is a Perl module that provides a really easy, fast and convenient way to build plugin-based IRC bots. I'm playing around with an IRC bot that should assist in continuously deploying projects.

This bot has two main functionalities:

  • keep track of continous integration builds
  • initiate and keep track of deployments

Right now the bot reads a main configuration file with data about projects, repositories, continuous integration, etc… and answers commands. This is an example:


21:58 <@cosimo> projects-list
21:58 < deployer> auth, geodns, libopera, link, myopera, sso
21:58 <@cosimo> build-status geodns
21:58 < deployer> 97ad24e success cosimo https://git.server/functests/builds/geodns/97ad24e
21:58 <@cosimo> latest-revision sso
21:58 < deployer> 5207cfe, https://git.server/?p=sso.git;a=commit;h=fe977d32e9580551dffe8139396106ba25207cfe
21:59 <@cosimo> build-status auth
21:59 < deployer> 24135 success cosimo https://test.server/functests/builds/auth-unit/24135
21:59 < deployer> 24135.2 success (manual) https://test.server/functests/builds/auth-functional/24135.2

Another functionality of the bot is to detect new builds, and automatically send updates to a given channel, stating the project, the new VCS revision, the committer and a link to the continuous integration test run. Example:


17:22 -deployer:#chan- sso, fe977d3 success cosimo https://test.server/functests/builds/opera-sso/fe977d3

In the future, I also want to command the bot to initiate deployments. Anyway, the problem was that Bot::BasicBot apparently lacked support for sending IRC notices. This caused all the bot messages to interrupt the flow of IRC conversations. Bot::BasicBot has source code also on github, so I just forked it and added support for IRC notices. I just noticed that the author already pulled in the new changes. o/

It will still take years for this to land in Debian, but still… :-)

My first article on Dev Opera!

This week the DevRel team published my first article on dev.opera.com!

For me this is really great. I remember, before joining Opera, during 2006, when I first considered sending a CV, I was looking at DevOpera. It was full of really high-level technical articles (back then!). I also remember thinking that I'd never be able to have my name on an article there.

Time flies, 4 years in the future, and it's there!. I still think Dev articles are high-level on average nowadays. Not sure about mine though :-), but if you're interested in OAuth, go read it and be sure to post feedback either here or on the article discussion forum. I have tried to avoid really going into the gory details. You can find them in the RFC anyway. Have fun!

From disaster to stability: the scaling challenges of My Opera (Surge 2010)

I just read John Allspaw's blog post about his talk at Surge 2010. John's talks were among the best of the conference IMO. So I was amazed to read his post.

I was really honored to take part in this conference as speaker. Just before my proposal being accepted, I had bought the book Web Operations. When the Surge speakers list was announced, I was thrilled to discover that many contributors to the book, including Allspaw, were also speaking at Surge, and even more honored to have the privilege to being in the same group and knowing them in person.

If you're interested in the talk I presented, while the Omniti folks bring up videos and slides on the Surge website, you can download them from Slideshare:

I hope the video is delayed, so I can avoid the embarrassment for a little while :-)

Surge 2010 scalability conference in Baltimore, USA – DAY 2

This is a summary of day 2 of the Surge conference that took place in Baltimore, USA, 30th of September and 1st of October 2010. For a quite comprehensive blog post about day 1, you can read my previous post.

Here comes the list of talks I attended during Day 2.

Brian Cantryll – failures in commodity hardware

What happens when commodity hardware is used in an "enterprise" hardware project? Brian guided the audience through this industrial hw project. There was no recorded video of this talk, due to the content being potentially "sensitive". Very interesting talk, and Brian is IMO a very good speaker.

Benjamin Black – FastIP

Benjamin presented a – for me – new way to analyze metrics of a network, named "Flow". The flow-based network metrics can represent a network activity in a way that is completely different and much more accurate than what's usually done by operations and sysadmin departments. The downside is that is generates a lot of data. The advantage is that you can analyze and even replay? any traffic that took place between any two nodes of the network. I'm sure I didn't understand correctly because this would be amazing.

There's products out there that offer flow-based network analysis: Cisco Netflow, Ntop NProbe, etc… There's also a IETF working group about flow. We couldn't see any example/demo because there was a problem with the slides, IIRC.

FastIP also offers a related service. I contacted Benjamin about this after his talk. Maybe we'll be able to try something out or at least have a demonstration.

My TODO list:

Gavin Roy – Scaling MyYearBook.com

One of the most interesting talks in this conference IMO. MyYearbook is a Postgres shop, among the top 25 trafficked sites in the USA.

Gavin talked about many things they did to scale their site as the traffic was growing. Here's some of the things I remember:

  • DB connection pooling very important for them. Made a world of difference. They use PgBouncer and pgPool2
  • DB Horizontal scaling with pIProxy. TODO: look it up
  • DB Replication w/ Londiste, Slony, Bucardo
  • Postgresql 9.0 based standby to increase read-only capacity, and for hot-standby.
  • Partitioned the database by table, feature available since Pg 8.1

They have a primary-to-secondary master failover procedure. They looked into automating it, but a tech judgement is really necessary in case something goes wrong, so they will keep it manual. This was a question I asked to Gavin, since we've thought about automating our failover procedure for MySQL, but it's not so easy to just decide when to trigger the failover…

For user storage, they use Isilon IQ Series, apparently a FreeBSD appliance with on-board NFS. For DB servers, they looked at different solutions, but they keep coming back to direct attached storage. Their man db server, they have a massively powerful machine, IIRC, 512Gb of RAM and 128 cores machine. I have to double check this because it seems really impressive.

John Allspaw – Go or No Go

Another great talk by John, well presented and with great content. Not easy to summarize. The main topic was the "Go or no-go meeting", a 10 minutes get together of all involved parties before releasing changes or launching any new feature live.

This meeting basically consists of Yes/No questions:

  • Have you tested enough to deploy? QA still needed?
  • Has the feature being communicated (blog/forum/…)?
  • Does everyone know: when it will go live? who will push the feature?
  • Has the feature been in production for staff (or beta users)? That can be tricky to implement if the new feature implies social interactions (beta user tagging non-beta user)
  • Is it possible to dark launch this feature? Will we?
  • Is it possible to turn on this feature on a % of users? Will we?
  • Does it involve new infrastructure? If so: is there monitoring in place? (BLOCKER)
  • On/Off switch in the code/config is in place? Is it documented?
  • Are all the relevant people available for communication and launch?
  • Is there a place for users to provide feedback about the feature?
  • Post-launch "it's all done" time agreed?
  • Contingency checklist done and everytime reviewed it? (BLOCKER)

The "Contingency list" should answer the question: "What could possibly go wrong? What will we do about it?", with a list of potential issues and how to solve them in case shit hits the fan.

Apart from the Go/No-go meeting, which would be, also according to my past experience, a great way to avoid problems, there's at least a couple more really nice things to keep in mind when developing or launching a new feature:

  • "Dark launches": a dark launch is essentially a full launch of the new feature, but in such a way that is invisible to users. So if you're making db queries and processing stuff, you keep doing all that, you just throw the data away. You will be able to realize the (almost) full impact of the new feature on your application and compensate accordingly.
  • Feature "sampling" (% of users): you just enable the full feature for a small, and then growing, percentage of your user base. You can gradually grow to 100% and test the effect of the changes.

Great stuff.

Neil Gunther – Quantifying scalability

Here I was a bit too excited, due to my talk coming next, so unfortunately I didn't pay too much attention. It's a full analysis of scalability seen as a mathematical function, as capacity of your system as the load increases.

Cosimo Streppone – Scaling challenges of my.opera.com

I think

I used 5 minutes to show a live demo of the My Opera realtime monitor application that we built and afterwards I got very interesting questions, and also some nice twitter messages about it.

I also talked about how we've experimented in distributing requests across the different datacenters with our little geodns tool.

All in all, for me it was a fantastic experience. Practice will make me better, so I look forward to a next time :-)

Baron Schwartz – Scaling without sharding

Baron works for Percona. I had read some talks of his. I think he's a really good speaker. He explained in detail the scenarios that arise when dealing with database scaling, the typical characteristics of reads and writes, single server vs multiple servers deployments.

Basically what the talk tries to suggest is that very few situations require to shard your database. Single server setups can go very far, by optimizing the way the db works. Quote: "Sharding should be your last resort". Sharding should be enforced when write demand exceeds write capacity, so avoid sharding if you can, try to buffer/collate writes, defer update work, etc..

Closing day 2

Theo Schlossnagle closed the conference with a plenary keynote about a semi-serious "brief history of computing". Much fun, and a goodbye to next year's Surge.

For a glimpse of what happened live at the conference, you can also check out the Twitter stream for #surgecon.

Definitely a great conference. Stay tuned for videos and slides on the official site, http://omniti.com/surge/2010.