Tag Archives: apache

Kicking Jenkins with monit

We've been using Jenkins to build and test all our projects for a good part of this year now. I think Jenkins is one of the very few Java projects I've seen and worked with that actually works and it's a real pleasure to use. Except every now and then it seems to crash without reason..

I haven't had time to dig into this problem yet. I've only seen the frontend Apache process logging errors because it cannot connect to the Tomcat backend on port 8080. My theory so far is that Jenkins tries to auto-update and crashes, or maybe there's a runaway test run that brings everything down…

Time is really limited these days, and I have heard good things about monit, I decided to try it to see if we could have Jenkins kicked when it dies for some reason. In this way we can avoid cases where the test suites haven't been running for a day or two and nobody noticed… :-|

So, long story short, here's the quick and dirty monit recipe to kick Apache + Jenkins (this is on Debian Squeeze):

check process jenkins with pidfile /var/run/jenkins/jenkins.pid
  start program = "/etc/init.d/jenkins start" with timeout 90 seconds
  stop program  = "/etc/init.d/jenkins stop"
  if failed host my.host.name port 8080 protocol http
     and request "/"
     then restart

check process apache with pidfile /var/run/apache2.pid
  start program = "/etc/init.d/apache2 start" with timeout 60 seconds
  stop program  = "/etc/init.d/apache2 stop"
  if failed host my.host.name port 80 protocol http
     and request "/"
     then restart

And, just for kicks, a complete Monit module for puppet up on Github. Have fun!

Ubuntu 10.10, modperl and Apache segfaulting fixed

Last month, before moving to Melbourne, where I am now, to work in the Opera Australia office for a few months, I had to setup a laptop for all the development work I normally do. So I chose Ubuntu 10.10 amd64. I have to say I'm quite happy with it. Everything works out of the box for me, including a Quickcam 9000 USB camera I used to shoot this poor time-lapse video from my new office window. Woot!

Anyway, the development environment for one particular project consists of Apache and mod_perl. So I setup the usual list of dependencies, but when I tried to start Apache to run the test suite, it would always stop right away with a segmentation fault.

Didn't really dig into the problem. Just straced the apache process, and that's what I got:

[apache starts up, reads a bunch of Perl modules, and opens the access  
log...]
...
brk(0x7f342adbd000)                     = 0x7f342adbd000
...
...
brk(0x7f342adde000)                     = 0x7f342adde000
brk(0x7f342adff000)                     = 0x7f342adff000
brk(0x7f342ae20000)                     = 0x7f342ae20000
brk(0x7f342ae41000)                     = 0x7f342ae41000
stat("/usr/lib/perl5/auto/DBI/DESTROY.al", 0x7f341e8459b0) = -1 ENOENT (No  
such file or directory)
stat("/home/cosimo/src/auth-svn/lib/auto/DBI/DESTROY.al", 0x7fffc4e2d520)  
= -1 ENOENT (No such file or directory)
stat("/home/cosimo/src/myopera-trunk/lib/auto/DBI/DESTROY.al",  
0x7fffc4e2d520) = -1 ENOENT (No such file or directory)
stat("/etc/perl/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT (No such  
file or directory)
stat("/usr/local/lib/perl/5.10.1/auto/DBI/DESTROY.al", 0x7fffc4e2d520) =  
-1 ENOENT (No such file or directory)
stat("/usr/local/share/perl/5.10.1/auto/DBI/DESTROY.al", 0x7fffc4e2d520) =  
-1 ENOENT (No such file or directory)
stat("/usr/lib/perl5/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT (No  
such file or directory)
stat("/usr/share/perl5/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT  
(No such file or directory)
stat("/usr/lib/perl/5.10/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT  
(No such file or directory)
stat("/usr/share/perl/5.10/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1  
ENOENT (No such file or directory)
stat("/usr/local/lib/site_perl/auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1  
ENOENT (No such file or directory)
stat("./auto/DBI/DESTROY.al", 0x7fffc4e2d520) = -1 ENOENT (No such file or  
directory)
stat("/var/tmp/test_cosimo_22931/auto/DBI/DESTROY.al", 0x7fffc4e2d520) =  
-1 ENOENT (No such file or directory)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

I thought it would be wiser to ask for advice on the DBI and mod_perl mailing lists. Tim Bunce suggested to try and get a stack trace of Apache. Why didn't I think of that in the first place? A few days later, I got my stack trace:

# gdb -c ./core /usr/sbin/apache2 
... 
Reading symbols from ... 
... 
Core was generated by `/usr/sbin/apache2 -d /var/tmp/test_cosimo_9727 -k 
start -C User cosimo -C Group ...'. 

Program terminated with signal 11, Segmentation fault. 
#0 0x00007fdaedfed858 in XS_Class__XSAccessor_END () 
from /usr/lib/perl5/auto/Class/XSAccessor/XSAccessor.so 
(gdb) backtrace 
#0 0x00007fdaedfed858 in XS_Class__XSAccessor_END () 
from /usr/lib/perl5/auto/Class/XSAccessor/XSAccessor.so 
#1 0x00007fdaf83cf845 in Perl_pp_entersub () from /usr/lib/libperl.so.5.10 
#2 0x00007fdaf83752c6 in Perl_call_sv () from /usr/lib/libperl.so.5.10 
#3 0x00007fdaf86ad40b in modperl_perl_call_list () 
from /usr/lib/apache2/modules/mod_perl.so 
#4 0x00007fdaf86b5786 in modperl_perl_destruct () 
from /usr/lib/apache2/modules/mod_perl.so 
#5 0x00007fdaf86a6256 in modperl_interp_destroy () 
from /usr/lib/apache2/modules/mod_perl.so 
#6 0x00007fdaf86a6715 in modperl_tipool_destroy () 
from /usr/lib/apache2/modules/mod_perl.so 
#7 0x00007fdaf86a62b2 in modperl_interp_pool_destroy () 
from /usr/lib/apache2/modules/mod_perl.so 
#8 0x00007fdaf98fd4e3 in ?? () from /usr/lib/libapr-1.so.0 
#9 0x00007fdaf98fc3b1 in apr_pool_destroy () from /usr/lib/libapr-1.so.0 
#10 0x00007fdaf98fc27f in apr_pool_clear () from /usr/lib/libapr-1.so.0 
#11 0x00007fdafa1b960d in main (argc=11, argv=0x7fff93b50ef8) 
at /build/buildd/apache2-2.2.16/server/main.c:692

Even if you don't know anything about stack traces, this output gently points to Class::XSAccessor. Perrin Harkins on the mod_perl list suggested to update Class::XSAccessor to the latest CPAN version, since its changelog mentioned some segmentation faults fixed in 0.10.

And that did it. No more segfaults on Ubuntu 10.10. Solution: upgrade Class::XSAccessor to 0.10+. Thanks to Class::XSAccessor maintainer(s)!

Self-contained instant functional test suites

I first learned about functional test suites when I started working on the My Opera code. Two years and some time later, I found myself slightly hating the My Opera functional test suite.

The main reasons:

  • It's too slow. Currently, it takes anywhere between 20 and 30 minutes to complete. Sure, it's thousands of tests, divided into hundreds of test scripts, grouped by functional area, like login, blogs, albums, etc…

    Even considered all of this, I think we should aim to have a single functional test run complete in 5-10 minutes.

    Most of the time is being wasted in the communication with the test server and creation and destruction of the database.

  • It's unreliable, sometimes cumbersome to manage. In our setup, CruiseControl fires a functional test suite run after every commit into the source code repository.

    That initiates an ssh connection to the test server, where a shell script takes care of restarting the running apache, dropping and creating the test database, updating required packages, etc…

    This implies that you need an apache instance running on a specific port, a mysql instance running on another port, etc… In the long run, this proved to be an approach that doesn't scale very well. It's too error prone, and not very reliable. Maybe your test run will fail because mysql has mysteriously crashed, or some other random bad thing.

A functional test suite that sucks less

Given the motivation, I came up with this idea of a functional test suite that is:

  1. instant: check out the source code from the repository, and you're ready to go.
  2. self-contained: it shouldn't have any external servers that need to be managed or even running.
  3. reusable: it doesn't have to be a functional test suite. The same concept can be reused for unit test suites, or anything really.

A few months later the first idea, and a couple of weeks of work on it, and we were ready with the first prototype. Here's how we use it:

  • Check out the source code from the repository in ~/src/myproject
  • cd ~/src/myproject
  • ./bin/run-functional-test-suite
  • Private instances of Apache, MySQL, and the main application are created and started
  • A custom WWW::Mechanizer-based client runs the functional test cases against the Apache instance
  • A TAP stream from the test run is produced and collected
  • All custom instances are destroyed
  • ???
  • Profit!

The temporary test run directory is left there untouched, so you can inspect it in case of problems. This is priceless, because that folder contains all the configuration and log files that the test run generated.

The main ideas

  • The entire functional test suite should run within a unique temporary directory created on the fly. You can run many instances as you want, with your username or a different one. They won't conflict with each other.
  • Use as much as possible the same configuration files as the other environments. If you already have development, staging and production, then functional-test is just another one of your environments. If possible, avoid creating special config files just for the functional test suite.
  • The Apache, MySQL, and Application configuration files are simple templates where you need to fill in the apache hostname, apache port, mysql port, username, password, etc…
  • Apache is started up from the temporary directory using the full command invocation, as:

    /usr/sbin/apache2 -k start -f /tmp/{project}_{user}_testsuite_{pid}/conf/apache.conf
    

    and all paths in config files are absolute, to avoid any relative references that would lead to files not found.
    The configuration template files have to take this into account, and put a $prefix style variable
    everywhere, like (Apache config, for example):

    Timeout 10
    Keepalive On
    HostnameLookups Off
    ...
    Listen [% suite.apache.port %]
    ...
    

  • The MySQL instances are created and destroyed from scratch for every test suite run, using the amazing MySQL::Sandbox. My colleague Terje had to patch the sandbox creation script to fix a problem with 64-bit environments. We filed a bug on the MySQL::Sandbox RT queue about this. This is the only small problem found in a really excellent Perl tool. If you haven't looked at it, do it now.

All in all, I'm really satisfied of the result. We applied it to the Auth project for now, which is smaller than My Opera, and we fine-tuned the wrapping shell script to compensate for the underlyiung distribution automatically. We've been able to run this functional test suite on Ubuntu from version 8.04 to 10.04, and on Debian Lenny, with no modifications. The required MySQL version is detected according to your system, and MySQL::Sandbox is instructed accordingly.

It also results in a faster execution of the entire functional test suite.

What can we improve?

There's lots of things to improve:

  • Speed. Instead of creating and destroying the database for every test group, it would be nice to use transactions, and rollback everything at the end of the test group (t/{group-name}/*.t). I used to do this many years ago with Postgres, and it was perfectly safe. I'm not sure we will ever make it with MySQL, since for example, ALTER TABLE (and other SQL statements) completely ignores transactions.
  • Reliability. The private instances of Apache, MySQL (and recently we added memcached too), are started up from ports that are calculated from the originating bash script PID. So, if you run the test suite from the bash script running as PID 8029, then MySQL is started up on port 8030, Apache on 8031, memcached on 8032. This has worked very well for now, and we made sure we don't use ports < 1024, but could lead to mysterious failures if some local services are using ports like 5432, or others. The idea is that we could test if a port is available before using it. However, this error is already detected at startup time, so it shouldn't be a huge problem.

Feedback welcome!

Disassembling a real world Plack PSGI application

After I started playing with Plack, I tried to evaluate whether to continue using it for our mission-critical production stuff or give up, going back to the same techniques we already use (successfully).

I think it's time to develop and deploy a Plack based application. In my grand plan, :-), I'd like to deploy nginx with PSGI support, or even more ambitiously, nginx or apache with Starman as "backend" http server. We'll see…

In the meantime, I'd like to write here a couple of niceties about Plack and Starman, showing some real code I wrote when I started.

A real world PSGI application

Here's a sample PSGI application currently under development:

#!/usr/bin/env perl
#
# Sample PSGI application
#

use strict;
#se warnings;
use constant ENVIRONMENT         => 'development';
use constant APACHE_DEPLOYMENT   => (ENVIRONMENT eq 'production');
use constant ENABLE_ACCESS_LOG   => (ENVIRONMENT eq 'development');
use constant ENABLE_DEBUG_PANELS => (ENVIRONMENT eq 'development');

use Plack::Builder;
use AuthOpera;
use AuthOpera::Account;

my $app = AuthOpera::Account->new(); 

builder {

    enable_if { not APACHE_DEPLOYMENT }
        'Plack::Middleware::Static', 
        path => qr{^/(bitmaps/|images/|js/|css/|downtime/|favicon.ico$|ping.html$)},
        root => '..',
        ;

    mount "/account" => builder {

        enable_if { ENABLE_DEBUG_PANELS } 'StackTrace';
        enable_if { ENABLE_DEBUG_PANELS } 'Debug';   # panels => [ qw(DBITrace Memory) ];
        enable_if { ENABLE_DEBUG_PANELS } 'Lint';
        enable_if { ENABLE_DEBUG_PANELS } 'Runtime';
        enable_if { ENABLE_ACCESS_LOG   } 'AccessLog';

        $app;

    }

}

Of course, the main application code is not here, but in the AuthOpera::Account class. That's not really relevant to what we're discussing here. Let's just say that any class, to be a valid and complete PSGI application, has to:

  • subclass from Plack::Component
  • have a call() method
  • the call() method must return a valid PSGI response. Example:
    package MyPSGIApp;
    
    use strict;
    use Data:: Dumper ();
    use parent 'Plack::Component';
    
    sub call {
        # $env is the full PSGI environment
        my ($self, $env) = @_;
    
        return [
    
            # HTTP Status code
            200,
    
            # HTTP headers as arrayref
            [ 'Content-type' => 'text/html' ],
    
            # Response body as array ref
            [ '<!DOCTYPE html>',
              '<body><h1>Hello world</h1><pre>',
              Data:: Dumper:: Dumper($env),
              '</pre></body></html>',
            ],
        ];
    }
    
    1;
    

That's it, this is a full PSGI application that does dump all its PSGI environment.

Of course in a real example, you probably want a template engine to return the page content, etc… That's what we are building for our applications. Actually just assembling the components we already have developed during these years, so we have template classes, config classes, localization, database access, etc…

So we're basically just gluing these ready made components inside the PSGI application, and then using them. I don't think this is particularly original, but it allows us to quickly "port" our code to PSGI and thus run anywhere we want to.

app.psgi in detail

Now, let's see the PSGI app in more detail.

use constant ENVIRONMENT         => 'development';
use constant APACHE_DEPLOYMENT   => (ENVIRONMENT eq 'production');
use constant ENABLE_ACCESS_LOG   => (ENVIRONMENT eq 'development');
use constant ENABLE_DEBUG_PANELS => (ENVIRONMENT eq 'development');

These constants are used to turn on and off certain features mentioned later in the builder {} block. I just found out the other day that these constants are near to useless. That is because plackup and starman already provide a -E environment switch. If you start your application with:

starman -E development myapp.psgi     # same with plackup, the default server

then Plack will by default enable the debugging panels and the Apache-style access log. I found out about this after having written that file. This means that the following enable_ifs are unnecessary:

mount "/myroot" => builder {
    enable_if { ENABLE_DEBUG_PANELS } 'StackTrace';
    enable_if { ENABLE_DEBUG_PANELS } 'Debug';   # panels => [ qw(DBITrace Memory) ];
    enable_if { ENABLE_DEBUG_PANELS } 'Lint';
    enable_if { ENABLE_DEBUG_PANELS } 'Runtime';
    enable_if { ENABLE_ACCESS_LOG   } 'AccessLog';
    $app;
}

I think Plack enables by default at least StackTrace, Debug, and AccessLog. In my case, however, I'm also enabling RunTime and Lint. But more importantly, I need to differentiate between Apache deployment and Starman deployment. That affects the way static files are served.

When deploying under Apache, I don't need the following:

enable_if { not APACHE_DEPLOYMENT }
    'Plack::Middleware::Static',
    path => qr{^/(bitmaps/|images/|js/|css/|downtime/|favicon.ico$|ping.html$)},
    root => '..';

because my PSGI application is enabled in an Apache <Location> block, as in:

<Location /myroot/>
    SetHandler perl-script
    PerlResponseHandler Plack::Handler::Apache2
    PerlSetVar psgi_app /my/path/to/app.psgi
</Location>

So Apache already takes care of serving the static files for me. However, when running completely under Starman, I need to tell it which folders or paths need to be served as static files, and where they are located. This is the purpose of the Static middleware:

enable_if { not APACHE_DEPLOYMENT } 'Plack::Middleware::Static',
    path => qr{^/(images/|js/|css/|favicon.ico$)},
    root => '/var/www/something';

If you're always deploying through plackup or starman, then, again, you don't need any enable_if, just enable. Maybe it's also a good idea to put everything under /static. For me that wasn't possible, since I already had existing content:

enable 'Plack::Middleware::Static',
    path => qr{^/static/},
    root => '/var/www/something';

Plack::Builder

About the Plack::Builder bit, and the related builder function. That is a function that helps you specify what you want Plack to run and how. Example:

builder {
    enable 'StackTrace';
    enable 'Debug';
    enable 'AccessLog';
    $app
}

where StackTrace, Debug, and AccessLog are all middleware classes, so causes Plack to wrap your final $app application first with the AccessLog middleware, then Debug and then StackTrace. I didn't check the code, but I believe this creates 3 different PSGI applications that are meant to fiddle with the response that your own application generates.

PSGI makes this possible, and it's just great. More middleware means easier and faster development. And ultimately, very good middleware makes for great reuse too.

The mount wrapper

I used mount in my example very basicly, but you can use mount to assemble compounds of applications in a very simple way. The same thing you do, for example, with Django and urls.py, except that, if you have seen a non-trivial urls.py, it looks like spaghetti after a while. Compare with this:

my $app1 = MyApp->new();
my $app2 = MyApp2->new();
#...

builder {

    enable 'Plack::Middleware::Static', 
        path => qr{^/static/},
        root => '/var/www/something';

    mount "/path1" => builder {
        enable 'StackTrace';
        $app;
    }

    mount "/path2" => $app2;

    mount "/path3" => builder {
        enable 'SomeMiddleware';
        $app3;
    }

}

Of course, then you have to add some dispatcher logic to your applications, but in the Plack world, we don't lack good dispatchers.

Plack rocks.