Tag Archives: cache

Migration of VCL configuration from Varnish 2.0 to 2.1

Recently we migrated most of our services from Varnish 2.0 to 2.1.
I'd like to explain what we changed with code (VCL) examples side by side,
in case anyone still needs to migrate to 2.1 and needs some help as well :-)

req., bereq., beresp., and obj.

Usually this naming difference in VCL is not really explained. They say "x has been renamed to y"
and you should change the name. That's kind of annoying. In reality, yes, the names changed, and at first
it is annoying, but trying to understand why they changed allows them to stick
in your mind very easily.

In vcl_fetch(), obj. is now beresp.. Why?
Because vcl_fetch() is the part of the request stage where Varnish has
already performed a request against a backend and got a response from it. That means that
if you refer to obj. in vcl_fetch(), it really means that your
touching the backend response, hence beresp..

Similarly in vcl_pipe(), that is executed when the result of vcl_recv()
is to switch to pipe mode. In that case, however, Varnish hasn't made the request
to the backend yet, so if you used obj. in vcl_pipe() you really meant
to change the request that was going to be made to the backend, hence bereq..

Let's see the changes we had to make:

 sub vcl_fetch {
 
-    set obj.ttl = 88s;
-    set obj.grace = 10m;
-    set obj.http.X-My-Opera = "http://youtube.com/watch?v=br79xGSpgF4";
+    set beresp.ttl = 88s;
+    set beresp.grace = 60m;
+    set beresp.http.X-dramatic = "http://www.youtube.com/watch?v=a1Y73sPHKxw";
 
 }

And:

 sub vcl_pipe {
     # Streaming files too (see related vcl_recv() rule).
     # We need to close the request, or varnish remains in pipe
     # mode for the entire session with that client.
-    set req.http.connection = "close";
+    set bereq.http.connection = "close";
 }

Backend probes and .initial

Backend probing allows Varnish to detect that backends are either "healthy"
or "sick". The probe VCL config block allows to tweak how this should work. In
particular, .threshold is the number of successful probes that are necessary
for Varnish to consider a backend healthy. .interval is the number of seconds
between one probe and the following one.

As an example, you can define that a backend should be considered working
(healthy) when it answers successfully to at least 3 probes, with an interval of
10 seconds between each probe. In Varnish 2.0.4, this means that if restarted,
Varnish will wait 3 times 10 = 30 seconds before serving any requests
from that backend
, because all backends were considered dead (sick) at startup.

In 2.1 this limitation is removed by introducing an .initial attribute
in the probe block. .initial is the number of probes considered successful
when the service is started, or the backend is added, and there's no information about it.
The default value is assumed to be equal to .threshold, so backends are considered
healthy as soon as they are introduced.

I think you can understand from these tiny details how well Varnish is engineered.
This just makes sense, doesn't it? :-) Here's the diff from 2.0 to 2.1:

 backend nginx {
     .host = "localhost";
     .port = "8080";
-
-    # Disabled to avoid the 15s startup
-    # 2.0.4-5 doesn't have .initial
-    #
-    #.probe = {
-    #  .url = "/ping.html";
-    #    .interval = 5s;
-    #    .timeout = 1s;
-    #    .window = 5;
-    #    .threshold = 3;
-    #}
+    .probe = {
+        .url = "/ping.html";
+        .interval = 10s;
+        .timeout = 2s;
+        .window = 10;
+        .threshold = 3;
+        .initial = 3;
+    }
 }

And in vcl_recv():

 sub vcl_recv {

 [...]

-    #----------
-    # DISABLED: Only enable when .probe block above is enabled
-    #----------
     # Detect broken backends and keep serving requests if possible
-    #if (! req.backend.healthy) {
-    #    set req.grace = 10m;
-    #} else {
-    #    set req.grace = 5s;
-    #}
+    if (! req.backend.healthy) {
+        set req.grace = 60m;
+    } else {
+        set req.grace = 5s;
+    }

Regular expression matching

Another "big" difference is the use in 2.1 of a Perl-compatible regular expression engine,
(PCRE) instead of the POSIX-style regex matching that used to be in 2.0.
This is a good change for me, as I'm pretty much used to Perl regex and I know next to nothing
about POSIX.

This change actually created a subtle problem that I caught only with a thorough testing
of our configurations. We use regex matching in a few places in our VCL configuration,
usually to analyze cookies
and set special "flags" that are then used to force
a HTTP Vary header, to make Varnish store different cached versions of the same
URL.

One of these cases is the language cookie, where we store a sticky
user preference about site language. Here's how the code changed:

  # STD: Sticky language cookie
  if (req.http.Cookie ~ "language=") {
      set req.http.X-Language =
-         regsub(req.http.Cookie, "^.*?language=([^;]*?);*.*$", "1");
+         regsub(req.http.Cookie, "^.*?language=([^;]*);*.*$", "1");
  }

  ...

  # Mobile view cookie
  if (req.http.Cookie ~ "mobile=") {
-     set req.http.X-Mobile = 
-         regsub(req.http.Cookie, "^.*?mobile=([^;]*?);*.*$", "1");
+     set req.http.X-Mobile =
+         regsub(req.http.Cookie, "^.*?mobile=([^;]*);*.*$", "1");
  }

In case you find it difficult to spot the change, it's the removal of the *?
(non-greedy star) operator. Non-greedy matching was used in 2.0, POSIX matching, to make
sure that the * didn't match too many characters, and thus eat part of other cookies. Except
POSIX regex matching does NOT have a non-greedy star operator. I just
didn't know that, and it's of course a bug, but it had worked perfectly so far… WTF???

For even more weirdness, why did I take the non-greedy star (*?) away now that it should
be supported with PCRE-matching? I removed it because otherwise the result of those
regsub() expressions are always empty!

Believe it or not, it looks exactly like 2.0 had PCRE and 2.1 has POSIX, which is
obviously not what's happening. If you know more about this and you can shed some light,
please contact me or leave a comment below.

Hope you liked this 2.0 -> 2.1 migration journey. I'm looking forward to 2.1 -> 3.0!
It's a bit more work there, because I will need to migrate my
my accept-language C extension
to the new vmod system, which I already started working on :-)

Have fun!

A command line tool for Debian to purge Varnish objects

I've been using varnish mostly on Debian systems. I found the reload-vcl script included in Debian to be useful.

The reload-vcl script

It's part of the standard varnish debian package. It uses the system defaults in /etc/defaults/varnish, so it knows how to correctly invoke the varnishadm utility to perform administrative commands. As the name implies, it reloads the default VCL file using the vcl.load and vcl.use commands, checking that every step succeeds properly before continuing so it's safe to use. It loads the new VCL file and labels it automatically with a unique id.

Something analogous but regarding the purge functionality could have been useful, so I looked at the source code for reload-vcl. Most of it deals with loading of /etc/defaults/varnish and various sanity checks. I reused that bit to make another script, to control cache purging.

The purge-cache script

Here's the full source code. Below there's a link to download the latest version from github.


#!/bin/sh

# purge-cache: Script to purge varnish cache. Defaults are defined in
# /etc/default/varnish.
#
# Cosimo <cosimo@cpan.org>
# Based on reload-vcl, by Stig Sandbeck Mathisen <ssm at debian dot org>

# Settings
defaults=/etc/default/varnish

# Paths
varnishadm=/usr/bin/varnishadm
date=/bin/date 
tempfile=/bin/tempfile

# Messages
# msg_no_varnishadm: varnishadm
msg_no_varnishadm="Error: Cannot execute %sn"
msg_no_management="Error: $DAEMON_OPTS must contain '-T hostname:port'n"
# msg_defaults_not_readable: defaults
msg_defaults_not_readable="Error: %s is not readablen"
# msg_defaults_not_there: defaults
msg_defaults_not_there="Error: %s does not existn"
msg_usage="Usage: $0 [-h][-q][-u <url>|-r <regex>|-a]nt-htdisplay helpnt-qtbe quietnt-utpurge by exact (relative) url (ex.: /en/products/)nt-rtpurge objects with URL matching a regex (ex.: ^/blogs/)nt-atpurge all objects from cachen"
msg_purge_failed="Error: purge command failedn"
# msg_purge_url: url
msg_purge_url="Purging objects by exact url: %sn"
# msg_purge_regex: regex
msg_purge_regex="Purging objects with URL matching regex: %sn"
msg_purge_all="Purging all cachen"
msg_purge_ok="Purge command successfuln"

# Load defaults file
if [ -f "$defaults" ]
then
    if [ -r "$defaults" ]
    then
        . "$defaults"
    else
        printf >&2 "$msg_defaults_not_readable" $defaults
        exit 1 
    fi
else
    printf >&2 "$msg_defaults_not_there" $defaults
    exit 1
fi

# parse command line arguments
while getopts hqu:r:a flag
do
    case $flag in
        h)
            printf >&2 "$msg_usage"
            exit 0
            ;; 
        u)
            purge_method=url
        url="$OPTARG"
            ;; 
        r)
            purge_method=regex
        regex="$OPTARG"
            ;; 
        a)
            purge_method=all
            ;; 
        q)
            quiet=1
            ;; 
        *)
            printf >&2 "$msg_usagen"
            exit 1
            ;; 
    esac
done

# Parse $DAEMON_OPTS (options must be kept in sync with varnishd).
# Extract the -f and the -T option, and (try to) ensure that the
# management interface is on the form hostname:address
OPTIND=1
while getopts a:b:dFf:g:h:l:n:P:p:s:T:t:u:Vw: flag $DAEMON_OPTS
do
    case $flag in
        f)
            if [ -f "$OPTARG" ]; then
                vcl_file="$OPTARG"
            fi 
            ;; 
        T)
            if [ -n "$OPTARG" -a "$OPTARG" != "${OPTARG%%:*}" ]
                then
                mgmt_interface="$OPTARG"
            fi  
            ;;  
    esac
done

# Sanity checks 
if [ ! -x "$varnishadm" ]
then
    printf >&2 "$msg_no_varnishadm" $varnishadm
    exit 1
fi

if [ -z "$mgmt_interface" ]
then
    printf >&2 "$msg_no_management"
    exit 1
fi

logfile=$($tempfile)
purge_command="vcl.list"

# Now run the purge command against the admin interface
if [[ $purge_method = "url" ]]
then
        purge_command="purge req.url == $url"
        printf >&2 "$msg_purge_url" $url | grep -v "^$" > $logfile
else
    if [[ $purge_method = "regex" ]]
    then
        purge_command="purge.url $regex"
        printf >&2 "$msg_purge_regex" $regex | grep -v "^$" > $logfile
    else
        if [[ $purge_method = "all" ]]
        then
            purge_command="purge.url ."
            printf >&2 "$msg_purge_all" | grep -v "^$" > $logfile
        fi
    fi
fi

# For some reason, using:
#
#   fi | grep -v "^$" > $logfile
#
# results in purge_command assignment being wiped out
# at the end of the block??

if [ -z "$purge_command" ]
then
    printf >&2 "$msg_usagen"
    exit 1
fi

# echo "cmd: $varnishadm -T $mgmt_interface $purge_command"

if $varnishadm -T $mgmt_interface $purge_command
then
    printf >&2 "$msg_purge_ok"
else
    printf >&2 "$msg_purge_failed"
    exitstatus=1
fi | grep -v "^$" > $logfile

# Blather
if [ -z "${quiet}" -o -n "$exitstatus" ]
then
    cat >&2 $logfile
fi

# Cleanup
rm -f $logfile  
exit $exitstatus

You can control how objects are purged from the cache with 3 options:

  • -a: purges all objects
  • -u <url>: purges an exact url
  • -r <regexp>: purges objects matching a regular expression
  • Examples

    
    # Purges all objects
    purge-cache -a
     
    # Purges all objects starting with "/products"
    purge-cache -r '^/products'
    
    # Purges objects with exact URL
    purge-cache -u '/en/homepage'
    

    Goal: no downtime

    Both reload-vcl and purge-cache can be combined together in a single script to be triggered when deploying new VCL code or new backend applications. Instead of restarting varnish, which I really don't like, and it's not very reliable either (on Debian sometimes it won't come back up), I use purge-cache -a to purge all objects and then reload-vcl to load and use the newly deployed VCL code.

    This procedure has no downtime at all. The effect of purging all objects can potentially be hard on the backends, but we're not at that point yet. Usually in the busiest applications we have, it takes around 10-20 seconds to reach 70%-75% of hit rate, so I would say that's not really a problem right now.

    Download!

    You can download the purge-cache script from github. I contacted the maintainer of the reload-vcl script. Maybe he will include purge-cache in the next release of the varnish debian package… or maybe I could package it as a Perl CPAN module.

Cache::Memcached::Mock, instant in-process memcached mock

A week ago, I wrote a Cache::Memcached mock module for some complicated unit tests in this project I'm working on.

A few people asked to upload it to CPAN, so here it is:

Cache::Memcached::Mock v0.01 is on CPAN.

I didn't spend that much time polishing it and making documentation, so it's a bit rough around the edges, but you get the idea.

You can use it as a drop-in replacement for Cache::Memcached when you don't want, or can't afford, to run your own memcached daemon.

I've already got a feature request from a colleague: making sure set() fails if you try to store a value bigger than 1Mb.

More varnish, now also on www.opera.com

I have been working on setting up and troubleshooting Varnish installations quite a bit lately. After deploying Varnish on My Opera for many different uses, namely APIs, avatars, user pictures and the frontpage, we also decided to try using Varnish on www.opera.com.

While "www" might seem much simpler than My Opera, it has its own challenges.
It doesn't have logged in users, or user-generated content, but, as with My Opera, a single URL is used to generate many (slightly) different versions of the content. Think about older versions of Opera, or maybe newest (betas, 10.50), mobile browsers, Opera Mini, site in "Mobile view", different languages, etc…

That makes caching with Varnish tricky, because you have to consider all of these variables, and instruct Varnish to cache each of these variations separately. No doubt opera.com in this respect is even more difficult than My Opera.

So, we decided to:

  • cache only the most trafficked pages (for now only the Opera startup page)
  • cache them only for Opera 10.x browsers
  • differentiate caching by specific version (the "x" in 10.x)

We basically used the same Varnish config as My Opera, with the accept-language hack, changing only the URL-specific logic. With this setup, we managed to cut down around 15% of backend requests on opera.com.

My Opera front page caching and Varnish hacking

The My Opera front page

According to our internal statistics, the front page of My Opera makes up for a consistent part of the entire traffic we get on our servers. So it's normal we have been working to optimize it for a very long time.

When we knew that Opera Mini 5.0 would be released with our front page as one of the preloaded speed dials, then we started to study the situation in more depth and plan what to do (and quickly!).

Mini 5 is already out, and used by lots of people, and during the last months, we have been getting more and more front page views than ever. What I'm going to tell you is the last (final?) step of the front page performance optimizations we worked on. If it works well, we could be able to apply it to other heavy parts of the site.

Enter Varnish…

Varnish is a reverse proxy cache software.
If you know Varnish already, I suggest you take a look at this great presentation from OSCON 2009.

During October 2009, we deployed our first Varnish server for My Opera, for some very specific and mostly static content. At that time, for me it was very experimental. I hardly knew anything about Varnish :) and in fact, we had some problems here and there. Then we gradually acquired some experience, and so we thought of using Varnish also, and for the first time, for a dynamic request.

Front page caching

Caching a full HTML page presents more challenges than caching a picture. For pictures, you can ignore the User-Agent and the cookies. At least in our case. You can ignore user language preferences. You can also ignore the Accept-Language HTTP header. For My Opera, we also have the Mobile view feature.

All of that means that if you're going to cache, say, the front page of My Opera, you can have:

  • 4 main types of browsers: Opera Mini, Opera Mobile, IE and the standards compliant;
  • 18 different languages, the ones in the language selector at the bottom, from Bulgarian to LOLCAT and Simplified Chinese, selected by either the sticky "language" cookie or by the Accept-Language header.
  • 2 views, mobile and full/desktop view

That makes a grand total of nearly 100 different versions of one single page.
Of course all of this is just for the logged out users. We don't want (and couldn't either) cache each single logged in user version of the frontpage (with the activity feed and all the rest).

Reducing the variations

For the caching to work properly, and be effective, we needed to find a way to reduce the possible number of versions of the front page. So in the Varnish VCL file, we match the User-Agent string, to reduce it to any of 4 predefined strings like "operamini", "operamobile", "msie", or "nomatch". So instead of having &inf; user agent strings, we get only 4.

Then another similar problem is the Accept-Language header. This header can be quite complex, depending on your browser settings, and there's no easy method to "figure-out" what language you want. From a string such as:

de-DE,tr;q=0.999,en;q=0.75,fr;q=0.9,it;q=0.8,ja;q=0.2,ru;q=0.1

you have to build a list of prioritized language preferences and match them against the languages your site can offer.

Failing to do that means, by default, having a different version of the frontpage for each different Accept-Language header, which is very variable across clients, even if there are very common values. A brief statistics gathering session showed 500 distinct values in about 10,000 browser requests.

accept-language.vcl

Varnish allows you to embed C code inside a VCL file. This is a pretty advanced feature that is not very much talked about. Given that using regexp to massage Accept-Language appeared to be messy, we discussed another crazy idea. Writing a C function to parse Accept-Language, and then embed that function into the Varnish VCL config.

Let's say that your site has English and Japanese. Your user browsers will send every possible Accept-Language header on Earth. If you enable Vary: Accept-Language on Varnish or on your backends (and you should)
the cache hit ratio will rapidly drop, because of the huge variations in Accept-Language contents. Varnish will store one version of the page for every different accept language string. That's bad.

With this hack, the Accept-Language header will be "rewritten" to just "en" or "ja", depending on your client settings. If no match occurs, a default language will be set ("en"). This brings the language variants down to exactly 2, the number of languages your site supports. In our case it's 18 versions, so down from ~500 to 18.

It seems a bit weird that we're the only ones having this problem :)

Most probably we're trying to solve this problem directly in Varnish, while usually this is dealt with at the backend level. Solving this inside Varnish is very nice, because it allows to scale more easily to other pages as well, with no modifications to the backends config or code.

If you think this might be useful for you too, you're welcome to get the code and try it out. It's on Github:

http://github.com/cosimo/varnish-accept-language/

Pay attention! It's experimental stuff, don't try it in production without extensive testing. And let me know how it goes :)