Tag Archives: varnish

Fast VCL checks for personalized backend responses

I’d like to talk about a problem I encountered a few years ago and one possible solution to it. This particular problem stuck with me for a long time for several reasons.
The first one is that at the time I considered the problem basically unsolvable. It would be like having a cake and eating it too, as the proverb goes. Another reason is that this problem had me spinning my wheels thinking about a solution for a good while.

Without any pretense of this being a particularly clever solution or anything like that, I’d like to illustrate what the general problem is and a possible solution I came up with. Hopefully this will be useful to you.

The general problem

Suppose you have a backend request of some sort, an API or a particular web page. In my case it was a json-based recommendations API, which returned a list of recommended news articles to read. The specific purpose of the request is not terribly important. What’s more important is the fact that this request can be personalized depending on the user that makes the request. I believe this is a quite common scenario.

In a recommendations context, it’s also common for a user not to be signed in to the service, or to be invoking the API for the first time. In this case, the recommendations engine does not have any previous information about the user, also called the cold start case.

In this specific project, we had operated in a “permanent cold start mode”, meaning the recommendations we were offering were never differentiated per user. There were a few knobs and settings to influence which type of recommendations one would get from the system (f.ex. less Sports articles and more Arts or Travel), but the system would not learn over time or change its recommendations based on user signals like articles read.

Among other things, this mode of operation allowed us to serve our entire userbase (around 90M monthly active users, around 10M weekly) with only two servers per data-center, also thanks to a very aggressive caching strategy.

When we started experimenting with personalized recommendations, it was immediately clear that we would not be able to handle the additional backend load caused by all the per-user requests. We estimated that, given the cache hit ratio drop, we would need something ridiculous like 50x the amount of servers. For each API request, we would have to:

  1. fetch the distinct user profile
  2. check if the profile contained any information about previously read articles or otherwise useful information to personalize the offered recommendations
  3. compute and return the personalized recommendations

These steps can only be performed by the recommendations engine backend. This implies that we would not be having any help from our caching in Varnish, which made personalized recommendations much harder to implement for us, at least without employing inordinate amounts of servers and having to significantly rebuild our system infrastructure.

You could very well say that that is a problem in itself, and it probably is :-)

A possible solution

I remember spending quite some time thinking about this, not seeing any possible solution. One day I attended a meetup. One of the engineers there talked about the Varnish API engine. The API Engine is a commercial Varnish add-on that can implement authentication and paywalls directly in the caching layer. The person talking about this mentioned how API engine embedded the SQLite3 database, and how this was crucial to the performance of it, since the caching layer is effectively the first bottleneck of a system.

I connected the dots almost immediately and I realized I had a possible way forward to solve my problem. This is how I imagined I could approach the problem:

  • organize user signals collection (what articles each user is reading, etc…) and user profile building as a completely separate batch activity
  • every x number of hours, build a sqlite database with a single table, user_profiles, consisting of two columns, a user_id string and a has_profile boolean. With such table in place, looking up whether we can build a significantly personalized recommendations set for a user is a only an SQL primary-key lookup away.
  • Using the excellent SQLite3 vmod, implement this SQL lookup in our existing Varnish VCL layer. Make sure that for every possible case this code never fails. For example, if the database file does not exist, or the file is for some reason corrupt, etc… we want to behave as if the particular user for the running request had no personalized profile.
  • Ensure that we would be able to update the SQLite database file at any time, without stopping Varnish, and the new file would be visible to the SQL queries immediately or at least after a short delay.

We tested the whole assembly and it seemed to work correctly. The final step consisted in actually computing the personalized profiles, building the real SQLite database, syncing it to the backend systems, and performing the dispatch logic in the VCL layer.

This is more or less the final logic I used:

  • If the request was for an anonymous user, don’t even perform the user profiles SQL lookup, and return the generic recommendations cached payload.
  • If the request comes from a user that has no personalized profile, that is, no record is present in the SQLite table, also return the generic recommendations payload.
  • If the user profiles lookup is positive, that is, a record exists in the user profiles table in SQLite and its has_profile flag is true, then pass the request on to the backend. We know it is a request that must be personalized and only the backend can do that.

Using such logic allows to serve the majority of your user base, which presumably has not logged in, or does not have any significant user profile yet, caching as much as possible. But it also allows personalized recommendations for all users that do have a profile.

We are shifting the critical decision as early in the chain as possible, that is, in your caching layer, either Varnish or similar, before the backend service is even consulted. Taking the decision to the backend service would not be feasible for the reasons already discussed.

The actual code

We used Puppet as configuration management tool back then, with a custom varnish module. I extended the existing manifest to add a new user_profiles.vcl file and to install by default the sqlite3 vmod for Varnish.

The existing VCL code was also modified to:

  • perform the personalized profile SQL query
  • decide whether to pass the request based on the result of the SQL query

The following code illustrates those two steps:

diff --git a/config.vcl b/config.vcl
index 8e25a8a..50c70ce 100644
--- a/config.vcl
+++ b/config.vcl
@@ -1,22 +1,23 @@
 # Recommender system VCL config

 include "/etc/varnish/accept-encoding.vcl";
 include "/etc/varnish/purge.vcl";
 include "/etc/varnish/x-forwarded-for.vcl";
 include "/etc/varnish/auth.vcl";
 include "/etc/varnish/stats.vcl";
+include "/etc/varnish/user_profiles.vcl";
 include "/etc/varnish/strip-tracking-cookies.vcl";

 backend apache {
     .host  = "127.0.0.1";
     .port  = "8000";
     .probe = {
         .url       = "/ping.html";
         .interval  = 10s;
         .timeout   = 5s;
         .window    = 20;
         .threshold = 3;
         .initial   = 3;
     }
 }
@@ -147,45 +148,49 @@ sub vcl_recv {
     if (req.backend.healthy && req.http.User-Agent ~ "McHammer") {
         return (pass);
     }

     # Client clicks must go through the backend (*with* client-id cookie)
     if (req.url ~ "^/api/1\.0/feedback/") {
         return (pass);
     }

     call check_authorization;
+    call check_user_profile;
     call accept_encoding_normalize;

+    # Users with tracking cookies can be served personalized results
+    if (req.http.X-Profile == "1") {
+        std.log("User has customized profile. Rolling the dice.");
+        # Initially keep the percentage of PASS very low, to test the
waters.
+        if (std.random(0, 100) < 1.0) {
+            std.log("User has customized profile and within 1.0%.
Passing.");
+            return (pass);
+        }
+    }

 }

The new user_profiles.vcl file consisted of the following code:

#-----------------------------------------------------------------------------
# Fast check for personalized user profiles
#-----------------------------------------------------------------------------
#
# The general idea is to use this fast check to send users who we know
# have a personalized user profile to the backend without caching, while
# retaining the ability to send cached objects for everyone else.
#
# Uses a SQLite3 database and libvmod-sqlite3 by Federico Schwindt:
# https://github.com/fgsch/libvmod-sqlite3
#
# Extracts the `clientId' from the HTTP Cookie header.
# Looks up the profile_id key having value equal to the `clientId' cookie.
# The underlying schema is very simple:
#
#   CREATE TABLE user_profiles (
#       profile_id char(100) PRIMARY KEY NOT NULL,
#       data text
#   );
#
# At least initially we will not use the data column.

import sqlite3;

sub vcl_init {
    sqlite3.open("/etc/varnish/user_profiles.db", "|;");
}

sub check_user_profile {

    # Quick yes/no test for the clientId cookie
    if (req.http.Cookie ~ "userId=") {

        # Extract a userId value from the Cookie header,
        # which remains untouched. Make sure we can still extract a clientId
        # value even if there's other cookies before/after ours.
        #
        # XXX Not sure what happens when client sends multiple Cookie lines.
        set req.http.X-Profile-Id = regsub(req.http.Cookie,
            "(?:^|.*;\s*)(?:userId=(.*?))\s*(?:;.*|$)", "\1");

        # No need to do anything if userId hasn't been found
        if (req.http.X-Profile-Id != "") {
            #std.log("Checking profile_id: " + req.http.X-Profile-Id);

            # First case of VCL-injection vulnerability :-)
            set req.http.X-Profile = sqlite3.exec(
                "SELECT 1 FROM user_profiles WHERE profile_id='"
                + req.http.X-Profile-Id
                + "'");

            # req.http.X-Profile !~ "^SQL" to catch errors like missing DB,
            # but seems a bit fragile. Depends on libsqlite3 and/or the vmod.
            if (req.http.X-Profile == "1") {
                std.log("User profile " + req.http.X-Profile-Id
                    + " found (" + req.http.X-Profile + ")");
            }
            else {
                std.log("User profile " + req.http.X-Profile-Id
                    + " not found");
            }
        }
    }
}

The commit message

I believe that good solutions deserve awesome commit messages. Here’s what I wrote:

Date:   Thu Jan 28 19:36:46 2016 +0100

    Fast VCL check for personalized profile existence

    How to have the cake and eat it too. Serve cached objects to the majority of
    users while personalizing recommendations to the ones that actually have a
    significant user profile available.

    Got the idea from the Varnish API engine[1].

    It's possible to perform tens of thousands of sqlite database lookups a second
    while processing requests in Varnish through VCL, thanks to SQLite3 being very
    lightweight and in this case embedded right inside Varnish through the sqlite3
    vmod[2].

    This commit hopefully adds all there is to it. The last bit is obviously the
    database file, which I placed in `/etc/varnish/user_profiles.db'. We will need
    to generate the .db file from the clicker and sync it to all frontends.

    Updates seem to be received immediately.

    When no database file is present, as will be in the initial deployment, the
    `check_user_profile()' function will work normally, signaling that no custom
    user profile has been found.

    [1] https://www.varnish-software.com/products/varnish-api-engine
    [2] https://github.com/fgsch/libvmod-sqlite3

How to rollout gradually?

Another interesting aspect is the way we could “control the flow” to this personalized recommendations API, that is, deciding what percentage of users that had personalized profiles, would actually get personalized recommendations.

A gradual rollout would certainly be the best approach, and it was implemented in two different ways:

  • once the SQL lookup was performed and the result was positive, we would still “roll the dice” and only allow 1% (or 5%, 10%) to actually pass through to the backend as personalized recommendations. This was an additional safety measure.
  • when batch building the SQLite database, we could decide to curtail the amount of users with personalized profiles. For example, excluding all users that had not read at least 5 or 10 articles. This barrier served two purposes. It effectively limited the amount of users that would be included in the SQLite database and at the same time made sure we had accumulated significant user profile information before attempting to serve personalized recommendations. A sort of win-win I didn’t expect at first :-)

As usual, if you have any feedback, email me or write below (but comments are subject to approval due to lots of spam).

Display and filter traffic at the varnish level: vlogdump

Haven’t written much in the last few months. The reason is that I’ve been at work building the Opera Discover service backend, that we launched on Opera mobile for Android just a few days ago.

A few weeks before, during the first stress test sessions of Discover, I wrote this little tool called vlogdump that Opera allowed me to put up on github. The main purpose, besides learning awk :-) is to display and filter traffic coming into your varnish daemon..

vlogdump is not meant to replace varnishlog but I know that sometimes varnishlog gives me too much output to deal with, especially if I want to pinpoint a single client or a single request. I know that the varnishlog that ships with varnish 3.0.x is way better in this regard, but we’re using 2.1.x, and that version of varnishlog is not as capable.

vlogdump is easier to look at than varnishlog, but at the same time it conveys much more information than varnishncsa or the typical access.log format.

Here’s an example of output:

$ varnishlog | vlogdump -v only_misses=1
172.22.0.15 => GET /assets/e85ed0a7b1b87120a0a2bfa025531c6733a48802 HTTP/1.0 MISS
            <= 200 OK 28.432 ms 172.22.0.18 => GET /assets/5a9e9440c5c85e8dc5d65e03e15c95e390901fa7 HTTP/1.0 MISS
            <= 200 OK 36.905 ms 172.22.0.18 => GET /icons/categories/te/icon32x32-technology.png HTTP/1.0 MISS
            <= 304 Not Modified 0.589 ms 172.22.0.15 => GET /api/fetch/article-preview/?client=2&language=en-GB HTTP/1.1 MISS
            <= 301 MOVED PERMANENTLY 8.381 ms 172.22.0.18 => GET /assets/c3830e95b717761005e26ce49ebab253e0ccb40b HTTP/1.0 MISS
            <= 200 OK 291.354 ms 172.22.0.18 => GET /api/category?client=2&language=en-GB HTTP/1.1 MISS
            <= 200 OK 58.025 ms   ...

Another interesting example.

Show request and response headers of transactions that resulted in cache hits and had request headers (any of them) matching "Android":

$ varnishlog | vlogdump -v show_req_headers=1 -v show_resp_headers=1 -v req_headers_match=Android -v only_hits=1
83.149.37.122 => GET /api/category/?... HTTP/1.1 HIT
            <= 200 OK         0.088 ms
   req.http.Accept = application/json;version=1
   req.http.Accept-Encoding = gzip
   req.http.Host = ...opera.com
   req.http.Connection = Keep-Alive
   req.http.User-Agent = Mozilla/5.0 (Linux; Android 4.1.2; GT-N7100 Build/JZO54K) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.58 Mobile Safari/537.31 OPR/14.0.1074.57768
 
   beresp.http.Server = Apache
   beresp.http.Content-Encoding = gzip
   beresp.http.Content-Type = application/json
   beresp.http.Vary = Accept-Encoding, Origin
   beresp.http.Content-Length = 4217
   beresp.http.Date = Sat, 25 May 2013 07:59:53 GMT
   beresp.http.X-Varnish = 1611090407 1611007435
   beresp.http.Age = 267
   beresp.http.Via = 1.1 varnish
   beresp.http.Connection = keep-alive

Now that you're eager to try it :-), you can do so in a few commands, and assuming you have the right awk installed:

wget -q -Ovlogdump https://raw.github.com/cosimo/vlogdump/master/vlogdump
varnishlog | ./vlogdump [options]

The documentation lists all the available options.

You can do more interesting things:

  • display request or response headers for each transaction
    (-v show_req_headers=1, -v show_resp_headers=1)
  • show only requests slower than 200ms (vlogdump -v only_slow=200)
  • show only cache misses or hits (-v only_hits=1 or only_misses=1)
  • show only transactions where the URL matches regexp X
    (-v url_match='X' or -v url_match='!X' for negative match)
  • show only transactions where the HTTP status code was X
    (-v only_status=X)
  • show only transactions where the request or response headers match a given regular expression (-v req_headers_match=Blah, -v resp_headers_match=Error)

You can also combine most of these options together. That is very useful when you are interested in a small fraction of the traffic, but you want to see the whole in-flight transactions.

One recommendation though. It is my first (last?) significant awk script :-) I know it works well, and I'm using it, but due to the way it works, I wouldn't leave it running for long periods of time, as it will slowly eat your memory keeping track of all transactions and clients.

If you have feedback or questions, feel free to comment on github or send me an email.

Report from the Varnish Users Group (VUG5) meeting in Paris – Day 1

Last week I attended the VUG5 meeting (https://www.varnish-cache.org/vug5). The following is my report of the conference Day 1, the "Users" day.

TL;DR

I learned a lot on (for me) gray areas of Varnish like 3.0, VMODs, ESI and various corner cases. My presentation on how we use Varnish at Opera sparked a lot of interest especially in our thumbnail service.

Day 1, VUG5 users day

Day 1 was held at La Défense, a mega business district just outside of Paris. All day was filled with presentations by Varnish Software people and a few other companies. On with the list, and my notes on the side.

Keynote: Varnish in 2020 by Poul Henning Kamp, Varnish Software

Poul runs thttpd, he's not a varnish user, so welcomes feeback from all users. That's why of the VUGs.

Varnish today is "The HTTP delivery engine". And in 2020? Hard to predict. PHK usually predicts things really badly. What we _can_ see is:

  • HTTP/2.0 Last call status just a few weeks ago
  • Google's SPDY support in Varnish? Most likely. Depends on future development and what/how many clients pick it up
  • HTTP over UDP? Lots of interest in this lately

Most likely future work on varnish:

  • Clearer split of transport and semantics
    (could speak HTTP no matter whether over UDP, TCP or SPDY)
  • Generic pluggable protocols (SPDY, f.ex.)
  • Decouple client protocol and backend protocol. Talk SPDY to client, talk HTTP to backend.

SSL in Varnish? Unlikely, just use Pound or nginx or whatever. Pound is simple and robust.

Varnish Book by Kristian Lyngstøl, Varnish Software

Expanding and improving on the existing training course material, Kristian and some contributors created a "Varnish Book", to help people starting up with Varnish. It will be is freely available at https://www.varnish-software.com/static/book/. Now there's only a cute bunny though.

Varnish + Escenic by Richard Zuidhof, Escenic?

Richard explained how he used Varnish to migrate away from the Apache/Squid/Apache sandwich and made it better/faster and his company saved a lot of money in the process.

Interesting points:

50x errors received from the backends are served doing a restart in vcl_fetch() but hitting a "dummy" backend, a sort of static version of a real backend. Something like:


  sub vcl_recv {
     ...
     if (req.restarts > 0) {
         set req.backend = dummy;
     }
  }
 
  sub vcl_fetch {
     if (beresp.status == 500) {
        return (restart); # Or whatever this is
     }
  }

Also talked about various timeouts, like:

 
  {
    .first_byte_timeout = 1s;
    .between_bytes_timeout = 1s;
  }

and how he needed to reset them back to 120s/180s for some of their pages to work.

He said: a timeout event from backend should cause Varnish to fall back to stale content. Not the case currently.
Varnish will abort the fetch operation. So pay attention.

Mobile device detection by Lasse Karsten, Varnish Software

Talked about various libraries and ways to detect mobile devices, including:

  • libvarnish-deviceatlas
  • WURFL
  • … others I didn't write down in time

Basically it was a way to survey how many people
use this technology and say that Varnish Software has a
commercial solution but they are going to open source
it Soon(tm), or something along these lines.

I was a bit distracted because I was having problems with the laptop and my presentation
was coming up, so… I plan to go back to this presentation once the slides are up.

ESI and Varnish by Federico Schwindt, RBS

Summary of how RBS is using ESI for an internal website used by RBS employees.

Basically the service is composed of various "boxes", small windows in the page with some information that depends on location, department or other things, and they use Varnish to cache those small boxes and ESI to compose the final page.

Problems:

  • They can't find a way to also keep the fully composed page as a cache object.
  • Invalidation logic is complex because of inter-dependent content between different boxes.

Interesting: they use a HTTP header sent by the backend to instruct Varnish on when to do ESI processing, so ESI is not a on/off as a whole, but it can be triggered on specific pages. This is very cool because it could also solve the development/production setup problem I had always feared when using ESI. With that I mean the complication of using development environments with ESI, where every dev installation needs a ESI-aware varnish.

Varnish at Opera by me

I talked about how we use Varnish in our projects. I mentioned a few Varnish extensions I worked on, including varnish-accept-language and varnish-geoip, plus other tools like http-cuke.

There were plenty of real world examples of VCL configuration we use in the various projects. I also talked about the varnish puppet module we wrote, that comes with a bunch of interesting customizations and fixes, included in the puppet-modules repository on Github.

If you're interested, slides are published here:

http://www.slideshare.net/cstrep/vug5-varnish-at-opera-software

I got lots of feedback and questions about our picture thumbnail service, so I'll probably write more about it soon.

Security with VCL by Kacper Wysocki, Redpill Linpro

Easily one of the best talks of the day. Kacper explained his security.vcl project. Here's a few highlights, but it's really interesting, I hope slides will be up soon.

  • Wrote modsec rules parser and converter to VCL
  • Eduardo Scarpellini, Master thesis, OWASP, worked on a varnish-firewall project, similar in scope, and did a in-depth research, finding that out of the OWASP top broken apps, he could automatically block 73% of XSS and SQL injections.
  • security.vcl is now used in ~10 sites with lots of traffic
  • Drawback compared to mod-security is that no POST data can be analyzed (yet)
  • In the future, we will see a merge of security.vcl and varnish-firewall projects.

Varnish modules by Kristian Lyngstøl, Varnish Software

I don't remember much, but I think Kristian basically tried to get more people to use VMODs, and said there's now a nice page where a list of known VMODs is kept:

http://www.varnish-cache.org/vmods

and you can register your own VMODs and have them listed.

Stay tuned for the "Day 2, Developers day" part.

How to start up varnish with a custom cc_command on Debian

If you need to compile your varnish VCL file with custom options, maybe because of libraries like GeoIP, and you're running Debian, you can not use the init script that's shipped by default.

It will not work because of how shell expansion works in the start-stop-daemon command contained in the init script. I wrote my explanation and a proposed fix in much more detail in this stack overflow question:

http://stackoverflow.com/questions/5906603/varnish-daemon-opts-options-errors/8333333#8333333

TL;DR: (+ quick & dirty fix) patch your init script like this:

 start_varnishd() {
     log_daemon_msg "Starting $DESC" "$NAME"
     output=$(/bin/tempfile -s.varnish)
-    if start-stop-daemon 
-       --start --quiet --pidfile ${PIDFILE} --exec ${DAEMON} -- 
-       -P ${PIDFILE} ${DAEMON_OPTS} > ${output} 2>&1; then
+    if bash -c "start-stop-daemon 
+        --start --quiet --pidfile ${PIDFILE} --exec ${DAEMON} -- 
+        -P ${PIDFILE} ${DAEMON_OPTS} > ${output} 2>&1"; then
         log_end_msg 0
     else
         log_end_msg 1
         cat $output
         exit 1
     fi
     rm $output
 }

Let me know if it works for you!

EDIT (7/Mar/2012): bug was filed in Debian as #659005. Nothing happened so far. We'll see.

Migration of VCL configuration from Varnish 2.0 to 2.1

Recently we migrated most of our services from Varnish 2.0 to 2.1.
I'd like to explain what we changed with code (VCL) examples side by side,
in case anyone still needs to migrate to 2.1 and needs some help as well :-)

req., bereq., beresp., and obj.

Usually this naming difference in VCL is not really explained. They say "x has been renamed to y"
and you should change the name. That's kind of annoying. In reality, yes, the names changed, and at first
it is annoying, but trying to understand why they changed allows them to stick
in your mind very easily.

In vcl_fetch(), obj. is now beresp.. Why?
Because vcl_fetch() is the part of the request stage where Varnish has
already performed a request against a backend and got a response from it. That means that
if you refer to obj. in vcl_fetch(), it really means that your
touching the backend response, hence beresp..

Similarly in vcl_pipe(), that is executed when the result of vcl_recv()
is to switch to pipe mode. In that case, however, Varnish hasn't made the request
to the backend yet, so if you used obj. in vcl_pipe() you really meant
to change the request that was going to be made to the backend, hence bereq..

Let's see the changes we had to make:

 sub vcl_fetch {
 
-    set obj.ttl = 88s;
-    set obj.grace = 10m;
-    set obj.http.X-My-Opera = "http://youtube.com/watch?v=br79xGSpgF4";
+    set beresp.ttl = 88s;
+    set beresp.grace = 60m;
+    set beresp.http.X-dramatic = "http://www.youtube.com/watch?v=a1Y73sPHKxw";
 
 }

And:

 sub vcl_pipe {
     # Streaming files too (see related vcl_recv() rule).
     # We need to close the request, or varnish remains in pipe
     # mode for the entire session with that client.
-    set req.http.connection = "close";
+    set bereq.http.connection = "close";
 }

Backend probes and .initial

Backend probing allows Varnish to detect that backends are either "healthy"
or "sick". The probe VCL config block allows to tweak how this should work. In
particular, .threshold is the number of successful probes that are necessary
for Varnish to consider a backend healthy. .interval is the number of seconds
between one probe and the following one.

As an example, you can define that a backend should be considered working
(healthy) when it answers successfully to at least 3 probes, with an interval of
10 seconds between each probe. In Varnish 2.0.4, this means that if restarted,
Varnish will wait 3 times 10 = 30 seconds before serving any requests
from that backend
, because all backends were considered dead (sick) at startup.

In 2.1 this limitation is removed by introducing an .initial attribute
in the probe block. .initial is the number of probes considered successful
when the service is started, or the backend is added, and there's no information about it.
The default value is assumed to be equal to .threshold, so backends are considered
healthy as soon as they are introduced.

I think you can understand from these tiny details how well Varnish is engineered.
This just makes sense, doesn't it? :-) Here's the diff from 2.0 to 2.1:

 backend nginx {
     .host = "localhost";
     .port = "8080";
-
-    # Disabled to avoid the 15s startup
-    # 2.0.4-5 doesn't have .initial
-    #
-    #.probe = {
-    #  .url = "/ping.html";
-    #    .interval = 5s;
-    #    .timeout = 1s;
-    #    .window = 5;
-    #    .threshold = 3;
-    #}
+    .probe = {
+        .url = "/ping.html";
+        .interval = 10s;
+        .timeout = 2s;
+        .window = 10;
+        .threshold = 3;
+        .initial = 3;
+    }
 }

And in vcl_recv():

 sub vcl_recv {

 [...]

-    #----------
-    # DISABLED: Only enable when .probe block above is enabled
-    #----------
     # Detect broken backends and keep serving requests if possible
-    #if (! req.backend.healthy) {
-    #    set req.grace = 10m;
-    #} else {
-    #    set req.grace = 5s;
-    #}
+    if (! req.backend.healthy) {
+        set req.grace = 60m;
+    } else {
+        set req.grace = 5s;
+    }

Regular expression matching

Another "big" difference is the use in 2.1 of a Perl-compatible regular expression engine,
(PCRE) instead of the POSIX-style regex matching that used to be in 2.0.
This is a good change for me, as I'm pretty much used to Perl regex and I know next to nothing
about POSIX.

This change actually created a subtle problem that I caught only with a thorough testing
of our configurations. We use regex matching in a few places in our VCL configuration,
usually to analyze cookies
and set special "flags" that are then used to force
a HTTP Vary header, to make Varnish store different cached versions of the same
URL.

One of these cases is the language cookie, where we store a sticky
user preference about site language. Here's how the code changed:

  # STD: Sticky language cookie
  if (req.http.Cookie ~ "language=") {
      set req.http.X-Language =
-         regsub(req.http.Cookie, "^.*?language=([^;]*?);*.*$", "1");
+         regsub(req.http.Cookie, "^.*?language=([^;]*);*.*$", "1");
  }

  ...

  # Mobile view cookie
  if (req.http.Cookie ~ "mobile=") {
-     set req.http.X-Mobile = 
-         regsub(req.http.Cookie, "^.*?mobile=([^;]*?);*.*$", "1");
+     set req.http.X-Mobile =
+         regsub(req.http.Cookie, "^.*?mobile=([^;]*);*.*$", "1");
  }

In case you find it difficult to spot the change, it's the removal of the *?
(non-greedy star) operator. Non-greedy matching was used in 2.0, POSIX matching, to make
sure that the * didn't match too many characters, and thus eat part of other cookies. Except
POSIX regex matching does NOT have a non-greedy star operator. I just
didn't know that, and it's of course a bug, but it had worked perfectly so far… WTF???

For even more weirdness, why did I take the non-greedy star (*?) away now that it should
be supported with PCRE-matching? I removed it because otherwise the result of those
regsub() expressions are always empty!

Believe it or not, it looks exactly like 2.0 had PCRE and 2.1 has POSIX, which is
obviously not what's happening. If you know more about this and you can shed some light,
please contact me or leave a comment below.

Hope you liked this 2.0 -> 2.1 migration journey. I'm looking forward to 2.1 -> 3.0!
It's a bit more work there, because I will need to migrate my
my accept-language C extension
to the new vmod system, which I already started working on :-)

Have fun!

A command line tool for Debian to purge Varnish objects

I've been using varnish mostly on Debian systems. I found the reload-vcl script included in Debian to be useful.

The reload-vcl script

It's part of the standard varnish debian package. It uses the system defaults in /etc/defaults/varnish, so it knows how to correctly invoke the varnishadm utility to perform administrative commands. As the name implies, it reloads the default VCL file using the vcl.load and vcl.use commands, checking that every step succeeds properly before continuing so it's safe to use. It loads the new VCL file and labels it automatically with a unique id.

Something analogous but regarding the purge functionality could have been useful, so I looked at the source code for reload-vcl. Most of it deals with loading of /etc/defaults/varnish and various sanity checks. I reused that bit to make another script, to control cache purging.

The purge-cache script

Here's the full source code. Below there's a link to download the latest version from github.


#!/bin/sh

# purge-cache: Script to purge varnish cache. Defaults are defined in
# /etc/default/varnish.
#
# Cosimo <cosimo@cpan.org>
# Based on reload-vcl, by Stig Sandbeck Mathisen <ssm at debian dot org>

# Settings
defaults=/etc/default/varnish

# Paths
varnishadm=/usr/bin/varnishadm
date=/bin/date 
tempfile=/bin/tempfile

# Messages
# msg_no_varnishadm: varnishadm
msg_no_varnishadm="Error: Cannot execute %sn"
msg_no_management="Error: $DAEMON_OPTS must contain '-T hostname:port'n"
# msg_defaults_not_readable: defaults
msg_defaults_not_readable="Error: %s is not readablen"
# msg_defaults_not_there: defaults
msg_defaults_not_there="Error: %s does not existn"
msg_usage="Usage: $0 [-h][-q][-u <url>|-r <regex>|-a]nt-htdisplay helpnt-qtbe quietnt-utpurge by exact (relative) url (ex.: /en/products/)nt-rtpurge objects with URL matching a regex (ex.: ^/blogs/)nt-atpurge all objects from cachen"
msg_purge_failed="Error: purge command failedn"
# msg_purge_url: url
msg_purge_url="Purging objects by exact url: %sn"
# msg_purge_regex: regex
msg_purge_regex="Purging objects with URL matching regex: %sn"
msg_purge_all="Purging all cachen"
msg_purge_ok="Purge command successfuln"

# Load defaults file
if [ -f "$defaults" ]
then
    if [ -r "$defaults" ]
    then
        . "$defaults"
    else
        printf >&2 "$msg_defaults_not_readable" $defaults
        exit 1 
    fi
else
    printf >&2 "$msg_defaults_not_there" $defaults
    exit 1
fi

# parse command line arguments
while getopts hqu:r:a flag
do
    case $flag in
        h)
            printf >&2 "$msg_usage"
            exit 0
            ;; 
        u)
            purge_method=url
        url="$OPTARG"
            ;; 
        r)
            purge_method=regex
        regex="$OPTARG"
            ;; 
        a)
            purge_method=all
            ;; 
        q)
            quiet=1
            ;; 
        *)
            printf >&2 "$msg_usagen"
            exit 1
            ;; 
    esac
done

# Parse $DAEMON_OPTS (options must be kept in sync with varnishd).
# Extract the -f and the -T option, and (try to) ensure that the
# management interface is on the form hostname:address
OPTIND=1
while getopts a:b:dFf:g:h:l:n:P:p:s:T:t:u:Vw: flag $DAEMON_OPTS
do
    case $flag in
        f)
            if [ -f "$OPTARG" ]; then
                vcl_file="$OPTARG"
            fi 
            ;; 
        T)
            if [ -n "$OPTARG" -a "$OPTARG" != "${OPTARG%%:*}" ]
                then
                mgmt_interface="$OPTARG"
            fi  
            ;;  
    esac
done

# Sanity checks 
if [ ! -x "$varnishadm" ]
then
    printf >&2 "$msg_no_varnishadm" $varnishadm
    exit 1
fi

if [ -z "$mgmt_interface" ]
then
    printf >&2 "$msg_no_management"
    exit 1
fi

logfile=$($tempfile)
purge_command="vcl.list"

# Now run the purge command against the admin interface
if [[ $purge_method = "url" ]]
then
        purge_command="purge req.url == $url"
        printf >&2 "$msg_purge_url" $url | grep -v "^$" > $logfile
else
    if [[ $purge_method = "regex" ]]
    then
        purge_command="purge.url $regex"
        printf >&2 "$msg_purge_regex" $regex | grep -v "^$" > $logfile
    else
        if [[ $purge_method = "all" ]]
        then
            purge_command="purge.url ."
            printf >&2 "$msg_purge_all" | grep -v "^$" > $logfile
        fi
    fi
fi

# For some reason, using:
#
#   fi | grep -v "^$" > $logfile
#
# results in purge_command assignment being wiped out
# at the end of the block??

if [ -z "$purge_command" ]
then
    printf >&2 "$msg_usagen"
    exit 1
fi

# echo "cmd: $varnishadm -T $mgmt_interface $purge_command"

if $varnishadm -T $mgmt_interface $purge_command
then
    printf >&2 "$msg_purge_ok"
else
    printf >&2 "$msg_purge_failed"
    exitstatus=1
fi | grep -v "^$" > $logfile

# Blather
if [ -z "${quiet}" -o -n "$exitstatus" ]
then
    cat >&2 $logfile
fi

# Cleanup
rm -f $logfile  
exit $exitstatus

You can control how objects are purged from the cache with 3 options:

  • -a: purges all objects
  • -u <url>: purges an exact url
  • -r <regexp>: purges objects matching a regular expression
  • Examples

    
    # Purges all objects
    purge-cache -a
     
    # Purges all objects starting with "/products"
    purge-cache -r '^/products'
    
    # Purges objects with exact URL
    purge-cache -u '/en/homepage'
    

    Goal: no downtime

    Both reload-vcl and purge-cache can be combined together in a single script to be triggered when deploying new VCL code or new backend applications. Instead of restarting varnish, which I really don't like, and it's not very reliable either (on Debian sometimes it won't come back up), I use purge-cache -a to purge all objects and then reload-vcl to load and use the newly deployed VCL code.

    This procedure has no downtime at all. The effect of purging all objects can potentially be hard on the backends, but we're not at that point yet. Usually in the busiest applications we have, it takes around 10-20 seconds to reach 70%-75% of hit rate, so I would say that's not really a problem right now.

    Download!

    You can download the purge-cache script from github. I contacted the maintainer of the reload-vcl script. Maybe he will include purge-cache in the next release of the varnish debian package… or maybe I could package it as a Perl CPAN module.

Another round of fixes for the varnish Accept-Language VCL extension

This very specific VCL extension to parse and normalize Accept-Language headers is becoming more robust as time goes by. Here's the latest round of fixes:

  • the original req.http.Accept-Language header could be overwritten when calling the vcl_rewrite_accept_language() function. Now this is fixed by copying the original header string into a static buffer and executing the processing on the copy.
  • improved a bit the style of the C code in a few places. Nothing miraculous really. Feels improved for me at least :)
  • fixed the use of a wrong define (string max length instead of languages list max length)
  • use of sizeof instead of using same constants in strncpy, etc…
  • added a small intro on the generated file too. In this way, if you find the code on some server, you can immediately understand what's that supposed to do, and where it came from :)

Enough blah blah, here's the new code. If you were already running this piece of VCL, then I won't tell you this is experimental stuff, because at this point it's no more experimental. But if you were running it already, then by all means upgrade. It is definitely better :)

http://github.com/cosimo/varnish-accept-language

Enjoy!

Puppet custom facts, and master-less puppet deployment

As I mentioned a few weeks ago, I'm using Puppet for some smaller projects here at work. They're pilot projects to see how puppet behaves and scales for us before taking it into bigger challenges.

One of the problems so far is that we're using fabric as the "last-mile" deployment tool, and that doesn't yet have any way to run jobs in parallel. That's the reason why I'm starting to look elsewhere, like mcollective for example.

However, today I had to prepare a new varnish box for files.myopera.com. This new machine is in a different data center than our current one, so we don't have any puppetmaster deployed there yet. This stopped me from using puppet in also another project. But lately I've been reading on the puppet-users mailing list that several people have tried to deploy a master-less puppet configuration, where you have no puppetmasterd running. You just deploy the puppet files, via rsync, source control or pigeons, and then let the standalone puppet executable run.

Puppet master-less setup

To do this, you have to at least have a good set of reusable puppet modules, which I tried to build small pieces at a time during the last few months. So I decided to give it a shot, and got everything up and running quickly. Deployed my set of modules in /etc/puppet/modules, and built a single manifest file that looks like the following:


#
# Puppet standalone no-master deployment
# for files.myopera.com varnish nodes
#
node varnish_box {

    # Basic debian stuff
    $disableservices = [ "exim4", "nfs-common", "portmap" ]
    service { $disableservices:
        enable => "false",
        ensure => "stopped",
    }

    # Can cause overload on the filesystem through cronjobs
    package { "locate": ensure => "absent", }
    package { "man-db": ensure => "absent", }

    # Basic configuration, depends on data center too
    include opera
    include opera::sudoers
    include opera::admins::core_services
    include opera::datacenters::dc2

    # Basic packages now. These are all in-house modules
    include base_packages
    include locales
    include bash
    include munin
    include cron
    include puppet
    include varnish

    varnish::config { "files-varnish-config":
        vcl_conf => "files.vcl",
        storage_type => "malloc",
        storage_size => "20G",
        listen_port => 80,
        ttl => 864000,
        thread_pools => 8,
        thread_min => 800,
        thread_max => 10000,
    }

    #
    # Nginx (SSL certs required)
    #
    include nginx

    nginx::config { "/etc/nginx/nginx.conf":
        worker_processes => 16,
        worker_connections => 16384,
        keepalive_timeout => 5,
    }

    nginx::vhost { "https.files.myopera.com":
        ensure => "present",
        source => "/usr/local/src/myopera/config/nginx/sites-available/https.files.myopera.com",
    }

    bash:: prompt { "/root/.bashrc":
        description => "Files::Varnish",
        color => "red",
    }

    munin:: plugin::custom { "cpuopera": }

    munin:: plugin { "if_eth0":
        plugin_name => "if_"
    }

    munin:: plugin {
        [ "mem_", "load", "df", "df_inode", "netstat", "vmstat",
          "iostat", "uptime", "threads", "open_files", "memory", "diskstats" ]:
    }
}

node default inherits varnish_box {
}

node 'my.hostname.opera.com' inherits varnish_box {
}

This manifest installs varnish, nginx, a bunch of basic packages I always want on every machines (vim, tcpdump, etc…), munin and appropriate plugins already configured, and also a nice red bash prompt to warn me that this is production stuff.

This file is everything the puppet client needs to run and produce the desired effect, without needing a puppet master. Save it as varnish-node.pp and then you run it with:


puppet varnish-node.pp

One problem that usually arises is how to serve the static files. In this case, I assumed I'm going to check out the source code and config files from my own repository into /usr/local/src/... so I don't need to point puppet to a server with the classic:


source => "puppet:///module/filename"

but you can just use:


source => "/usr/local/whatever/in/my/local/filesystem"

That's great and it works just fine.

Custom facts

Puppet uses a utility called facter to extract "facts" from the underlying system, sysinfo-style. A typical facter run produces the following output:


$ facter
architecture => x86_64
domain => internal.opera.com
facterversion => 1.5.6
fqdn => cd01.internal.opera.com
...
hardwaremodel => x86_64
hostname => cd01
id => cosimo
ipaddress => 10.20.30.40
ipaddress_eth0 => 10.20.30.40
is_virtual => false
...
kernel => Linux
kernelmajversion => 2.6
...
operatingsystem => Ubuntu
operatingsystemrelease => 10.04
physicalprocessorcount => 1
processor0 => Intel(R) Core(TM)2 Duo CPU     E6550  @ 2.33GHz
processor1 => Intel(R) Core(TM)2 Duo CPU     E6550  @ 2.33GHz
processorcount => 2
...

and so on. Within puppet manifests, you can use any of these facts to influence the configuration of your system. For example, if memorysize > 4.0 Gb then run varnish with 2000 threads instead of 1000. This is all very cool, but sometimes you need something that facter doesn't give you by default.

That's why facter can be extended.

I tried creating a datacenter.rb facter plugin that would look at the box IP address and figure out in which data center we're located. That in turn can be used to setup the nameservers and other stuff.

Here's the code. My Ruby-fu is less than awesome:


#
# Provide an additional 'datacenter' fact
# to use in generic modules to provide datacenter
# specific settings, such as resolv.conf
#
# Cosimo, 03/Aug/2010
#

Facter.add("datacenter") do
    setcode do

        datacenter = "unknown"

        # Get current ip address from Facter's own database
        ipaddr = Facter.value(:ipaddress)

        # Data center on Mars
        if ipaddr.match("^88.88.88.")
            datacenter = "mars"

        # This one on Mercury
        elsif ipaddr.match("^99.99.99.")
            datacenter = "mercury"

        # And on Jupiter
        elsif ipaddr.match("^77.77.77.")
            datacenter = "jupiter"
        end

        datacenter
    end
end

However, there's one problem. When puppet runs, it doesn't get the new fact, even though facter from the command line can see it and execute it just fine (when the plugin is in the current directory).

Now I need to know how to inform puppet (and/or facter) that it has to look into one of my puppet modules' plugin (or lib from 0.25.x) directory to load my additional datacenter.rb fact.

Any idea ??

Bandwidth limiting a varnish server

I know, this might sound like blasphemy. After all, why on earth should one want to limit bandwidth, and serve your clients slower, when using Varnish to accelerate HTTP serving?

Sounds silly. And it is, but if your Varnish servers are saturating your internet wires, and you have other services running, you might want to investigate on bandwidth limiting possibilities. On an installation of Varnish here at Opera, I'm constantly seeing nodes hitting 950+ Mbit/s peak, and 600-700 Mbit/s average. That's a damn fast bit pushing, near to the physical theoretical capacity of 1 Gbit/s for one Gb ethernet card.

Now, back to the problem. How to bw limit Varnish?

I think I know the Varnish project well enough to be sure that something like bandwidth limiting will never be implemented inside Varnish. Asking on the mailing list, I got the tip to start looking into tc and its companion howto.

I'd never heard of tc before, so I started searching around. Turns out that half of the internet is heavily outdated :) I found lots of old stale material, and very few working examples. After a bit of research, I came up with a semi-working solution.

tc is a linux command installed by default almost anywhere I think. Its purpose is to control network traffic. It's not very simple to use, and I don't want to pretend I know it. I just wished there were more examples about it, easier to understand, and recently updated.

The best example I found is archived at http://blog.edseek.com/~jasonb/articles/traffic_shaping/scenarios.html ("8.2 Guaranteeing rate"):

#!/bin/bash

RATE=8000

if [ x$1 = 'xstop' ]
then
        tc qdisc del dev eth0 root >/dev/null 2>&1
fi

tc qdisc add dev eth0 root handle 1: htb default 90
tc class add dev eth0 parent 1: classid 1:1 htb rate ${RATE}kbit ceil ${RATE}kbit

tc class add dev eth0 parent 1:1 classid 1:10 htb rate 6000kbit ceil ${RATE}kbit
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 1000kbit ceil ${RATE}kbit
tc class add dev eth0 parent 1:1 classid 1:50 htb rate 500kbit ceil ${RATE}kbit
tc class add dev eth0 parent 1:1 classid 1:90 htb rate 500kbit ceil 500kbit

tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
tc qdisc add dev eth0 parent 1:50 handle 50: sfq perturb 10
tc qdisc add dev eth0 parent 1:90 handle 90: sfq perturb 10

tc filter add dev eth0 parent 1:0 protocol ip u32 match ip sport 80 0xffff classid 1:10
tc filter add dev eth0 parent 1:0 protocol ip u32 match ip sport 22 0xffff classid 1:20
tc filter add dev eth0 parent 1:0 protocol ip u32 match ip sport 25 0xffff classid 1:50
tc filter add dev eth0 parent 1:0 protocol ip u32 match ip sport 110 0xffff classid 1:5

These commands create a queue discipline class tree ("qdisc") that is a control structure that is used by tc to know how to shape or rate limit the network traffic. You can do pretty much anything, but the previous code creates one "htb" bucket for all the traffic. "htb" means "Hierarchy Token Bucket". That is supposed to contain other "buckets" that can define different arbitrary bandwidth limits.

Simplifying: the main "htb" bucket (named "1:1") corresponds to the full pipe bandwidth, say 8Mbit/s. Then under this one, we create other 4 buckets, named "1:10", "1:20", "1:50", "1:90" of respectively, 6Mbit/s, 1Mbit/s, 500kbit/s, 500kbit/s. These ones are managed through the "sfq", "Stochastic Fairness Queueing". Read: everyone gets their fair piece of the pie :)

So we have these 4 different pipes, 6Mbit/s, 1Mbit/s, 500kbit/s, 500kbit/s. After that, last block, we can decide which pipe should the traffic go through.

sport 80 means that source (outgoing source, so it's your clients destination) port 80, where your HTTP is supposedly listening to, will get the big 6Mbit/s slice, sport 22 (ssh) will get 1Mbit/s, and so on…

Now this did work on my test machine, and I could set the bandwidth limit, download a file with wget and see that the speed was exactly matching the desired one, while other connections were unlimited. However, when I tried to put this in production on the actual Varnish machines, the same script and settings didn't work.

I figured I had to bandwidth limit the whole "htb" bucket, instead of limiting just the HTTP traffic. Which sucks, I guess. But nevertheless, it works. So, I'll copy/paste the entire magic here for whoever might be interested. And maybe explain me why this doesn't work exactly like in my tests. Traffic measured with iptraf and iftop show consistent results.

!/bin/sh
#
# Set up bandwidth limiting for an interface / service. Based on 'tc'.
# Defaults can be overridden by /etc/default/traffic-shaper
# Cosimo, 2010/07/13
#
TC=/sbin/tc

test -f /etc/default/traffic-shaper && . /etc/default/traffic-shaper

IF=${IF:-eth0}
RATE=${RATE:-100Mbit}
HTTP_RATE=${HTTP_RATE:-50Mbit}
HTTP_PORT=${HTTP_PORT:-80}
SSH_RATE=${SSH_RATE:-500kbit}

echo "[$IF] HTTP (:$HTTP_PORT) rate=$HTTP_RATE/$RATE"
echo "[$IF] SSH  (:22) rate=$SSH_RATE"

#exit

if [ "x$1" = "xstop" ]; then
        echo 'Stopping traffic shaper...'
        $TC qdisc del dev $IF root >/dev/null 2>&1 && echo 'Done'
        exit
elif [ "x$1" = "xshow" ]; then
        $TC qdisc show dev $IF
        exit
elif [ "x$1" = "xstats" ]; then
        $TC -d -s qdisc show dev $IF
        exit
fi

echo "Traffic shaping setup ($HTTP_RATE/$RATE) on port $HTTP_PORT."
echo "Reserving $SSH_RATE for interactive sessions."

$TC qdisc add dev $IF root handle 1: htb default 10

# I should be using this line, but I had to replace it with the following
### $TC class add dev $IF parent 1: classid 1:1 htb rate ${RATE} ceil ${RATE}
$TC class add dev $IF parent 1: classid 1:1 htb rate ${HTTP_RATE} ceil ${RATE}

# Doesn't seem to have any effect (?)
$TC class add dev $IF parent 1:1 classid 1:10 htb rate ${HTTP_RATE} ceil ${RATE}
$TC class add dev $IF parent 1:1 classid 1:90 htb rate ${SSH_RATE} ceil ${RATE}

$TC qdisc add dev $IF parent 1:10 handle 10: sfq perturb 10
$TC qdisc add dev $IF parent 1:90 handle 90: sfq perturb 10

$TC filter add dev $IF parent 1:0 protocol ip u32 match ip sport $HTTP_PORT 0xffff classid 1:10
$TC filter add dev $IF parent 1:0 protocol ip u32 match ip sport 22 0xffff match ip dport 22 0xffff classid 1:90

Have fun, but don't try this at home :)

A couple of bugs fixed on the varnish accept-language VCL extension

Today I received a report of a bug in how the Opera.com startup page was presented to some browsers. In particular, people with Brazilian language.

This was traced down to being a bug in the accept-language.vcl extension that we are using on both My Opera and Opera.com.

In particular, the bug had to do with static buffers being reused and overwritten during the Accept-Language string parsing, essentially due to my non-optimal C skills :) Now this bug is fixed and new test cases were added, so to hopefully avoid regressions in the future.

So now clients with Accept-Language containing strings like pt-BR, es-ca, zh-CN, etc… will be correctly recognized by the VCL code, and they will get the correct page served by varnish.

As always, thanks to Opera, the code is freely available, and I have to say it's not really experimental dangerous stuff anymore. It's been quite proven in the last few months (these few bugs excluded). At least your Varnish server shouldn't explode :-) Have fun!