Fast VCL checks for personalized backend responses

I’d like to talk about a problem I encountered a few years ago and one possible solution to it. This particular problem stuck with me for a long time for several reasons.
The first one is that at the time I considered the problem basically unsolvable. It would be like having a cake and eating it too, as the proverb goes. Another reason is that this problem had me spinning my wheels thinking about a solution for a good while.

Without any pretense of this being a particularly clever solution or anything like that, I’d like to illustrate what the general problem is and a possible solution I came up with. Hopefully this will be useful to you.

The general problem

Suppose you have a backend request of some sort, an API or a particular web page. In my case it was a json-based recommendations API, which returned a list of recommended news articles to read. The specific purpose of the request is not terribly important. What’s more important is the fact that this request can be personalized depending on the user that makes the request. I believe this is a quite common scenario.

In a recommendations context, it’s also common for a user not to be signed in to the service, or to be invoking the API for the first time. In this case, the recommendations engine does not have any previous information about the user, also called the cold start case.

In this specific project, we had operated in a “permanent cold start mode”, meaning the recommendations we were offering were never differentiated per user. There were a few knobs and settings to influence which type of recommendations one would get from the system (f.ex. less Sports articles and more Arts or Travel), but the system would not learn over time or change its recommendations based on user signals like articles read.

Among other things, this mode of operation allowed us to serve our entire userbase (around 90M monthly active users, around 10M weekly) with only two servers per data-center, also thanks to a very aggressive caching strategy.

When we started experimenting with personalized recommendations, it was immediately clear that we would not be able to handle the additional backend load caused by all the per-user requests. We estimated that, given the cache hit ratio drop, we would need something ridiculous like 50x the amount of servers. For each API request, we would have to:

  1. fetch the distinct user profile
  2. check if the profile contained any information about previously read articles or otherwise useful information to personalize the offered recommendations
  3. compute and return the personalized recommendations

These steps can only be performed by the recommendations engine backend. This implies that we would not be having any help from our caching in Varnish, which made personalized recommendations much harder to implement for us, at least without employing inordinate amounts of servers and having to significantly rebuild our system infrastructure.

You could very well say that that is a problem in itself, and it probably is :-)

A possible solution

I remember spending quite some time thinking about this, not seeing any possible solution. One day I attended a meetup. One of the engineers there talked about the Varnish API engine. The API Engine is a commercial Varnish add-on that can implement authentication and paywalls directly in the caching layer. The person talking about this mentioned how API engine embedded the SQLite3 database, and how this was crucial to the performance of it, since the caching layer is effectively the first bottleneck of a system.

I connected the dots almost immediately and I realized I had a possible way forward to solve my problem. This is how I imagined I could approach the problem:

  • organize user signals collection (what articles each user is reading, etc…) and user profile building as a completely separate batch activity
  • every x number of hours, build a sqlite database with a single table, user_profiles, consisting of two columns, a user_id string and a has_profile boolean. With such table in place, looking up whether we can build a significantly personalized recommendations set for a user is a only an SQL primary-key lookup away.
  • Using the excellent SQLite3 vmod, implement this SQL lookup in our existing Varnish VCL layer. Make sure that for every possible case this code never fails. For example, if the database file does not exist, or the file is for some reason corrupt, etc… we want to behave as if the particular user for the running request had no personalized profile.
  • Ensure that we would be able to update the SQLite database file at any time, without stopping Varnish, and the new file would be visible to the SQL queries immediately or at least after a short delay.

We tested the whole assembly and it seemed to work correctly. The final step consisted in actually computing the personalized profiles, building the real SQLite database, syncing it to the backend systems, and performing the dispatch logic in the VCL layer.

This is more or less the final logic I used:

  • If the request was for an anonymous user, don’t even perform the user profiles SQL lookup, and return the generic recommendations cached payload.
  • If the request comes from a user that has no personalized profile, that is, no record is present in the SQLite table, also return the generic recommendations payload.
  • If the user profiles lookup is positive, that is, a record exists in the user profiles table in SQLite and its has_profile flag is true, then pass the request on to the backend. We know it is a request that must be personalized and only the backend can do that.

Using such logic allows to serve the majority of your user base, which presumably has not logged in, or does not have any significant user profile yet, caching as much as possible. But it also allows personalized recommendations for all users that do have a profile.

We are shifting the critical decision as early in the chain as possible, that is, in your caching layer, either Varnish or similar, before the backend service is even consulted. Taking the decision to the backend service would not be feasible for the reasons already discussed.

The actual code

We used Puppet as configuration management tool back then, with a custom varnish module. I extended the existing manifest to add a new user_profiles.vcl file and to install by default the sqlite3 vmod for Varnish.

The existing VCL code was also modified to:

  • perform the personalized profile SQL query
  • decide whether to pass the request based on the result of the SQL query

The following code illustrates those two steps:

diff --git a/config.vcl b/config.vcl
index 8e25a8a..50c70ce 100644
--- a/config.vcl
+++ b/config.vcl
@@ -1,22 +1,23 @@
 # Recommender system VCL config

 include "/etc/varnish/accept-encoding.vcl";
 include "/etc/varnish/purge.vcl";
 include "/etc/varnish/x-forwarded-for.vcl";
 include "/etc/varnish/auth.vcl";
 include "/etc/varnish/stats.vcl";
+include "/etc/varnish/user_profiles.vcl";
 include "/etc/varnish/strip-tracking-cookies.vcl";

 backend apache {
     .host  = "127.0.0.1";
     .port  = "8000";
     .probe = {
         .url       = "/ping.html";
         .interval  = 10s;
         .timeout   = 5s;
         .window    = 20;
         .threshold = 3;
         .initial   = 3;
     }
 }
@@ -147,45 +148,49 @@ sub vcl_recv {
     if (req.backend.healthy && req.http.User-Agent ~ "McHammer") {
         return (pass);
     }

     # Client clicks must go through the backend (*with* client-id cookie)
     if (req.url ~ "^/api/1\.0/feedback/") {
         return (pass);
     }

     call check_authorization;
+    call check_user_profile;
     call accept_encoding_normalize;

+    # Users with tracking cookies can be served personalized results
+    if (req.http.X-Profile == "1") {
+        std.log("User has customized profile. Rolling the dice.");
+        # Initially keep the percentage of PASS very low, to test the
waters.
+        if (std.random(0, 100) < 1.0) {
+            std.log("User has customized profile and within 1.0%.
Passing.");
+            return (pass);
+        }
+    }

 }

The new user_profiles.vcl file consisted of the following code:

#-----------------------------------------------------------------------------
# Fast check for personalized user profiles
#-----------------------------------------------------------------------------
#
# The general idea is to use this fast check to send users who we know
# have a personalized user profile to the backend without caching, while
# retaining the ability to send cached objects for everyone else.
#
# Uses a SQLite3 database and libvmod-sqlite3 by Federico Schwindt:
# https://github.com/fgsch/libvmod-sqlite3
#
# Extracts the `clientId' from the HTTP Cookie header.
# Looks up the profile_id key having value equal to the `clientId' cookie.
# The underlying schema is very simple:
#
#   CREATE TABLE user_profiles (
#       profile_id char(100) PRIMARY KEY NOT NULL,
#       data text
#   );
#
# At least initially we will not use the data column.

import sqlite3;

sub vcl_init {
    sqlite3.open("/etc/varnish/user_profiles.db", "|;");
}

sub check_user_profile {

    # Quick yes/no test for the clientId cookie
    if (req.http.Cookie ~ "userId=") {

        # Extract a userId value from the Cookie header,
        # which remains untouched. Make sure we can still extract a clientId
        # value even if there's other cookies before/after ours.
        #
        # XXX Not sure what happens when client sends multiple Cookie lines.
        set req.http.X-Profile-Id = regsub(req.http.Cookie,
            "(?:^|.*;\s*)(?:userId=(.*?))\s*(?:;.*|$)", "\1");

        # No need to do anything if userId hasn't been found
        if (req.http.X-Profile-Id != "") {
            #std.log("Checking profile_id: " + req.http.X-Profile-Id);

            # First case of VCL-injection vulnerability :-)
            set req.http.X-Profile = sqlite3.exec(
                "SELECT 1 FROM user_profiles WHERE profile_id='"
                + req.http.X-Profile-Id
                + "'");

            # req.http.X-Profile !~ "^SQL" to catch errors like missing DB,
            # but seems a bit fragile. Depends on libsqlite3 and/or the vmod.
            if (req.http.X-Profile == "1") {
                std.log("User profile " + req.http.X-Profile-Id
                    + " found (" + req.http.X-Profile + ")");
            }
            else {
                std.log("User profile " + req.http.X-Profile-Id
                    + " not found");
            }
        }
    }
}

The commit message

I believe that good solutions deserve awesome commit messages. Here’s what I wrote:

Date:   Thu Jan 28 19:36:46 2016 +0100

    Fast VCL check for personalized profile existence

    How to have the cake and eat it too. Serve cached objects to the majority of
    users while personalizing recommendations to the ones that actually have a
    significant user profile available.

    Got the idea from the Varnish API engine[1].

    It's possible to perform tens of thousands of sqlite database lookups a second
    while processing requests in Varnish through VCL, thanks to SQLite3 being very
    lightweight and in this case embedded right inside Varnish through the sqlite3
    vmod[2].

    This commit hopefully adds all there is to it. The last bit is obviously the
    database file, which I placed in `/etc/varnish/user_profiles.db'. We will need
    to generate the .db file from the clicker and sync it to all frontends.

    Updates seem to be received immediately.

    When no database file is present, as will be in the initial deployment, the
    `check_user_profile()' function will work normally, signaling that no custom
    user profile has been found.

    [1] https://www.varnish-software.com/products/varnish-api-engine
    [2] https://github.com/fgsch/libvmod-sqlite3

How to rollout gradually?

Another interesting aspect is the way we could “control the flow” to this personalized recommendations API, that is, deciding what percentage of users that had personalized profiles, would actually get personalized recommendations.

A gradual rollout would certainly be the best approach, and it was implemented in two different ways:

  • once the SQL lookup was performed and the result was positive, we would still “roll the dice” and only allow 1% (or 5%, 10%) to actually pass through to the backend as personalized recommendations. This was an additional safety measure.
  • when batch building the SQLite database, we could decide to curtail the amount of users with personalized profiles. For example, excluding all users that had not read at least 5 or 10 articles. This barrier served two purposes. It effectively limited the amount of users that would be included in the SQLite database and at the same time made sure we had accumulated significant user profile information before attempting to serve personalized recommendations. A sort of win-win I didn’t expect at first :-)

As usual, if you have any feedback, email me or write below (but comments are subject to approval due to lots of spam).

Leave a Reply

Your email address will not be published. Required fields are marked *