Monthly Archives: March 2010

Looking at Cassandra

When I start to get interested in a project, I usually join the project users mailing list and read it for a couple of months, to get a general feeling of what's going on, what problems do people have, etc…

I became very interested in Cassandra, "A highly scalable, eventually consistent, distributed, structured key-value store".
So, almost 2 months ago, I joined the cassandra-users mailing list.

Turns out that Cassandra is in production at huge sites like Twitter and Digg. Which doesn't really tell you much if you don't know how they use it, and what they use it for. However, guess what? There's a lot of information out there.

Here's a few links I'd like to share about Cassandra:

The MySQL Sandbox

I learned about the MySQL Sandbox last year in Lisbon for the European Perl conference. I remember I talked about it with Giuseppe Maxia, the original author. I promised him to try it.

I wasn't really convinced about it until the first time I tried it.

That was when on My Opera we switched from master-slave to master-master replication. It was during a weekend last November. I was a bit scared of doing the switch. Then I remembered about MySQL Sandbox, and I tried it on an old laptop.

I was absolutely amazed to discover that I could simulate pretty much any setup, from simple to complex, master-slave, master-master, circular multi-master, etc…
It was also very quick to setup, and it's also very fast to install new sandboxed servers.
You can setup a 5 servers replication setup in less than 30 seconds.

MySQL Sandbox is a Perl tool that you can easily install from CPAN with:

$ sudo cpan MySQL-Sandbox

You get the make_sandbox command that allows you to create new sandboxes. Right now I'm trying it again for a maintainance operation we have to do on My Opera soon. I'm using the master-master setup like this:

make_replication_sandbox --master_master --server-version=/home/cosimo/mysql/mysql-5.0.51a-....tar.gz

so I can simulate the entire operation and try to minimize the risk of messing up the production boxes while also getting a bit more confident about these procedures.

MySQL Sandbox, try it. You won't go back :-)

A geolocating, distributed DNS service, geodns

It's been 2 months now that I formally changed my team from My Opera to Core Services. For most of my day, I still work on My Opera, but I get to work on other projects as well.

One of these projects regarded the browser-ballot screen, even if now it's being used for other purposes as well. It is a very interesting project, named Geodns. It is a DNS server.

Its purpose is not unique or new: create geographically-aware DNS zones. Example (just an example): my.geo.opera.com, a geographically-aware my.opera.com, that sends you to our US-based My Opera servers if your browser presents itself with a US or american ip address, norwegian servers if you are in Norway, etc… So, nothing new or particularly clever. Actually someone argues that DNS systems shouldn't be used in this way. But it's really convenient anyway…

So this DNS server is written in Perl, and it uses the omnipresent GeoIP library to find out every client IP address position on Earth, and then uses this information to send the client to the appropriate server based on some simple matching rules:

  • by specific country
  • by specific continent
  • if none match specifically, extract a random server from the pool of those available to serve requests that don't match any other rule

I also made geodns log to a special file that allows to use our own OpenGL engine to display realtime DNS hits on a photo-realistic 3D Earth.

In this picture you can see blue and red dots. The higher the spikes, the more requests there are. Blue is where our datacenters are, red is where clients are sending requests from.

I'm trying to get this published as open source, even if, as I said, it's not really complex or anything. Just a good out-of-the-box solution. It's been running in production for about 3 weeks now, and it's serving around 300 requests per second on average. Stable and fast, but now we're looking at increasing the traffic. My goal is to reach at least 2000 req/s on a single machine. We'll see… :)

Varnish “sess_workspace” and why it is important

When using Varnish on a high traffic site like opera.com or my.opera.com, it is important to reach a stable and sane configuration (both VCL and general service tuning).

If you're just starting using Varnish now, it's easy to overlook things (like I did, for example :) and later experience some crashes or unexpected problems.

Of course, you should read the Varnish wiki, but I'd suggest you also read at least the following links. I found them to be very useful for me:

A couple of weeks ago, we experienced some random Varnish crashes, 1 per day on average. That happened during a weekend. As usual, we didn't really notice that Varnish was crashing until we looked at our Munin graphs. Once you know that Varnish is crashing, everything is easier :)

Just look at your syslog file. We did, and we found the following error message:

Feb 26 06:58:26 p26-01 varnishd[19110]: Child (27707) died signal=6
Feb 26 06:58:26 p26-01 varnishd[19110]: Child (27707) Panic message: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188:#012  Condition((p) != 0) not true.  thread = (cache-worker)sp = 0x7f8007c7f008 {#012  fd = 239, id = 239, xid = 1109462166,#012  client = 213.236.208.102:39798,#012  step = STP_LOOKUP,#012  handling = hash,#012  ws = 0x7f8007c7f078 { overflow#012    id = "sess",#012    {s,f,r,e} = {0x7f8007c7f808,,+16369,(nil),+16384},#012  },#012    worker = 0x7f82c94e9be0 {#012    },#012    vcl = {#012      srcname = {#012        "input",#012        "Default",#012        "/etc/varnish/accept-language.vcl",#012      },#012    },#012},#012
Feb 26 06:58:26 p26-01 varnishd[19110]: Child cleanup complete
Feb 26 06:58:26 p26-01 varnishd[19110]: child (3710) Started
Feb 26 06:58:26 p26-01 varnishd[19110]: Child (3710) said Closed fds: 3 4 5 10 11 13 14
Feb 26 06:58:26 p26-01 varnishd[19110]: Child (3710) said Child starts
Feb 26 06:58:26 p26-01 varnishd[19110]: Child (3710) said Ready
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (7327) died signal=6
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (7327) Panic message: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188:#012  Condition((p) != 0) not true.  thread = (cache-worker)sp = 0x7f8008e84008 {#012  fd = 248, id = 248, xid = 447481155,#012  client = 213.236.208.101:39963,#012  step = STP_LOOKUP,#012  handling = hash,#012  ws = 0x7f8008e84078 { overflow#012    id = "sess",#012    {s,f,r,e} = {0x7f8008e84808,,+16378,(nil),+16384},#012  },#012    worker = 0x7f81a4f5fbe0 {#012    },#012    vcl = {#012      srcname = {#012        "input",#012        "Default",#012        "/etc/varnish/accept-language.vcl",#012      },#012    },#012},#012
Feb 26 18:13:37 p26-01 varnishd[19110]: Child cleanup complete
Feb 26 18:13:37 p26-01 varnishd[19110]: child (30662) Started
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (30662) said Closed fds: 3 4 5 10 11 13 14
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (30662) said Child starts
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (30662) said Ready

A quick research brought me to sess_workspace.

We found out we had to increase the default (16kb), especially since we're doing quite a bit of HTTP header copying and rewriting around. In fact, if you do that, each varnish thread uses a memory space at most sess_workspace bytes.

If you happen to need more space, maybe because clients are sending long HTTP header values, or because you are (like we do) writing lots of additional varnish-specific headers, then Varnish won't be able to allocate enough memory, and will just write the assert condition on syslog and drop the request.

So, we bumped sess_workspace to 256kb by setting the following in the startup file:


-p sess_workspace=262144

And since then we haven't been having crashes anymore.

More varnish, now also on www.opera.com

I have been working on setting up and troubleshooting Varnish installations quite a bit lately. After deploying Varnish on My Opera for many different uses, namely APIs, avatars, user pictures and the frontpage, we also decided to try using Varnish on www.opera.com.

While "www" might seem much simpler than My Opera, it has its own challenges.
It doesn't have logged in users, or user-generated content, but, as with My Opera, a single URL is used to generate many (slightly) different versions of the content. Think about older versions of Opera, or maybe newest (betas, 10.50), mobile browsers, Opera Mini, site in "Mobile view", different languages, etc…

That makes caching with Varnish tricky, because you have to consider all of these variables, and instruct Varnish to cache each of these variations separately. No doubt opera.com in this respect is even more difficult than My Opera.

So, we decided to:

  • cache only the most trafficked pages (for now only the Opera startup page)
  • cache them only for Opera 10.x browsers
  • differentiate caching by specific version (the "x" in 10.x)

We basically used the same Varnish config as My Opera, with the accept-language hack, changing only the URL-specific logic. With this setup, we managed to cut down around 15% of backend requests on opera.com.