Varnish "sess_workspace" and why it is important

When using Varnish on a high traffic site like opera.com or my.opera.com, it is important to reach a stable and sane configuration (both VCL and general service tuning).

If you're just starting using Varnish now, it's easy to overlook things (like I did, for example :) and later experience some crashes or unexpected problems.

Of course, you should read the Varnish wiki, but I'd suggest you also read at least the following links. I found them to be very useful for me:

Kristian LyngstÃ¸l's blog, the varnish-related posts, but other stuff as well. I had the opportunity to attend a 2-day Varnish training at Linpro, and he was holding the course. I can't say good enough of the advice Kristian gives in his blog. Really, go read it now!.
Other users mails on the varnish-misc mailing list. In particular, two messages that carry so much helpful information that one could study for a month probably. This one by Twitter's John Adams, and This one by Audun Ytterdal, now working at VG.no, one of the biggest norwegian newspapers.
Artur Bergman OSCON 2009 talk on Varnish (PDF, or on slideshare). Dense with useful tips, Bergman runs a high traffic site chaining multiple distributed varnish servers.

A couple of weeks ago, we experienced some random Varnish crashes, 1 per day on average. That happened during a weekend. As usual, we didn't really notice that Varnish was crashing until we looked at our Munin graphs. Once you know that Varnish is crashing, everything is easier :)

Just look at your syslog file. We did, and we found the following error message:


Feb 26 06:58:26 p26-01 varnishd[19110]: Child (27707) died signal=6
Feb 26 06:58:26 p26-01 varnishd[19110]: Child (27707) Panic message: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188:#012  Condition((p) != 0) not true.  thread = (cache-worker)sp = 0x7f8007c7f008 {#012  fd = 239, id = 239, xid = 1109462166,#012  client = 213.236.208.102:39798,#012  step = STP_LOOKUP,#012  handling = hash,#012  ws = 0x7f8007c7f078 { overflow#012    id = "sess",#012    {s,f,r,e} = {0x7f8007c7f808,,+16369,(nil),+16384},#012  },#012    worker = 0x7f82c94e9be0 {#012    },#012    vcl = {#012      srcname = {#012        "input",#012        "Default",#012        "/etc/varnish/accept-language.vcl",#012      },#012    },#012},#012
Feb 26 06:58:26 p26-01 varnishd[19110]: Child cleanup complete
Feb 26 06:58:26 p26-01 varnishd[19110]: child (3710) Started
Feb 26 06:58:26 p26-01 varnishd[19110]: Child (3710) said Closed fds: 3 4 5 10 11 13 14
Feb 26 06:58:26 p26-01 varnishd[19110]: Child (3710) said Child starts
Feb 26 06:58:26 p26-01 varnishd[19110]: Child (3710) said Ready
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (7327) died signal=6
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (7327) Panic message: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188:#012  Condition((p) != 0) not true.  thread = (cache-worker)sp = 0x7f8008e84008 {#012  fd = 248, id = 248, xid = 447481155,#012  client = 213.236.208.101:39963,#012  step = STP_LOOKUP,#012  handling = hash,#012  ws = 0x7f8008e84078 { overflow#012    id = "sess",#012    {s,f,r,e} = {0x7f8008e84808,,+16378,(nil),+16384},#012  },#012    worker = 0x7f81a4f5fbe0 {#012    },#012    vcl = {#012      srcname = {#012        "input",#012        "Default",#012        "/etc/varnish/accept-language.vcl",#012      },#012    },#012},#012
Feb 26 18:13:37 p26-01 varnishd[19110]: Child cleanup complete
Feb 26 18:13:37 p26-01 varnishd[19110]: child (30662) Started
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (30662) said Closed fds: 3 4 5 10 11 13 14
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (30662) said Child starts
Feb 26 18:13:37 p26-01 varnishd[19110]: Child (30662) said Ready

A quick research brought me to sess_workspace.

We found out we had to increase the default (16kb), especially since we're doing quite a bit of HTTP header copying and rewriting around. In fact, if you do that, each varnish thread uses a memory space at most sess_workspace bytes.

If you happen to need more space, maybe because clients are sending long HTTP header values, or because you are (like we do) writing lots of additional varnish-specific headers, then Varnish won't be able to allocate enough memory, and will just write the assert condition on syslog and drop the request.

So, we bumped sess_workspace to 256kb by setting the following in the startup file:


-p sess_workspace=262144

And since then we haven't been having crashes anymore.

Random hacking

Assume nothing. Code defensively. Keep it simple, stupid!

Varnish “sess_workspace” and why it is important

One thought on “Varnish “sess_workspace” and why it is important”

Leave a Reply Cancel reply