Monthly Archives: May 2013

Display and filter traffic at the varnish level: vlogdump

Haven’t written much in the last few months. The reason is that I’ve been at work building the Opera Discover service backend, that we launched on Opera mobile for Android just a few days ago.

A few weeks before, during the first stress test sessions of Discover, I wrote this little tool called vlogdump that Opera allowed me to put up on github. The main purpose, besides learning awk :-) is to display and filter traffic coming into your varnish daemon..

vlogdump is not meant to replace varnishlog but I know that sometimes varnishlog gives me too much output to deal with, especially if I want to pinpoint a single client or a single request. I know that the varnishlog that ships with varnish 3.0.x is way better in this regard, but we’re using 2.1.x, and that version of varnishlog is not as capable.

vlogdump is easier to look at than varnishlog, but at the same time it conveys much more information than varnishncsa or the typical access.log format.

Here’s an example of output:

$ varnishlog | vlogdump -v only_misses=1
172.22.0.15 => GET /assets/e85ed0a7b1b87120a0a2bfa025531c6733a48802 HTTP/1.0 MISS
            <= 200 OK 28.432 ms 172.22.0.18 => GET /assets/5a9e9440c5c85e8dc5d65e03e15c95e390901fa7 HTTP/1.0 MISS
            <= 200 OK 36.905 ms 172.22.0.18 => GET /icons/categories/te/icon32x32-technology.png HTTP/1.0 MISS
            <= 304 Not Modified 0.589 ms 172.22.0.15 => GET /api/fetch/article-preview/?client=2&language=en-GB HTTP/1.1 MISS
            <= 301 MOVED PERMANENTLY 8.381 ms 172.22.0.18 => GET /assets/c3830e95b717761005e26ce49ebab253e0ccb40b HTTP/1.0 MISS
            <= 200 OK 291.354 ms 172.22.0.18 => GET /api/category?client=2&language=en-GB HTTP/1.1 MISS
            <= 200 OK 58.025 ms   ...

Another interesting example.

Show request and response headers of transactions that resulted in cache hits and had request headers (any of them) matching "Android":

$ varnishlog | vlogdump -v show_req_headers=1 -v show_resp_headers=1 -v req_headers_match=Android -v only_hits=1
83.149.37.122 => GET /api/category/?... HTTP/1.1 HIT
            <= 200 OK         0.088 ms
   req.http.Accept = application/json;version=1
   req.http.Accept-Encoding = gzip
   req.http.Host = ...opera.com
   req.http.Connection = Keep-Alive
   req.http.User-Agent = Mozilla/5.0 (Linux; Android 4.1.2; GT-N7100 Build/JZO54K) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.58 Mobile Safari/537.31 OPR/14.0.1074.57768
 
   beresp.http.Server = Apache
   beresp.http.Content-Encoding = gzip
   beresp.http.Content-Type = application/json
   beresp.http.Vary = Accept-Encoding, Origin
   beresp.http.Content-Length = 4217
   beresp.http.Date = Sat, 25 May 2013 07:59:53 GMT
   beresp.http.X-Varnish = 1611090407 1611007435
   beresp.http.Age = 267
   beresp.http.Via = 1.1 varnish
   beresp.http.Connection = keep-alive

Now that you're eager to try it :-), you can do so in a few commands, and assuming you have the right awk installed:

wget -q -Ovlogdump https://raw.github.com/cosimo/vlogdump/master/vlogdump
varnishlog | ./vlogdump [options]

The documentation lists all the available options.

You can do more interesting things:

  • display request or response headers for each transaction
    (-v show_req_headers=1, -v show_resp_headers=1)
  • show only requests slower than 200ms (vlogdump -v only_slow=200)
  • show only cache misses or hits (-v only_hits=1 or only_misses=1)
  • show only transactions where the URL matches regexp X
    (-v url_match='X' or -v url_match='!X' for negative match)
  • show only transactions where the HTTP status code was X
    (-v only_status=X)
  • show only transactions where the request or response headers match a given regular expression (-v req_headers_match=Blah, -v resp_headers_match=Error)

You can also combine most of these options together. That is very useful when you are interested in a small fraction of the traffic, but you want to see the whole in-flight transactions.

One recommendation though. It is my first (last?) significant awk script :-) I know it works well, and I'm using it, but due to the way it works, I wouldn't leave it running for long periods of time, as it will slowly eat your memory keeping track of all transactions and clients.

If you have feedback or questions, feel free to comment on github or send me an email.