The My Opera front page
According to our internal statistics, the front page of My Opera makes up for a consistent part of the entire traffic we get on our servers. So it's normal we have been working to optimize it for a very long time.
When we knew that Opera Mini 5.0 would be released with our front page as one of the preloaded speed dials, then we started to study the situation in more depth and plan what to do (and quickly!).
Mini 5 is already out, and used by lots of people, and during the last months, we have been getting more and more front page views than ever. What I'm going to tell you is the last (final?) step of the front page performance optimizations we worked on. If it works well, we could be able to apply it to other heavy parts of the site.
Enter Varnish…
Varnish is a reverse proxy cache software.
If you know Varnish already, I suggest you take a look at this great presentation from OSCON 2009.
During October 2009, we deployed our first Varnish server for My Opera, for some very specific and mostly static content. At that time, for me it was very experimental. I hardly knew anything about Varnish :) and in fact, we had some problems here and there. Then we gradually acquired some experience, and so we thought of using Varnish also, and for the first time, for a dynamic request.
Front page caching
Caching a full HTML page presents more challenges than caching a picture. For pictures, you can ignore the User-Agent and the cookies. At least in our case. You can ignore user language preferences. You can also ignore the Accept-Language
HTTP header. For My Opera, we also have the Mobile view feature.
All of that means that if you're going to cache, say, the front page of My Opera, you can have:
- 4 main types of browsers: Opera Mini, Opera Mobile, IE and the standards compliant;
- 18 different languages, the ones in the language selector at the bottom, from Bulgarian to LOLCAT and Simplified Chinese, selected by either the sticky "language" cookie or by the
Accept-Language
header. - 2 views, mobile and full/desktop view
That makes a grand total of nearly 100 different versions of one single page.
Of course all of this is just for the logged out users. We don't want (and couldn't either) cache each single logged in user version of the frontpage (with the activity feed and all the rest).
Reducing the variations
For the caching to work properly, and be effective, we needed to find a way to reduce the possible number of versions of the front page. So in the Varnish VCL file, we match the User-Agent
string, to reduce it to any of 4 predefined strings like "operamini", "operamobile", "msie", or "nomatch". So instead of having &inf; user agent strings, we get only 4.
Then another similar problem is the Accept-Language
header. This header can be quite complex, depending on your browser settings, and there's no easy method to "figure-out" what language you want. From a string such as:
de-DE,tr;q=0.999,en;q=0.75,fr;q=0.9,it;q=0.8,ja;q=0.2,ru;q=0.1
you have to build a list of prioritized language preferences and match them against the languages your site can offer.
Failing to do that means, by default, having a different version of the frontpage for each different Accept-Language header, which is very variable across clients, even if there are very common values. A brief statistics gathering session showed 500 distinct values in about 10,000 browser requests.
accept-language.vcl
Varnish allows you to embed C code inside a VCL file. This is a pretty advanced feature that is not very much talked about. Given that using regexp to massage Accept-Language appeared to be messy, we discussed another crazy idea. Writing a C function to parse Accept-Language
, and then embed that function into the Varnish VCL config.
Let's say that your site has English and Japanese. Your user browsers will send every possible Accept-Language header on Earth. If you enable Vary: Accept-Language
on Varnish or on your backends (and you should)
the cache hit ratio will rapidly drop, because of the huge variations in Accept-Language contents. Varnish will store one version of the page for every different accept language string. That's bad.
With this hack, the Accept-Language
header will be "rewritten" to just "en" or "ja", depending on your client settings. If no match occurs, a default language will be set ("en"). This brings the language variants down to exactly 2, the number of languages your site supports. In our case it's 18 versions, so down from ~500 to 18.
It seems a bit weird that we're the only ones having this problem :)
Most probably we're trying to solve this problem directly in Varnish, while usually this is dealt with at the backend level. Solving this inside Varnish is very nice, because it allows to scale more easily to other pages as well, with no modifications to the backends config or code.
If you think this might be useful for you too, you're welcome to get the code and try it out. It's on Github:
http://github.com/cosimo/varnish-accept-language/
Pay attention! It's experimental stuff, don't try it in production without extensive testing. And let me know how it goes :)