Monthly Archives: January 2010

URL shortening in Ubiquity for Opera too…

Given the cool recent activity around url shortening in Opera, I thought I could also give my small contribution.

In fact, a url shortening command is missing in Ubiquity for Opera. Or rather, it was missing.

There's a new command, shorten-url, based on bit.ly's API, that allows you, as usual with Ubiquity commands, to shorten the current open tab URL, or shorten any URL you type in the Ubiquity window. Here you can see a screenshot as example:

This new command also uses the amazing ajax-enabling UserJS library by xErath. Another interesting news is that from now on, I'll use YUI Compressor to also ship a minified version of the ubiquity javascript code, that almost halves the size, so that's good, since we're already at ~80kb uncompressed.

As usual, you can download Ubiquity for Opera,
(or the minified version), or go to the Ubiquity for Opera github repository.

Enjoy :-)

My Opera front page caching and Varnish hacking

The My Opera front page

According to our internal statistics, the front page of My Opera makes up for a consistent part of the entire traffic we get on our servers. So it's normal we have been working to optimize it for a very long time.

When we knew that Opera Mini 5.0 would be released with our front page as one of the preloaded speed dials, then we started to study the situation in more depth and plan what to do (and quickly!).

Mini 5 is already out, and used by lots of people, and during the last months, we have been getting more and more front page views than ever. What I'm going to tell you is the last (final?) step of the front page performance optimizations we worked on. If it works well, we could be able to apply it to other heavy parts of the site.

Enter Varnish…

Varnish is a reverse proxy cache software.
If you know Varnish already, I suggest you take a look at this great presentation from OSCON 2009.

During October 2009, we deployed our first Varnish server for My Opera, for some very specific and mostly static content. At that time, for me it was very experimental. I hardly knew anything about Varnish :) and in fact, we had some problems here and there. Then we gradually acquired some experience, and so we thought of using Varnish also, and for the first time, for a dynamic request.

Front page caching

Caching a full HTML page presents more challenges than caching a picture. For pictures, you can ignore the User-Agent and the cookies. At least in our case. You can ignore user language preferences. You can also ignore the Accept-Language HTTP header. For My Opera, we also have the Mobile view feature.

All of that means that if you're going to cache, say, the front page of My Opera, you can have:

  • 4 main types of browsers: Opera Mini, Opera Mobile, IE and the standards compliant;
  • 18 different languages, the ones in the language selector at the bottom, from Bulgarian to LOLCAT and Simplified Chinese, selected by either the sticky "language" cookie or by the Accept-Language header.
  • 2 views, mobile and full/desktop view

That makes a grand total of nearly 100 different versions of one single page.
Of course all of this is just for the logged out users. We don't want (and couldn't either) cache each single logged in user version of the frontpage (with the activity feed and all the rest).

Reducing the variations

For the caching to work properly, and be effective, we needed to find a way to reduce the possible number of versions of the front page. So in the Varnish VCL file, we match the User-Agent string, to reduce it to any of 4 predefined strings like "operamini", "operamobile", "msie", or "nomatch". So instead of having &inf; user agent strings, we get only 4.

Then another similar problem is the Accept-Language header. This header can be quite complex, depending on your browser settings, and there's no easy method to "figure-out" what language you want. From a string such as:

de-DE,tr;q=0.999,en;q=0.75,fr;q=0.9,it;q=0.8,ja;q=0.2,ru;q=0.1

you have to build a list of prioritized language preferences and match them against the languages your site can offer.

Failing to do that means, by default, having a different version of the frontpage for each different Accept-Language header, which is very variable across clients, even if there are very common values. A brief statistics gathering session showed 500 distinct values in about 10,000 browser requests.

accept-language.vcl

Varnish allows you to embed C code inside a VCL file. This is a pretty advanced feature that is not very much talked about. Given that using regexp to massage Accept-Language appeared to be messy, we discussed another crazy idea. Writing a C function to parse Accept-Language, and then embed that function into the Varnish VCL config.

Let's say that your site has English and Japanese. Your user browsers will send every possible Accept-Language header on Earth. If you enable Vary: Accept-Language on Varnish or on your backends (and you should)
the cache hit ratio will rapidly drop, because of the huge variations in Accept-Language contents. Varnish will store one version of the page for every different accept language string. That's bad.

With this hack, the Accept-Language header will be "rewritten" to just "en" or "ja", depending on your client settings. If no match occurs, a default language will be set ("en"). This brings the language variants down to exactly 2, the number of languages your site supports. In our case it's 18 versions, so down from ~500 to 18.

It seems a bit weird that we're the only ones having this problem :)

Most probably we're trying to solve this problem directly in Varnish, while usually this is dealt with at the backend level. Solving this inside Varnish is very nice, because it allows to scale more easily to other pages as well, with no modifications to the backends config or code.

If you think this might be useful for you too, you're welcome to get the code and try it out. It's on Github:

http://github.com/cosimo/varnish-accept-language/

Pay attention! It's experimental stuff, don't try it in production without extensive testing. And let me know how it goes :)