My Opera front page caching and Varnish hacking

The My Opera front page

According to our internal statistics, the front page of My Opera makes up for a consistent part of the entire traffic we get on our servers. So it's normal we have been working to optimize it for a very long time.

When we knew that Opera Mini 5.0 would be released with our front page as one of the preloaded speed dials, then we started to study the situation in more depth and plan what to do (and quickly!).

Mini 5 is already out, and used by lots of people, and during the last months, we have been getting more and more front page views than ever. What I'm going to tell you is the last (final?) step of the front page performance optimizations we worked on. If it works well, we could be able to apply it to other heavy parts of the site.

Enter Varnish…

Varnish is a reverse proxy cache software.
If you know Varnish already, I suggest you take a look at this great presentation from OSCON 2009.

During October 2009, we deployed our first Varnish server for My Opera, for some very specific and mostly static content. At that time, for me it was very experimental. I hardly knew anything about Varnish :) and in fact, we had some problems here and there. Then we gradually acquired some experience, and so we thought of using Varnish also, and for the first time, for a dynamic request.

Front page caching

Caching a full HTML page presents more challenges than caching a picture. For pictures, you can ignore the User-Agent and the cookies. At least in our case. You can ignore user language preferences. You can also ignore the Accept-Language HTTP header. For My Opera, we also have the Mobile view feature.

All of that means that if you're going to cache, say, the front page of My Opera, you can have:

  • 4 main types of browsers: Opera Mini, Opera Mobile, IE and the standards compliant;
  • 18 different languages, the ones in the language selector at the bottom, from Bulgarian to LOLCAT and Simplified Chinese, selected by either the sticky "language" cookie or by the Accept-Language header.
  • 2 views, mobile and full/desktop view

That makes a grand total of nearly 100 different versions of one single page.
Of course all of this is just for the logged out users. We don't want (and couldn't either) cache each single logged in user version of the frontpage (with the activity feed and all the rest).

Reducing the variations

For the caching to work properly, and be effective, we needed to find a way to reduce the possible number of versions of the front page. So in the Varnish VCL file, we match the User-Agent string, to reduce it to any of 4 predefined strings like "operamini", "operamobile", "msie", or "nomatch". So instead of having &inf; user agent strings, we get only 4.

Then another similar problem is the Accept-Language header. This header can be quite complex, depending on your browser settings, and there's no easy method to "figure-out" what language you want. From a string such as:

de-DE,tr;q=0.999,en;q=0.75,fr;q=0.9,it;q=0.8,ja;q=0.2,ru;q=0.1

you have to build a list of prioritized language preferences and match them against the languages your site can offer.

Failing to do that means, by default, having a different version of the frontpage for each different Accept-Language header, which is very variable across clients, even if there are very common values. A brief statistics gathering session showed 500 distinct values in about 10,000 browser requests.

accept-language.vcl

Varnish allows you to embed C code inside a VCL file. This is a pretty advanced feature that is not very much talked about. Given that using regexp to massage Accept-Language appeared to be messy, we discussed another crazy idea. Writing a C function to parse Accept-Language, and then embed that function into the Varnish VCL config.

Let's say that your site has English and Japanese. Your user browsers will send every possible Accept-Language header on Earth. If you enable Vary: Accept-Language on Varnish or on your backends (and you should)
the cache hit ratio will rapidly drop, because of the huge variations in Accept-Language contents. Varnish will store one version of the page for every different accept language string. That's bad.

With this hack, the Accept-Language header will be "rewritten" to just "en" or "ja", depending on your client settings. If no match occurs, a default language will be set ("en"). This brings the language variants down to exactly 2, the number of languages your site supports. In our case it's 18 versions, so down from ~500 to 18.

It seems a bit weird that we're the only ones having this problem :)

Most probably we're trying to solve this problem directly in Varnish, while usually this is dealt with at the backend level. Solving this inside Varnish is very nice, because it allows to scale more easily to other pages as well, with no modifications to the backends config or code.

If you think this might be useful for you too, you're welcome to get the code and try it out. It's on Github:

http://github.com/cosimo/varnish-accept-language/

Pay attention! It's experimental stuff, don't try it in production without extensive testing. And let me know how it goes :)

Why is the Sun Java updater so bloody annoying?

I can't really find a reason for this.
The Java updater, this is on Windows, is trying hard to update my JRE installation, without my consent, and by overriding my choice of never auto-updating my system.

What I mean is: I explicitly disabled all automated checks and/or updates performed by this monster, and still it bugs me trying to run the update every day.

Why this has to be so annoying? I hate it. I think last time this happened I removed some registry key or probably used the MS admin console to disable some obscurely named service.

And the worst thing is that I have to waste 10 precious minutes of my life to find how to disable this annoying crap.

Java updater, I seriously hate you.

Just disabled Technorati ping from My Opera

I just found out that pinging Technorati atm is really slow, so I disabled this service by overriding the default production configuration.

It seems every Technorati ping took around 180 seconds, which is a bit round as number, so I guess their service is either timing out, or we started sending wrong data, or to the wrong URL, which is again weird, because we didn't change anything… :-)

Let's see tomorrow…

The “Gran Torino” of keyboards…

My keyboard

It's not the best keyboard for everyone of course, but it's the best for me. Totally awesome personalized keyboard. It's fully supported by my window manager. I don't understand why everyone coming at my desk refuses to write on it…

It's more than 10 years old. It has served me very well, and has been cleaned extensively 3 or 4 times with full disassembly. Here's my personal ritual. Every day when I'm finished working, I cover it from dust with a special cloth. This cloth has been covering my keyboards since I had my first C64, then C128, Amiga and now this one. :-)

Full picture here.

Ubiquity for Opera, “currency converter” and more…

Today I went back to a project that I really like, Ubiquity for Opera. Usually I do that when I'm annoyed by something (in this case I needed to quickly convert currency amounts), or when I find something funny.

This time, Ubiquity gets some more commands and some updates to existing ones.

  • the isdown command, that checks if a host is up, has been changed to be interactive. This is the first one that I managed to make interactive, as it requires a bit more magic than just opening a browser window.
  • the currency-converter command,
  • the instant-rimshot command

Download Ubiquity for Opera,
or go to the Ubiquity for Opera github repository.

Enjoy :-)

Improved slideshow in Dragonfruit

In the new Dragonfruit release, we also worked on an improved, or completely new, photo album slideshow functionality. This replaced our LightBox based slideshow that worked, but had some quirks here and there.

I think the new slideshow is really awesome, and if you didn't try it yet, you should try it now!.

Take a look at these albums:

by derspecht, http://my.opera.com/365/albums/slideshow/?album=704336

by AgnetaM, http://my.opera.com/365/albums/slideshow/?album=722249

And these are my own :-)

From the 365 group, http://my.opera.com/365/albums/slideshow/?album=769801

And one of my first photo albums on My Opera:

http://my.opera.com/cstrep/albums/slideshow/?album=504322

Opera 10 and the Microsoft Silverlight plugin

Just in case anyone is wondering…

If you don't know, SilverLight is the Microsoft answer to Flash.
If there's some website that has videos or other content that you want to see but they chose to use SilverLight, not all hope is lost.

Just go to the download page for the SilverLight plugin. If you are using Opera, it will tell you that "This browser is not supported blah blah blah…".

Ignore that bullshit and just download it. Then close Opera and install it.
Be sure to remove any pre-existing version first, or it won't work.

After the installation takes place, reopen Opera and go to the plugins page. You should see the SilverLight plugin already enabled. Congratulations, and welcome to the fantastic world of SilverLight content. :-|

YouTube is implementing OEmbed

That's good news. For once, we were faster than YouTube to implement something :-)
Anyway, if you look at any video page source code, you will find something like:

<link rel="alternate" type="application/json+oembed" href="http://www.youtube.com/oembed?url=http%3A//www.youtube.com/watch?v%3Da1Y73sPHKxw&format=json" title="Dramatic Chipmunk" />
<link rel="alternate" type="text/xml+oembed" href="http://www.youtube.com/oembed?url=http%3A//www.youtube.com/watch?v%3Da1Y73sPHKxw&format=xml" title="Dramatic Chipmunk" />

And, by looking at one of these URLs, for example the JSON one, you can see:

{
  "provider_url": "http://www.youtube.com/",
  "title": "Dramatic Chipmunk",
  "html": "<object width="384" height="313"><param name="movie" value="http://www.youtube.com/v/a1Y73sPHKxw&fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/a1Y73sPHKxw&fs=1" type="application/x-shockwave-flash" width="384" height="313" allowscriptaccess="always" allowfullscreen="true"></embed></object>",
  "author_name": "cregets",
  "height": 313,
  "width": 384,
  "version": "1.0",
  "author_url": "http://www.youtube.com/user/cregets",
  "provider_name": "YouTube",
  "type": "video"
}

So, from now on, you don't have to guess what's the HTML code to correctly embed a YouTube video (like we on My Opera did for the Embed Video button on the new blog post form, but you have the full, and always updated, HTML code in the OEmbed JSON content.

Nice, I think.

European Perl Conference, Day 1

Every YAPC::EU (Yet Another Perl Conference Europe) is a really big event in the Perl world, with lots of people from every part of the planet. I got to know some of them already, so we just meet like good friends :-) This year's theme was Corporate Perl, how Perl is used in the corporate world.

This time though I was presenting a talk during the first day of the conference: How Opera uses Perl, that's up on Slideshare right now. If you take a look at it, you will find out that we actually use Perl for a lot of systems, from the very tiny to very complex, mission-critical ones. It's been quite some fun preparing the talk, and I think it also went decently.

There were lots of other interesting talks, even lightning talks, like Giuseppe Maxia's MySQL Sandbox, or Sue Spencer's talk about "Perl at Cisco Systems". There was also a talk on roles and inheritance in OO systems by Curtis Poe of the BBC, and a really funny lightning talk by Alex Kapranoff, a russian guy, but I don't remember the title. Merijn Brand presented lots of ways to improve your Perl modules. This guy's amazing. Also avid Opera user.

During lunch we met up with Martin Berends and Carl Mäsak and talked about Perl 6 syntax, CPAN 6, etc… really cool people.