Category Archives: Uncategorized

Verifying MySQL behaviour with automated test suites and mytap

You know everything about how MySQL treats UTF8 and LATIN1 charsets and how the collation table impacts on selection and insertion of data, right?

Great, then stop reading :)

I don't and since I'm in the process of setting up a new version of the Opera accounts database, I really don't want to screw up things. I tried to fully understand how MySQL works in this respect (charsets, collations, etc…) but reading documentation and memorizing it wasn't very easy. Plus, there's a thousands blog posts on the matter, not always 100% accurate.

So I thought I'd better get hands on and I wrote a kind of database test suite.

Now this test suite is hooked up to the main project builds on Jenkins. Here's a sample output:


[...]
[workspace] $ /bin/sh -xe /tmp/hudson3255767718598715423.sh
+ ./bin/run-dbtest-suite
basedir=/var/lib/jenkins/jobs/auth-db/workspace
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/__initdb__.my ........................................... 
1..2
ok 1 - Using utf8tests database
ok 2 - Server charset is latin1
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_bin.my ................................... 
1..6
ok 1 - All our records are there. No duplicate key error.
ok 2 - utf8_bin collation does not collate a/â/à/A/...
ok 3 - utf8_bin collation does not collate a/â/à/A/...
ok 4 - utf8_bin collation does not collate a/â/à/A/...
ok 5 - Query for mixed-case username does not return lowercase username
ok 6 - Query for upper-case username does not return lowercase username
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_general_ci.my ............................ 
1..7
ok 1 - Collation for t007 is utf8_general_ci
ok 2 - utf8_general_ci collation normalizes accents, diacritics and the like
ok 3 - A and Å are collated to the same character in the utf8_general_ci table
ok 4 - å and Å are collated to the same character in the utf8_general_ci table
ok 5 - lower/upper case chars are collated in the utf8_general_ci table
ok 6 - lower/upper case chars are collated in the utf8_general_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_unicode_ci.my ............................ 
1..7
ok 1 - Collation for t005 is utf8_unicode_ci
ok 2 - utf8_unicode_ci collation normalizes accents, diacritics and the like
ok 3 - A and Å are collated to the same character in the utf8_unicode_ci table
ok 4 - å and Å are collated to the same character in the utf8_unicode_ci table
ok 5 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 6 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/default-table-charset.my ................................ 
1..3
ok 1 - Default character set is utf8 when no charset is specified (from server)
ok 2 - Default character set is utf8 when "CHARSET utf8" specified in the CREATE TABLE
ok 3 - Default character set is utf8 when "CHARSET utf8" and "COLLATE" specified in the CREATE TABLE
ok
...
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/username-with-utf8-chars.my ............................. 
1..5
ok 1 - We have some UTF-8 encoded string in our hands (hex)
ok 2 - We have some UTF-8 encoded string in our hands (charset)
ok 3 - Can select back UTF-8 content from a CHARSET utf8 table
ok 4 - Given string is exactly 24 bytes long (length)
ok 5 - Given string is exactly 8 (wide) characters long (char_length)
ok
All tests successful.
Files=11, Tests=80, 0.739731 wallclock secs ( 0.05 usr  0.02 sys +  0.07 cusr  0.01 csys =  0.15 CPU)
Result: PASS
Recording test results
Finished: SUCCESS

And here's an example of "sanity check" test case, which doesn't do much:


   1 -- Check that we can insert and retrieve UTF-8 content correctly
   2 
   3 BEGIN;
   4 
   5 SET NAMES utf8;
   6 
   7 SELECT tap.plan(5);
   8 
   9 USE auth_utf8tests;
  10 
  11 SET @username = '今日话题今日话题';
  12 SET @encoded  = 'C1BB8AE697A5E8AF9DE9A298E4BB8AE697A5E8AF9DE9A298';
  13 
  14 SELECT tap.eq(
  15     HEX(@username),
  16     @encoded,
  17     'We have some UTF-8 encoded string in our hands (hex)'
  18 );
  19 
  20 SELECT tap.eq(
  21     CHARSET(@username),
  22     'utf8',
  23     'We have some UTF-8 encoded string in our hands (charset)'
  24 );
  25 
  26 INSERT INTO t001 (f1) VALUES (@username);
  27 
  28 SELECT tap.eq(
  29     (SELECT HEX(f1) FROM t001 WHERE f1 = @username),
  30     @encoded,
  31     'Can select back UTF-8 content from a CHARSET utf8 table'
  32 );
  33 
  34 SELECT tap.eq(
  35     (SELECT LENGTH(f1) FROM t001 WHERE f1 = @username),
  36     24,
  37     'Given string is exactly 24 bytes long (length)'
  38 );
  39 
  40 SELECT tap.eq(
  41     (SELECT CHAR_LENGTH(f1) FROM t001 WHERE f1 = @username),
  42     8,
  43     'Given string is exactly 8 (wide) characters long (char_length)'
  44 );
  45 
  46 -- Finish the tests and clean up.
  47 CALL tap.finish();
  48 ROLLBACK;

This SQL test code uses mytap. You can see how the SELECT tap.* calls are just the equivalents of the TAP testing framework of Perl. SELECT tap.eq() is the equivalent of Test::More::is(), and so on.

Another, more interesting test case, is the following:


   1 --
   2 -- Verify how the utf8_unicode_ci collation works
   3 --
   4 
   5 BEGIN;
   6 
   7 SET NAMES utf8;
   8 
   9 SELECT tap.plan(12);
  10 
  11 USE auth_utf8tests;
  12 

  [...]

  40 SELECT tap.eq(
  41     (SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA=SCHEMA() AND TABLE_NAME='t015'),
  42     'utf8_unicode_ci',
  43     'Collation for t015 is utf8_unicode_ci'
  44 );

  [...]

  48 
  49 SELECT tap.eq(
  50     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1a' ORDER BY id),
  51     '10',
  52     'utf8_unicode_ci collation normalizes accents, diacritics and the like'
  53 );
  54 
  55 SELECT tap.eq(
  56     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1Å' ORDER BY id),
  57     '10',
  58     'A and Å are collated to the same character in the utf8_unicode_ci table'
  59 );
  60 
  61 SELECT tap.eq(
  62     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1å' ORDER BY id),
  63     '10',
  64     'å and Å are collated to the same character in the utf8_unicode_ci table'
  65 );
  66 
  67 SELECT tap.eq(
  68     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TestUser1A' ORDER BY id),
  69     '10',
  70     'lower/upper case chars are collated in the utf8_unicode_ci table'
  71 );
  72 
  73 SELECT tap.eq(
  74     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TESTUSER1A' ORDER BY id),
  75     '10',
  76     'lower/upper case chars are collated in the utf8_unicode_ci table'
  77 );
  78 
  79 SELECT tap.eq(
  80     (SELECT COUNT(*) FROM t015),
  81     1,
  82     'We are allowed to insert only 1 record, because the others collate to the same string'
  83 );
  84 
  85 -- Finish the tests and clean up.
  86 CALL tap.finish();
  87 ROLLBACK;

An interesting thing that I didn't know how to do in the beginning is how to trap errors. I left out that part from the test code to simplify, but here it is:


  13 DELIMITER //
  14 
  15 DROP PROCEDURE IF EXISTS populate_table //
  16 
  17 CREATE PROCEDURE populate_table ()
  18 BEGIN
  19 
  20     DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' BEGIN
  21         SELECT tap.ok(
  22             1,
  23             'We should get dupkey errors when inserting data with collation utf8_unicode_ci'
  24         );
  25     END;
  26 
  27     INSERT INTO t015 (id,username,note) VALUES (10, 'testuser1a', 'plain');
  28     INSERT INTO t015 (id,username,note) VALUES (20, 'testuser1â', 'circumflex a');
  29     INSERT INTO t015 (id,username,note) VALUES (30, 'testuser1à', 'a grave');
  30     INSERT INTO t015 (id,username,note) VALUES (40, 'testuser1Å', 'A circ');
  31     INSERT INTO t015 (id,username,note) VALUES (50, 'TestUser1A', 'mixed case');
  32     INSERT INTO t015 (id,username,note) VALUES (60, 'TESTUSER1A', 'upper case');
  33 
  34 END;
  35 
  36 //
  37 
  38 DELIMITER ;
  39 
  46 /* Should generate 5 dupkey errors (taken as successful tests) */
  47 CALL populate_table;

It's a bit convoluted. To trap errors you have use the DECLARE HANDLER statement. DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' means that whenever SQLSTATE is '23000', and that corresponds to a duplicate key error, then execute this block of code. All of that must necessarily be wrapped into a stored procedure. Handlers outside of stored procedures are not allowed.

In this particular tests, the table uses the utf8_unicode_ci collation table, so we are expecting a duplicate key error on username whenever we insert the string 'testuser1à' or 'TESTUSER1A', because 'testuser1a' was already inserted at the beginning. Of all the INSERT statements, only the first one is bound to succeed, so I put a SELECT tap.ok(1) for the duplicate key HANDLER and I expect 5 tests when I make the CALL populate_table;.

This of course may seem trivial. And I guess it is, but for me it's a much better way of learning than scouring through the manuals or the many blog posts out there that may or may not reflect the environment I'm working with.

Routinely running this kind of test suite makes it possible and easy to verify the database behaviour:

instantly
after upgrades (5.1 -> 5.5? -> 6?) or storage engine changes
after mysql configuration changes. For example, I discovered in this way that adding default-charset=utf8 in my MySQL config breaks everything.

I consider this my live documentation on how MySQL works. I would really appreciate if you have any feedback on this. Have fun!

Report from the Varnish Users Group (VUG5) meeting in Paris – Day 1

Leave a reply

Last week I attended the VUG5 meeting (https://www.varnish-cache.org/vug5). The following is my report of the conference Day 1, the "Users" day.

TL;DR

I learned a lot on (for me) gray areas of Varnish like 3.0, VMODs, ESI and various corner cases. My presentation on how we use Varnish at Opera sparked a lot of interest especially in our thumbnail service.

Day 1, VUG5 users day

Day 1 was held at La Défense, a mega business district just outside of Paris. All day was filled with presentations by Varnish Software people and a few other companies. On with the list, and my notes on the side.

Keynote: Varnish in 2020 by Poul Henning Kamp, Varnish Software

Poul runs thttpd, he's not a varnish user, so welcomes feeback from all users. That's why of the VUGs.

Varnish today is "The HTTP delivery engine". And in 2020? Hard to predict. PHK usually predicts things really badly. What we _can_ see is:

HTTP/2.0 Last call status just a few weeks ago
Google's SPDY support in Varnish? Most likely. Depends on future development and what/how many clients pick it up
HTTP over UDP? Lots of interest in this lately

Most likely future work on varnish:

Clearer split of transport and semantics
(could speak HTTP no matter whether over UDP, TCP or SPDY)
Generic pluggable protocols (SPDY, f.ex.)
Decouple client protocol and backend protocol. Talk SPDY to client, talk HTTP to backend.

SSL in Varnish? Unlikely, just use Pound or nginx or whatever. Pound is simple and robust.

Varnish Book by Kristian Lyngstøl, Varnish Software

Expanding and improving on the existing training course material, Kristian and some contributors created a "Varnish Book", to help people starting up with Varnish. It ~~will be~~ is freely available at https://www.varnish-software.com/static/book/. ~~Now there's only a cute bunny though.~~

Varnish + Escenic by Richard Zuidhof, Escenic?

Richard explained how he used Varnish to migrate away from the Apache/Squid/Apache sandwich and made it better/faster and his company saved a lot of money in the process.

Interesting points:

50x errors received from the backends are served doing a restart in vcl_fetch() but hitting a "dummy" backend, a sort of static version of a real backend. Something like:


  sub vcl_recv {
     ...
     if (req.restarts > 0) {
         set req.backend = dummy;
     }
  }
 
  sub vcl_fetch {
     if (beresp.status == 500) {
        return (restart); # Or whatever this is
     }
  }

Also talked about various timeouts, like:

 
  {
    .first_byte_timeout = 1s;
    .between_bytes_timeout = 1s;
  }

and how he needed to reset them back to 120s/180s for some of their pages to work.

He said: a timeout event from backend should cause Varnish to fall back to stale content. Not the case currently.
Varnish will abort the fetch operation. So pay attention.

Mobile device detection by Lasse Karsten, Varnish Software

Talked about various libraries and ways to detect mobile devices, including:

libvarnish-deviceatlas
WURFL
… others I didn't write down in time

Basically it was a way to survey how many people
use this technology and say that Varnish Software has a
commercial solution but they are going to open source
it Soon(tm), or something along these lines.

I was a bit distracted because I was having problems with the laptop and my presentation
was coming up, so… I plan to go back to this presentation once the slides are up.

ESI and Varnish by Federico Schwindt, RBS

Summary of how RBS is using ESI for an internal website used by RBS employees.

Basically the service is composed of various "boxes", small windows in the page with some information that depends on location, department or other things, and they use Varnish to cache those small boxes and ESI to compose the final page.

Problems:

They can't find a way to also keep the fully composed page as a cache object.
Invalidation logic is complex because of inter-dependent content between different boxes.

Interesting: they use a HTTP header sent by the backend to instruct Varnish on when to do ESI processing, so ESI is not a on/off as a whole, but it can be triggered on specific pages. This is very cool because it could also solve the development/production setup problem I had always feared when using ESI. With that I mean the complication of using development environments with ESI, where every dev installation needs a ESI-aware varnish.

Varnish at Opera by me

I talked about how we use Varnish in our projects. I mentioned a few Varnish extensions I worked on, including varnish-accept-language and varnish-geoip, plus other tools like http-cuke.

There were plenty of real world examples of VCL configuration we use in the various projects. I also talked about the varnish puppet module we wrote, that comes with a bunch of interesting customizations and fixes, included in the puppet-modules repository on Github.

If you're interested, slides are published here:

http://www.slideshare.net/cstrep/vug5-varnish-at-opera-software

I got lots of feedback and questions about our picture thumbnail service, so I'll probably write more about it soon.

Security with VCL by Kacper Wysocki, Redpill Linpro

Easily one of the best talks of the day. Kacper explained his security.vcl project. Here's a few highlights, but it's really interesting, I hope slides will be up soon.

Wrote modsec rules parser and converter to VCL
Eduardo Scarpellini, Master thesis, OWASP, worked on a varnish-firewall project, similar in scope, and did a in-depth research, finding that out of the OWASP top broken apps, he could automatically block 73% of XSS and SQL injections.
security.vcl is now used in ~10 sites with lots of traffic
Drawback compared to mod-security is that no POST data can be analyzed (yet)
In the future, we will see a merge of security.vcl and varnish-firewall projects.

Varnish modules by Kristian Lyngstøl, Varnish Software

I don't remember much, but I think Kristian basically tried to get more people to use VMODs, and said there's now a nice page where a list of known VMODs is kept:

http://www.varnish-cache.org/vmods

and you can register your own VMODs and have them listed.

Stay tuned for the "Day 2, Developers day" part.

Using hypnotoad in production, anyone?

Leave a reply

So, you're using hypnotoad in production. And it works perfectly for you. Maybe you have an Nginx or Apache in front of it configured as reverse proxy. Everything's great. Right? Right. Then I have a zillion questions for you.

Maybe I don't understand how it works, but I'm having the following problems:

"sometimes" hypnotoad won't stop. I usually try to stop it with:
```
hypnotoad --stop /path/to/my/script
```
I use symlinks to deploy applications, so for example I deploy in /opt/myapp and each new deployment gets a timestamped folder, /opt/myapp/releases/20120224-180801.
Then there's a symlink that always points to the last deployed version, /opt/myapp/current → /opt/myapp/releases/{whatever-datetime}. Now, using hypnotoad --stop /opt/myapp/current doesn't work, because hypnotoad probably uses the actual filename, not the symlink, to identify the running application.

That's fine, but then how can I stop it reliably? I wish it had a hypnotoad --force-stop mode or something.
Last problem, when I push a new deployment, and stop and restart hypnotoad, often the application doesn't work properly, it only generates exceptions for unknown reasons. Stopping and restarting again manually usually fixes the problems…

I was a bit frustrated today, so I decided to switch back to starman. I have never ever had a problem with it, so I will stick to it for now. But I would still be interested to know whether you use hypnotoad in production and how well it works. Write in the small box below, you don't need to register. Thanks :)

Find uses of perl 5.10 features in your code: a bit of PPI magic

Leave a reply

This morning on IRC we were talking about old perl installations, and how forgetting the use 5.010 but using 5.10 features, for example the // operator, can be a problem.

I suggested maybe using a git hook would be an idea, so I assembled this proof of concept script to test for 5.10-isms but without a use 5.010; statement. It's too long to include here, so I put in on Github (https://gist.github.com/1875528).

The script uses the "impossible" PPI module to parse the Perl code and extract information about used operators using PPI::Token::Operator, to catch // or ~~ or similar, and use statements, with PPI::Statement::Include.

It's meant to be run as part of a post-commit hook or similar. I had searched for similar modules in the Perl::Critic space but found nothing of sorts. Maybe I just didn't look too well?

Anyway, https://gist.github.com/1875528. Enjoy.

Perl client for Etsy statsd, improved and released v0.02 on CPAN

Leave a reply

Sometimes bugs reported on the CPAN issue tracker are the perfect excuse to improve your code. In this case, my client module for Etsy statsd service, Net::Statsd, got an update because of this ticket, RT#74172.

As with all my recent CPAN module, when a new bug is filed against it, I try to create a specific test case. Sometimes it's quite hard to do, but this time wasn't, even though I had to refactor the existing code. This allowed me to improve the testability of the code in the process, so thanks to the reporter of that ticket :)

I still haven't managed to test my own code with the statsd service, and hook it up to Graphite. Soon :-)

As usual, code is up on Github:

https://github.com/cosimo/perl5-net-statsd

And on CPAN too:

https://metacpan.org/module/Net::Statsd.
Have fun!

My experience at Velocity Europe 2011 in Berlin

Leave a reply

TL;DR

This year there was the 1st edition of Velocity Europe. I got to present a talk on a DDoS attack we faced at Opera, and it was really awesome to be there.

The long version…

Around July this year I knew there was going to be a Velocity Conference in Europe, and I decided I would try to propose a few ideas for talks. I didn't have my hopes too high, but I wanted to give it a shot anyways, pushing myself way out of my comfort zone :)

The worst that could happen was that the talks didn't get accepted. After a month or so the crazy thing happened, and I got an invite to speak at Velocity, due in November.

Preparation

The first few months passed while I was slowly gathering material for the talk. The idea was talking about the DDoS attack that struck us in October 2010. Almost a year had passed, so if we hadn't taken notes and collected all sorts of logs and information, we wouldn't have had any chance to reconstruct all the "story" with enough detail to be interesting.

Anyway, weeks went by, and in September I started writing down an outline of the talk. It consisted in describing what happened during the DDoS and how we faced it, what we did, how we figured out what to do, etc… but I didn't have a clear idea of what to convey with the actual presentation. What would be the core message, if any?

If I learned anything out of all this, is that writing an outline is absolutely the best favor you can do to yourself to avoid so many problems later on. Just write it down as a text, a blog post, a story. Mind maps are also useful for me.

Last 3-4 weeks flew away while I was trying to put together a decent deck of slides.

In Berlin: pre-conference

"Birds of feathers" was the pre-conference event that took place on Monday 7th (November 2011, if you're reading this in the future), put together by a local team led by Schlomo Schapiro, which I got to talk to also during the conference. It was a good event, Steve Souders and John Allspaw and many other conference attendees were already there. There were sponsor companies presenting their products.

The most interesting sessions of the pre-conference IMO were:

100ms: Steve Souders pushed everyone to think about the next level of web performance. How to bring down the "loading time" of web pages to 100ms. There was an interesting discussion about that. My point was that loading time really needs to be divided into at least dns resolution, server processing, network transfers, client rendering. So there's at least 4 totally different chunks that make up the load time and all of them can be optimized, but with varying levels of gain and complexity.
Dyn inc presentation about their product dashboard, that led to a better productivity and communication between teams. Cory van Wollerstein explained their mash-up of Jira and Confluence, used to automatically pull information from the tickets db and provide high-level overviews to executive teams. Very cool. He also argued whether having product managers is a good thing for a company.

The rest of the day I was busy polishing my presentation, and trying to rehearse at the hotel. A month before the conference, I had bought The Naked Presenter (ebook edition), hoping that it would help me do a decent presentation. The book of course recommended to rehearse. It felt very weird and embarassing, but I'm *so* glad I did it. I managed to streamline the presentation, and memorize the sequence of slides.

The Conference – Day 1

Schedule:

http://velocityconf.com/velocityeu/public/schedule/grid/2011-11-08

Keynotes

Opening remarks, plus Theo Schlossnagle, one of the minds behind SurgeCon, on how good operations dudes are usually generalists and need to have a wide spectrum knowledge instead of being "(Perl|Python|Ruby|Java) developers". I really recognize myself in this more generalist role than, for example, the Ruby-on-Rails guy.

Lightning demos

These were lightning demos during the first morning:

Page speed online: https://developers.google.com/pagespeed/
Weinre: IBM's browser remote debugger (AKA Opera's scope for WebKit), http://phonegap.github.com/weinre/

Rest of Day 1 went to hell

I had to convert all my slides to 4:3 and test again with the on-site equipment. I was also freaking out at the same time, so I missed everything else until my talk. Sorry :)

Most talks have been recorded and are already up on the Velocity site. Particularly interesting IMO, but video not available yet, are:

Massively Sharded MySQL at Tumblr, Evan Elias
http://velocityconf.com/velocityeu/public/schedule/detail/21678
Introducing the Amazon Silk Browser, Jon Jenkins, Amazon
http://velocityconf.com/velocityeu/public/schedule/detail/21891

My talk

As I said, it was about the DDoS attack to my.opera.com of October 2010. I basically talked about how we found out we were under DDoS, and how we struggled to find our way to keep the site up and running despite the traffic. This was a mid-scale DDoS with around 18k distinct IPs attacking us. We had a hard time, but it was also very much fun in retrospect :) We learned quite a lot in the process, about HTTP and TCP/IP, nkiller2 and the TCP zero-window exploit. Most importantly, we learned to make better use of old and new tools to do troubleshooting. You will find all of this in the slides.

I did my best, and I think it was well received by the audience. While on stage, I really had the feeling that people enjoyed it, plus several folks came to say hello afterwards. One of the most frequent comments I heard was that people found my talk honest. That is the single thing I appreciate the most, because that had been one of my goals since the start. To tell an honest and detailed story of how things went, without pretending to be the super awesome heroes that know everything and can fix anything in no time.

Unfortunately, after the conference I was informed that there had been no recording of the talk. That is really sad. However, since there's no recording, I can pretend I was a nice speaker, given the ratings :). Seriously, if you have a picture or video recording, contact me :)

Here's the slides if you're interested:

http://velocityconf.com/velocityeu/public/schedule/detail/21653

The Conference – Day 2

Schedule:

http://velocityconf.com/velocityeu/public/schedule/grid/2011-11-09

Keynotes

Very inspirational talk by Jeff Veen, Typekit.com

Very well presented, great visuals. Great overall. How to create conditions for teams to work and work well.

http://velocityconf.com/velocityeu/public/schedule/detail/21788

Anticipation: What could possibly go wrong? by John Allspaw, Etsy

A great talk about how to prevent, analyze, respond to Operations problems. I very much like John's style, I think he's a pioneer, at least he introduced me to many great ideas, one above all, continuous deployment. I also like his many references to aviation, aerospace and military engineering fields.

http://velocityconf.com/velocityeu/public/schedule/detail/22258

Full stack awareness, Artur Bergman, Fastly

He's Artur Bergman. Listen to him :-) If anything, because he's really authentic.

http://velocityconf.com/velocityeu/public/schedule/detail/22914

Lightning demos

Another session of lightning demos, for our pleasure:

Dynatrace AJAX edition
FITB, network switches monitoring, https://github.com/lozzd/FITB. During the talk, he mentioned, and it seems pretty interesting, https://github.com/etsy/dashboard.

Browser performance track

This was a track in itself. I lost all of it, since I mainly followed the Operations track, but this was really interesting I heard. Recent speed enhancements in Opera, Chrome, Firefox and Javascript in general were explained in detail.

Afternoon talks

Deploying large payloads at scale, Ramon van Alteren (hyves.nl)

Biggest social network in the Netherlands (4M daily active users, ~10M total users). Ramon is a very cool guy. They have 3.5K servers, and their main application consists of 750Mb compiled php binaries to deploy. And they are experimenting with bittorrent tools to do that :)

I had a few hours of engaging talk with Ramon at one of the social events that followed the conference. We found lots of similarities in how we're dealing with infrastructural growth, scaling, etc… We both use config management tools like puppet extensively in our organizations. We promised each other to remain in touch about deployment matters.

http://velocityconf.com/velocityeu/public/schedule/detail/21571

HTTP connection management from 10 users to 100 millions, Bradley Heilbrun, YouTube

Really interesting dive into YouTube early (2005-2007) architecture with Apache, load balancers, GSLB.

I met Bradley later on that day and we had a quick chat. Turns out they use(d) PowerDNS with its pipe backend for geographic load balancing, much like as we do in Opera with GeoDNS. That made my day :-) It's a pity that companies like YouTube don't talk much about their current technology. They usually tell you about 2-3 years old architectures. That's still very valuable, of course.

http://velocityconf.com/velocityeu/public/schedule/detail/21708

Conclusions

If you're even remotely interested in operations, devops, running a service, scaling, performance, infrastructure, then Velocity is the conference. Surge is another one, probably even better, more hardcore-engineering focused. From my perspective, there's a couple of things that could be improved:

while I understand that sponsors are what makes conferences like Velocity possible, some sponsors took too much time out of the actual talk tracks. One or two talks were very promotional in nature, and it was clear to everyone that these companies were pushing their products or themselves. Maybe it wasn't their intention, but to me and to others I talked to, it came out that way.
I think Velocity needs to screen better this type of talks and separate them from the authentic content that people want, the "stories from the trenches". As a counter-example, Google, among other companies, were doing sponsoring (and recruiting!) activities in a separate hall. That worked very well for everyone. Please let's keep it that way.
the on-site technical team wasn't fully prepared to handle presentations made with Open Office. That is not acceptable if you ask me, even if the majority of speakers have a Mac. It's 2011 (2012 now even), so you really need to be prepared to read OpenOffice files. I realize that wasn't Velocity organizers' fault, but I think it's something to consider for next time.

That said, I'm really really happy about my experience at Velocity Europe, both as a speaker and as attendee. It was really awesome, and worth every moment I spent working to prepare for it. Thank you O'Reilly, and I hope to be able to participate again some day :)

How to start up varnish with a custom cc_command on Debian

Leave a reply

If you need to compile your varnish VCL file with custom options, maybe because of libraries like GeoIP, and you're running Debian, you can not use the init script that's shipped by default.

It will not work because of how shell expansion works in the start-stop-daemon command contained in the init script. I wrote my explanation and a proposed fix in much more detail in this stack overflow question:

http://stackoverflow.com/questions/5906603/varnish-daemon-opts-options-errors/8333333#8333333

TL;DR: (+ quick & dirty fix) patch your init script like this:

 start_varnishd() {
     log_daemon_msg "Starting $DESC" "$NAME"
     output=$(/bin/tempfile -s.varnish)
-    if start-stop-daemon 
-       --start --quiet --pidfile ${PIDFILE} --exec ${DAEMON} -- 
-       -P ${PIDFILE} ${DAEMON_OPTS} > ${output} 2>&1; then
+    if bash -c "start-stop-daemon 
+        --start --quiet --pidfile ${PIDFILE} --exec ${DAEMON} -- 
+        -P ${PIDFILE} ${DAEMON_OPTS} > ${output} 2>&1"; then
         log_end_msg 0
     else
         log_end_msg 1
         cat $output
         exit 1
     fi
     rm $output
 }

Let me know if it works for you!

EDIT (7/Mar/2012): bug was filed in Debian as #659005. Nothing happened so far. We'll see.

How to detect the Debian version of a server without logging in

Leave a reply

As Ops team, we're slowly taking over operations for several other teams here at Opera. One of our first tasks is to:

First idea to check whether a server is Debian Lenny or Squeeze was to login and cat /etc/debian_version. However, if you haven't accessed that machine before, and your ssh keys are not there, you can't do that. In our case, we have to file a request for it, and it can take time. Wondering if there was a quicker way, I came up with this trick:

#!/bin/sh
#
# Tells the Debian version reading the OpenSSH banner
# Requires OpenSSH to be running and ssh port to be open.
#
# Usage: $0 <hostname>
#
# Cosimo, 23/11/2011

HOST=$1

if [ "x$HOST" = "x" ]; then
    echo "Usage: $0 <hostname>"
fi

OPENSSH_BANNER=$(echo "n" | nc ${HOST} 22 | head -1)

#echo "OPENSSH_BANNER=$OPENSSH_BANNER"

IS_SQUEEZE=$(echo $OPENSSH_BANNER | egrep '^SSH-.*OpenSSH_5.*Debian-6')
IS_LENNY=$(echo $OPENSSH_BANNER   | egrep '^SSH-.*OpenSSH_5.*Debian-5')
IS_ETCH=$(echo $OPENSSH_BANNER    | egrep '^SSH-.*OpenSSH_4.*Debian-9')

# SSH-2.0-OpenSSH_5.1p1 Debian-5
# SSH-2.0-OpenSSH_4.3p2 Debian-9etch3
# SSH-2.0-OpenSSH_5.5p1 Debian-6+squeeze1

#echo "Squeeze: $IS_SQUEEZE"
#echo "Lenny: $IS_LENNY"
#echo "Etch: $IS_ETCH"

if [ "x$IS_SQUEEZE" != "x" ]; then
    echo "$HOST is Debian 6.x (squeeze)"
    exit 0
fi

if [ "x$IS_LENNY" != "x" ]; then
    echo "$HOST is Debian 5.x (lenny)"
    exit 0
fi

if [ "x$IS_ETCH" != "x" ]; then
    echo "$HOST is Debian 4.x (etch)"
    exit 0
fi

echo "I don't know what $HOST is."
echo "Here's the openssh banner: '$OPENSSH_BANNER'"

exit 1

It reads the OpenSSH server banner to determine the major Debian version (Etch, Lenny, Squeeze). It's really fast, it's very simple and hopefully reliable too. Enjoy. Download from https://gist.github.com/1389206/.

Calling all Mojolicious users: patches welcome?

Leave a reply

So you're using Mojolicious. Good. We started using it too, and it's great. We started having some patches lying around, which wouldn't be integrated into the mainline.

We're starting to reach a critical mass and I have been considering the idea of starting our own Mojolicious "branch". I'd like to know how many of you are in the same situation, and issue a call to action:

if you need or have needed patches to Mojolicious that for whatever reason were not integrated into the official repository, please contact me, leave a comment here or send me an email. I'd like to hear from you!

Kicking Jenkins with monit

Leave a reply

We've been using Jenkins to build and test all our projects for a good part of this year now. I think Jenkins is one of the very few Java projects I've seen and worked with that actually works and it's a real pleasure to use. Except every now and then it seems to crash without reason..

I haven't had time to dig into this problem yet. I've only seen the frontend Apache process logging errors because it cannot connect to the Tomcat backend on port 8080. My theory so far is that Jenkins tries to auto-update and crashes, or maybe there's a runaway test run that brings everything down…

Time is really limited these days, and I have heard good things about monit, I decided to try it to see if we could have Jenkins kicked when it dies for some reason. In this way we can avoid cases where the test suites haven't been running for a day or two and nobody noticed… :-|

So, long story short, here's the quick and dirty monit recipe to kick Apache + Jenkins (this is on Debian Squeeze):

check process jenkins with pidfile /var/run/jenkins/jenkins.pid
  start program = "/etc/init.d/jenkins start" with timeout 90 seconds
  stop program  = "/etc/init.d/jenkins stop"
  if failed host my.host.name port 8080 protocol http
     and request "/"
     then restart

check process apache with pidfile /var/run/apache2.pid
  start program = "/etc/init.d/apache2 start" with timeout 60 seconds
  stop program  = "/etc/init.d/apache2 stop"
  if failed host my.host.name port 80 protocol http
     and request "/"
     then restart

And, just for kicks, a complete Monit module for puppet up on Github. Have fun!