Tag Archives: mysql

MySQL (Percona XtraDB) slave replication crash resilience settings

It’s been a geological age since my last blog post!

Oh, so many things happened in the meantime. For the past four years, I worked on the development and operations side of the news recommendation system that powered Opera Discover. With enough energy, I have planned to write a recommender systems “primer” series. We’ll see.

Meanwhile, I’d like to keep these notes here. They’ve been useful to make MySQL replication recover gracefully from network instability, abrupt disconnections and generally datacenter failures. Here they are.

Coming MySQL 5.6 on Debian Wheezy, we began to experience mysql replication breakages after abrupt shutdowns or sudden machine crashes. When systems came back up, more frequently then not, mysql replication would stop due to corrupted slave relay logs.

I started investigating this problem and soon found documentation and blog posts describing the log corruption issues and how mysql development addressed that. Here’s the pages I used as references:

Additionally, we had (I believe unrelated) problems with some mysql meta tables that couldn’t be queried, even though they were listed as existing in the mysql shell and in the filesystem.
We solved this problem with the following steps:

DROP TABLE innodb_table_stats;
ALTER TABLE innodb_table_stats DISCARD TABLESPACE;
stop mysql
rm -rf /var/lib/mysql/mysql/innodb_table_stats.*
restart mysql

These steps have to be executed in this order, even if altering a table after having dropped it may seem nonsensical. It is nonsensical, as sometimes mysql things are.

Crash safe replication settings

We’ve distilled a set of standalone replication settings that will provide years and years of unlimited crash-safe replication fun (maybe). Here they are:

# More resilient slave crash recovery
master-info-repository = TABLE
relay-log-info-repository = TABLE
relay-log-recovery = ON
sync-master-info = 1
sync-relay-log-info = 1

Let’s see what each of these settings does.

master-info-repository=TABLE and relay-log-info-repository=TABLE instruct mysql to store master and relay log information into the mysql database rather than in separated *.info files in the /var/lib/mysql folder.
This is important because in case of crashes, we would like to ensure that master/relay log information is subject to the same ACID properties that the database itself provides. Corollary: make sure the relevant meta tables have InnoDB as storage engine.
For example, a SHOW CREATE TABLE slave_master_info should say Engine=InnoDB.

relay-log-recovery=ON is critical in case of corruption of relay log files on a slave system. When MySQL encounters corrupted relay log files during startup, by default it will drop the ball and halt. This option set to ON, will cause mysql to attempt refetching the relay log files from the master database. The master should then be configured to keep its binlogs for a suitable amount of time (often I use 2 weeks, but really depends on the volume of database changes). As a test, it’s possible to replace the current relay log file with a corrupted copy (from /dev/urandom for example). MySQL will discard the corrupted log file and attempt download from the master, after which a regular startup will be carried out. Fully automatic recovery!

sync-master-info=1 and sync-relay-log-info=1 enable the synchronized commit of both master and relay log information to the database with every transaction commit. This is again something that must be evaluated in each single application. Most probably if you have a high volume of writes, you don’t want to enable it. However, if the writes rate is low enough, this option won’t cost any additional performance and should instead make sure that the slave_master_info and slave_relay_log_info tables are always consistent with the state of the replication and of the rest of the database.

That is all. I’d love to hear any feedback or corrections to this information.

Verifying MySQL behaviour with automated test suites and mytap

You know everything about how MySQL treats UTF8 and LATIN1 charsets and how the collation table impacts on selection and insertion of data, right?

Great, then stop reading :)

I don't and since I'm in the process of setting up a new version of the Opera accounts database, I really don't want to screw up things. I tried to fully understand how MySQL works in this respect (charsets, collations, etc…) but reading documentation and memorizing it wasn't very easy. Plus, there's a thousands blog posts on the matter, not always 100% accurate.

So I thought I'd better get hands on and I wrote a kind of database test suite.

Now this test suite is hooked up to the main project builds on Jenkins. Here's a sample output:


[...]
[workspace] $ /bin/sh -xe /tmp/hudson3255767718598715423.sh
+ ./bin/run-dbtest-suite
basedir=/var/lib/jenkins/jobs/auth-db/workspace
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/__initdb__.my ........................................... 
1..2
ok 1 - Using utf8tests database
ok 2 - Server charset is latin1
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_bin.my ................................... 
1..6
ok 1 - All our records are there. No duplicate key error.
ok 2 - utf8_bin collation does not collate a/â/à/A/...
ok 3 - utf8_bin collation does not collate a/â/à/A/...
ok 4 - utf8_bin collation does not collate a/â/à/A/...
ok 5 - Query for mixed-case username does not return lowercase username
ok 6 - Query for upper-case username does not return lowercase username
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_general_ci.my ............................ 
1..7
ok 1 - Collation for t007 is utf8_general_ci
ok 2 - utf8_general_ci collation normalizes accents, diacritics and the like
ok 3 - A and Å are collated to the same character in the utf8_general_ci table
ok 4 - å and Å are collated to the same character in the utf8_general_ci table
ok 5 - lower/upper case chars are collated in the utf8_general_ci table
ok 6 - lower/upper case chars are collated in the utf8_general_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_unicode_ci.my ............................ 
1..7
ok 1 - Collation for t005 is utf8_unicode_ci
ok 2 - utf8_unicode_ci collation normalizes accents, diacritics and the like
ok 3 - A and Å are collated to the same character in the utf8_unicode_ci table
ok 4 - å and Å are collated to the same character in the utf8_unicode_ci table
ok 5 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 6 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/default-table-charset.my ................................ 
1..3
ok 1 - Default character set is utf8 when no charset is specified (from server)
ok 2 - Default character set is utf8 when "CHARSET utf8" specified in the CREATE TABLE
ok 3 - Default character set is utf8 when "CHARSET utf8" and "COLLATE" specified in the CREATE TABLE
ok
...
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/username-with-utf8-chars.my ............................. 
1..5
ok 1 - We have some UTF-8 encoded string in our hands (hex)
ok 2 - We have some UTF-8 encoded string in our hands (charset)
ok 3 - Can select back UTF-8 content from a CHARSET utf8 table
ok 4 - Given string is exactly 24 bytes long (length)
ok 5 - Given string is exactly 8 (wide) characters long (char_length)
ok
All tests successful.
Files=11, Tests=80, 0.739731 wallclock secs ( 0.05 usr  0.02 sys +  0.07 cusr  0.01 csys =  0.15 CPU)
Result: PASS
Recording test results
Finished: SUCCESS

And here's an example of "sanity check" test case, which doesn't do much:


   1 -- Check that we can insert and retrieve UTF-8 content correctly
   2 
   3 BEGIN;
   4 
   5 SET NAMES utf8;
   6 
   7 SELECT tap.plan(5);
   8 
   9 USE auth_utf8tests;
  10 
  11 SET @username = '今日话题今日话题';
  12 SET @encoded  = 'C1BB8AE697A5E8AF9DE9A298E4BB8AE697A5E8AF9DE9A298';
  13 
  14 SELECT tap.eq(
  15     HEX(@username),
  16     @encoded,
  17     'We have some UTF-8 encoded string in our hands (hex)'
  18 );
  19 
  20 SELECT tap.eq(
  21     CHARSET(@username),
  22     'utf8',
  23     'We have some UTF-8 encoded string in our hands (charset)'
  24 );
  25 
  26 INSERT INTO t001 (f1) VALUES (@username);
  27 
  28 SELECT tap.eq(
  29     (SELECT HEX(f1) FROM t001 WHERE f1 = @username),
  30     @encoded,
  31     'Can select back UTF-8 content from a CHARSET utf8 table'
  32 );
  33 
  34 SELECT tap.eq(
  35     (SELECT LENGTH(f1) FROM t001 WHERE f1 = @username),
  36     24,
  37     'Given string is exactly 24 bytes long (length)'
  38 );
  39 
  40 SELECT tap.eq(
  41     (SELECT CHAR_LENGTH(f1) FROM t001 WHERE f1 = @username),
  42     8,
  43     'Given string is exactly 8 (wide) characters long (char_length)'
  44 );
  45 
  46 -- Finish the tests and clean up.
  47 CALL tap.finish();
  48 ROLLBACK;

This SQL test code uses mytap. You can see how the SELECT tap.* calls are just the equivalents of the TAP testing framework of Perl. SELECT tap.eq() is the equivalent of Test::More::is(), and so on.

Another, more interesting test case, is the following:


   1 --
   2 -- Verify how the utf8_unicode_ci collation works
   3 --
   4 
   5 BEGIN;
   6 
   7 SET NAMES utf8;
   8 
   9 SELECT tap.plan(12);
  10 
  11 USE auth_utf8tests;
  12 

  [...]

  40 SELECT tap.eq(
  41     (SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA=SCHEMA() AND TABLE_NAME='t015'),
  42     'utf8_unicode_ci',
  43     'Collation for t015 is utf8_unicode_ci'
  44 );

  [...]

  48 
  49 SELECT tap.eq(
  50     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1a' ORDER BY id),
  51     '10',
  52     'utf8_unicode_ci collation normalizes accents, diacritics and the like'
  53 );
  54 
  55 SELECT tap.eq(
  56     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1Å' ORDER BY id),
  57     '10',
  58     'A and Å are collated to the same character in the utf8_unicode_ci table'
  59 );
  60 
  61 SELECT tap.eq(
  62     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1å' ORDER BY id),
  63     '10',
  64     'å and Å are collated to the same character in the utf8_unicode_ci table'
  65 );
  66 
  67 SELECT tap.eq(
  68     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TestUser1A' ORDER BY id),
  69     '10',
  70     'lower/upper case chars are collated in the utf8_unicode_ci table'
  71 );
  72 
  73 SELECT tap.eq(
  74     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TESTUSER1A' ORDER BY id),
  75     '10',
  76     'lower/upper case chars are collated in the utf8_unicode_ci table'
  77 );
  78 
  79 SELECT tap.eq(
  80     (SELECT COUNT(*) FROM t015),
  81     1,
  82     'We are allowed to insert only 1 record, because the others collate to the same string'
  83 );
  84 
  85 -- Finish the tests and clean up.
  86 CALL tap.finish();
  87 ROLLBACK;

An interesting thing that I didn't know how to do in the beginning is how to trap errors. I left out that part from the test code to simplify, but here it is:


  13 DELIMITER //
  14 
  15 DROP PROCEDURE IF EXISTS populate_table //
  16 
  17 CREATE PROCEDURE populate_table ()
  18 BEGIN
  19 
  20     DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' BEGIN
  21         SELECT tap.ok(
  22             1,
  23             'We should get dupkey errors when inserting data with collation utf8_unicode_ci'
  24         );
  25     END;
  26 
  27     INSERT INTO t015 (id,username,note) VALUES (10, 'testuser1a', 'plain');
  28     INSERT INTO t015 (id,username,note) VALUES (20, 'testuser1â', 'circumflex a');
  29     INSERT INTO t015 (id,username,note) VALUES (30, 'testuser1à', 'a grave');
  30     INSERT INTO t015 (id,username,note) VALUES (40, 'testuser1Å', 'A circ');
  31     INSERT INTO t015 (id,username,note) VALUES (50, 'TestUser1A', 'mixed case');
  32     INSERT INTO t015 (id,username,note) VALUES (60, 'TESTUSER1A', 'upper case');
  33 
  34 END;
  35 
  36 //
  37 
  38 DELIMITER ;
  39 
  46 /* Should generate 5 dupkey errors (taken as successful tests) */
  47 CALL populate_table;

It's a bit convoluted. To trap errors you have use the DECLARE HANDLER statement. DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' means that whenever SQLSTATE is '23000', and that corresponds to a duplicate key error, then execute this block of code. All of that must necessarily be wrapped into a stored procedure. Handlers outside of stored procedures are not allowed.

In this particular tests, the table uses the utf8_unicode_ci collation table, so we are expecting a duplicate key error on username whenever we insert the string 'testuser1à' or 'TESTUSER1A', because 'testuser1a' was already inserted at the beginning. Of all the INSERT statements, only the first one is bound to succeed, so I put a SELECT tap.ok(1) for the duplicate key HANDLER and I expect 5 tests when I make the CALL populate_table;.

This of course may seem trivial. And I guess it is, but for me it's a much better way of learning than scouring through the manuals or the many blog posts out there that may or may not reflect the environment I'm working with.

Routinely running this kind of test suite makes it possible and easy to verify the database behaviour:

  • instantly
  • after upgrades (5.1 -> 5.5? -> 6?) or storage engine changes
  • after mysql configuration changes. For example, I discovered in this way that adding default-charset=utf8 in my MySQL config breaks everything.

I consider this my live documentation on how MySQL works. I would really appreciate if you have any feedback on this. Have fun!

Self-contained instant functional test suites

I first learned about functional test suites when I started working on the My Opera code. Two years and some time later, I found myself slightly hating the My Opera functional test suite.

The main reasons:

  • It's too slow. Currently, it takes anywhere between 20 and 30 minutes to complete. Sure, it's thousands of tests, divided into hundreds of test scripts, grouped by functional area, like login, blogs, albums, etc…

    Even considered all of this, I think we should aim to have a single functional test run complete in 5-10 minutes.

    Most of the time is being wasted in the communication with the test server and creation and destruction of the database.

  • It's unreliable, sometimes cumbersome to manage. In our setup, CruiseControl fires a functional test suite run after every commit into the source code repository.

    That initiates an ssh connection to the test server, where a shell script takes care of restarting the running apache, dropping and creating the test database, updating required packages, etc…

    This implies that you need an apache instance running on a specific port, a mysql instance running on another port, etc… In the long run, this proved to be an approach that doesn't scale very well. It's too error prone, and not very reliable. Maybe your test run will fail because mysql has mysteriously crashed, or some other random bad thing.

A functional test suite that sucks less

Given the motivation, I came up with this idea of a functional test suite that is:

  1. instant: check out the source code from the repository, and you're ready to go.
  2. self-contained: it shouldn't have any external servers that need to be managed or even running.
  3. reusable: it doesn't have to be a functional test suite. The same concept can be reused for unit test suites, or anything really.

A few months later the first idea, and a couple of weeks of work on it, and we were ready with the first prototype. Here's how we use it:

  • Check out the source code from the repository in ~/src/myproject
  • cd ~/src/myproject
  • ./bin/run-functional-test-suite
  • Private instances of Apache, MySQL, and the main application are created and started
  • A custom WWW::Mechanizer-based client runs the functional test cases against the Apache instance
  • A TAP stream from the test run is produced and collected
  • All custom instances are destroyed
  • ???
  • Profit!

The temporary test run directory is left there untouched, so you can inspect it in case of problems. This is priceless, because that folder contains all the configuration and log files that the test run generated.

The main ideas

  • The entire functional test suite should run within a unique temporary directory created on the fly. You can run many instances as you want, with your username or a different one. They won't conflict with each other.
  • Use as much as possible the same configuration files as the other environments. If you already have development, staging and production, then functional-test is just another one of your environments. If possible, avoid creating special config files just for the functional test suite.
  • The Apache, MySQL, and Application configuration files are simple templates where you need to fill in the apache hostname, apache port, mysql port, username, password, etc…
  • Apache is started up from the temporary directory using the full command invocation, as:

    /usr/sbin/apache2 -k start -f /tmp/{project}_{user}_testsuite_{pid}/conf/apache.conf
    

    and all paths in config files are absolute, to avoid any relative references that would lead to files not found.
    The configuration template files have to take this into account, and put a $prefix style variable
    everywhere, like (Apache config, for example):

    Timeout 10
    Keepalive On
    HostnameLookups Off
    ...
    Listen [% suite.apache.port %]
    ...
    

  • The MySQL instances are created and destroyed from scratch for every test suite run, using the amazing MySQL::Sandbox. My colleague Terje had to patch the sandbox creation script to fix a problem with 64-bit environments. We filed a bug on the MySQL::Sandbox RT queue about this. This is the only small problem found in a really excellent Perl tool. If you haven't looked at it, do it now.

All in all, I'm really satisfied of the result. We applied it to the Auth project for now, which is smaller than My Opera, and we fine-tuned the wrapping shell script to compensate for the underlyiung distribution automatically. We've been able to run this functional test suite on Ubuntu from version 8.04 to 10.04, and on Debian Lenny, with no modifications. The required MySQL version is detected according to your system, and MySQL::Sandbox is instructed accordingly.

It also results in a faster execution of the entire functional test suite.

What can we improve?

There's lots of things to improve:

  • Speed. Instead of creating and destroying the database for every test group, it would be nice to use transactions, and rollback everything at the end of the test group (t/{group-name}/*.t). I used to do this many years ago with Postgres, and it was perfectly safe. I'm not sure we will ever make it with MySQL, since for example, ALTER TABLE (and other SQL statements) completely ignores transactions.
  • Reliability. The private instances of Apache, MySQL (and recently we added memcached too), are started up from ports that are calculated from the originating bash script PID. So, if you run the test suite from the bash script running as PID 8029, then MySQL is started up on port 8030, Apache on 8031, memcached on 8032. This has worked very well for now, and we made sure we don't use ports < 1024, but could lead to mysterious failures if some local services are using ports like 5432, or others. The idea is that we could test if a port is available before using it. However, this error is already detected at startup time, so it shouldn't be a huge problem.

Feedback welcome!

MySQL alter table in a transaction

I had a quite painful experience yesterday, I would say…

We were about to run live database migrations, when I decided to try it first in a staging database server, to have a more precise idea of how long they would take. So I connected to the MySQL test system:

mysql> use myopera
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> BEGIN WORK;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLE blah ADD COLUMN xxx ...
...

The ALTER operation completed in a minute or so. Then I wanted to revert it, so I typed:

mysql> ROLLBACK;

So the transaction was rolled back. Just to double check, I verified that the table was indeed left in its original state, and to my surprise, it wasn't. Yes, the table is a InnoDB table, and yes, I had put the ALTER statement in a transaction.

Even after a ROLLBACK the table is left altered… Isn't that surprising? For me, it was. Of course, it is a documented MySQL behavior, and it's filed as bug/feature request.

The MySQL Sandbox

I learned about the MySQL Sandbox last year in Lisbon for the European Perl conference. I remember I talked about it with Giuseppe Maxia, the original author. I promised him to try it.

I wasn't really convinced about it until the first time I tried it.

That was when on My Opera we switched from master-slave to master-master replication. It was during a weekend last November. I was a bit scared of doing the switch. Then I remembered about MySQL Sandbox, and I tried it on an old laptop.

I was absolutely amazed to discover that I could simulate pretty much any setup, from simple to complex, master-slave, master-master, circular multi-master, etc…
It was also very quick to setup, and it's also very fast to install new sandboxed servers.
You can setup a 5 servers replication setup in less than 30 seconds.

MySQL Sandbox is a Perl tool that you can easily install from CPAN with:

$ sudo cpan MySQL-Sandbox

You get the make_sandbox command that allows you to create new sandboxes. Right now I'm trying it again for a maintainance operation we have to do on My Opera soon. I'm using the master-master setup like this:

make_replication_sandbox --master_master --server-version=/home/cosimo/mysql/mysql-5.0.51a-....tar.gz

so I can simulate the entire operation and try to minimize the risk of messing up the production boxes while also getting a bit more confident about these procedures.

MySQL Sandbox, try it. You won't go back :-)

MySQL is just crazy or what?

Ok, I was working on the staging/test database, checking out some performance problems I was experiencing.
Then I thought of optimizing one table. Optimizing should be like rebuilding the physical data file to compact record structures, remove invalidated data, …

MySQL has this OPTIMIZE TABLE command.
Ok, fire!

2/3 minutes passed. Then other 2/3. Then again.

I started to worry and tried to start new sessions, to see if table was blocked. It was.
Completely locked up.

Ok, I left the session running. Went to lunch. Came back.
Still running. WTF? I decided to interrupt the command. CTRL + C.

MySQL responded with some error code, 22 or something.
Ok, just check if I have to rebuild indexes, just in case.

SELECT * FROM sometable

Boom!

Table corrupted. No kidding. Tried to REPAIR TABLE and REPAIR TABLE QUICK.
No way. Table was completely DESTROYED.

Amazing…