replication Archives - Random hacking

It’s been a geological age since my last blog post!

Oh, so many things happened in the meantime. For the past four years, I worked on the development and operations side of the news recommendation system that powered Opera Discover. With enough energy, I have planned to write a recommender systems “primer” series. We’ll see.

Meanwhile, I’d like to keep these notes here. They’ve been useful to make MySQL replication recover gracefully from network instability, abrupt disconnections and generally datacenter failures. Here they are.

Coming MySQL 5.6 on Debian Wheezy, we began to experience mysql replication breakages after abrupt shutdowns or sudden machine crashes. When systems came back up, more frequently then not, mysql replication would stop due to corrupted slave relay logs.

I started investigating this problem and soon found documentation and blog posts describing the log corruption issues and how mysql development addressed that. Here’s the pages I used as references:

Additionally, we had (I believe unrelated) problems with some mysql meta tables that couldn’t be queried, even though they were listed as existing in the mysql shell and in the filesystem.
We solved this problem with the following steps:

DROP TABLE innodb_table_stats; ALTER TABLE innodb_table_stats DISCARD TABLESPACE; stop mysql rm -rf /var/lib/mysql/mysql/innodb_table_stats.* restart mysql

These steps have to be executed in this order, even if altering a table after having dropped it may seem nonsensical. It is nonsensical, as sometimes mysql things are.

Crash safe replication settings

We’ve distilled a set of standalone replication settings that will provide years and years of unlimited crash-safe replication fun (maybe). Here they are:

# More resilient slave crash recovery master-info-repository = TABLE relay-log-info-repository = TABLE relay-log-recovery = ON sync-master-info = 1 sync-relay-log-info = 1

Let’s see what each of these settings does.

master-info-repository=TABLE and relay-log-info-repository=TABLE instruct mysql to store master and relay log information into the mysql database rather than in separated *.info files in the /var/lib/mysql folder.
This is important because in case of crashes, we would like to ensure that master/relay log information is subject to the same ACID properties that the database itself provides. Corollary: make sure the relevant meta tables have InnoDB as storage engine.
For example, a SHOW CREATE TABLE slave_master_info should say Engine=InnoDB.

relay-log-recovery=ON is critical in case of corruption of relay log files on a slave system. When MySQL encounters corrupted relay log files during startup, by default it will drop the ball and halt. This option set to ON, will cause mysql to attempt refetching the relay log files from the master database. The master should then be configured to keep its binlogs for a suitable amount of time (often I use 2 weeks, but really depends on the volume of database changes). As a test, it’s possible to replace the current relay log file with a corrupted copy (from /dev/urandom for example). MySQL will discard the corrupted log file and attempt download from the master, after which a regular startup will be carried out. Fully automatic recovery!

sync-master-info=1 and sync-relay-log-info=1 enable the synchronized commit of both master and relay log information to the database with every transaction commit. This is again something that must be evaluated in each single application. Most probably if you have a high volume of writes, you don’t want to enable it. However, if the writes rate is low enough, this option won’t cost any additional performance and should instead make sure that the slave_master_info and slave_relay_log_info tables are always consistent with the state of the replication and of the rest of the database.

That is all. I’d love to hear any feedback or corrections to this information.

Here's how to failover from primary to secondary master.
This was written following the My Opera case, and we use MySQL, but should be fairly generic.

Disable monitoring checks

Pause any Pingdom checks that are running for the affected systems
Set downtime or disable notifications in Nagios for the affected systems

Record log file and position

Assuming your secondary master (DB2) is idle, then now you have to record the log file and position by issuing a SHOW MASTER STATUS command:


mysql> SHOW MASTER STATUS G
*************************** 1. row ***************************
            File: mysql-bin.000024    <- MASTER LOG FILE
        Position: 91074774            <- MASTER LOG POSITION
    Binlog_Do_DB:
Binlog_Ignore_DB:

1 row in set (0.00 sec)

Write them down somewhere.
If you need to perform any kind of write/alter query on this host, then you have to issue the show master status command again, because position will change.

Also try repeating this command. You should see that the log file and position do not change between different runs.

Enable maintenance mode

Now is the time to enable your maintenance or downtime mode for the site or service. That will of course depend on your deployment tools.

Stop backend servers

Your backend/application servers might need to stay up and running. For example, in case of the Auth service, we want this, because we're going to serve static responses (html, xml, etc…) to the clients instead of just letting the connections hang.

In other cases, it's fine to just shut down the backends. You may want to do this for 2 reasons:

to make sure that nobody is accidentally hitting your master database, from your internal network or otherwise
because doing so should close all the connections to your master database. This is actually depending on the wait_timeout variable in the mysql server. The connections won't go away until wait_timeout seconds have passed. This is the normal behaviour, so don't panic if you still see connections after you shut down the backends.

Switch to the new master now

This depends on how you actually perform the switch. I can imagine at least 2 ways to do this:

by instructing LVS to direct all connections to the secondary master
take over the IP address either manually or using keepalived

On My Opera, we use keepalived with a private group between the two master database servers, so it's just a matter of:

stopping keepalived on the primary master database
starting keepalived on the secondary master database

There is a quick and dirty bash script that allows to verify who's the master and makes the switch.

#!/bin/sh

DB1=pri-master-hostname
DB2=sec-master-hostname

function toggle_keepalive() {
        host=$1
        if [[ `ssh $host pidof keepalived` == "" ]]; then
                ssh $host /etc/init.d/keepalived start
                if [[ `ssh $host pidof keepalived` == "" ]]; then
                        echo '*** KEEPALIVE START FAILED ***'
                        echo 'Aborting the master failover procedure'
                        exit
                fi
        else
                ssh $host /etc/init.d/keepalived stop
                if [[ `ssh $host pidof keepalived` != "" ]]; then
                        echo '*** KEEPALIVE STOP FAILED ***'
                        echo 'Aborting the master failover procedure'
                        exit
                fi
        fi
}

echo "Master Database failover"
echo

# Find out who's the primary master now, and swap them
if [[ `ssh $DB1 pidof keepalived` == "" ]]; then
        PRIMARY=$DB2
        SECONDARY=$DB1
else
        PRIMARY=$DB1
        SECONDARY=$DB2
fi

echo Primary is $PRIMARY
echo Secondary is $SECONDARY

# Shutdown primary first, then enable secondary
toggle_keepalive $PRIMARY
toggle_keepalive $SECONDARY

As soon as you do that, the secondary master will be promoted to primary master.
Since they are assumed to be already replicating from each other, nothing will change for them. It will however for all the slaves that were replicating from the primary master. We'll see what to do about that later.

Restart backend servers

Now it's the right time to restart the backend servers, and check that they correctly connect to the new primary master.

On My Opera, we're using a virtual address, w-mlb (write-mysql-load-balancer), to refer to the active primary master database. We use this name in the configuration files everywhere.

This means that we don't have to change anything in the backend servers configuration. We just restart them, and they will connect to the new primary master, due to the IP takeover step described above.

Turn off maintenance mode

If the backends are working correctly, they're connecting to the new master db, it's time to remove the maintenance page, so do that.

We're enabling and disabling maintenance mode by enabling and disabling a virtual host configuration in our frontends and reloading or restarting the frontend httpd servers.

From now on, your application is hopefully up and running and receiving client requests, so your downtime window is over.

Check replication lag

The database slaves at this point are still replicating from the former primary master database (DB1).

But DB1 now is not receiving any traffic (queries) anymore, so it's basically idle, and it should be. Any queries happening on DB1 now mean that something is seriously wrong. There might be lingering connections, but no activity.

Then it's important that all the slaves show no replication lag, so issuing a SHOW SLAVE STATUS command should show zero seconds behind.

mysql> SHOW SLAVE STATUS G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: <DB1-ip>
                Master_Port: <port>
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000025
        Read_Master_Log_Pos: 13691126
...
      Seconds_Behind_Master: 0

1 row in set (0.00 sec)

It's important that Seconds Behind Master is zero.
If it's not, it means that the slave needs more time to fully replicate all the past traffic that had been going on on the former primary master, DB1.

Remember that the primary master is now DB2, while DB1 is the secondary master.

Change master on the slaves

Now you can perform the CHANGE MASTER TO command on all the slaves.

Now you have to bring back the notes about MASTER LOG FILE and MASTER LOG POSITION.

First, stop the slave replication.

mysql> STOP SLAVE;

Then the exact command to issue, if nothing else about your replication changed, is:

mysql> CHANGE MASTER TO MASTER_HOST='<DB2-ip>', MASTER_LOG_FILE='<master_log_file>', MASTER_LOG_POSITION='<master_log_position>';

Then restart the slave replication:

mysql> START SLAVE;
mysql> SHOW SLAVE STATUS G

The following SHOW SLAVE STATUS G command should show the replication running, and, depending on how long it took you to change master since the new master took over the ip, the number of seconds of replication lag.

This number should rapidly go down towards zero.

If it's not, then you might have a problem. Go hide now or take the first flight to Australia or something.

We wrote a switch-master Perl script that proved to be very effective and useful. Example:

./switch-master --host <your_slave> --new-master <new_master_ip> --log-file <master_log_file> --log-pos <master_log_position>

This script performs a lot of sanity checks. Before switching master, it checks that replication lag is zero. If it's not, waits a bit and checks again, etc…

It's made to try to prevent disaster from striking. Very useful and quick to use.

Enable monitoring checks

Now verify that everything looks fine, replication lag is zero, your backends are working correctly, try to use your site a bit.

If everything's fine, enable or unpause the monitoring checks.
You have made it!

Random hacking

Assume nothing. Code defensively. Keep it simple, stupid!

Tag Archives: replication

MySQL (Percona XtraDB) slave replication crash resilience settings