Titles in this page

Monday, January 26, 2015

Performance issues and fixes -- MySQL 5.6 Semi-Synchrnous Replication

 Long time ago I wrote a blog post that we started using Semi-Synchronous replication in production at Facebook. We are running our own MySQL facebook-5.6 branch, which added many replication (including Semisync) patches. Some of them were also merged with WebScaleSQL.

 Recently I heard from people from community that they were suffering from Semisync performance issues in official MySQL 5.6. I took some time to review our previous MySQL bug reports, then realized that some important bugs were either still "verified" or inefficiently fixed. Two most affecting bug reports were https://bugs.mysql.com/bug.php?id=70342 and http://bugs.mysql.com/bug.php?id=70669. We fixed both at our branch so I haven't paid much attention after that, but people outside Facebook are certainly affected.

 In this post, I'm going to describe some effective configurations to get better Semisync throughput on master and slaves, by showing simple benchmark numbers. I used three machines -- client and master and semisync slave --, all running on pure flash. They are located within very close distance. I created 100 databases and enabled Multi-Threaded-Slave, and ran 100 mysqlslap processes for 100 databases, with 30 concurrent connections each (3000 concurrent connections in total). All queries were auto-committed inserts and I used InnoDB storage engine on both master and slaves.

1. Set master_info_repository=TABLE
 MySQL 5.6 and 5.7 have a performance bug that writing FILE based master info (and relay log info) files are very expensive. This is especially serious for Semisync replication, since this bug slows down IO thread. On Semisync, slow IO threads takes longer time to send ACK back to the master, so it slows down master throughput as well. Default master_info_repository is FILE, so without changing this parameter to TABLE, you are affected by this bug.
 Here are benchmark results between FILE and TABLE.

5.6.22 master_info_repository=FILE :  5870/s
5.6.22 master_info_repository=TABLE:  7365/s

 These numbers were the number of commits per second on both master and slave instances. Slave didn't lag for most experiments, thanks to multi-threaded slave feature.
 Please don't confuse between master_info_repository and relay_log_info_repository parameters. relay_log_info_repository has to be TABLE, otherwise crash safe slave doesn't work. master_info_repository works on both FILE and TABLE, but I suggest to use TABLE for performance reasons.

2. Reduce durability on master
 Older 5.6 had a bug that slaves couldn't continue replication after crashed master's recovery, even if setting fully durable configurations. Here is a closed bug report.
 This bug report was closed, but it caused some performance regression. Master extended LOCK_log mutex holding duration -- releasing LOCK_log mutex after calling fsync(). This certainly fixed the bug, but caused performance regression because LOCK_log was very hot mutex in 5.6 -- both Binlog Dump thread and application threads need to hold the lock. Hopefully reducing durability (I mean setting sync_binlog=0 and innodb_flush_log_at_trx_commit=0|2) mitigates the regression a lot. When using Semisync replication, you are most certainly to promote a slave on master failure, so durability on master does not matter much == You can reduce durability on master.

5.6.22 master_info_repository=TABLE, full durability:  7365/s
5.6.22 master_info_repository=TABLE, less durability:  9800/s

3. Loss Less Semisync
 MySQL 5.7 improved replication performance a lot. Probably the most effective improvement was that Binlog Dump thread no longer held LOCK_log mutex. In addition to that, 5.7 introduced "Loss-Less Semisync" feature.
 If you read my previous Semisync blog post carefully, you may have noticed that 1. we backported "Loss-Less Semisync" from 5.7, and 2. we got better throughput with Loss-Less Semisync than Normal Semisync. This was because Loss-Less Semisync actually reduced mutex contentions -- LOCK_commit and LOCK_binlog_. My 5.7 benchmark result was as follows.

5.7 Normal Semisync:    12302/s
5.7 Loss Less Semisync: 14500/s
(master_info_repository=TABLE, less durable)

 Reducing LOCK_log contention on Binlog Dump thread and introducing Loss Less Semisync were major contributors in MySQL 5.7 performance improvements. At Facebook, we ported both in our 5.6 branch. It would be interesting for community if these can be available on other distributions, since using 5.7 in production will not happen anytime soon.

In addition to the above three configurations, there are still some considerations to make Semisync throughput not bad / better.

4. Semisync mysqlbinlog
 At Facebook, we implemented mysqlbinlog to speak Semisync protocol, and used it as a Semisync replication reader. On replication slaves, IO thread and SQL thread conflict with internal mutexes. As I mentioned above, slow IO thread slows down Semisync master throughput. Semisync mysqlbinlog doesn't have such slowdown, because it doesn't have SQL thread. So using Semisync mysqlbinlog instead of Semisync slave can improve master throughput.

5.7 Loss Less Semisync + Semisync slave: 14500/s (on both master and slave)
5.7 Loss Less Semisync + Semisync mysqlbinlog + async slave: 48814/s on master, 15363/s on slave

 This shows Semisync mysqlbinlog improved master throughput. But this is actually not so good news -- because slave lags a lot. Fundamentally we need to fix mutex contentions between IO thread and SQL threads.

 There are some open performance bugs in GTID. Especially 5.7 one is serious. If you really need high throughput Semisync, you need to carefully benchmark with GTID (and ask Oracle to fix!).

Here are whole table definitions and mysqlslap commands I used for benchmark.
for i in `seq 1 100`
mysql -e "drop database test$i"
mysql -e "create database test$i"
mysql test$i -e "create table t1 (id int auto_increment primary key, \
value int, value2 varchar(100)) engine=innodb"

for i in `seq 1 100` do mysqlslap --host=master --query="insert into test$i.t1 \ values (null, 1, 'abcdefghijklmn')" --number-of-queries=100000 --concurrency=30 & done