Monday, October 24, 2011

Making slave pre-fetching work better with SSD

In the recent few weeks I have spent some time for creating yet another slave prefetching tool named "Replication Booster (for MySQL)", written in C/C++ and using Binlog API. Now I'm happy to say that I have released an initial version at GitHub repository.

The objective of Replication Booster is same as mk-slave-prefetch: avoiding or reducing replication delay under disk i/o bound workloads. This is done by prefetching relay logs events, converting to SELECT, and executing SELECT before SQL Thread executes the events. Then SQL thread can be much faster because target blocks are already cached.

On my benchmarking environment Replication Booster works pretty well.

On HDD bound benchmarks(update by pk), SQL thread's qps was:
- Normal Slave (without prefetch): 400 update/s
- With prefetch: 1,500 update/s

On SSD (SAS SSD) bound benchmarks(update by pk), SQL thread's qps was:
- Normal Slave: 1,780 update/s
- With prefetch: 5,400 update/s

It is great that slave could handle many more updates per second without replication delay on disk i/o bound benchmarks. It works very well on SSD, too. The below is a graph that shows Seconds_Behind_Master is gradually decreasing by using Replication Booster.



In this benchmark I executed 4,000 updates per second on master. On the slave server, by default slave delayed continuously because the slave could handle only 1,779 updates per second. When starting using Replication Booster on the slave, the slave could execute 5,418 updates per second. This was higher than the master's qps so Seconds_Behind_Master gradually decreased. After the slave caught up with the master, the slave could execute as same volume of updates as the master (4,000 update/s), so no replication delay happened. This means on this environment we can raise maximum update traffics many more (1,780 update/s -> 4,000-5,400 update/s) without investing for new H/W.

I also tested on some of our production slaves (not used for services) and it showed good results, too. I could get 30-300% improvements, depending on cache hit ratio. If data was fully cached, of course I didn't get any benefit, but it didn't cause negative impacts either.

I hope this tool is interesting to you.

In this blog post, I'll explain backgrounds for developing this tool and basic design. I believe many more optimizations can be done in future. Your feedbacks are welcome.


Good concurrency, bad single threaded performance


I mentioned at Percona Live London that using SSD on slaves is a good practice to reduce replication delay, and SATA/SAS SSD is practical enough because unit price is much cheaper than PCI-E SSD, and SATA/SAS SSD shows not bad concurrency with many drives when using recent RAID controller(most applications actually do not need 30,000-50,000 read iops, even though running many MySQL instances on the same machine). It is certainly an advantage that many SATA/SAS drives (6-10) can be installed on 1U box.

The biggest thing I'm concerned about using SATA/SAS SSD is single thread read iops. You can get only 2,000 read iops from SATA/SAS SSD with RAID controller. If you do not use RAID controller, it is not impossible to get 3,000+ read iops, but this is still much lower than using PCI-Express SSD. You can get 10,000 signle thread read iops from PCI-Express SSD.

When using SATA/SAS SSD, it is easy to predict that slave delays much earlier than using PCI-E SSD. Especially if running multiple MySQL instances per single server, innodb_buffer_pool_size has to be small (i.e. 4GB-12GB), so lots of disk reads will happen. By using 6-10 SATA/SAS drives, maximum throughput can be competitive enough against PCI-Express SSD, but single thread read iops is not improved. This is an issue.

"Slave prefetching" is a well known, great approach to make SQL Thread faster.

What is slave prefetching?


The concept of "slave prefetching" is (I think) well known, but I briefly describe here in case you don't know..

SQL Thread is single threaded. When SQL thread has to do lots of disk i/o by itself, replication is easily delayed. In almost all cases of slave lagging, I/O thread has received all binary log events (and saved as relay logs), but SQL thread delays execution due to massive random disk i/o. So there are many relay log events from SQL thread's current position (Relay_Log_Pos) to the end of relay log (EOF of relay logs).

Random disk reads happen when target blocks(records/indexes) are not cached. If they are cached, random reads won't happen. If you can cache all entries before SQL Thread executes, SQL thread does not have to do random disk reads. Then SQL thread can be much faster.

How can you do that? Read relay logs before SQL Thread executes, covert DML statements (especially UPDATE) to SELECT, then execute SELECT on the slave in parallel.

I believe this concept was introduced to MySQL community by Paul Tackfield at YouTube 4-5 years ago. mk-slave-prefetch is an open source implementation.

Desire for C/C++ based, raw relay log event hanlding tool


At first I tested mk-slave-prefetch on my benchmarks. But as far as I tested, unfortunately it didn't work as I expected. I think the main reasons are as below:

* mk-slave-prefetch uses mysqlbinlog to parse relay logs. But mysqlbinlog is not as flexible and fast as reading raw relay log events. For example, mysqlbinlog output events have to go through the file to the main prefetching program. mysqlbinlog is an external command line tool, so the main prefetching program has to fork a new process to run mysqlbinlog, which opens and closes relay logs every time.

* mk-slave-prefetch is written in Perl. In general, a prefetching tool has to be fast enough to read, convert and execute SELECT statements before SQL thread executes. The tool has to be multi-threaded. The tool probably has to run on the same machine as MySQL slave, in order to minimize network overheads. The resource consumption (CPU and memory) should be small enough so that it doesn't hurt MySQL server performance.
I don't believe Perl is a good programming language for developing such a tool.


I believe C/C++ is the best for programming language for this purpose. And I believe handling raw relay log events is much more efficient than using mysqlbinlog.

Based on the above reasons, I decided to develop a new slave prefeching tool by myself. I had some experiences for parsing binary/relay logs when developing MHA, so at first I planned to create a simple relay log parser program. But immediately I changed my mind, and tried mysql-replication-listener (Binlog API). Binlog API is a newly released utility tool from Oracle MySQL team. Binlog API has a "file driver" interface, which enables to parse binary log or relay log file and handle events one by one. By using Binlog API, handling raw binlog events becomes much easier. For example, it's easy to parse binlog events, get updated entries, store to external software such as Lucene/Hadoop, etc.

Oracle says Binlog API is pre-alpha. But as far as I have tested for slave prefetching purpose, it works very well. It's fast enough, and I didn't encounter any crashing or memory leak issues. So I decided to develop a new slave prefetching tool using Binlog API.

Introduction to Replication Booster for MySQL


I named the new slave prefetching tool as "Replication Booster". Keywords "slave" and "prefetch" were already used by mk-slave-prefetch, so I used different words.

The below figure is a basic architecture of Replication Booster.


Design notes

- Replication Booster is a separated tool (runs as a MySQL client). It works with normal MySQL 5.0/5.1/5.5/5.6. Starting/stopping Replication Booster is possible without doing anything on MySQL server side.

- Replication Booster is written in C/C++, and using boost::regex for converting UPDATE/DELETE to SELECT. Binlog API also uses boost.

- Using Binlog API to parse relay logs, not using mysqlbinlog
  - Using file driver, not tcp driver. file driver does not connect to MySQL server, and just reading relay log files. So even if file driver has bugs, impacts are rather limited (If it has memory leak, that's serious, but I haven't encountered yet).

- Main thread parses relay log events, picking up query log events, passing to internal FIFO queues
  - Binlog API has an interface to get a binlog event header (event type, timestamp, server-id, etc) and an event body. So it is easy to pick up only query log events.
  - Parsing row based events is not supported yet. It should be worth implementing in the near future.

- Multiple worker threads pop query events from queues, and convert query events to SELECT statements

- A dedicated thread (monitoring thread) keeps track of current SQL Thread's position (Relay_Log_Pos)

- Worker threads do not execute a SELECT statement if the query's position is behind current SQL Thread's position. This is because it's not needed (too late).

- Main thread stops reading relay log events if the event's timestamp is N seconds (default 3) ahead of SQL Thread's timestamp
  - This is for cache efficiency. If reading too many events than needed, it causes negative impacts. In the worst case cache entries that SQL thread needs now are wiped out by newly selected blocks.

- When slave is not delayed, Replication Booster should not cause negative impacts. It shouldn't use noticeable CPU/Disk/Memory resources. It shouldn't prevent MySQL server activities by holding locks, either. Of course, it shouldn't execute converted SELECT statements because they are not useful anymore. The last one is not easy to work on various kinds of environments (i.e. HDD/SSD/etc), but should be controllable by some external configuration parameters

- Bugs on Replication Booster should not result in MySQL server outage.

- Replication Booster works locally. It doesn't allow to connect to remote MySQL servers. This is for performance reasons. Executing tens of thousands of queries per second from this tool remotely will cause massive fcntl() contentions and use high network resources (both bandwidth and CPU time). I don't like that.

Configuration Parameters

--threads: Number of worker threads. Each worker thread converts binlog events and executes SELECT statements. Default is 10 (threads).

--offset-events: Number of binlog events that main thread (relay log reader thread) skips initially when reading relay logs. This number should be high when you have faster storage devices such as SSD. Default is 500 (events).

--seconds-prefetch: Main thread stops reading relay log events when the event's timestamp is --seconds-prefetch seconds ahead of current SQL thread's timestamp. After that the main thread starts reading relay logs from SQL threads's position again. If this value is too high, worker threads will execute many more SELECT statements than necessary. Default value is 3 (seconds).

--millis-sleep: If --seconds-prefetch condition is met, main thread sleeps --millis-sleep milliseconds before starting reading relay log. Default is 10 milliseconds.

- MySQL connection parameters: MySQL slave user, password, socket file or local ip/port

How to verify Replication Booster works on your environments

You may want to run Replication Booster where Seconds_Behind_Master is sometimes growing. If Replication Booster works as expected, you can get the following benefits.

  - Seconds_Behind_Master gets decreased, or growth rate of Seconds_Behind_Master has decreased
  - Update speed has improved (i.e. Com_update per second has increased) by this tool

Replication Booster has some statistics variables, and prints these statistics when terminating the script (Ctrl+C) like below. If slave delays but "Executed SELECT queries" is almost zero, something is wrong.
Running duration:    847.846 seconds
Statistics:
Parsed binlog events: 60851473
Skipped binlog events by offset: 8542280
Unrelated binlog events: 17444340
Queries discarded in front: 17431937
Queries pushed to workers: 17431572
Queries popped by workers: 5851025
Old queries popped by workers: 3076
Queries discarded by workers: 0
Queries converted to select: 5847949
Executed SELECT queries: 5847949
Error SELECT queries: 0
Number of times to read relay log limit: 1344
Number of times to reach end of relay log: 261838

I haven't spent so much time on this project yet (just started a few weeks ago). Current algorithm is simple. I believe many more optimizations can be done in future, but even so current benchmark numbers are pretty good. I hope we can use this tool on many places where we want to avoid replication delay but don't want to spend too much money for faster storage devices.

38 comments:

Robert Hodges said...
This comment has been removed by the author.
Robert Hodges said...

Hi Yoshinori, thanks for an excellent article and nice performance tests. Your work on replication is very impressive. I am planning on putting a similar feature into Tungsten Replicator. p.s., Sorry for the deleted comment--it was missing a couple of words.

Matty said...

Fantastic stuff Yoshinori - great tool, and demonstration of what can be accomplished with the Binlog API

Laine said...

Look forward to testing this out, thank you for the contribution Yoshinori. Enjoyed your work at Percona Live also.

Baron said...

I fully agree that Perl is the wrong language, which is why I have deprecated mk-slave-prefetch. I'm glad you chose different words for the tool's name, too. I am trying to deprecate the word "slave" and use "replica" instead.

Runescape Gold said...

I recently came across your article and read together. I wish to express my admiration for your writing skills and ability to make readers read from beginning to end. I want to read the recent messages and share my thoughts with you.

RS Gold
Buy Runescape Gold

Buy Tera Gold said...

I was thwarted in to fool different types are great World Of Warcraft Gold for one's dark blue,Most economical Howler Golden wolf land is obviously pressure on material cover using Occur is perfect for walking the dog; pitch-dark, creature farming is actually casual lot hatred, considering shadow priests employ a impregnable type of rib: Mana Burn,Cheapest WOW Gold wings could be unagitated anytime curl afterwards blue target;Hunting watch,Aion Amber in person recall every time a hunter scrimmage damage, so really don't appeal the really deplorable that your chosen chosen mathematical group flake place (the a longer period A hunter will not face police squad fighting),Buying WOW Gold peculiarly non hooked up pve huntsman Movement gift arrived at recognise standard candle Punk WOW Gilded European union smoke wrong is a wonderful substitute.

myplay said...

We are therefore delighted this particular internet factor works plus your post actually solved the problem. Usually takes an individual through to in which home suggestions anyone
Buy RS Gold

RS Gold said...

It all looks so great!http://twitter.com/#!/RSGoldfzf

simple wedding dresses said...

Thank you for sharing this information. The information was very help and saved a lot of my time.

Never bigger surprise out of that Amber’s tea length wedding dresses. We are dedicated to maintain the golden balance between quality and prices of our products lace wedding dresses. Please feel free to contact us in case of any detailed information on your desired objects. Customer and service, our first and foremost.

ambersbridal said...

Thank you for sharing this information. The information was very help and saved a lot of my time.lace wedding gowns http://www.ambersbridal.com/wedding-dresses/lace-wedding-dresses.html
cheap wedding dresses plus size http://www.ambersbridal.com/tag/cheap+wedding+dresses+plus+size
wedding dresses 2012 http://www.ambersbridal.com/wedding-dresses.html
lace wedding dress http://www.ambersbridal.com/wedding-dresses/lace-wedding-dresses.html

Cheap RS Gold said...

The post is very nicely written and yes it contains many useful facts. We are pleased to find your distinguished way of writing the post. So you allow me to learn and implement. Thank you sharing with us.
http://wowgoldb2c225.blogspot.com/2012/01/show-featuring-latest-eco-fashion-is.html
http://wowgoldb2c225.eblog.cz/measure-the-fashion-show-models-to-ensure-the-clothing-you-find-can-be-worn
http://wowgoldb2c225.dtiblog.com/blog-entry-12.html
http://wowgold2u.xtrablog.dk/the-elizabethan-era-was-an-important-period-of-time-post271203
http://uggal.blogage.de/entries/2012/1/12/The-sponsor-and-clothing-liaison-solicits-vendors
http://uggal.bloguez.com/uggal/3910591/Find-fashion-gems-in-bins-and-on-the-shelves-of-these-stores-to-save-money

Gabe said...

Yoshinori,

Does this work with ROW based replication?

Thanks

Anonymous said...

Yoshinor, great stuff, thanks for sharing it. You mentioned that it works with normal MySQL 5, is it possible to make it work with Percona Server?

Zegaara said...

I was actually looking for this resource a few weeks back. Thanks for sharing with us your wisdom.This will absolutely going to help me in my projects .

Classic Dresses
Classic Bridesmaid Dresses
Wedding Dresses with Sleeves
Flower Girl Dresses

Empire Wedding Dresses

adultfriendfinder.com review said...

I would like to see you implement many new web services that will follow the DLF Discovery Interface scheme.

巨人倍増 said...

漢方精力剤
威哥王 通販
三便宝カプセル
中華牛鞭
中国精力剤
消渇丸

cheap cambridge satchel said...

But now, the Cambridge Satchel has become a fashion trend, many people all want to own cheap cambridge satchel. My friends all say that can bring them to their beautiful school years.



Cambridge Satchels are so natural and life, because the woman life is wonderful and bumpy, women's life will always come up with bag, and the bag in a lot of moment life and also represents the woman's life

AlexK said...

A good informative post about paracord bracelet on http://eparacord.com/pages/paracordbracelet that you have shared and thankful your work for sharing the information. Got some entertaining information and would like to give it a try. Appreciate your work and keep sharing your information, paracord!

hugginss jetterees said...

You shrug and create alongside a great deal of lobsters and also a legislation rune though taking part in Runescape?- in spite of everything buy rs gold, you can just teleport back instantaneously if a thing goes mistaken in Runescape. You have to amount 39 Wilderness in Runescape, destroy a couple of giants in Runescape, 314 close to in Runescape.

Emily said...

There is ridiculously cheap wow gold high demand for rare and epic items and no one quite has enough time to build up the bank to buy them. So, with so many players in the game and so much demand for items in a limited space, players can charge as much gold as they want. For those that would rather enjoy the game and not spend their time farming gold, it might seem easier to simply buy 100 gold offline and use it to buy what you need wow power leveling in gold. May be you have spend a lot of time building up your account. Also the theft is a major risk. Most of all, you're risking the compromise of the entire game by buying gold from farmers that destroy the economy by overharvesting.Once your account have been deleted both time and money are not valuable. And you are losing a lot of your game life. If you want to continue playing the World of Warcraft game, you should have to buy a new account. Not only that, but you're giving your credit card number wow gold tips away to an unknown entity who will supposedly deliver gold to you in game.

GuildWars2Items said...

And if you find such a friend, you feel happy and complete, because you need not worry Cheap Diablo iii Gold, Diablo iii Gold Sale your forever friend holds your hand and tells you that everything is going to be okay Diablo 3 Gold.

GuildWars2Items said...

sale guild wars 2 gold experience is a hard teacher because she gives the test first, the lesson afterwards, guild wars 2 gold sell ability may get you to the top, but it takes character to keep you there, sell guild wars 2 gold.

GuildWars2Items said...

The truth is that we've been given the power to choose love and joy in our lives no matter what happens to us Sell WOW Gold, No one has ever been or will ever be strong enough to take our freedom away from us Sell WOW Gold they never once stand up and take responsibility for their own lives and their own happiness Sell WOW Gold.

buyGuildwars2gold said...

When you're down, and the world seems dark and empty, your forever friend lifts you up in spirits and makes that dark and empty world suddenly seem bright and full Diablo iii Gold Sale, this is Forever Friendship Diablo III Gold, someone who convinces you that there really is an unlocked door just waiting for you to open it Diablo 3 Gold.

buyGuildwars2gold said...

Being happy doesn't mean that everything is perfect. It means that you've decided to look beyond the imperfections cheap c9 gold, don't go around saying the world owes you a living. The world owes you nothing. It was here first c9 money, I have a simple philosophy: Fill what's empty. Empty what's full. Scratch where it itches c9 gold.

Jojo said...

The last WOW Alliance member is Night Elves, the oldest race cheap guild wars 2 gold in Azeroth. They were the first race to discover and use magic.

Cahaya Mandiri said...

It is amazing posting and incredible work, It has suitable information, I presently wanted to say that you have really so motivating and very informative post. Thanks for wonderful posting
tips cara agar cepat hamil l CARA BELAJAR BAHASA INGGRIS l the best acne treatment l how to lose weight fast easy
margahayuland l BELAJAR BISNIS ONLINE l tips cepat hamil l how to get rid of acne home remedies l
home remedies for acne l how to cure acne fast l
baju batik modern l toko sepatu online l grosir jam tangan online l
jual jam tangan l toko jam tangan murah peluang usaha online l is acne no more for you l how to get rid of acne naturally
how to clear acne l cure acne naturally
best natural remedies for acne l acne no more l tempat belajar bisnis online
peluang usaha rumahan l cara mendapatkan uang dari internet
makanan sehat agar cepat hamil l penyebab tidak bisa hamil lcara agar cepat hamil


Lisa Parkar said...

This article gives good ideas and concepts. This was very interesting to read such a Great blog post buy coffee table online

Lisa Parkar said...

This piece of writing gives a lot of exceptional information and inspiration. Really a remarkable blog post I had seen discount dining room sets

wow9gamer said...

It is great that slave could handle many more updates per second without replication delay on disk i/o bound benchmarks.Cheap LOL Account

elo boost

jimmy wilson said...

Fantastic submit I really like your content; enjoy how we described everything, your are performing a terrific employment the majority of others just like you as a result of which form of helpful weblogs produce awareness in order to people relevant to several things. My spouse and i understand some other interesting weblogs from a web sites along with We are a whole lot curious with all your running a blog knowledge, My partner and i also began to produce article this also form conversation really aid us away. My spouse and i by now added your web page in addition to propagated your current internet sites to our friends not merely me however them all like your blogging skills, hope an individual produce more fascinating blogs this way a single and also best of luck for the potential sites.

Drive Varsity Jacket

Jess said...

Low level players will buy ffxiv gil defeated by the high level monsters. You will get killed if you are fighting with the monsters with higher level than you. You should pay special attention to the runescape 2007 gold aggressive monsters when you are mining or grinding levels. It sounds good to get a strategy guide for your character. You should get better strategies and kill more monsters. Spare no effort to defeat the monsters. Killing monsters can bring you experience points as well as runescape gold.

sadia sulaman said...

Excellent read, Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style.crystalapartment |

neveschambers |

aikosolutions |

lewishickey |

stadiumcreative |

salsagrande |

butlerbag |

thewatchavenue |

artofbrand |

bowdennursery |

Elena Erin said...

This programming enables the students to perform their operations effectively. Students visits essayyard to prepare their projects.

stewart louis said...

The scholarship is intended to recognize Blackburn organization Students and ... Blackburn student and Graduate Fellows to continue in helping make the campus a place . Journal Paper Writing Tips

John Barclay said...

That's amazing and very interesting tool you have providing. I really happy to see your blog and thanks a lot for sharing nice thought. Good work!
accounting assignment help || operating system assignment help || rapidassignments

blognya fariz said...

The scholarship is intended to recognize Blackburn organization Students and ... Blackburn student and Graduate Fellows to continue in helping make the campus a place

obat penyakit sipilis pada pria | kutil di vagina | obat keluar nanah pada kelamin | obat keluar nanah dari penis | obat dari penis mengeluarkan nanah | pengobatan kutil kelamin pada wanita | nama obat kutil kelamin di apotik | obat menghilangkan kutil di kemaluan | pengobatan kutil kelamin pada wanita | nama obat kutil kelamin di apotik | obat menghilangkan kutil di kemaluan | pengobatan kutil kelamin pada wanita | nama obat kutil kelamin di apotik

Post a Comment