Thursday, October 10, 2013

Making full table scan 10x faster in InnoDB

At MySQL Connect 2013, I talked about how we used MySQL 5.6 at Facebook, and explained some of new features we added to our Facebook MySQL 5.6 source tree. In this post, I'm going to talk about how we made full table scan faster in InnoDB.

Faster full table scan in InnoDB

 In general, almost all queries from applications are using indexes, and reading very few rows (0..1 on primary key lookups and 0..hundreds on range scans). But sometimes we run full table scans. Typical full table scan examples are logical backups (mysqldump) and online schema changes (SELECT ... INTO OUTFILE).

 We take logical backups by mysqldump at Facebook. As you know MySQL offers both physical and logical backup commands/utilities. Logical backup has some advantages against physical backup. For example:
  • Logical backup size is much smaller. 3x-10x size difference is not uncommon.
  • Relatively easier to parse backups. On physical backup, if we can't restore a database by some serious reasons such as checksum failure, it is very difficult to dig into InnoDB internal data structures and fix corruptions. We trust logical backups rather than physical backups
 Major drawbacks of the logical backup are that full backup and full restore are much slower than physical backup/restore.

 The slowness of full logical backup often causes problems. It may take very long time if database size grows a lot and tables are fragmented. At Facebook, we suffered from mysqldump performance problem that we couldn't finish full logical backup within reasonable amount of time on some HDD and Flashcache based servers. We knew that InnoDB wasn't efficient at full table scan because InnoDB did not actually do sequential reads, but did random reads mostly. This was a known issue for long years. As database storage capacity has been growing, the slow full table scan issue has been getting serious to us. So we decided to enhance InnoDB to do faster sequential reads. Finally our Database Engineering team implemented "Logical Readahead" feature in InnoDB. With logical readahead, our full table scan speed improved 9~10 times than before under usual production workloads. Under heavy production workloads, full table scan speed became 15~20 times faster.

Issues of full table scan on large, fragmented tables

 When doing full table scan, InnoDB scans pages and rows by primary key order. This applies to all InnoDB tables, including fragmented tables. If primary key pages (pages where primary keys and rows are stored) are not fragmented, full table scan is pretty fast because reading order is close to physical storage order. This is similar to reading files by OS command (dd/cat/etc) like below.
# dd if=/data/mysql/dbname/large_table.ibd of=/dev/null bs=16k iflag=direct
 You may find that even on commodity HDD servers, you can read more than 100MB/s multiplied by "number of drives" from disks. Over 1GB/s is not uncommon.

 Unfortunately, in many cases primary key pages are fragmented. For example, if you need to manage user_id and object_id mappings, primary key will be (user_id, object_id). Insert ordering does not match with user_id ordering, so new inserts/updates very often cause page splits. New split pages will be allocated at far from current pages. This means pages get fragmented.

 If primary key pages are fragmented, full table scan becomes very slow. Fig 1 illustrates the problem. After InnoDB reads leaf page #3, it has to read page #5230, and after that it has to read page #4. Page #5230 is far from page #3 and #4, so disk read ordering becomes almost random, not sequential. It is very well known that random reads on HDD is much slower than sequential reads. One very effective approach to improve random reads is using SSD, but per-GB cost on SSD is still more expensive than HDD so using SSD is not always possible.

Fig 1. Full table scan is not actually doing sequential reads

Does Linear Read Ahead really help?

 InnoDB supports prefetching feature called "Linear Read Ahead". With linear read ahead, InnoDB reads an extent (64 consecutive pages: 1MB if not compressed) at one time if N pages are accessed sequentially (N can be configured by innodb_read_ahead_threshold parameter, default is 56). But actually this does not help so much. One extent (64 pages) is very small range. For most large fragmented tables, next page does not exist in the same extent. See above fig 1 for example. After reading page#3, InnoDB needs to read page#5230. Page#3 does not exist in the same extent as #5230, so linear read ahead won't help here. This is pretty common case for large fragmented tables. That's why Linear read ahead does not help much to improve full table scan performance.

Optimization approach 1: Physical Read Ahead

 As described above, the problem of slow full table scan was because InnoDB did mostly random reads. To make it faster, making InnoDB do sequential reads was needed. The first approach I came up with was creating an UDF (Use Defined Function) to read ibd file (InnoDB data file) sequentially. After executing the UDF, pages in the ibd file should be within InnoDB buffer pool, so no random read happens when doing full table scan. Here is an example usage.
mysql> SELECT buf_warmup ("db1", "large_table"); /* loading into buf pool */
mysql> SELECT * FROM large_application_table; /* in-memory select */
 buf_warmup() is a udf that reads entire ibd file of database "db1", table "large_table". It takes time to read the entire ibd file from disk, but reads are sequential so much faster than random reads. When I tested, I could get ~5x overall faster time than normal linear read ahead.
 This proved that sequentially reading ibd files helped to improve throughput, but there were a couple of disadvantages:
  • If table size is bigger than InnoDB buffer pool size, this approach does not work
  • Reading entire ibd file means that not only primary key pages, but also secondary index pages and unused pages have to be read and cached into InnoDB buffer pool, even though only primary key pages are needed for full table scan. This is very inefficient if you have many secondary indexes.
  • Applications have to be changed to call UDF.
 This looked "good enough" solution, but our database engineering team came up with a better solution called "Logical Read Ahead", so we didn't choose UDF approach.

Logical Read Ahead

 Logical Read Ahead (LRA) works as below.
  1. Reading many branch pages of the primary key
  2. Collecting leaf page numbers
  3. Reading many (configurable amount of) leaf pages by page number order (mostly sequential disk reads)
  4. Reading rows by primary key order (same as usual full table scan, but buffer pool hit rate should be very high)
 This is illustrated at fig 2 below.

Fig 2: Logical Read Ahead

 Logical Read Ahead (LRA) solved issues of Physical Read Ahead. LRA enables InnoDB to read only primary key pages (not reading seconday index pages), and to prefetch configurable number of pages (not entire table) at one time. And LRA does not require any SQL syntax changes.
 We added two new session variables to make LRA work. One is "innodb_lra_size" to control how many MBs to prefetch leaf pages. The other is "innodb_lra_sleep" session variable to control how many milliseconds to sleep per prefetch. We tested around 512MB ~ 4096MB prefetch size and 50 milliseconds sleep, and so far we haven't encountered any serious (such as crash/stall/inconsistency) problem. These session variables should be set only when doing full table scan. In our case, mysqldump and some utility scripts (i.e. online schema change tool) turn logical read ahead on.

Submitting multiple async I/O requests at once

 Another performance issue we noticed was that i/o unit size in InnoDB was only one page, even if doing readahead. 16KB i/o unit size is too small for sequential reads, and much less efficient than larger I/O unit size.
 In 5.6, InnoDB uses Linux Native I/O (aio) by default. If submitting multiple consecutive 16KB read requests at once, Linux internally can merge requests and reads can be done efficiently. But unfortunately InnoDB submitted only one page i/o request at once. I filed a bug report #68659. As written in the bug report, on modern HDD RAID 1+0 environment, I could get more than 1000MB/s disk reads by submitting 64 consecutive pages requests at once, while I got only 160MB/s disk reads by submitting one page request.
 To make Logical Read Ahead work better on our environments, we fixed this issue. On our MySQL, InnoDB submits many more page i/o requests before calling io_submit().


In both cases, our production tables (fragmented tables) were used.

1. Full table scan on pure HDD (basic benchmark, no other workload)
Table sizeWithout LRAWith LRAImprovement
10GB10 min 55.00 sec1 min 15.20 sec8.71x
50GB52 min 44.42 sec6 min 36.98 sec7.97x

2. Online schema change under heavy workload
Table sizeWithout LRAWith LRAImprovement
1GB7 min 33 sec24 sec18.8x
* dump time only, not counting data loading time

Source code

 All of our enhancements are available at GitHub.
 - Logical read ahead implementation : diff
 - Submitting multiple i/o requests at once : diff
 - Enabling logical read ahead on mysqldump : diff



 InnoDB was not efficient for full table scan, so we fixed it. We did two enhancements, one was implementing logical read ahead, the other was submitting multiple async read i/o requests at once. We have seen 8 to 18 times performance improvements on our production tables, and this has been very helpful to reduce our backup time, schema change time, etc. I hope this feature will be supported in InnoDB by Oracle officially, or at least by major MySQL fork products.


Clay Ferguson said...

Dude did you just implement a Bayer Tree in database tables ?

sunbains said...

The second diff doesn't look right, see:

+ errno = -total_submitted;

Earlier in the code you have:

+ total_submitted += io_submit(array->aio_ctx[i], count,
+ &(array->pending[iocb_index]));

What if io_submit() fails after the 1st iteration?

Sunny Bains said...

In the third diff you have:

+ os_aio_linux_dispatch_read_array_submit();

What happens if LINUX_NATIV_AIO is not defined? Did I miss the IO call?

Rongrong Zhong said...

Sunny: You are right, it won't set the errno correctly if io_submit fail in later iterations. Thanks for pointing that out!

Tom K said...

Great article! Really impressed with Sunny's comment too.

Tim Callaghan said...

Yoshinori: How does this improvement in full table scan performance for your mysqldumps affect the other running workload on the server? I assume that bringing the data 18x faster into the InnoDB cache would have a measurable impact.

Justin Swanhart said...

Tim that is what innodb_old_blocks_time and innodb_old_blocks_pct are for. They are meant to prevent FTS from "blowing out" good blocks in the BP.

Justin Swanhart said...

That being said, buffer pool pinning is probably something else that should be added. This would keep some objects in the BP no matter what, like important lookup tables.

Tim Callaghan said...

Justin, fair enough. But I was hoping to see some actual evidence in either direction from Yoshinori. I don't trust anything without measurement.

Yoshinori Matsunobu said...

Tim, we set innodb_old_blocks_time and innodb_old_blocks_pct so that pages read by backups are quickly evicted.

Another important point I should note is that we use flashcache, and we do some optimizations here. In general, cache miss in InnoDB is not so much painful if cache hits in flashcache, because flash is fast. It's very painful if cache misses in flashcache, because hdd is slow. On backups, when reading ibd file from hdd (cache miss), our mysqld does not write to flash, but directly caches into buffer pool. This means existing cached pages in flash(flashcache) are not evicted. This is more important optimization for us.

Tim Callaghan said...

Thanks for the explanation. Well done on this clever optimization!

Rock said...

I Agree with this post, we have best option to buy twitter followers. get more twitter followers cheap from here.
purchase twitter followers

Web Designing Bangalore said...
This comment has been removed by the author.
Web Designing Bangalore said...

Never the less ,you efforts are very marvelous , wonderful. Thanks for useful information.
Website Development Company in Bangalore

zinavo shankar said...

Thanks for sharing the information with us.Its really good information.
Web Designing Bangalore

Fred Duffler said...

I'd like to thank you for tables and figures. It was very useful paper for me indeed.

Manoranjan Sahoo said...

Nice tutorial, thanks for share this post. I hope peoples are really help after reading this blog.
Website designing company India

Manoj Kusshwaha said...

Nice article got a good knowledge you have

Matthew Corgan said...

Have you looked into the possibility of actually defragmenting the table, either in background threads or during writes?

rakesh said...

Great Information,thanks a lot for sharing this interesting article,really its very impressive..
web designing company

Sudha S said...

very informative article, thank you for sharing this great article...


Sagheer said...

Online Jobs of Data Entry, Copy Pasting, Add Posting, Clicking, Web Surfing, Website Visiting, Article Sharing, Data Sharing, Google Business Plans, Investment Plans, Genuine earnings from home.

Sagheer said...

All Kind of Current Affairs, Latest Hot News updates, Classified Sites, Pakistani Classified Sites, USA Classifieds, Indian Classifieds, Entertainment Articles, Entertainment News, Entertainment Pictures, Bollywood, Hollywood and Lollywood Pictures and Videos, Entertainment Latest updates, Hot Entertainment News and Pictures Funny Entertainment Pictures, lol Pictures, Funny Pictures and Much More Fun Only on 1 Current Affairs Network

Faheem Zia said...

For All Latest Hot Current Affairs

Web Design Company said...

Informative blog post. Website is the key to present business in internet.

Web Designing Companies in Bangalore | Web Development Company in Bangalore

chaithanya m said...
This comment has been removed by the author.
chaithanya m said...

B school in Bangalore enriches the reserved as entire widening of the learners by portion them put up the speculative acquaintance learnt in class into a waged environment

chaithanya m said...

Sherwani for men in Bangalore is grouping of royal sophistication and perfect style with clean cuts, fine craftsmanship and elaborate thread work.

sherwani for men in bangalore

chaithanya m said...

Mind Bird Solutions is a proven website design and Development company that is customer centric, is focused on long term stability, and has a leadership team of seasoned web services professionals..
web application development bangalore

chaithanya m said...

B school in Bangalore is sponsoring by IZee Education Trust. B school in Bangalore is assisted by IZee Manpower in India's Leading acceptance And Working out association........
b school in bangalore

Ahsan Afsar said...

World Most Popular and Top Amazing Speedy Cars

muqtiyar ahmed said...

associated mostly with weddings ................

Sherwanis in Bangalore

muqtiyar ahmed said...
This comment has been removed by the author.
muqtiyar ahmed said...

guarantee that web application

web design bangalore

muqtiyar ahmed said...

will grow up the leaps and bounds

muqtiyar ahmed said...

very imperative that the clothes are easy enough to be worn and provide highlighting with the most elegant and comfortable form of wedding clothing..............

chaithanya m said...

Upgrade consumers with inexperienced help every step of the way from design and to development………….
Website design Bangalore

chaithanya m said...

to satisfy client with a creative website to meet his business goals
Web design Bangalore

chaithanya m said...

empower the online presence of a particular product or service in more identifiable way
Web designing Bangalore

chaithanya m said...

Creating affordable and professional website designs
Website designing Bangalore

chaithanya m said...

Achieve an online presence that delivers the very best
Web application development

chaithanya m said...

ease of use of Realtycompass.
Real estate Bangalore

Ahsan Afsar said...

Online Jobs of Data Entry and Copy Pasting with Jobz Corner

muqtiyar ahmed said...

Nitesh Chelsea Grand entrance gate........

Nitesh Chelsea

muqtiyar ahmed said...

Top mba in bangalore entrance examination.............

Top Mba in Bangalore

muqtiyar ahmed said...

Bring the immediacy in career prospects to students……
Top mba in Bangalore

muqtiyar ahmed said...

Provide decorate and fluting is kind you look very stylish and graceful….
Online sherwani buy

muqtiyar ahmed said...

From fabrics ranging from brocade to terry wool…
Online shopping for marriage sherwani

muqtiyar ahmed said...

Provide highlighting with the most elegant and comfortable form of wedding clothing…
Sherwani for men

muqtiyar ahmed said...

Organism a home landlord is one of the most looked-for dreams of waged professionals….
Prestige kew garden

muqtiyar ahmed said...

Give a solid foundation, hassle-free implementation, and provide maximal extensibility.
Prestige kew garden

Ahsan Afsar said...

Find best business plan from home without any work, just invest $1 and get 120% Total Profit within a week

peter anderson said...

Thank you so much for giving us such kind of handy content which will be most useful to me as well as others.SEO services San Francisco

peter anderson said...

Visit for best web design, web development , SEO services and internet marketing services. more SEO services San Francisco

Minh Vương said...

Công ty vệ sinh 5 Sạch
Dịch vụ giặt thảm tại Hà Nội
Dịch vụ giặt ghế sofa tại Hà Nội
Vệ sinh nhà sau xây dựng

Madhu said...

Nice blog and thanks for sharing the information.

Prestige boulevard bangalore

Prestige kew gardens

Salarpuria east crest

Goyal orchid greens

Bhartiya city nikoo homes

Prestige boulevard bangalore

Sobha dream acres

Shapoorji pallonji parkwest

prestige kew gardens

prestige song of the south

prestige lakeside habitat

provident kenworth

Machine Tools Emart said...
This comment has been removed by the author.
Machine Tools Emart said...

industrial Product online
palani panchamirtham online

Naveen Chaudhary said...

valentine wish
valentine quotes for girlfriend
valentine day wishes for girlfriend
valentine day poems for girlfriend

Grayax Cyber said...

I Love Your Website, I Would Appreciate It For A Good Read This!!!




Godrej Avenues said...

Nice blog.Very informative.Thanks for sharing with us.
Godrej Avenues
Godrej Avenues Bangalore

Prestige Kew Gardens said...

Nice blog. Very Informative. Thanks for sharing with us. Let me share some pre launch projects

Prestige Kew Gardens
Prestige Kew Gardens Bangalore

Unknown said...

Awesome Blog. Informative Post.
Godrej Avenues

jualobatkuatcialisasli45. said...

Vimax Obat Pembesar Penis Permanen
Ciri-Ciri Vimax Asli Canada
Jual Vimax Asli
Vimax Original
Vigrx Plus Asli
Cobra Oil Super
Minyak Lintah Papua
Vakum Pembesar Penis
Obat Kuat Procomil
Obat Kuat Cialis
Obat Kuat Nangen Zengzhangsu
Obat Perangsang Potenzol Asli
Obat Perangsang Sex Drop
Obat Perangsang Viagra Cair
Dildo Dua Kepala
Vibrator Peluru
Kondom Duri Sungut Lele
Ring Getar Butterfly
Boneka Full Body Blonde Semi Realistis
Boneka Sex Pantat Doggy Style
Vagina Getar Klasik 16 Cm
Vagina Getar Suara
FleshLight Vagina Getar
Vibrator Kupu-Kupu Triple Spot
Dildo Vibrator Sakky Anti Air
Vibrator Fairy Magic
Vibrator Kapsul Love Egg
Dildo Getar Goyang Tempel

Ashoka Liviano said...

Great and Informative Blog.Thanks for sharing with us.
Ashoka Liviano

Sobha Gateway Of Dreams said...

very useful information about DB.
Sobha Gateway Of Dreams

Prestige Boulevard said...

we want more information.
Prestige Boulevard

Brigade Buena Vista said...

we want more information on DB
Brigade Buena Vista

Imgkt Solutions said...

Nice blog. Thanks for share this info.

Website designing and development in Canada | USA | UK

Skylark Royaume said...

Luxury apartments in Bangalore.
Skylark royaume


Apartments in hyderabad.

Pushpa Anjali said...

Apartments in Bangalore
Prestige Kew Gardens

Pushpa Anjali said...

Great Article
Assetz Marq

Haris Siddiqui said...

Ecommerce website design companies is turning into a main business as web shopping is an across the board. Individuals think that its more helpful to do internet shopping as opposed to heading off to the shopping centers.

Pushpa Anjali said...

It’s really a nice and helpful piece of information. I’m glad that you shared this helpful info with us. Please keep us informed like this.
Assetz Marq
Prestige Kew Gardens

Access1solution said...

Thanks for such a knowledgeable post.
We are the Best Web Design Company in Bangalore / Web Development in Bangalore

Khalid Baid said...

Handbag & Clutches For Hot Girls
Front Open Double Shirt
Fashion Gallery Lehenga Choli
Stylo Best Mehndi Designs
Latest Sherwani Designs
Bridal Jewellery Set
Zara Shahjahan Eid Dresses
Mehndi Patterns for EID
Sophia Tolli Collection
Earrings In Gold Collection
Actress Maya Ali – Fashion Collection
Bridal Gowns Collection
classic lawn suits
mix eid dresses
midsummer kurta
anarkali suits

Khalid Baid said...

lehnga choli dresses
bridal makeup
ball garments
babydoll night wear dresses
Jewelry Women Wear
Saheli Couture By Preity Zinta Dresses
Parties Hairstyle
Zainab Chottani Pretty Suits
Fancy Lawn Clothes
Nail Designs For UK Girls
Girls Footwear Selection
Pakistani Lehenga Clothes

Pavitraa Fashion said...

I like your blog, mind power use and developing wed designing on table scan faster. Very beautiful designer clothing Bollywood Sarees !!

Pushpa Anjali said...

It’s really a nice and helpful piece of information. I’m glad that you shared this helpful info with us. Please keep us informed like this.
G Corp Icon Bangalore
Assetz Marq Bangalore

Sare Homes said...

How to create a table in DB.
Sare Homes Chennai

rakesh said...

good information
website design in bangalore
website design in bangalore
web design bangalore
website design bangalore web design in bangalore

Pushpa Anjali said...

It’s really a nice and helpful piece of information. I’m glad that you shared this helpful info with us. Please keep us informed like this.
Assetz Marq

Anusha Gona said...

Nice Blog...

Bhartiya City Nikoo Homes Thanisandra Bangalore
LnT Raintree Boulevard Hebbal bangalore

Vertex Panache

Anusha Gona said...

Nice Blog...

Bhartiya City Nikoo Homes Thanisandra Bangalore
LnT Raintree Boulevard Hebbal bangalore

Vertex Panache

Mai Doan said...

This topic is interesting. This information is very useful to me. I hope there will be many articles or like this article. Thank you for sharing
The head soccer synthetic head soccer unblocked games famous. on head soccer you could be playing head soccer unblocked , Big Head Soccer or Big Head Football . Big Head Soccer Big Head Football are games is our top. Rejoice with Big Head Soccer Big Head Football Big Head Soccer Big Head Football offline

Anusha Gona said...

Its Really great blog...thanks for sharing
Salarpuria Sattva Laurel Heights

Salarpuria Sattva Divinity

Anusha Gona said...

Its Great Blog...
Salarpuria Sattva Laurel Heights
Salarpuria Sattva Divinity

cap vision said...

its Awesome blog Web design

Bangalore Web Guru said...

very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.
Web Designing Company Bangalore

Godrej Eternity said...

Oracle DB is Better.
Godrej Eternity Bangalore

Pushpa Anjali said...

Great Blog
Pacifica Hill Crest

Pushpa Anjali said...

Your blog is very informative. Nice post, thanks for sharing with us. Check out our latest pre launch projects.

Prestige Song Of The South
Assetz Marq Bangalore
Prestige Lake Ridge Bangalore
Prestige Lakeside Habitat Bangalore
Goyal Orchid Whitefield

Post a Comment