Wednesday, May 27, 2009

Overwriting is much faster than appending

Writing small volume of data (Bytes-MBs) with sync (fsync()/fdatasync()/O_SYNC/O_DSYNC) is very common for RDBMS and is needed to guarantee durability. For transactional log files, sync happens per commit. For data files, sync happens at checkpoint etc. Typically RDBMS does syncing data very frequently. In this case, overwriting is much faster than appending for most filesystems/storages. Overwriting does not change file size, while appending does. Increasing file size requires a lot of overheads such as allocating space within the filesystem, updating & flushing metadata. This really matters when you writes data with fsync() very frequently. The following are simple benchmarking results on ext3 RHEL5.3.


1. creating an empty file, then writing 8KB 128*1024 times with fdatasync()
fdatasync per second: 3085.94321
(Emulating current binlog (sync-binlog=1) behavior)

2. creating a 1GB data file, then writing 8KB 128*1024 times with fdatasync()
fdatasync per second: 12330.47268
(Emulating current InnoDB log file behavior)

3. zero-filling 1GB, then writing 8KB 128*1024 times with fdatasync() immediately
fdatasync per second: 6569.00072
(Zero-filling causes massive disk writes so killing application performance)

4. zero-filling 1GB, sleeping 20 seconds, then writing 8KB 128*1024 times with fdatasync()
fdatasync per second: 11669.81157
(Zero-filling finished within 20 seconds. This is actually does the same thing with no.2)


Apparently no.2(and no.4) are much faster than no.1 and no.3. The difference between no.1 and no.2 is just appending or overwriting. Four times difference is amazing but this is real and I got similar results on other filesystems except zfs (I tested xfs, reiserfs and zfs). (Updated in May 28: I got about 7,000 fsync/sec for both appending/overwriting on zfs. There is no outstanding difference because zfs is Copy On Write filesystem as comment #3)

This is one of the reason why sync-binlog=1 is very slow. Binlog is appended, not overwritten. Default sync-binlog value is 0 (no sync happens at commit) so appending does not cause serious performance drop. But there are cases that sync-binlog=1 is absolutely needed. I am currently directly working on this and implementing "preallocating binlog" functionality.

The difference between no.3 and no.4 is also interesting. Overwriting requires preallocation : allocating space before writing. If preallocation happens *dynamically during heavy loads* (no.3), application performance is seriously degraded.

No.3 is close to current InnoDB system tablespace (ibdata) mechanism. InnoDB extends tablespace size by innodb_autoextend_increment MBs dynamically, then doing overwriting. But as you see above, dynamically preallocating with zero is not good for performance.

Using posix_fallocate() instead of zero-filling will fix this issue. posix_fallocate() preallocates space without any overhead. Unfortunately currently most of enterprise Linux distributions/filesystems/glibc don't behave as expected, but internally doing zero-filling instead(including RHEL5.3).
Preallocating large enough space before going into production or low-load hours (midnight etc) is a current workaround.

Talking about InnoDB deployment, innodb_file_per_table is popular and easy-to-manage, but it's currently not overwriting architecture(Updated in May 28: Preallocating GBs of data beforehand is not possible unless you explicitly load & delete data into/from tables. The maximum (auto)extension of an .ibd file is 4MB at one time, regardless of innodb_autoextend_increment setting. See bug#31592 for detail.) Not using innodb_file_per_table, but preallocating large InnoDB tablespaces before going into production (i.e. ibdata1,2,3.., over 100GB in total) can often get better performance.

53 comments:

Baron said...

"innodb_file_per_table is popular and easy-to-manage, but it's currently not overwriting architecture" -- I think you are mixing things a little. It does preallocation in blocks of innodb_autoextend_increment size.

There is some javascript in this comment form that is driving me crazy. Arrow keys don't work, can't ctrl-v to paset, can't right click to paste...

mtocker said...

Baron, almost... there's a bug:
http://bugs.mysql.com/bug.php?id=31592

qu1j0t3 said...

The generalisation is probably not valid for COW filesystems such as ZFS.

Yoshinori Matsunobu said...

Baron,

I needed to explain more. Thanks. What I meant was preallocating GBs of data beforehand is not possible unless you explicitly load & delete data into/from tables. The maximum (auto)extension of an .ibd file is currently 4MB at one time, regardless of innodb_autoextend_increment setting, as Morgan says.

I don't do anything special setting in this comment form. I can copy & paste..

Yoshinori Matsunobu said...

qu1j0t3,

You are right. I did the same test on ZFS last August then got very close numbers (about 7,000 fsync/sec in both appending and overwriting). I forgot to mention this. I have updated the post. Thanks.

Bradley C. Kuszmaul said...

I don't get it. For ordinary disk drives how can you do anything like 7000 fsync/sec? A disk can do a couple of hundred. That's assuming that you actually want the fsync to write data to disk. Even SSD can only do a few thousand fsync/sec.

Yoshinori Matsunobu said...

Bradley,

I tested on H/W RAID 1, two SAS 15,000RPM HDDs, write cache enabled, protected by battery. Using battery backed up write cache (BBWC) enables to get thousands of fsync/sec with durability and this is one of the best practices for DBAs. You also might be interested in my previous posts:
http://yoshinorimatsunobu.blogspot.com/2009/05/tables-on-ssd-redobinlogsystem.html
http://yoshinorimatsunobu.blogspot.com/2009/05/make-sure-write-cache-is-enabled-on.html

Sildenafil Citrate said...

I have overwrite some and you are very right, overwriting is much faster than appending, but sometimes you need to do some appending too, it is slower but you can't avoid it, anyway nice post!

li said...

can you share the pre-allocated bin log source code or provide patch, we want to use it .

Silver MLM said...

How do I migrate my work tunes to my home computer without overwriting?

Sharma Web Service said...

this is good helpful to preallocating GBs of data beforehand is not possible unless you explicitly load & delete data.
web design Huntsville AL

Handwriting analysis said...

Thanks for sharing these wonderful comic videos with us.they are really funny.will look after for some more updates.

android tablet said...

Good post. Very impressive. Thanks for sharing.

Hard stool treatment said...

I would like to thank you for the efforts you have made in writing this article. Raw food diet Toenail fungus infection Common digestive disorders

aion kinah said...

No.3 is close to the current tablespace InnoDB system (ibdata) mechanism. Extends the size of a InnoDB table innodb_autoextend_increment MB dynamically, then do the replacement. But as you can see above, zero dynamically preallocating is not a good performance.

I found some Eden Eternal Review on google,Eden Eternal is a new game and Eden Eternal Gold is a hot topic just like WOW Gold and RS Gold! But Buy RS Gold is so difficult!

hermes birkin said...

Thank you for another essential article. Where else could anyone get that kind of information in such a complete way of writing? I have a presentation incoming week, and I am on the lookout for such information.
Hermes replica
hermes birkin replica

WOW Gold said...

Popular games above video card have been recently remodeled Buy Eden Gold to any new generation total. These classics have been created as a digital game titles. Best video card games are electronic games adapted effectively Buy Runescape Gold. These games are played on game consoles and on individual computers.Buy Tera GoldWOW GoldMaplestory Mesos For SaleBuy Tera GoldTera Gold

Runescape Gold said...

I just needed a little information and a Google search of it. I visited all the pages that came on the first page and not get a relevant result, then I said to see the second and has a blog. This is what I wanted!


http://www.buylovejewelry.com/
http://gamepartygogo.com/

Troy Polamalu Jersey said...

we are a nfl jersey wholesaler, if you want to order nfl jersey please take look at Pittsburgh Steelers jersey

film art said...

Amazing article is written here. Thanks for sharing this with us...

get girlfriend give oral sex said...

I'm so excited. I really appreciate sharing this great post. Keep up your excellent work.

RS GP said...

Thank you for sharing. Glad to see you.It is really a good post.http://tera4u.net/

arabic alphabet for kids said...

This is a wonderful content. I will bookmark this site and visit again. It is very informative. Thanks for sharing.
Grow taller exercises
Exercises to increase height
Stretch exercises to grow taller

i need a blowjob said...

I want to thank you for this informative read. Loved the whole article! Thanks for sharing.
jacks blowjob persuasion
how to get a girl to suck your dick
get girlfriend to deepthroat
convince girlfriend facial
how to get more blow jobs
i need a blowjob
wife wont suck my cock
make swallow cum
jacks blowjob persuasion review

GuildWars2Items said...

Don't go for looks,they can deceive [url=http://www.d3-gold.org]Cheap D3 Gold[/url], don't go for wealth, even that fades away [url=http://www.d3-gold.org]D3 Gold Sale[/url], go for someone who makes you smile because it takes only a smile to make a dark day seem bright [url=http://www.d3-gold.net]D3 Gold[/url].

GuildWars2Items said...

It takes strength to be truthful when a lie would be more convenient D3 Gold, it takes strength to be polite to someone when that person has been rude to you Buy D3 Gold, it takes strength to persist in the face of obstacles, when it would be much easier to simply give up Cheap D3 Gold.

GuildWars2Items said...

gw2 gold Life is too short to wake up in the morning with regrets. So, love the people who treat you right and forget about the ones who do not gw2 gold, people who are serious about the relation are moody as they have devoted a lot that makes them worry about gains and losses guild wars 2 gold.

Cahaya Mandiri said...

Great Post. I have not been visiting the site recently. Took a visit again and there were some great comments on the site. Excellent post. Keep up the good work.
tips cara agar cepat hamil l CARA BELAJAR BAHASA INGGRIS l the best acne treatment l how to lose weight fast easy
margahayuland l BISNIS ONLINE l tips cepat hamil l how to get rid of acne home remedies l
home remedies for acne l how to cure acne fast l
baju batik modern l toko sepatu online l grosir jam tangan online l
jual jam tangan l toko jam tangan murah peluang usaha online l is acne no more for you l how to get rid of acne naturally
how to clear acne l cure acne naturally
best natural remedies for acne l acne no more l tempat belajar bisnis online
peluang usaha rumahan l cara mendapatkan uang dari internet
makanan sehat agar cepat hamil l penyebab tidak bisa hamil lcara agar cepat hamil


muhammad yusuf syaifullah said...

i want to say,your arcticle is very nice and informative to read
Rumput Kebar Penyubur Kandungan

steve7876 said...

You were all over it while it was happening.I'm going to the barns midweek.Why don't you come along.Best store cards online

shahbaz said...

The company's pilot program includes over 2,000 stores on the west coast going Styrofoam-free.vaporizer australia

sadia sulaman said...

Excellent read, Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style.flevoservice |

ascotuw |

stagandhendays |

carvinworld |

ashun |

signvinylonline |

kickstartup |

radioalternativa |

mob-apps |

kellysofnarberth |

shahbaz said...

i used to be appreciating viewing personnel seven days every week. Once regarding for my state of affairs bought a complete smaller from their website...new york motorcycle insurance

shahbaz said...

This is really awesome and i love that.. This is very unique thing you put on that post.. Thanks for sharing... recuperar dados

Pan Norbert said...

Finally I figured out I am good at picking one or two bigger trades per day, and I am very comfortable and good with that forex trading strategy. which forex trading strategies

shahbaz said...

I love that idea of thinking of three ways to wear something before buying it! I really need to start doing that. I was going through my wardrobe and I found stuff that I have never even worn! I love the way you style this! You are so beautifulPest Control

Best economical Dymo Labels said...

I really admire the important ideas that you offer in the content. I am looking forward for more important thoughts and more blogs. puresilk salt cell

panca-samudera said...


The article posted was very informative and useful
thanks for sharing.
jaring futsal , jaring futsal murah , jual jaring futsal , jaring golf , jual jaring golf , jaring golf murah ,
jaring polynet , tangga darurat , jaring truk , jaring cargo , jaring outbound , jual rumput futsal murah ,
jual rumput sintetis murah , tali tambang , cargo net , jaring tanaman , jaring kassa , jaring proyek , jaring bangunan ,
jaring gedung , jaring pengaman proyek , jaring pengaman bangunan , jaring pengaman gedung , jaring peneduh , jaring waring , kasa hijau , tangga darurat , jaring gawang futsal ,
jual jaring gawang futsal murah , jaring peneduh , jaring truk , tali tambang nylon , jaring safety , safety net , jaring , waring , polynet

steve7876 said...

Web is changing with the speed of light and India needed latest and cutting communication technology to improve its overall ranking. Improvement in M2M adoption will boosting overall marketing pernetraion and will lead to better customer support. best dentist in plano

shahbaz said...

This is a excellent post. Do you have an email I can grab you on? Couldn't locate it throughout your website.Thanks
help in selling a business

shahbaz said...

This is a truly good site post. Not too many people would actually, the way you just did. I am really impressed that there is so much information about this subject that have been uncovered and you’ve done your best,miami criminal defense attorneys

Sandhy Herbal Papua said...


Sarang Semut Putih
Sarang Semut Papua
Buah Merah Papua
Rumput kebar Papua
Kayu Akway

Linfeng Yan said...

In a newer post ( http://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html ) you actually mentioned below point

"1. write() does disk read when needed. To avoid this issue you need to append a file, not overwrite. Or use OS page aligned writes."


This would mean in test 2, the whole 1GB of file needs to be paged in before the overwritten can occur. Wouldn't that be slow?


Sorry for commenting on a aged post. Appreciate your response.

shahbaz said...

Create this eleemosynary of buddys you attention about precisely what consequently heartbreak-favored along ruminates to most of these draws which are truth produced.AAA Tree Service NY Emergency Tree Removal

Srinadh said...

A smallish campaign with a homemade list would not be likely to yield much of a result. To achieve anything worthwhile, a much more aggressive effort is needed. Then, the age-old value analysis applies: projected earnings = margin on total projected sales - cost of campaign.

shahbaz said...

Their snowchains furthermore conservation ended up keenly offered amongst underscores. Adjunct instantly, Throughout i restrain handful gender of surface pin in antithesis to most of us idiom about our Canadian suture.schwimmbad-ueberdachungen

shahbaz said...

Your blog specializing in going above the certain lacking a apprehension contemptuous along on their possess released constrains.rn jobs in south florida

shahbaz said...

Congratulation for the great post. Those who come to read your Information will find lots of helpful and informative tips.Congratulation for the great post. Those who come to read your Information will find lots of helpful and informative tips.

lina sexshop said...

Berikut kami akan membahas tentang bagai mana cara memperbesar penis dengan hasil cepat dan aman tanpa operasi
jika anda ingin memperbesar penis dengan hasil cepat,aman tanpa operasi itu mudah anda bisa pilih cara memperbesar penis dengan obat pembesar penis herbal
seperti Pembesar Penis Obat Good Man

jika anda riskan konsumsi obat pembesar penis anda bisa pilih jalan lain seperti metode trapi ( pemakaian luar ) kami juga punya solusinya anda harus punya Alat Pembesar Penis sejenis Pro Extender atau Vimax Extender

Berikut penjelasan kami :

Apa Obat Good Man itu : Obat Good Man
Apa Vimax Extender itu : Vimax Extender Asli
Apa Pro Extender itu : Pro Extender Asli

Sekarang anda bisa tau rahasia cara memperbesar penis dengan aman tanpa operasi.selamat mencoba.

Thank you for the information , I wait for more information and I ask for a return visit our new website
>> Obat Pembesar penis <<

Best economical Dymo Labels said...

Really. Awesome posts. Looking forward for more posts. All the very best. e-learning solutions

shahbaz said...

This is my first time i visit here. I found so many interesting stuff in your blog especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the enjoyment here! keep up the good work.atlanta rhinoplasty

shahbaz said...

I a lot of junk food I had to do and in that last yesterday at am you know but if you don't feel good after you eat and now you're feel good so when you eat like a healthier foods you just feel like inside you better like your physiology is just different yak know when some.web icons

shahbaz said...

Thank you so much for the post you do. I like your post and all you share with us is up to date and quite informative, i would like to bookmark the page so i can come here again to read you, as you have done a wonderful job.charter yacht miami

Post a Comment