Monday, November 8, 2010

Handling long texts/blobs in InnoDB - 1 to 1 relationship, covering index

  I have seen that 1 to 1 relationship is sometimes used for MySQL(InnoDB) to avoid significant performance slowdown. To understand the performance difference, it is necessary to understand how InnoDB stores column values to data blocks. Suppose you have the following "diary" table (for storing blog entries).
CREATE TABLE diary (
diary_id INT UNSIGNED AUTO_INCREMENT,
user_id INT UNSIGNED NOT NULL,
post_date TIMESTAMP NOT NULL,
status TINYINT UNSIGNED NOT NULL,
rating FLOAT NOT NULL,
title VARCHAR(100) NOT NULL,
body TEXT,
PRIMARY KEY (diary_id),
INDEX user_date(user_id, post_date),
INDEX user_rating(user_id, rating)
) CHARSET utf8 ENGINE=InnoDB;
  "body" column contains diary text, which is pretty large (1KB/row). On the other hand, the rest columns are small enough (less than 50 bytes per row in total). This table is mostly used from applications for fetching titles, not fetching body. 90% SELECT queries will be like this:
1) SELECT user_id, post_date, title FROM diary WHERE diary_id=?

  And the rest 10% SELECT queries will be:
2) SELECT body FROM diary WHERE diary_id=?

  You probably guess that 1) is much faster than 2) because 1) fetches only <50 bytes while 2) fetches +1KB data. But unfortunately, in most cases 1) is as slow as 2). Even though 100% SELECT queries are 1), overall throughput will be poor if diary table is much larger than innodb buffer pool size. The reason is how InnoDB stores column values. The below is a rough image.



  InnoDB stores large texts/blobs separetely from other columns if no more space is available within the same block. The separated page is called "Overflow Page". But this doesn't work for the diary table. This is mainly because diary body(1KB) is much smaller than InnoDB block size(16KB). When inserting diary entries, InnoDB block space is normally available enough to store all columns including body. As a result, body is stored next to the rest columns, and InnoDB blocks are occupied mostly by body.



  InnoDB's I/O unit is per block basis. Even though you do not fetch body, InnoDB internally has to read blocks that contain body. In other words, single InnoDB block can not have many diary entries. So a lot of random disk reads happen here.
  I talked about this topic at the MySQL Conference & Expo this year. I explained two solutions, one is using 1 to 1 relationship and the other is using covering index.

  1 to 1 relationship solution is creating below two tables:

CREATE TABLE diary_head (
diary_id INT UNSIGNED AUTO_INCREMENT,
user_id INT UNSIGNED NOT NULL,
post_date TIMESTAMP NOT NULL,
status TINYINT UNSIGNED NOT NULL,
rating FLOAT NOT NULL,
title VARCHAR(100) NOT NULL,
PRIMARY KEY (diary_id),
INDEX user_date(user_id, post_date),
INDEX user_rating(user_id, rating)
) CHARSET utf8 ENGINE=InnoDB;

CREATE TABLE diary_body (
diary_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
body TEXT
) CHARSET utf8 ENGINE=InnoDB;
  diary_head table does not have the largest text column(body). So diary_head table size can be much smaller than the original diary table. InnoDB blocks that have diary_head's column values are frequently accessed, but the number of the blocks can be smaller. So these InnoDB blocks will be cached within InnoDB buffer pool very well. Most of SELECT queries do not require random disk i/o, so total throughput can be increased.

  Another approach is creating a covering index that covers all columns except body. You do not need to create 1 to 1 tables in this case.
mysql> ALTER TABLE diary ADD INDEX diary_covering (diary_id, user_id, post_date, 
-> status, rating, title);
  By fetching these columns through diary_covering index instead of the PRIMARY key, InnoDB does not read blocks that contain body. diary_covering index size will be as small as the above diary_head table, so it will be cached very well.

  Both 1 to 1 relationship and an additional covering index perform pretty well(The benchmarking results are written in the conference slides). But neither is straightforward for developers. Using 1 to 1 relationship forces developers to use joins to fetch all diary columns, and forces them to maintain consistency (or use foreign key constraints). Using additional covering index sometimes increases data size significantly. And more, you need to add FORCE INDEX in the SELECT statement like below so that MySQL can use the covering index insted of the PRIMARY key.
mysql> SELECT user_id, post_date, title FROM diary 
-> FORCE INDEX (diary_covering) WHERE diary_id=?

  This is needed because MySQL optimizer chooses PRIMARY KEY for PK lookups because the optimizer guesses PK lookup (unique lookup) is the fastest than any other type of index lookup, even though it is not true in this case. Using SQL hints is sometimes not easy, especially when you use O/R mapping tools.

  Personally I hope that InnoDB team or outside experts supports new InnoDB data format which is optimal for handling such large data types. PBXT handles these types of columns pretty well. PBXT stores large texts separately from other smaller columns, and it won't read large text data when SQL statement doesn't touch the large columns. If such kinds of data formats (storing specified columns in DDL into separated blocks) are supported in InnoDB, that will be nice.

53 comments:

lstigile said...

Does the new 5.5 innodb Baracuda file format improve the situation?

From http://dev.mysql.com/doc/innodb-plugin/1.1/en/innodb-row-format-dynamic.html :
"When innodb_file_format is set to “Barracuda” and a table is created with ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED, long column values are stored fully off-page, and the clustered index record contains only a 20-byte pointer to the overflow page."

Would the 1-1 mapping or covering index still be required?

Anonymous said...

The work on http://bugs.mysql.com/bug.php?id=39653 should help to get the covering index selected without the need for hints. The PK is now last choice when there are several possible covering indexes. But not when it's select *, that sill prefers PK after the work in http://bugs.mysql.com/bug.php?id=55656 . This work is in 5.1.51 and later.

James Day, MySQL Principal Support Engineer, Oracle UK

Yoshinori Matsunobu said...

@lstigile

My understanding is that neither DYNAMIC nor COMPRESSED format helps in this case. These new formats store 20B instead of 768B when large texts/blobs are stored in overflow pages. This helps when texts/blobs are stored in overflow pages. But in this case texts/blobs are much smaller than InnoDB Blocks, so even though we use DYNAMIC/COMPRESSED, texts/blobs are stored in the same pages, not in overflow pages.

Yoshinori Matsunobu said...

@James

Does the fix for bug#39653 help for unique lookups (type==const)? I think it helps for range scans, but not for unique lookups because find_shortest_key() is not called in that case.

Romain said...

Great article, thanks!
One question though : In the first solution (1 to 1), can you tell me why you are not linking the two tables with a foreign key on 'diary_id' ?
Thanks!

Yoshinori Matsunobu said...

@Romain

It is possible to create FK on these tables. It's just a trade-off between insert performance slowdown caused by FK lookup and application-side consistency checking. Which is better depends on applications.

Aleksey said...

continue ...

if (!npollev)
return 0;

dbctx->set_statistics(conns.size(), npollev);

size_t j;

/* READ */
/* j = 1, because index 0 is server socket */

for (i = conns.begin(), j = 1; i != conns.end(); ++i, ++j) {
pollfd& pfd = pfds[j];
if ((pfd.revents & mask_in) == 0) {
continue;
}
hstcpsvr_conn& conn = **i;
if (conn.read_more()) {
conn.nb_last_io = now;
DENA_VERBOSE(100, fprintf(stderr, "read_more fd=%d\n", conn.fd.get()));
if (conn.cstate.readbuf.size() > 0) {
DENA_VERBOSE(100, fprintf(stderr, "readbuf > 0 fd=%d\n", conn.fd.get()));
const char ch = conn.cstate.readbuf.begin()[0];
if (ch == 'Q') {
DENA_VERBOSE(100, fprintf(stderr, "readbuf > 0 : shutdown Q fd=%d\n", conn.fd.get()));
vshared.shutdown = 1;
} else if (ch == '/') {
DENA_VERBOSE(100, fprintf(stderr, "readbuf > 0 : / fd=%d\n", conn.fd.get()));
conn.cstate.readbuf.clear();
conn.cstate.writebuf.clear();
conn.read_finished = true;
conn.write_finished = true;
} else {
/* EXECUTE */
DENA_VERBOSE(100, fprintf(stderr, "execute fd=%d\n", (*i)->fd.get()));
execute_lines(conn);
}
}
}
}

/* COMMIT */
dbctx->unlock_tables_if();
const bool commit_error = dbctx->get_commit_error();
dbctx->clear_error();

/* WRITE/CLOSE */
for (i = conns.begin(), j = 1; i != conns.end(); ++j) {
pollfd& pfd = pfds[j];
hstcpsvr_conn& conn = **i;
hstcpsvr_conns_type::iterator icur = i;
++i;
if (commit_error) {
conn.reset();
continue;
}
if ((pfd.revents & (mask_out | mask_in)) != 0) {
if (conn.write_more()) {
conn.nb_last_io = now;
}
}
if (cshared.sockargs.timeout != 0 &&
conn.nb_last_io + cshared.sockargs.timeout < now) {
conn.reset();
}
if (conn.closed() || conn.ok_to_close()) {
DENA_VERBOSE(100, fprintf(stderr, "close fd=%d\n", (*icur)->fd.get()));
conns.erase_ptr(icur);
}
}

DENA_VERBOSE(130, fprintf(stderr, "nb: %p nfds=%zu cns=%zu\n", this, nfds, conns.size()));

if (conns.empty()) {
dbctx->close_tables_if();
}

dbctx->set_statistics(conns.size(), 0);

return 0;
}


I fix some problems in configure.ac, Makefile.am (s) for our platforms.

My email: alkon2000@mail.ru

With best regards, Aleksey.

Maciej Wiercinski said...

Hi,

Is there any reason to have AUTO_INCREMENT in diary_body.diary_id?

Best regards,

Yingkuan Liu said...

It seems the 1:1 relationship tables method is better than the covering index.
It's quite costly to maintain extra secondary index and lost the fast PK lookup at the same time.
It probably also depends on how often the records get updated. Of course the only potential updatable column is rating. Is it a fixed rating of diary content or reader rating?

Yingkuan Liu said...

By the way, Oracle store LOB columns either inline (in row) or out of line. You have option to choose inline or outline for smaller LOB size namely 4000 bytes or less.

Here's snip from Oracle doc
===================

LOB columns store locators that reference the location of the actual LOB value. Depending on the column properties you specify when you create the table, and depending the size of the LOB, actual LOB values are stored either in the table row (inline) or outside of the table row (out-of-line).

LOB values are stored out-of-line when any of the following situations apply:

If you explicitly specify DISABLE STORAGE IN ROW for the LOB storage clause when you create the table.

If the size of the LOB is greater than approximately 4000 bytes (4000 minus system control information), regardless of the LOB storage properties for the column.

If you update a LOB that is stored out-of-line and the resulting LOB is less than approximately 4000 bytes, it is still stored out-of-line.

LOB values are stored inline when any of the following conditions apply:

When the size of the LOB stored in the given row is small, approximately 4000 bytes or less, and you either explicitly specify ENABLE STORAGE IN ROW or the LOB storage clause when you create the table, or when you do not specify this parameter (which is the default).

When the LOB value is NULL (regardless of the LOB storage properties for the column).

Sagging skin said...

Your post really grabbed my attention and interest for the reason that the content is not just informative but also simple yet meaningful.

Natural laxative foods Stomach gas Hard stool treatment Health benefits of almonds White spots on skin
Cure ear infection Under eye wrinkles Baby ear infection Prevent prostate cancer Prevent prostate cancer Lemon tea healh benefits Lemon tea healh benefits Prevent candida yeast infection Prevent candida yeast infection Health benefits of papaya Health benefits of papaya Benefits of peppermint oil Benefits of peppermint oil Stop hiccups Stop hiccups Water and weight loss Water and weight loss Wart Removal Wart Removal Health benefits of oatmeal

android tablet said...

Good post. Very impressive. Thanks for sharing.

WOW Gold said...

A number of people who feel networking is a waste of time and has no effect on personal relationships or to succeed in your business. This concept is completely wrong. And business relationships are built through the process of creating networks of hundreds of years.
buy eden gold
eden gold
cheap eden gold

Marissarachelle said...

Your heart breaks into cheap wow gold million pieces, you regret at ever chance you had to make things right, ever night you lay motionlessly staring at the top of your celling wall thinking
why didn’t I do something before tonight? Then RS Gold wake up the next morning with a brand new chance to buy runescape gold do it again, take it and tell them how you feel eden gold before its too late, like it was for buy eden gold!

aion gold said...

FK You can create these tables. It's just a compromise between the performance slowdown caused by the insertion of CF research and check the consistency of the application. What is the best depends on the applications.

I found some Eden Eternal Review on google,Eden Eternal is a new game and Eden Eternal Gold is a hot topic just like WOW Gold and RS Gold! But Buy RS Gold is so difficult!

hermes birkin said...

Thank you for another essential article. Where else could anyone get that kind of information in such a complete way of writing? I have a presentation incoming week, and I am on the lookout for such information.
Hermes replica
hermes birkin replica

WOW Gold said...

Board games have come a long way from the historical Maplestory Mesos For Sale Sumerian times when they had played by the fires Buy WOW Gold of camel caravans. Astronauts an incredible number of miles above Buy Tera Gold the earth have played games. Now with the Internet, players 50% of Tera Gold a planet apart may occur together Buy Eden Gold in cyberspace and revel in the challenge games.

vrnjacka banja said...

It is very informative. Loved the whole article! Thanks for sharing.
vrnjacka banja smestaj
vrnjacka banja privatni smestaj

Runescape Gold said...
This comment has been removed by a blog administrator.
wow gold said...

good enough

Anonymous said...

These include hermes birkin bag pattern Versace, Gucci, Dolce&Gabbana, Fendi, and Fake Watch other designers. If you want to customize your choice Kelly design, you can choose IWC Watches Sale from leather selections like box calf, lizard skin, ostrich skin, snake skin, crocodile skin, and Togo. You can even have it lined with fur.

pepa prase said...

I'm so excited. I really appreciate sharing this great post. Keep up your excellent work.
bojanke za decu
bojanke
gledanje u solju

fm transmitter said...

This is very good information.Your writing is very good.

banja srbija said...

Excellent post. I want to thank you for this informative read. I will bookmark this site and visit again..
Banje u Srbiji
Banje Srbije

hypnotist said...

Hypnosis for Beginners

kako napraviti web stranicu said...

I'm so excited. I really appreciate sharing this great post. Keep up your excellent work.

film art said...

I want to thank you for this informative read. Loved the whole article! Thanks for sharing.
seo optimizacija
tekstovi pesama
dnevni horoskop
sanovnik

العاب said...

Good posting, im subscribing to your rss. Many thanks once more

افلام said...

Thanks for sharing a very informative article.

Anonymous said...

excellent post

get wife give oral sex said...

It is very informative. I will bookmark this site and visit again. Keep up your excellent work.

Grow taller diet said...

I'm so excited. I really appreciate sharing this great post. Keep up your excellent work
Naturally way to grow taller
How to grow taller naturally
Growing taller at any age
Growing taller secrets

arabic letters said...

Excellent post. I want to thank you for this informative read. I will bookmark this site and visit again...
Does milk make you taller
How to be taller

Znacenje brojeva said...

So complicated for me, anyway nice article.

how to cunnilingus said...

I will bookmark this site and visit again. It is very informative. Thanks for sharing.
ljubavni stihovi

Sport uzivo said...

Very good article, little complicated but good. Tnx

family shopping said...

Amazing article, thank you

Anonymous said...

Thanks Yoshinori, you help me a lot.
Mitch from loperamide times.

Anonymous said...

Very good article, little complicated but good. seo optimizacija

Nguyễn Việt Tiến said...

Angkor wat Tours
http://www.toursangkorwat.com/
Angkor wat Tours

GuildWars2Items said...

You have a forever friend for life, and forever has no end D3 Gold Sale, and if you find such a friend, you feel happy and complete, because you need not worry Diablo iii Gold, your forever friend holds your hand and tells you that everything is going to be okay Buy Diablo iii Gold.

GuildWars2Items said...

The past is gone and static. Nothing we can do will change it. The past is gone and static. Nothing we can do will change it. Everything we do will affect it guild wars 2 gold sale, you laugh at me for being different, but I laugh at you for being the same sale guild wars 2 gold, The consequences of today guild wars 2 gold sale are determined by the actions of the past. To change your future, alter your decisions today .

GuildWars2Items said...

Death, old age, are words without a meaning, that pass by us like the idle air which we regard not Diablo 3 Gold Sale, It takes strength to do what must be done when the work is unpleasant and uncomfortable Diablo 3 Gold, it takes strength to persist in the face of obstacles, when it would be much easier to simply give up Buy Diablo 3 Gold.

GuildWars2Items said...

MapleStory Mesos i have a simple philosophy: Fill what's empty. Empty what's full. Scratch where it itches Maple Story Mesos, life is not measured by the number of breaths we take, but by the moments that take our breath away MapleStory Mesos.

Maria Arredondo said...

Being happy doesn't mean that everything is perfect. It means that you've decided to look beyond the imperfections guild wars 2 gold,
Do not pray for tasks equal to your powers.Pray for powers equal to your tasks.Then the doing of work shall be no miracle, but you shall be the miracle guild wars 2 gold, fear not that the life shall come to an end, but rather fear that it shall never have a beginning guild wars 2 gold.

Maria Arredondo said...

The consequences of today are determined by the actions of the past scarlet blade gold. To change your future, alter your decisions today scarlet blade gold, Experience is a hard teacher because she gives the test first, the lesson afterwards scarlet blade gold, but it takes character to keep you there.

buyGuildwars2gold said...

The world owes you nothing. It was here first. Being happy doesn't mean that everything is perfect Runescape Gold, Empty what's full. Scratch where it itches. Don't go around saying the world owes you a living RS Gold, It means that you've decided to look beyond Runescape Money the imperfections. Do not pray for tasks equal to your powers.

buyGuildwars2gold said...

Suppose someone gave you a pen — a sealed, solid-colored pen.You could not see how much ink it had D3 Gold, think Positive Thoughts Every Day Buy D3 Gold, the brightest future will always be based on a forgotten past, you can't go on well in life until you let go of your past failures and heartaches Cheap D3 Gold.

adeoe said...

diary_head table does not have the largest text column(body). So diary_head table size can be much smaller than the original diary table. InnoDB blocks that have swtor credits diary_head's column values are frequently accessed, but the number of the blocks can be smaller. So these InnoDB blocks will be cached within InnoDB buffer pool very well. Most of SELECT queries wizardry gold do not require random disk i/o, so total throughput can be increased.

jimmy wilson said...

Outstanding write-up I adore your content; adore the way you outlined things, you are doing an excellent work many of others like you by means of which sort of informative information sites supply consciousness in order to people relevant to many things. My partner and i study various other intriguing blogs from a sites as well as My business is much serious along with your writing a blog skills, My spouse and i in addition started to generate write-up and this also type discussion actually assist me personally out. I already book-marked your current page and propagated your current internet sites to help my own acquaintances not only me but these people like your current blogs abilities, desire you compose far more interesting websites such as this a single as well as enjoy for your future weblogs.
Jimmy Wilson-MI Ghost Protocol Jacket

Jess said...

The following step is to start up your Runescape and login. http://www.gpors.com/

Best Article Websites said...

Very Useful information , this is both good reading for, have quite a few good key points, and I learn some new stuff from it too, thanks for sharing your information.
Advertising agencies in Karachi | Advertising agencies in Pakistan

Sqiar BI said...

Thanks for the post, In this complex environment business need to present there company data in meaningful way.So user easily understand it .Sqiar (http://www.sqiar.com/benefits-of-using-tableau-software/) which is in UK,provide services like Tableau and Data Warehousing etc .In these services sqiar experts convert company data into meaningful way.

Post a Comment