Understanding MyISAM record structure
Asked Answered
C

1

6

I am trying to understand how MyISAM physically store its records and how it maintains its structure after record insertion and record deletion. I have read the following link:

I want to make sure if I understand it correctly, please correct me if it is not right.

Fixed-sized record

MyISAM fixed-size record

  • Delete marker determines whether record is deleted or not deleted.
  • Record header holds which column of a row contains NULL value
  • The length of data is fixed.

Variable-sized record

enter image description here

  • Delete marker is replaced with BLOCK_DELETED block type
  • Record header holds length of data and length of unused data

enter image description here

  • A single record can be seperated into multiple block connected by overflow pointer.

Deletion

enter image description here

  • For variable-sized record, change block type to BLOCK_DELETED
  • Maintain double linked-list of all deleted record by having the previous pointer of the newly deleted record points to last deleted record. Then, the last deleted record's next pointer points to the newly deleted record.
  • For fixed-sized record, simply change delete marker as deleted. (unsure if they use double linked-list to connect all the deleted record with fixed-sized record)

Insertion

  • If there is no unused space (deleted records), append the data at the end of the file
  • If there is unused space that fits the newly inserted record, write the new record there.
  • If there is unused space that is far bigger than newly inserted record, split into two records: the new record and the deleted record.
  • If there is unused space that is smaller than newly inserted record, write data there, have overflow pointer to points to the unfitted data at other block.

Updating

  • What if users update existed data with longer data? Will MyISAM marked the record as deleted and find place that fits the new data or simply use overflow pointer to point to unfitted data?

Recap the question again

I want to make sure if I understand it correctly, please correct me if it is not right.

Additional questions

  • Would it be very inefficient if the table has been deleted and inserted for many times since the record structure could potentially full of overflow pointers and unused space?
Catchweight answered 23/5, 2013 at 10:49 Comment(1)
Quite a broad question, but to answer just your "Additional question": yes, this is why it is recommended to issue an OPTIMIZE TABLE every now and then.Nb
A
5

The information you have in the question concerning MyISAM is right on target. However, I would like to address your two additional questions:

LATEST QUESTION

What if users update existed data with longer data? Will MyISAM marked the record as deleted and find place that fits the new data or simply use overflow pointer to point to unfitted data?

According to the Book

sdkm

Chapter 10 : "Storage Engines" Page 196 Paragraph 7 says

For records with variable length, the format is more complicated. The first byte contains a special code describing the subtype of the record. The meaning of the subsequent bytes varies with each subtype, but the common theme is that there is a sequence of bytes that contains the length of the record, the number of unused bytes in the block, NULL value indicator flags, and possibly a pointer to the continuation of the record if the record did not fit into the previously created space and had to be split up. This can happen when one record gets deleted, and a new one to be inserted into its place exceeds the original one is size. You can get the details of the meanings of different codes by studying the switch statement in_mi_get_block_info() in storage/myisam/mi_dynrec.c.

Based on that paragraph, the old record gets overwritten with linkage data only if the new data to insert cannot fit in the previously allocated block. This can result in many bloated rows.

ADDITIONAL QUESTION

Would it be very inefficient if the table has been deleted and inserted for many times since the record structure could potentially full of overflow pointers and unused space?

From my previous answer, there would be lots of blocks that have

  • block of space
  • the length of the record
  • the number of unused bytes in the block
  • NULL value indicator flags
  • possibly a pointer to the continuation of the record if the record did not fit into the previously created space and had to be split up

Such record links would start in the front of every row that have oversized data being inserted. This can bloat a MyISAM tables .MYD file very quickly.

SUGGESTIONS

The default row format of a MyISAM is Dynamic. When a table is Dynamic and experiences lots of INSERTs, UPDATEs, and DELETEs, such a table would need to optimized with

OPTIMIZE TABLE mytable;

There is an alternative: switch the table's row format to Fixed. That way, all rows are the same size. This is how you make the row format Fixed:

ALTER TABLE mytable ROW_FORMAT=Fixed;

Even with a Fixed Row Format, time must be taken to locate an available record but the time would be O(1) search time (In layman's terms, it would take the same amount of time to locate an available record no matter how many rows the table has or how many deleted rows there are). You could bypass that step by enabling concurrent_insert as follows:

Add this to my.cnf

[mysqld]
concurrent_insert = 2

MySQL restart not required. Just run

mysql> SET GLOBAL concurrent_insert = 2;

This would cause all INSERTs to go to the back of the table without looking for free space.

Advantage of Fixed Row tables

  • INSERTs, UPDATEs, and DELETEs would be somewhat faster
  • SELECT are 20-25% faster

Here are some of my posts on SELECT being faster for Row Formats being Fixed

Disadvantage of Fixed Row tables

In most cases, when you run ALTER TABLE mytable ROW_FORMAT=Fixed;, the table may grow 80-100%. The .MYI file (index pages for the MyISAM table) would also grow at the same rate.

EPILOGUE

If you want speed for MyISAM tables and can live with bigger tables, my alternate suggestions would be needed. If you want to conserve space for each MyISAM table, leave the row format as is (Dynamic). You will have to compress the table with OPTIMIZE TABLE mytable; more frequent with Dynamic tables.

Aldon answered 29/5, 2013 at 20:8 Comment(2)
Thank you for accepting my answer. May I, please, get the bounty too ?Aldon
Sure, I just wanted to delay a bit to see if there is more answer, but if you insisted, I don't mind. :DCatchweight

© 2022 - 2024 — McMap. All rights reserved.