Should MEDIUMINT be avoided in MySQL?

Asked 12/8, 2014 at 3:44 Answered 27/4, 2018 at 16:10

I came across a comment on the following blogpost that recommends against using MEDIUMINT:

Don’t use [the 24bit INT], even in MySQL. It’s dumb, and it’s slow, and the code that implements it is a crawling horror.

4294967295 and MySQL INT(20) Syntax Blows

An answer on Stack Overflow also notes that SQL Server, PostgreSQL and DB2 don't support MEDIUMINT:

What is the difference between tinyint, smallint, mediumint, bigint and int in MySQL?

Should MEDIUMINT be avoided or should I continue to use it in the cases where it best represents the data I am storing?

Thermometry answered 12/8, 2014 at 3:44 Comment(2)

It's false economy. The natural width on modern processors is 32 or 64 bits, and the db has to go to extra effort to enforce 24-bit integer width. – Athelstan 12/8, 2014 at 3:51

@JimGarrison I really doubt that is much of a factor overall. It could be stored as 32-bits in-memory (or not but it doesn't really matter), and 24-bits on records trivially. (Saving up to 25% space over an int..) – Dorwin 12/8, 2014 at 3:53

InnoDB stores MEDIUMINT as three bytes value. But when MySQL has to do any computation the three bytes MEDIUMINT is converted into eight bytes unsigned long int(I assume nobody runs MySQL on 32 bits nowadays).

There are pros and cons, but you understand that "It’s dumb, and it’s slow, and the code that implements it is a crawling horror" reasoning is not technical, right?

I would say MEDIUMINT makes sense when data size on disk is critical. I.e. when a table has so many records that even one byte difference (4 bytes INT vs 3 bytes MEDIUMINT) means a lot. It's rather a rare case, but possible.

mach_read_from_3 and mach_read_from_4 - primitives that InnoDB uses to read numbers from InnoDB records are similar. They both return ulint. I bet you won't notice a difference on any workload.

Just take a look at the code:

ulint
mach_read_from_3(
/*=============*/
        const byte*     b)      /*!< in: pointer to 3 bytes */
{
        ut_ad(b);
        return( ((ulint)(b[0]) << 16)
                | ((ulint)(b[1]) << 8)
                | (ulint)(b[2])
                );
}

Do you think it's much slower than this?

ulint
mach_read_from_4(
/*=============*/
        const byte*     b)      /*!< in: pointer to four bytes */
{
        ut_ad(b);
        return( ((ulint)(b[0]) << 24)
                | ((ulint)(b[1]) << 16)
                | ((ulint)(b[2]) << 8)
                | (ulint)(b[3])
                );
}

Ricketts answered 12/8, 2014 at 5:14 Comment(3)

Ok thanks. I guess I was looking to confirm if there was anything particularly bad about the way that MySQL handles the mediumint internally, or if it was really non-standard and should be avoided. It sounds like it is fine to use if it fits the data being stored. – Thermometry 12/8, 2014 at 23:45

Hi, you mean, if i use this: min(MEDIUMINTField) in MySQL, the result will be eight bytes and in java, for example, i need a long type variable ? – Cookshop 17/10, 2020 at 20:28

It's up to a client library to decide which type to use for the result. – Ricketts 20/10, 2020 at 16:8

In the grand scheme of things, fetching a row is the big cost. Simple functions, expressions, and much less, data formats, is insignificant in how long a query takes.

On the other side, if your dataset it too large to stay cached, the overhead of I/O to fetch row(s) is even more significant. A crude rule of thumb says that a non-cached row takes 10 times as long as a cached one. Hence, shrinking the dataset (such as using a smaller *INT) may give you a huge performance benefit.

This argument apples to ...INT, FLOAT vs DOUBLE, DECIMAL(m,n), DATETIME(n), etc. (A different discussion is needed for [VAR]CHAR/BINARY(...) and TEXT/BLOB.)

For those with a background in Assembly language...

A table is likely to have a mixture of numbers and strings, thereby thwarting attempts to "align" values.
MySQL has always handled a variety of hardwares (big/little-endian, 16/32/64-bit) with binary compatibility. Note how the code @akuzminsky provided avoids alignment and endian issues. And it lets the compiler deal with 32-bit issues if the hardware is only 16-bit.
The code to test for special cases would probably outweigh the simply writing generic code.
We are talking typically less than 1% of the total row-handling time.

Hence, the only sane way to write the code is to work at the byte level, and to ignore register size and assume all values are mis-aligned.

For Optimization, in order of importance:

Count the disk hits. Touching disk is overwhelmingly the most costly part of a query.
Count the number of rows touched. Finding a row (via BTree, etc) takes some CPU. But, note, very few installations are CPU-bound; those that are tend to have poor indexes. (Rule of Thumb: There are typically 100 rows in an InnoDB data or index block.)
Only now does parsing the row come into play.

Rule of Thumb: If a tentative optimization does not (via back-of-envelope calc) yield 10% improvement, don't waste your time on it. Instead look for some bigger improvement. For example, indexes and Summary tables are often provide 10x (not just 10%).

Flaunty answered 27/4, 2018 at 16:10 Comment(2)

Thanks, and what about a MySQL MEDIUMINT (3 bytes) integer? will it cause the CPU to do extra work to align the data on the registers ? that is what I have been told here – Ilocano 27/4, 2018 at 16:50

@Accountantم - My background is such that I very much understand "word alignment". But, I claim, it is truly insignificant in this context. I added a bunch to my Answer. – Flaunty 27/4, 2018 at 17:15

Recommended topics

Hot tags