How much size I will save if changed INT column to MEDIUMINT?
Asked Answered
F

3

2

I'm learning how to optimize my database by re choosing the correct data types for the columns and I want to know how much size I will save if I choose MEDIUMINT (3 Bytes) instead of INT(4 Bytes)

AFAIK -and correct me if I'm wrong- I need the database size to be as small as possible to fit in RAM to reduce the hard-desk requests. The size of the database consists of the tables sizes + index sizes

giving that I have an INT column that has 10'000'000 rows and a B-Tree index on it, how much size In MBs I will save if I changed the datatype of the column from INT to MEDIUMINT at

  • table data size ?
  • index size ?

note: I know MySQL will not reduce the actual size on disk unless I OPTIMIZE TABLE

EDIT: My situation is that I will finish my first serious system in my life shortly -it's an ERP system that I plan to sell in the Arab region market - . Plans 1, 2, 3, 4 databases are supposed to be about 2GB, 4GB, 10GB, 40GB respectively, so If I could reduce the size of each database without sacrificing performance/features, why not ? If I could make a 32GB RAM machine serve 4 clients instead of 2, why not ?

Flinn answered 26/4, 2018 at 23:10 Comment(4)
Worrying over a single byte seems a bit overkill if you're not working on an 8-bit machine from the 80s. A 3-byte integer is a weird idea anyways, it sounds inefficient memory-access wise.Perkoff
@MattiVirkkunen hmmmm, I don't know that, so do you recommend using INT datatype for all my numbers because it will make a better performance ?Glottochronology
When you ALTER TABLE and adjust the schema the table has to be rebuilt so if you're using "file per table" in InnoDB the space will be reclaimed.Loera
Related: #25256312Snakebird
L
4

Just use INT unless you have a specific, measurable problem. You're only going to make a mess of things if you fret over every single byte in an era where even the most thrifty of smart phones has a billion of them for memory alone.

I need the database size to be as small as possible to fit in RAM to reduce the hard-desk requests.

No you don't. You need the database to be easy to work with and perform adequately. In an era of SSD-backed databases, I/O will not be a problem until you're operating at large scale, and when and if that day comes then you can take measurements and understand the specific problems you're having.

Shaving a single byte off your INT field is unlikely to make anything better since three byte integer values are not something your CPU can directly deal with. These will be converted to four bytes and aligned properly so they can be understood, a process that's messy compared to reading a plain old 32-bit integer.

Remember, MySQL comes from an era where a high-end server had 64 megabytes of memory and a 9 gigabyte hard disk was considered huge. Back then you did have to shave bytes off because you only had a handful of them.

Now we have other concerns, like will you accidentally exhaust your 24-bit integer space like Slashdot did where their site went down because of exactly the sort of "optimizing" you're intending to do here.

Be careful. Optimize when you have a concrete reason to, not just because you think you need to. Avoiding premature optimization is a constant struggle in development, but if you're disciplined you can avoid it.

Loera answered 26/4, 2018 at 23:24 Comment(6)
thank you very much, and what about TINYINT ? is 1 byte makes the CPU sad too?Glottochronology
"Be careful. Optimize when you have a concrete reason to, not just because you think you need to" I just thought If I squeezed the schema I will get better performance, but now after this answer I will use INT for everythingGlottochronology
Computers love powers of two, so one, two, four and eight byte values are all natively supported. Three, five, seven and so on are not. It used to be that disks were super slow, but now they can transfer >2GB/s, which is for many people their entire database. When you have a problem that can be measured, look more closely, use tools like EXPLAIN to see what the problem is. It's hardly ever a matter of an extra byte per row.Loera
"Use INT for evertyhing" isn't quite the solution. It's "Use INT by default", but it's worth deviating if you have a good reason. Using UUIDs, using BIGINT when you know you're going to have billions of records, that sort of thing.Loera
I'm really sorry for bothering you again after the question. After reading the answers in this question I'm confused. Away from the 3-bytes alignment problem, why not using tinyint or smallint instead of int and reduce my cost if I'm sure the column will not be exhausted ? please see the edits in my question to have a better vision of my situationGlottochronology
A properly tuned MySQL server on a host with 32GB of memory will not even notice the difference between 2-byte and 4-byte values until you're dealing with billions of records, and even then you're talking about 1-2GB of memory impact. Do the math and you'll see it's really not worth fussing about.Loera
C
4

(I disagree with some of the other Answers/Comments. I will try to answer all the questions, plus address all the points that I disagree with.)

MEDIUMINT is 3 bytes, saving 1 byte per row over INT.
TINYINT is 1 bytes, saving 3 bytes per row over INT.
In both cases, there is another 1 or 3 bytes saved per occurrence in any INDEX other than the PRIMARY KEY.

If you are likely to have more data+index than space in RAM, then it is wise to shrink the datatypes but be conservative.

Use MEDIUMINT UNSIGNED (etc) if the value is non-negative, such as for AUTO_INCREMENT. That gives you a limit of 16M instead of 8M. (Yeah, yeah, that's a tiny improvement.)

Beware of "burning" AUTO_INCREMENT ids -- INSERT IGNORE (and several other commands) will allocate the next auto_inc before checking whether it will be used.

Even if data+index exceeds RAM size (actually innodb_buffer_pool_size), it may not slow down to disk speed -- it depends on access patterns of the data. Beware of UUIDs, they are terribly random. Using UUIDs when you can't cache the entire index is deadly. The buffer_pool is a cache. (I have seen a 1TB dataset run fast enough with only 32GB of RAM and a spinning disk.)

Using ALTER TABLE to change a datatype probably (I am not sure) rebuilds the table, thereby performing the equivalent of OPTIMIZE TABLE.

If the table was created with innodb_file_per_table = OFF and you turn it ON before doing the ALTER, you get a separate file for the table, but ibdata1 will not shrink (instead it will have lots more free space).

Alignment of 3-byte numbers -- not an issue. Powers of 2 is not relevant here. MySQL assumes all columns are at poor boundaries, and of poor sizes. All numbers are converted to a generic format (64-bit numbers) for operating on. This conversion is an insignificant part of the total time -- fetching the row (even if cached) is the most costly part.

When I/O-bound, shrinking datatypes leads to more rows per block, which leads to fewer disk hits (except in the UUID case). When I/O-bound, hitting the disk is overwhelming the biggest performance cost.

"NULLS take no space" -- https://dev.mysql.com/doc/internals/en/innodb-field-contents.html . So, again, less I/O. But, beware, if this leads to an extra check for NULL in a SELECT, that could lead to a table scan instead of using an index. Hitting 10M rows is a lot worse than hitting just a few.

As for how many clients you can fit into 32GB -- Maybe 6 or more. Remember, the buffer_pool is a cache; data and indexes are cached on a block-by-block basis. (An InnoDB block is 16KB.)

One more thing... It is a lot easier to shrink the datatypes before going into production. So, do what you can safely do now.

Collaborate answered 7/5, 2018 at 4:34 Comment(0)
W
2

The exact size of your index is going to depend on how many rows you have, but also on how the data in your index looks.

If you shave off 1 byte per record in your data, and you have 10.000.000 records, that'll only save you up to 10MB on disk for the table data. Adding an index is going to add some more, and B-trees have empty space in them, but it depends on the actual data how inefficient it is.

If you want to save space, make sure that the field is not nullable, because even if you fill all rows with data, there is information per record, stating whether the nullable field contains data or not.

Wame answered 26/4, 2018 at 23:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.