I would advise you to make a lot of in deep testing, but from my tests I am achiving very good results with both INSERT and SELECT with the table definition I posted before. I will detail my tests in depth so anyone could easily repeat and check if it gets better results. Backup your data before any test.
I must say that these are only tests, and may not reflect or improve your real case, but its a good way of learning and probably a way of finding usefull information and results.
The advises that we have seen here are really nice, and you will surely notice a great speed improvement by using a predefined type VARCHAR with size instead of TEXT. However you could gain speed, I would advise not to use MyISAM for data integrity reasons, stay with InnoDB.
TESTING:
1. Setup Table and INSERT 200 million of data:
CREATE TABLE `entity_versionable` (
`version` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`fk_entity` INT(10) UNSIGNED NOT NULL,
`str1` VARCHAR(255) DEFAULT NULL,
`str2` VARCHAR(255) DEFAULT NULL,
`bool1` TINYINT(1) DEFAULT NULL,
`double1` DOUBLE DEFAULT NULL,
`date` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`version`,`fk_entity`)
) ENGINE=INNODB AUTO_INCREMENT=230297534 DEFAULT CHARSET=latin1
In order to insert +200 million rows in about 35 mins in a table, please check my other question where peterm has answered one of the best ways to fill a table. It works perfectly.
Execute the following query 2 times in order to insert 200 million rows of no random data (change data each time to insert random data):
INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE)
SELECT 1, 'a1', 238, 2, 524627, '2013-06-16 14:42:25'
FROM
(
SELECT a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + f.N * 100000 + g.N * 1000000 + h.N * 10000000 + 1 N FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) e
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) f
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) g
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) h
) t;
*Since you already have the original table with 200 million rows of real random data, you wont probably need to fill it, just export your table data and schema and import it into a new Testing table with the same schema. That way you will make tests in a new table with your real data, and the improvements you get will also work for the original one.
2. ALTER the new Test table for performance (or use my example above in step 1 to get better results).
Once that we have our new Test table setup and filled with random data, we should check the above advises, and ALTER the table to speed it up:
- Change TEXT to VARCHAR(255).
- Select and make a good primary key unique index with two or three
columns. Test with version autoincrement and fk_entity in your first
test.
- Partition your table if necessary, and check if it improves speed. I
would advise not to partition it in your first tests, in order to
check for real performance gain by changing data types and mysql
configuration. Check the following link for some partition and
improvement tips.
- Optimize and repair your table. Index will be made again and will
speed searchs a lot:
OPTIMIZE TABLE test
.entity_versionable
;
REPAIR TABLE test
.entity_versionable
;
*Make a script to execute optimize and maintain your index up to date, launching it every night.
3. Improve your MySQL and hardware configuration by carefully reading the following threads. They are worth reading and Im sure you will get better results.
- Easily improve your Database hard disk configuration spending a bit
of money: If possible use a SSD for your main MySQL database, and a
stand alone mechanical hard disk for backup purposes. Set MySQL logs
to be saved on another third hard disk to improve speed in your
INSERTs. (Remember to defragment mechanical hard disks after some
weeks).
- Performance links: general&multiple-cores, configuration,
optimizing IO, Debiancores, best configuration,
config 48gb ram..
- Profiling a SQL query: How to profile a query, Check for possible bottleneck in a query
- MySQL is very memory intensive, use low latency CL7 DDR3 memory if
possible. A bit off topic, but if your system data is critical, you may look for ECC memory, however its expensive.
4. Finally, tests your INSERTs and SEARCHs in the test table. Im my tests with +200 million of random data with the above table schema, it spends 0,001seconds to INSERT a new row and about 2 minutes to search and SELECT 100 million rows. And however its only a test and seems to be good results :)
5. My System Configuration:
- Database: MySQL 5.6.10 InnoDB database (test).
- Processor: AMD Phenom II 1090T X6 core, 3910Mhz each core.
- RAM: 16GB DDR3 1600Mhz CL8.
- HD: Windows 7 64bits SP1 in SSD, mySQL installed in SSD, logs written in mechanical hard disk.
Probably we should get better results with one of the lastest Intel i5 or i7 easily overclocked to 4500Mhz+, since MySQL only uses one core for one SQL. The higher the core speed, the faster it will be executed.
6. Read more about MySQL:
O'Reilly High Performance MySQL
MySQL Optimizing SQL Statements
7. Using another database:
MongoDB or Redis will be perfect for this case and probably a lot faster than MySQL. Both are very easy to learn, and both has their advantages:
- MongoDB: MongoDB log file growth
Redis
I would definitively go for Redis. If you learn how to save the log in Redis, it will be the best way to manage the log with insanely high speed:
redis for logging
Have in mind the following advices if you use Redis:
Redis is compiled in C and its stored in memory, has some different
methods to automatically save the information into disk
(persistence), you wont probably have to worry about it. (in case of disaster
scenario you will end loosing about 1 second of logging).
Redis is used in a lot of sites which manages terabytes of data,
there are a lot of ways to handle that insane amount of information
and it means that its secure (used here in stackoverflow, blizzard, twitter, youporn..)
Since your log will be very big, it will need to fit in memory in
order to get speed without having to access the hard disk. You may
save different logs for different dates and set only some of them in
memory. In case of reaching memory limit, you wont have any errors and everything will still work perfectly, but check the Redis Faqs for more information.
Im totally sure that Redis will be a lot faster for this purpose than
MySQL. You will need to learn about how to play with lists
and
sets
to update data and query/search for data. If you may need really advanced query searches, you should go with MongoDB, but in this case of simple date searchs will be perfect for Redis.
Nice Redis article in Instagram Blog.
SOMEDOUBLE
from the same entity, and thus be generating two rows inentitychange
with the sameentity_id
. I updated my question, so the "change process" becomes clearer. – Munroenum
datatype actually does normalisation: it uses 1 or 2 bytes for storage depending on number of values (65K - max). – Innocencyenum
does --enum
does not create a second table, it just encodes a fixed list of values and stores the ordinal number of the value in the list. But neither is what you described normalization. Normalization is not about referring to attributes by a surrogate key. Normalization is not about making storage more compact. Normalization is about preventing data anomalies. – Robandenum
does has nothing to do with normalization. – Innocency