MYSQL query performs very slow
Asked Answered
E

8

19

I have developed a user bulk upload module. There are 2 situations, when I do a bulk upload of 20 000 records when database has zero records. Its taking about 5 hours. But when the database already has about 30 000 records the upload is very very slow. It takes about 11 hours to upload 20 000 records. I am just reading a CSV file via fgetcsv method.

if (($handle = fopen($filePath, "r")) !== FALSE) {
            while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
                if (count($peopleData) == $fieldsCount) {

//inside i check if user already exist (firstName & lastName & DOB)
//if not, i check if email exist. if exist, update the records.
//other wise insert a new record.
}}}

Below are the queries that run. (I am using Yii framework)

SELECT * 
FROM `AdvanceBulkInsert` `t` 
WHERE renameSource='24851_bulk_people_2016-02-25_LE CARVALHO 1.zip.csv' 
LIMIT 1

SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId, 
       cfv.relatedId, cfv.fieldValue, cfv.createdAt 
FROM `CustomField` `cf` 
    INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId 
    LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId 
                and relatedId = 0 
    LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id 
WHERE ((relatedTable = 'people' and enabled = '1') 
  AND (onCreate = '1')) 
  AND (cfsa.subarea='peoplebulkinsert') 
ORDER BY cf.sortOrder, cf.label

SELECT * 
FROM `User` `t` 
WHERE `t`.`firstName`='Franck' 
  AND `t`.`lastName`='ALLEGAERT ' 
  AND `t`.`dateOfBirth`='1971-07-29' 
  AND (userType NOT IN ("1")) 
LIMIT 1

If exist update the user:

UPDATE `User` SET `id`='51394', `address1`='49 GRANDE RUE', 
                  `mobile`='', `name`=NULL, `firstName`='Franck', 
                  `lastName`='ALLEGAERT ', `username`=NULL, 
                  `password`=NULL, `email`=NULL, `gender`=0, 
                  `zip`='60310', `countryCode`='DZ', 
                  `joinedDate`='2016-02-23 10:44:18', 
                  `signUpDate`='0000-00-00 00:00:00', 
                  `supporterDate`='2016-02-25 13:26:37', `userType`=3, 
                  `signup`=0, `isSysUser`=0, `dateOfBirth`='1971-07-29', 
                  `reqruiteCount`=0, `keywords`='70,71,72,73,74,75', 
                  `delStatus`=0, `city`='AMY', `isUnsubEmail`=0, 
                  `isManual`=1, `isSignupConfirmed`=0, `profImage`=NULL, 
                  `totalDonations`=NULL, `isMcContact`=NULL, 
                  `emailStatus`=NULL, `notes`=NULL, 
                  `addressInvalidatedAt`=NULL, 
                  `createdAt`='2016-02-23 10:44:18', 
                  `updatedAt`='2016-02-25 13:26:37', `longLat`=NULL 
WHERE `User`.`id`='51394'

If user don't exist, insert new record.

Table engine type is MYISAM. Only the email column has a index.

How can I optimize this to reduce the processing time?

Query 2, took 0.4701 seconds which means for 30 000 records it will take 14103 sec, which is about 235 minutes. approx 6 hours.

Update

CREATE TABLE IF NOT EXISTS `User` (
  `id` bigint(20) NOT NULL,
  `address1` text COLLATE utf8_unicode_ci,
  `mobile` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
  `name` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
  `firstName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `lastName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `username` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `password` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
  `email` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
  `gender` tinyint(2) NOT NULL DEFAULT '0' COMMENT '1 - female, 2-male, 0 - unknown',
  `zip` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
  `countryCode` varchar(3) COLLATE utf8_unicode_ci DEFAULT NULL,
  `joinedDate` datetime DEFAULT NULL,
  `signUpDate` datetime NOT NULL COMMENT 'User signed up date',
  `supporterDate` datetime NOT NULL COMMENT 'Date which user get supporter',
  `userType` tinyint(2) NOT NULL,
  `signup` tinyint(2) NOT NULL DEFAULT '0' COMMENT 'whether user followed signup process 1 - signup, 0 - not signup',
  `isSysUser` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 - system user, 0 - not a system user',
  `dateOfBirth` date DEFAULT NULL COMMENT 'User date of birth',
  `reqruiteCount` int(11) DEFAULT '0' COMMENT 'User count that he has reqruited',
  `keywords` text COLLATE utf8_unicode_ci COMMENT 'Kewords',
  `delStatus` tinyint(2) NOT NULL DEFAULT '0' COMMENT '0 - active, 1 - deleted',
  `city` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `isUnsubEmail` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Unsubscribed form email',
  `isManual` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Manualy add',
  `longLat` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'Longitude and Latitude',
  `isSignupConfirmed` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Whether user has confirmed signup ',
  `profImage` tinytext COLLATE utf8_unicode_ci COMMENT 'Profile image name or URL',
  `totalDonations` float DEFAULT NULL COMMENT 'Total donations made by the user',
  `isMcContact` tinyint(1) DEFAULT NULL COMMENT '1 - Mailchimp contact',
  `emailStatus` tinyint(2) DEFAULT NULL COMMENT '1-bounced, 2-blocked',
  `notes` text COLLATE utf8_unicode_ci,
  `addressInvalidatedAt` datetime DEFAULT NULL,
  `createdAt` datetime NOT NULL,
  `updatedAt` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `AdvanceBulkInsert` (
  `id` int(11) NOT NULL,
  `source` varchar(256) NOT NULL,
  `renameSource` varchar(256) DEFAULT NULL,
  `countryCode` varchar(3) NOT NULL,
  `userType` tinyint(2) NOT NULL,
  `size` varchar(128) NOT NULL,
  `errors` varchar(512) NOT NULL,
  `status` char(1) NOT NULL COMMENT '1:Queued, 2:In Progress, 3:Error, 4:Finished, 5:Cancel',
  `createdAt` datetime NOT NULL,
  `createdBy` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `CustomField` (
  `id` int(11) NOT NULL,
  `customTypeId` int(11) NOT NULL,
  `fieldName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `relatedTable` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `defaultValue` text COLLATE utf8_unicode_ci,
  `sortOrder` int(11) NOT NULL DEFAULT '0',
  `enabled` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
  `listItemTag` char(1) COLLATE utf8_unicode_ci DEFAULT NULL,
  `required` char(1) COLLATE utf8_unicode_ci DEFAULT '0',
  `onCreate` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
  `onEdit` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
  `onView` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
  `listValues` text COLLATE utf8_unicode_ci,
  `label` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `htmlOptions` text COLLATE utf8_unicode_ci
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `CustomFieldSubArea` (
  `id` int(11) NOT NULL,
  `customFieldId` int(11) NOT NULL,
  `subarea` varchar(256) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=43 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `CustomValue` (
  `id` int(11) NOT NULL,
  `customFieldId` int(11) NOT NULL,
  `relatedId` int(11) NOT NULL,
  `fieldValue` text COLLATE utf8_unicode_ci,
  `createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM AUTO_INCREMENT=86866 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Entire PHP Code is here http://pastie.org/10737962

Update 2

Explain output of the Query

enter image description here

Encage answered 25/2, 2016 at 12:41 Comment(13)
Post some full code so it makes more sense. Also make sure you have indexes built on your JOIN points.Multiplier
Do you know which part is taking time? the update or one of the selectsWhittington
It helps if you post your question so we can actually read it.Simony
First thing to optimize is to not use SELECT * as to check if user with that prams is present you can SELECT id only id from that table which will mean user presentUnwished
I think this may well be the most useful comment in the question only email column has a index How large are these tables?Simony
as I said add index's to join points!Multiplier
@Simony i have updated the question with table structure. please have a look.Encage
@Whittington i am trying to understand which part is taking time. is there any way to see it?Encage
I have updated the question with entire php code.Encage
@Whittington this is the part that take about half a second. SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId, cfv.relatedId, cfv.fieldValue, cfv.createdAt FROM CustomField cf INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId and relatedId = 0 LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id WHERE ((relatedTable = 'people' and enabled = '1') AND (onCreate = '1')) AND (cfsa.subarea='peoplebulkinsert') ORDER BY cf.sortOrder, cf.labelEncage
you posted database structure without CustomType table. Can you add the table or rather put "explain" outputs for each query into the question?Appetence
Hi. The structure of your CSV is always the same ? And you know exactly the good number of column ?Chariness
You can replace the LEFT JOIN CustomFieldSubArea in the query with INNER JOIN since your WHERE clause requires an entry in the joined table anyway. An INNER JOIN might be faster because the query optimiser is free to execute the query in either order.Cassicassia
T
15

Indexes are your friend.

UPDATE User ... WHERE id = ... -- Desperately needs an index on ID, probably PRIMARY KEY.

Similarly for renameSource.

SELECT * 
FROM `User` `t` 
WHERE `t`.`firstName`='Franck' 
  AND `t`.`lastName`='ALLEGAERT ' 
  AND `t`.`dateOfBirth`='1971-07-29' 
  AND (userType NOT IN ("1")) 
LIMIT 1;

Needs INDEX(firstName, lastName, dateOfBirth); the fields can be in any order (in this case).

Look at each query to see what it needs, then add that INDEX to the table. Read my Cookbook on building indexes.

Tocology answered 27/2, 2016 at 2:33 Comment(3)
For the firstname, lastname, and dateofbirth separate indices would be much more useful in case like partial name search or birthdate range.Elecampane
firstName LIKE 'F%' won't do well with my index. However, birthdate range will do just fine, since that column in last in the index.Tocology
Tacking userType onto the index may help.Tocology
S
11

Try these things to increase your query performance:

  • define indexing in your database structure, and get only columns that you want.
  • Do not use * in select query.
  • And do not put ids in quotes like User.id='51394', instead do User.id= 51394.
  • If you are giving ids in quotes then your indexing will not work. That approach improve your query performance by 20% faster.
  • If you are using ENGINE=MyISAM then you not able to define indexing in between your database table, change database engine to ENGINE=InnoDB. And create some indexing like foreign keys, full text indexing.
Statuesque answered 26/2, 2016 at 13:32 Comment(4)
I've done some quick testing and putting IDs in quotes or not seems to have no discernible effect on query time. Do you have a reference to back up the 20% speed increase?Cassicassia
I said put ids without quotes. And apply indexingStatuesque
@MattRaines: using integers instead of strings will only make a difference if you define a primary key (which will apply indexing). You can read more about it here: #1071680Mcnamara
There's no reference to type coercion on that question or any of the answers at all. According to EXPLAIN, using quotes makes no difference to whether the primary key is used for the query. mysql> explain SELECT * FROM foo WHERE id = '1'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: foo type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: Using indexCassicassia
F
5

If I understand, for all the result of SELECT * FROM AdvanceBulkInsert ... you run a request SELECT cf.*, and for all the SELECT cf.*, you run the SELECT * FROM User

I think the issue is that you send way too much requests to the base.

I think you should merge all your select request in only one big request.

For that:

Then you call the update on all the result of the merged select.

You should too time one by one your request to find which of this requests take the most time, and you should too use ANALYSE to find what part of the request take time.

Edit:

Now I have see your code :

Some lead:

  • have you index for cf.customTypeId , cfv.customFieldId , cfsa.customFieldId, user. dateOfBirth ,user. firstName,user.lastName ?

  • you don't need to do a LEFT JOIN CustomFieldSubArea if you have a WHERE who use CustomFieldSubArea, a simple JOIN CustomFieldSubArea is enougth.

  • You will launch the query 2 a lot of time with relatedId = 0 , maybe you can save the result in a var?

  • if you don't need sorted data, remove the "ORDER BY cf.sortOrder, cf.label" . Else, add index on cf.sortOrder, cf.label

Frenchpolish answered 25/2, 2016 at 13:58 Comment(3)
can you give us the code of CustomValue::model()->getCustomData ? also, can you do a EXPLAIN on the query 2?(dev.mysql.com/doc/refman/5.7/en/using-explain.html))Frenchpolish
here is the code for CustomValue::model()->getCustomData pastie.org/10738382Encage
Query 2 explain is updated in the question. by the way, when i changed the query by removing cf.* and adding cf.fieldName,cf.label,cf.required,cf.defaultValue,cf.listValues,cf.htmlOptions. its a bit better. now to process 30 000 user records from csv file it takes only 106 minutes.Encage
A
3

When you need to find out why a query takes long, you need to inspect individual parts. As you shown in the question Explain statement can help you very much. Usually the most important columns are:

  • select_type - this should always be simple query/subquery. Related subqueries give a lot of troubles. Luckily you don't use any
  • possible keys - What keys is this select going to search by
  • rows - how many candidate rows are determined by the keys/cache and other techniques. Smaller number is better
  • Extra - "using" tells you how exactly are the rows found, this is the most useful information

Query analysis

I would have posted analytics for the 1st and 3rd query but they are both quite simple queries. Here is the breakdown for the query that gives you troubles:

EXPLAIN SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId, 
   cfv.relatedId, cfv.fieldValue, cfv.createdAt 
FROM `CustomField` `cf` 
    INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId 
    LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId 
                and relatedId = 0 
    LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id 
WHERE ((relatedTable = 'people' and enabled = '1') 
  AND (onCreate = '1')) 
  AND (cfsa.subarea='peoplebulkinsert') 
ORDER BY cf.sortOrder, cf.label
  • INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
  • LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId and relatedId = 0
  • LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
  • WHERE ((relatedTable = 'people' and enabled = '1') AND (onCreate = '1')) AND (cfsa.subarea='peoplebulkinsert')
  • ORDER BY cf.sortOrder, cf.label

Solution

Let me explain above list. Bold columns totally must have an index. Joining tables is expensive operation that otherwise needs to go through all rows of both tables. If you make index on the joinable columns the DB engine will find much faster and better way to do it. This should be common practice for any database

The italic columns are not mandatory to have index, but if you have large amount of rows (20 000 is large amount) you should also have index on the columns that you use for searching, it might not have such impact on the processing speed but is worth the extra bit of time.

So you need to add indicies to theese columns

  • CustomType - id
  • CustomField - customTypeId, id, relatedTable, enabled, onCreate, sortOrder, label
  • CustomValue - customFieldId
  • CustomFieldSubArea - customFieldId, subarea

To verify the results try running explain statement again after adding indicies (and possibly few other select/insert/update queries). The extra column should say something like "Using Index" and possible_keys column should list used keys (even two or more per join query).

Side note: You have some typos in your code, you should fix them in case someone else needs to work on your code too: "reqruiteCount" as table column and "fileUplaod" as array index in your refered code.

Appetence answered 2/3, 2016 at 20:9 Comment(1)
For better performance use composite indexes where applicable, not lots of individual indexes. MySQL almost never uses two indexes in a single query. (And this because using two indexes will be inefficient.)Tocology
C
1

For my work, I have to add daily one CSV with 524 Columns and 10k records. When I have try to parse it and add the record with php, it was horrible.

So, I propose to you to see the documentation about LOAD DATA LOCAL INFILE

I copy/past my own code for example, but adapt him to your needs

$dataload = 'LOAD DATA LOCAL INFILE "'.$filename.'"
                REPLACE
                INTO TABLE '.$this->csvTable.' CHARACTER SET "utf8"
                FIELDS TERMINATED BY "\t"
                IGNORE 1 LINES
            ';

$result = (bool)$this->db->query($dataload);

Where $filename is a local path of your CSV (you can use dirname(__FILE__) for get it )

This SQL command is very quick (just 1 or 2 second for add/update all the CSV)

EDIT : read the doc, but of course you need to have an uniq index on your user table for "replace" works. So, you don't need to check if the user exist or not. And you don't need to parse the CSV file with php.

Chariness answered 3/3, 2016 at 13:25 Comment(0)
V
0

You appear to have the possibility (probability?) of 3 queries for every single record. Those 3 queries are going to require 3 trips to the database (and if you are using yii storing the records in yii objects then that might slow things down even more).

Can you add a unique key on first name / last name / DOB and one on email address?

If so the you can just do INSERT....ON DUPLICATE KEY UPDATE. This would reduce it to a single query for each record, greatly speeding things up.

But the big advantage of this syntax is that you can insert / update many records at once (I normally stick to about 250), so even less trips to the database.

You can knock up a class that you just pass records to and which does the insert when the number of records hits your choice. Also add in a call to insert the records in the destructor to insert any final records.

Another option is to read everything in to a temp table and then use that as a source to join to your user table to do the updates / insert to. This would require a bit of effort with the indexes, but a bulk load to a temp table is quick, and a updates from that with useful indexes would be fast. Using it as a source for the inserts should also be fast (if you exclude the records already updated).

The other issue appears to be your following query, but not sure where you execute this. It appears to only need to be executed once, in which case it might not matter too much. You haven't given the structure of the CustomType table, but it is joined to Customfield and the field customTypeId has no index. Hence that join will be slow. Similarly on the CustomValue and CustomFieldSubArea joins which join based on customFieldId, and neither have an index on this field (hopefully a unique index, as if those fields are not unique you will get a LOT of records returned - 1 row for every possibly combination)

SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId, 
       cfv.relatedId, cfv.fieldValue, cfv.createdAt 
FROM `CustomField` `cf` 
    INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId 
    LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId 
                and relatedId = 0 
    LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id 
WHERE ((relatedTable = 'people' and enabled = '1') 
  AND (onCreate = '1')) 
  AND (cfsa.subarea='peoplebulkinsert') 
ORDER BY cf.sortOrder, cf.label
Vara answered 29/2, 2016 at 10:56 Comment(0)
O
0

see to it you can try to reduce the query and check with sql online compiler check the time period then include under the project.

Ombre answered 2/3, 2016 at 10:22 Comment(0)
M
0

Always do bulk importing within a transation

        $transaction = Yii::app()->db->beginTransaction();
        $curRow = 0;
        try
        {
            while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
            $curRow++;
            //process $peopleData
            //insert row
            //best to use INSERT ... ON DUPLICATE  KEY UPDATE
            // a = 1
            // b = 2;
            if ($curRow % 5000 == 0) {
               $transaction->commit();
               $transaction->beginTransaction();
            }
        }
        catch (Exception $ex)
        {
            $transaction->rollBack();
            $result = $e->getMessage();                    
        }
        //don't forget the remainder.
        $transaction->commit();

I have seen import routines sped up 500% by simply using this technique. I have also seen an import process that did 600 queries (mixture of select, insert, update and show table structure) for each row. This technique sped up the process 30%.

Misdirect answered 5/3, 2016 at 0:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.