E_WARNING: Error while sending STMT_PREPARE packet. PID=*
Asked Answered
H

4

12

My Laravel 5.7 website has been experiencing a few problems that I think are related to each other (but happen at different times):

  1. PDO::prepare(): MySQL server has gone away
  2. E_WARNING: Error while sending STMT_PREPARE packet. PID=10
  3. PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry (My database often seems to try to write the same record twice in the same second. I've been unable to figure out why or how to reproduce it; it doesn't seem to be related to user behavior.)
  4. Somehow, those first 2 types of errors only ever appear in my Rollbar logs but not on the text logs on the server or in my Slack notifications, as all errors are supposed to (and all others do).

For months, I've continued to see scary log messages like these, and I've been completely unable to reproduce these errors (and have been unable to diagnose and solve them).

I haven't yet found any actual symptoms or heard of any complaints from users, but the error messages seem non-trivial, so I really want to understand and fix the root causes.


I've tried changing my MySQL config to use max_allowed_packet=300M (instead of the default of 4M) but still get these exceptions frequently on days when I have more than a couple of visitors to my site.

I've also set (changed from 5M and 10M) the following because of this advice:

innodb_buffer_pool_chunk_size=218M
innodb_buffer_pool_size = 218M

As further background:

  • My site has a queue worker that runs jobs (artisan queue:work --sleep=3 --tries=3 --daemon).
  • There are a bunch of queued jobs that can be scheduled to happen at the same moment based on the signup time of visitors. But the most I see that have happened simultaneously is 20.
  • There are no entries in the MySQL Slow Query Log.
  • I have a few cron jobs, but I doubt they're problematic. One runs every minute but is really simple. Another runs every 5 minutes to send certain scheduled emails if any are pending. And another runs every 30 minutes to run a report.
  • I've run various mysqlslap queries (I'm completely novice though) and haven't found anything slow even when simulating hundreds of concurrent clients.
  • I'm using Laradock (Docker).
  • My server is DigitalOcean 1GB RAM, 1 vCPU, 25GB SSD. I've also tried 2GB RAM with no difference.
  • The results from SHOW VARIABLES; and SHOW GLOBAL STATUS; are here.

My my.cnf is:

[mysql]

[mysqld]
sql-mode="STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION"
character-set-server=utf8
innodb_buffer_pool_chunk_size=218M
innodb_buffer_pool_size = 218M
max_allowed_packet=300M
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow_query_log.log
long_query_time = 10
log_queries_not_using_indexes = 0

Any ideas about what I should explore to diagnose and fix these problems? Thanks.


Hefner answered 25/11, 2018 at 17:2 Comment(23)
Can you show us the constructed SQL statement. And SHOW CREATE TABLE.Mileage
@RickJames I updated the question. How can I show the "constructed" MySql statement? (And since it only occasionally fails, it would be helpful to know how to debug / log what is different about the failure cases.) Thanks.Hefner
If you would like to log the MYSQL of the query you can use this: \DB::enableQueryLog(); before the query and this: logger(\DB::getQueryLog() ); afterwards. This will save the query to your laravel log fileForesail
@josh Thanks. In my question above, I added a sample of what the \DB::getQueryLog() says, although it's just an example (and didn't coincide with this error actually happening on my production machine, since my production machine doesn't have this logging enabled, and the error doesn't always happen).Hefner
@RickJames See my new comment and update. Thanks.Hefner
@Hefner - next time it happens see if you can guess what the job_class_name was. Might it have been a really long string? With various escaped characters? (Etc).Mileage
@Hefner - And... The optimal index for that query is a 4-column composite index: INDEX(contact_id, job_class_name, execute_at, deleted_at). (Any rearrangement of those columns is just as good.)Mileage
@RickJames Wow, today in my query log, I saw 'time' => 157.95!! That's shocking. My table has <2000 records and has 3 indexes. I avoided indexing deleted_at because of https://mcmap.net/q/333527/-does-laravel-39-s-quot-soft_delete-quot-need-index-on-mysql. But you're saying the best approach would be the 4-column composite index? I'm so confused about 2 things. A) How my query would ever take this long on so few records even if I had no indexes, and B) why my 3 separate indexes wouldn't be just as good as the proposed 4-column composite index, given what people have written about the cardinality of deleted_at.Hefner
@Hefner - I am not suggesting a single-column INDEX(deleted_at), but rather a 4-column 'composite' index. Meanwhile, I bled all over that other Q&A.Mileage
@Hefner - A) No rational reason for 158 seconds. B) Think about two alphabetized lists -- one with a million last names and phone numbers, the other with first names and phone numbers. How would you find my phone number? Compare that to a list that was alphabetized on last, first.Mileage
@Hefner - See if you can get EXPLAIN SELECT ... (with the values filled in). I don't expect it to help, but I'm getting desperate.Mileage
@RickJames Good news it that 1 of my 4 problems wasn't a problem at all. My queries that I thought took 352 seconds really only took 0.352 seconds! 🤦‍♂️ I've updated my question to remove that part. But I'm still seeing logged errors about PDO::prepare(): MySQL server has gone away really frequently. I'm going to see if I can temporarily disable cron jobs and queued jobs and hopefully eventually isolate a correlation. Maybe the problem has nothing to do with RAM or MySQL and is more like this person experienced with Facebook query slowness: https://mcmap.net/q/833707/-sudden-quot-mysql-server-has-gone-away-quot-error-in-php-siteHefner
what is your Laradoc config? share your docker-compose.yml?Transference
@Transference It's this (raw.githubusercontent.com/laradock/laradock/…) except I changed version to 2.1, added this SSL in nginx (- /apps/laradock_data/certs:/etc/nginx/ssl), and these in mysql: mem_limit: 300m and memswap_limit: 2g. And added a few restart: always in various sections.Hefner
You should remove all applications you don't use in your project: percona, mssql, blackfire, postgressql, neo4j, mongoDB, aerospike... etc... etc.. You are running to much on the same instance :-) . Could you just share docker ps output please? ;-) and docker stats pleaseTransference
@Transference They are on separate tabs here now, thanks: docs.google.com/spreadsheets/d/…Hefner
which php is losing connection? php-fpm or php-worker?Transference
that is pretty difficult to debug remotely, but I would start from moving mysql out of docker and same instance. I am not sure about your load, but one shared vCPU can be the problem.Transference
@Transference if php-worker is what processes queued jobs and cron jobs, then probably that one.Hefner
could you please check what is average output of show processlist; and show status where `variable_name` = 'Threads_connected'; when peak timeTransference
@Transference SHOW FULL PROCESSLIST at peak traffic was this: docs.google.com/spreadsheets/d/… I won't know the answer to the other one til the next peak traffic time (24 hrs from now), but currently it says "10". Thanks.Hefner
@Hefner You could try to work around the issue with a library like this one: github.com/adamturcsan/reconnecting-pdo It wraps/extends PDO and transparently reconnects & rebuilds prepared statements if the connection was lost. I also wrote a longer answer (see lower on this page) with some ideas about the root cause.Duval
Is your code called twice? I'm thinking preflight requests that have become a thing?Booth
M
5

Re Slowlog: Show us your my.cnf. Were the changes in the [mysqld] section? Test it via SELECT SLEEP(12);, then look both in the file and the table.

Alternate way to find the query: Since the query is taking several minutes, do SHOW FULL PROCESSLIST; when you think it might be running.

How much RAM do you have? Do not have max_allowed_packet=300M unless you have at least 30GB of RAM. Else you are risking swapping (or even crashing). Keep that setting under 1% of RAM.

For further analysis of tunables, please provide (1) RAM size, (2) SHOW VARIABLES; and (3) SHOW GLOBAL STATUS;.

Re deleted_at: That link you gave starts with "The column deleted_at is not a good index candidate". You misinterpreted it. It is talking about a single-column INDEX(deleted_at). I am suggesting a composite index such as INDEX(contact_id, job_class_name, execute_at, deleted_at).

158 seconds for a simple query on a small table? It could be that there is a lot of other stuff going on. Get the PROCESSLIST.

Re Separate indexes versus composite: Think of two indexes: INDEX(last_name) and INDEX(first_name). You flip through the last_name index to find "James", then what can you do? Flipping through the other index for "Rick" won't help you find me.

Analysis of VARIABLES and GLOBAL STATUS

Observations:

  • Version: 5.7.22-log
  • 1.00 GB of RAM
  • Uptime = 16d 10:30:19
  • Are you sure this was a SHOW GLOBAL STATUS ?
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

innodb_buffer_pool_size -- I thought you had it at 213M, not 10M. 10M is much too small. On the other hand, you seem to have less than that much data.

Since the RAM is so small, I recommend dropping tmp_table_size and max_heap_table_size and max_allowed_packet to 8M. And lower table_open_cache, table_definition_cache, and innodb_open_files to 500.

What causes so many simultaneous connections?

Details and other observations:

( innodb_buffer_pool_size / _ram ) = 10M / 1024M = 0.98% -- % of RAM used for InnoDB buffer_pool

( innodb_buffer_pool_size ) = 10M -- InnoDB Data + Index cache

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_total ) = 375 / 638 = 58.8% -- Pct of buffer_pool currently not in use -- innodb_buffer_pool_size is bigger than necessary?

( Innodb_buffer_pool_bytes_data / innodb_buffer_pool_size ) = 4M / 10M = 40.0% -- Percent of buffer pool taken up by data -- A small percent may indicate that the buffer_pool is unnecessarily big.

( innodb_log_buffer_size / _ram ) = 16M / 1024M = 1.6% -- Percent of RAM used for buffering InnoDB log writes. -- Too large takes away from other uses for RAM.

( innodb_log_file_size * innodb_log_files_in_group / innodb_buffer_pool_size ) = 48M * 2 / 10M = 960.0% -- Ratio of log size to buffer_pool size. 50% is recommended, but see other calculations for whether it matters. -- The log does not need to be bigger than the buffer pool.

( innodb_flush_method ) = innodb_flush_method = -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveat about O_ALL_DIRECT

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( min( tmp_table_size, max_heap_table_size ) / _ram ) = min( 16M, 16M ) / 1024M = 1.6% -- Percent of RAM to allocate when needing MEMORY table (per table), or temp table inside a SELECT (per temp table per some SELECTs). Too high may lead to swapping. -- Decrease tmp_table_size and max_heap_table_size to, say, 1% of ram.

( net_buffer_length / max_allowed_packet ) = 16,384 / 16M = 0.10%

( local_infile ) = local_infile = ON -- local_infile = ON is a potential security issue

( Select_scan / Com_select ) = 111,324 / 264144 = 42.1% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Max_used_connections / max_connections ) = 152 / 151 = 100.7% -- Peak % of connections -- increase max_connections and/or decrease wait_timeout

You have the Query Cache half-off. You should set both query_cache_type = OFF and query_cache_size = 0 . There is (according to a rumor) a 'bug' in the QC code that leaves some code on unless you turn off both of those settings.

Abnormally small:

( Innodb_pages_read + Innodb_pages_written ) / Uptime = 0.186
Created_tmp_files = 0.015 /HR
Handler_write = 0.21 /sec
Innodb_buffer_pool_bytes_data = 3 /sec
Innodb_buffer_pool_pages_data = 256
Innodb_buffer_pool_pages_total = 638
Key_reads+Key_writes + Innodb_pages_read+Innodb_pages_written+Innodb_dblwr_writes+Innodb_buffer_pool_pages_flushed = 0.25 /sec
Table_locks_immediate = 2.8 /HR
Table_open_cache_hits = 0.44 /sec
innodb_buffer_pool_chunk_size = 5MB

Abnormally large:

Com_create_db = 0.41 /HR
Com_drop_db = 0.41 /HR
Connection_errors_peer_address = 2
Performance_schema_file_instances_lost = 9
Ssl_default_timeout = 500

Abnormal strings:

ft_boolean_syntax = + -><()~*:&
have_ssl = YES
have_symlink = DISABLED
innodb_fast_shutdown = 1
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
session_track_system_variables = time_zone, autocommit, character_set_client, character_set_results, character_set_connection
slave_rows_search_algorithms = TABLE_SCAN,INDEX_SCAN
Mileage answered 26/1, 2019 at 1:40 Comment(9)
+1 for these hints. My server is DigitalOcean 1GB RAM, 1 vCPU, 25GB SSD. I've also tried 2GB RAM with no difference. It sounds like you strongly disagree with dba.stackexchange.com/questions/27328/…. I've updated the question to include my my.cnf and SHOW VARIABLES; and SHOW GLOBAL STATUS;. Thanks for the composite index info. It seems like mine is good then.Hefner
@Hefner - Well, with only 1GB of ram, you really need to lower max_allowed_packet=300M.Mileage
@Hefner - I added my analysis of the tunables.Mileage
Thank you so much. Although I don't understand most of this (and will try to study more and eventually implement more of it later), I've now updated my my.cnf to change ~20 settings and have rebuilt my Docker container and then verified that the new settings appear when I run SHOW VARIABLES; I just have good traffic later today and see if I notice anything different. Thanks.Hefner
@Hefner - Which 20 settings? Any number that is increased is likely to put you at risk of running out of the tiny 1GB of RAM.Mileage
These are the settings I'm trying now: docs.google.com/spreadsheets/d/…Hefner
PDO::prepare(): MySQL server has gone away has been happening nearly every minute now! I also just got a Slack notification that \DB::getQueryLog() logged a recent query of my automations table as having 'time' => 10.68. SELECT @@global.long_query_time; shows 2.0, but select * from mysql.slow_log; does not show that long query. SHOW FULL PROCESSLIST; results in docs.google.com/spreadsheets/d/…Hefner
I changed back to max_allowed_packet=300M thinking it would help because https://mcmap.net/q/351390/-pdo-mysql-server-has-gone-away and dev.mysql.com/doc/refman/5.7/en/gone-away.html, but I continue to see "PDO::prepare(): MySQL server has gone away" happening. I've stopped running ads because even though I haven't seen any symptoms, I'm worried that visitors are. My landing page conversion rate was far lower than usual.Hefner
Swapping makes mysql slooooooooooow.Mileage
D
2

I encountered the same situation on a long-running PHP CLI script (it listens on a Redis list ; each action is quick but the script basically runs for ever).

I create the PDO object and a prepared statement at the beginning, then reuse them afterwards.

The day after I started the script, I got the exact same errors:

PHP Warning:  Error while sending STMT_EXECUTE packet. PID=9438 in /...redacted.../myscript.php on line 39

SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

In my case, it's a development server, there is no load, MySQL is on the same box... so it's unlikely to come from external factors. It's most likely related to the fact I used the same MySQL connection for too long, and it timed out. And PDO doesn't bother, so any subsequent query will just return "MySQL server has gone away".

Checking the value of "wait_timeout" in MySQL:

mysql> show variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.06 sec)

mysql> show local variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.00 sec)

I see 28800 seconds = 8 hours, which seems coherent with the timing of my errors.

In my case, restarting the MySQL server, or setting wait_timeout very low, while keeping the same PHP worker running, makes it very easy to reproduce the issue.

Overall:

  • PDO doesn't care if the connection times out, and will not automatically reconnect. If you put a try/catch around your PDO queries, the script will never crash and keep using the obsolete PDO instance.
  • the STMT_EXECUTE warning is probably incidental ; just because the script whose connection timed out was using prepared statements, and the first query post-timeout happened to be using a prepared statement

To get back to your case

  • in theory Laravel 5 is immune to this issue: https://blog.armen.im/en/laravel-4-and-stmt_prepare-error/ ; do you use something other than Illuminate, or even bare PDO directly? Also, I'm not sure what Laravel does when it detects a lost connection (does it reconnect and rebuild prepared statements?), it might be worth digging further.
  • check your MySQL wait_timeout value, and increase it if it's too low
  • if it's not happening all the time, see if the errors correlate with server / DB load. High load can make things (especially big SQL queries) several times slower, to the point some other MySQL timeout like max_execution_time gets reached.
  • see if you wrapped PDO queries in a try / catch block and use it to retry the query ; it might be preventing the connection error from bubbling up.
Duval answered 2/4, 2019 at 8:5 Comment(2)
In the case of my long running workers, I worked around the issue thanks to this library: github.com/adamturcsan/reconnecting-pdo It wraps/extends PDO and transparently reconnects & rebuilds prepared statements if the connection was lost.Duval
+1 for the effort, but it's no longer a priority for me to try to solve this since inexplicably the error messages for #1 and #2 have stopped. I never like when bugs happen without me eventually understanding the root cause, but I'm focusing elsewhere now. I'll check out this answer more carefully if I see these errors again. Thanks.Hefner
B
0

If you see this message randomly, possible reasons:

  1. Your MySQL is behind a proxy, and they are using different timeout config.

  2. You are using PHP's persist connection.

You may try to dig into the problem by these steps:

  1. Make sure your connections to MySQL have long enough timeout (eg: proxy setting, MySQL's wait_timeout / interactive_timeout)

  2. Disable the persist connection at PHP side.

  3. Do some tcpdump if you can to see what happend when you got the error message.

Bondy answered 10/12, 2018 at 6:9 Comment(6)
https://mcmap.net/q/135071/-what-are-the-disadvantages-of-using-persistent-connection-in-pdo agrees that persistent connections aren't ideal. I checked my /etc/php/7.2/cli/php.ini, and it says mysqli.allow_persistent = On even though https://mcmap.net/q/910993/-php-ini-mysqli-allow_persistent-won-39-t-turn-off-even-after-php-restart. But Laravel doesn't use persistent connections anyway: fideloper.com/laravel-pdo-connection-options. I'm not sure what "Your MySQL is behind a proxy" means. My production server does use Docker containers. I'm using laradock.io. Thanks.Hefner
I just added a 500 bounty. Any new ideas? Thanks.Hefner
@Hefner If there is no slow query in log, but your client (php) waits too long, there is usually a network problem. MySQL only record slow query in log, but if your network is laggy, MySQL won't treat that lag as slow. I think tcpdump would help, it will show not only the network problem, but also what happened with the MySQL protocol.Bondy
@Hefner some tips about how to tcpdump on a busy site: you can write a script to do tcpdump every minute and save to individual files with timestamp in filename, it can reduce the captured data size. And when you see something wrong, just look into the captured data at that time.Bondy
I almost got this to work: cd /apps/tcpdumps && sudo bash -c "nohup tcpdump -i eth0 -G 300 -W 3 -w trace-%dd-%Hh-%Mm.pcap &" but it seems to stay open in my terminal (doesn't give me the cursor back on the next line), so I must be doing it wrong. Also, I'm not sure what I'd be looking for in the pcap files anyway.Hefner
You can try to sudo bash first to get a root shell and run your scripts, and you may need a while loop in your script.Bondy
E
0

I had an interesting observation on my CentOS machine, running PHP7.2x, Apache 2.4.6. After changing the host in my CodeIgniter config from '127.0.0.1' to 'localhost' the problem disappeared.

So change:

'hostname' => '127.0.0.1' to 'hostname' => 'localhost'

I reverted the configuration a couple of times to double check, but somehow this does the trick all the time...

Elboa answered 3/1, 2021 at 20:56 Comment(1)
127.0.0.1!= localhost. If you use localhost, mysql will use a Unix socket for the connection. With 127.0.0.1 it will be a tcp/ip connection (see dev.mysql.com/doc/refman/5.7/en/can-not-connect-to-server.html). According to the documentation, the performance difference should be minimal. Im not an expert on this topic, but if this solves your issue it might indicate a problem with the tcp/ip stackElect

© 2022 - 2025 — McMap. All rights reserved.