This is not a problem that is necessarily solvable by changing configuration options.
Changing configuration options will sometimes have a positive impact, but it can just as easily make things worse, or do nothing at all.
The nature of the error is this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void) {
void **mem = malloc(sizeof(char)*3);
void *ptr;
/* read past end */
ptr = (char*) mem[5];
/* write past end */
memcpy(mem[5], "whatever", sizeof("whatever"));
/* free invalid pointer */
free((void*) mem[3]);
return 0;
}
The code above can be compiled with:
gcc -g -o corrupt corrupt.c
Executing the code with valgrind you can see many memory errors, culminating in a segmentation fault:
krakjoe@fiji:/usr/src/php-src$ valgrind ./corrupt
==9749== Memcheck, a memory error detector
==9749== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==9749== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==9749== Command: ./corrupt
==9749==
==9749== Invalid read of size 8
==9749== at 0x4005F7: main (an.c:10)
==9749== Address 0x51fc068 is 24 bytes after a block of size 16 in arena "client"
==9749==
==9749== Invalid read of size 8
==9749== at 0x400607: main (an.c:13)
==9749== Address 0x51fc068 is 24 bytes after a block of size 16 in arena "client"
==9749==
==9749== Invalid write of size 2
==9749== at 0x4C2F7E3: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9749== by 0x40061B: main (an.c:13)
==9749== Address 0x50 is not stack'd, malloc'd or (recently) free'd
==9749==
==9749==
==9749== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==9749== Access not within mapped region at address 0x50
==9749== at 0x4C2F7E3: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9749== by 0x40061B: main (an.c:13)
==9749== If you believe this happened as a result of a stack
==9749== overflow in your program's main thread (unlikely but
==9749== possible), you can try to increase the size of the
==9749== main thread stack using the --main-stacksize= flag.
==9749== The main thread stack size used in this run was 8388608.
==9749==
==9749== HEAP SUMMARY:
==9749== in use at exit: 3 bytes in 1 blocks
==9749== total heap usage: 1 allocs, 0 frees, 3 bytes allocated
==9749==
==9749== LEAK SUMMARY:
==9749== definitely lost: 0 bytes in 0 blocks
==9749== indirectly lost: 0 bytes in 0 blocks
==9749== possibly lost: 0 bytes in 0 blocks
==9749== still reachable: 3 bytes in 1 blocks
==9749== suppressed: 0 bytes in 0 blocks
==9749== Rerun with --leak-check=full to see details of leaked memory
==9749==
==9749== For counts of detected and suppressed errors, rerun with: -v
==9749== ERROR SUMMARY: 4 errors from 3 contexts (suppressed: 0 from 0)
Segmentation fault
If you didn't know, you already figured out that mem
is heap allocated memory; The heap refers to the region of memory available to the program at runtime, because the program explicitly requested it (with malloc in our case).
If you play around with the terrible code, you will find that not all of those obviously incorrect statements results in a segmentation fault (a fatal terminating error).
I explicitly made those errors in the example code, but the same kinds of errors happen very easily in a memory managed environment: If some code doesn't maintain the refcount of a variable (or some other symbol) in the correct way, for example if it free's it too early, another piece of code may read from already free'd memory, if it somehow stores the address wrong, another piece of code may write to invalid memory, it may be free'd twice ...
These are not problems that can be debugged in PHP, they absolutely require the attention of an internals developer.
The course of action should be:
- Open a bug report on http://bugs.php.net
- If you have a segfault, try to provide a backtrace
- Include as much configuration information as seems appropriate, in particular, if you are using opcache include optimization level.
- Keep checking the bug report for updates, more information may be requested.
- If you have opcache loaded, disable optimizations
- I'm not picking on opcache, it's great, but some of it's optimizations have been known to cause faults.
- If that doesn't work, even though your code may be slower, try unloading opcache first.
- If any of this changes or fixes the problem, update the bug report you made.
- Disable all unnecessary extensions at once.
- Begin to enable all your extensions individually, thoroughly testing after each configuration change.
- If you find the problem extension, update your bug report with more info.
- Profit.
There may not be any profit ... I said at the start, you may be able to find a way to change your symptoms by messing with configuration, but this is extremely hit and miss, and doesn't help the next time you have the same zend_mm_heap corrupted
message, there are only so many configuration options.
It's really important that we create bugs reports when we find bugs, we cannot assume that the next person to hit the bug is going to do it ... more likely than not, the actual resolution is in no way mysterious, if you make the right people aware of the problem.
USE_ZEND_ALLOC
If you set USE_ZEND_ALLOC=0
in the environment, this disables Zend's own memory manager; Zend's memory manager ensures that each request has it's own heap, that all memory is free'd at the end of a request, and is optimized for the allocation of chunks of memory just the right size for PHP.
Disabling it will disable those optimizations, more importantly it will likely create memory leaks, since there is a lot of extension code that relies upon the Zend MM to free memory for them at the end of a request (tut, tut).
It may also hide the symptoms, but the system heap can be corrupted in exactly the same way as Zend's heap.
It may seem to be more tolerant or less tolerant, but fix the root cause of the problem, it cannot.
The ability to disable it at all, is for the benefit of internals developers; You should never deploy PHP with Zend MM disabled.
USE_ZEND_ALLOC=0
to get the stacktrace in the error log And found the bug/usr/sbin/httpd: corrupted double-linked list
, I found out that commenting out theopcache.fast_shutdown=1
worked for me. – Kidwell