P

41

134

All of the sudden I've been having problems with my application that I've never had before. I decided to check the Apache's error log, and I found an error message saying "zend_mm_heap corrupted". What does this mean.

OS: Fedora Core 8 Apache: 2.2.9 PHP: 5.2.6

Phosphate answered 11/2, 2010 at 21:49 Comment(4)

I used USE_ZEND_ALLOC=0 to get the stacktrace in the error log And found the bug /usr/sbin/httpd: corrupted double-linked list, I found out that commenting out the opcache.fast_shutdown=1 worked for me. – Kidwell 18/6, 2015 at 15:50

Yes, same here. Also see another report further below https://mcmap.net/q/167385/-what-does-quot-zend_mm_heap-corrupted-quot-mean – Muscat 23/3, 2016 at 2:10

I had the same thing using Laravel. I injected a class into the constructor of another class. The class I was injecting, was injecting the class it was injected into, basically creating a circular reference causing the heap issue. – Tired 12/1, 2017 at 9:18

Restart the Apache server for quickest and temporary solutions :) – Quake 5/6, 2017 at 9:53

M

59

After much trial and error, I found that if I increase the output_buffering value in the php.ini file, this error goes away

Mcburney answered 15/12, 2010 at 19:16 Comment(10)

Increase to what? Why would this change make this error go away? – Parol 3/4, 2012 at 17:22

@Parol this answer helps explain what output_buffering is and why increasing it can help https://mcmap.net/q/46091/-what-is-output-buffering-in-php – Monteria 29/5, 2012 at 19:46

@Monteria I know what ob is, I was wondering about the specific details that were left out of dsmithers' answer, as I was having the same error message as the op. For closure: it turned out my problem was a misconfigured setting pertaining to memcached. Thanks, though! – Parol 30/5, 2012 at 20:41

@Parol what misconfigured setting? – Duston 10/6, 2012 at 18:17

@KyleCronin our service platform uses Memcache in production. However, some single instances -- non-production/sandbox, customer one-offs -- do not use memcache. In the latter case, I had a configuration copied from production to a customer one-off, and the memcache configuration indicated a memcache server URI that was not available in that environment. I deleted the line and disabled memcache in the app, and the problem went away. So, long story short, a very specific problem encountered in a specific environment, that might not be generally applicable. But, since you asked... – Parol 9/8, 2012 at 15:50

@Parol Thanks for following up, I was getting the same error. Don't remember how I fixed it though... – Duston 9/8, 2012 at 17:55

As for me, the solution was @Justin MacLeod anwser. I already had output buffer enabled and increasing it's size didnt help. – Oldham 4/9, 2014 at 20:36

I'm getting this problem any time I call an OpenSSL method. For some environments, raising output_buffering fixes it, for some it doesn't. This is a really frustrating error to troubleshoot. – Piled 11/7, 2016 at 19:53

This should not be increased, first need to check code why large chunks are sent :P – Lightening 30/11, 2019 at 5:4

I get this error when a conditional breakpoint is used with XDebug no-debug-non-zts-20180731 and VS Code 1.59.1, similar to bugs.xdebug.org/1647. – Jamisonjammal 27/8, 2021 at 13:48

T

61

This is not a problem that is necessarily solvable by changing configuration options.

Changing configuration options will sometimes have a positive impact, but it can just as easily make things worse, or do nothing at all.

The nature of the error is this:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
    void **mem = malloc(sizeof(char)*3);
    void *ptr;

    /* read past end */
    ptr = (char*) mem[5];   

    /* write past end */
    memcpy(mem[5], "whatever", sizeof("whatever"));

    /* free invalid pointer */
    free((void*) mem[3]);

    return 0;
}

The code above can be compiled with:

gcc -g -o corrupt corrupt.c

Executing the code with valgrind you can see many memory errors, culminating in a segmentation fault:

krakjoe@fiji:/usr/src/php-src$ valgrind ./corrupt
==9749== Memcheck, a memory error detector
==9749== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==9749== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==9749== Command: ./corrupt
==9749== 
==9749== Invalid read of size 8
==9749==    at 0x4005F7: main (an.c:10)
==9749==  Address 0x51fc068 is 24 bytes after a block of size 16 in arena "client"
==9749== 
==9749== Invalid read of size 8
==9749==    at 0x400607: main (an.c:13)
==9749==  Address 0x51fc068 is 24 bytes after a block of size 16 in arena "client"
==9749== 
==9749== Invalid write of size 2
==9749==    at 0x4C2F7E3: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9749==    by 0x40061B: main (an.c:13)
==9749==  Address 0x50 is not stack'd, malloc'd or (recently) free'd
==9749== 
==9749== 
==9749== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==9749==  Access not within mapped region at address 0x50
==9749==    at 0x4C2F7E3: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9749==    by 0x40061B: main (an.c:13)
==9749==  If you believe this happened as a result of a stack
==9749==  overflow in your program's main thread (unlikely but
==9749==  possible), you can try to increase the size of the
==9749==  main thread stack using the --main-stacksize= flag.
==9749==  The main thread stack size used in this run was 8388608.
==9749== 
==9749== HEAP SUMMARY:
==9749==     in use at exit: 3 bytes in 1 blocks
==9749==   total heap usage: 1 allocs, 0 frees, 3 bytes allocated
==9749== 
==9749== LEAK SUMMARY:
==9749==    definitely lost: 0 bytes in 0 blocks
==9749==    indirectly lost: 0 bytes in 0 blocks
==9749==      possibly lost: 0 bytes in 0 blocks
==9749==    still reachable: 3 bytes in 1 blocks
==9749==         suppressed: 0 bytes in 0 blocks
==9749== Rerun with --leak-check=full to see details of leaked memory
==9749== 
==9749== For counts of detected and suppressed errors, rerun with: -v
==9749== ERROR SUMMARY: 4 errors from 3 contexts (suppressed: 0 from 0)
Segmentation fault

If you didn't know, you already figured out that mem is heap allocated memory; The heap refers to the region of memory available to the program at runtime, because the program explicitly requested it (with malloc in our case).

If you play around with the terrible code, you will find that not all of those obviously incorrect statements results in a segmentation fault (a fatal terminating error).

I explicitly made those errors in the example code, but the same kinds of errors happen very easily in a memory managed environment: If some code doesn't maintain the refcount of a variable (or some other symbol) in the correct way, for example if it free's it too early, another piece of code may read from already free'd memory, if it somehow stores the address wrong, another piece of code may write to invalid memory, it may be free'd twice ...

These are not problems that can be debugged in PHP, they absolutely require the attention of an internals developer.

The course of action should be:

Open a bug report on http://bugs.php.net
- If you have a segfault, try to provide a backtrace
- Include as much configuration information as seems appropriate, in particular, if you are using opcache include optimization level.
- Keep checking the bug report for updates, more information may be requested.
If you have opcache loaded, disable optimizations
- I'm not picking on opcache, it's great, but some of it's optimizations have been known to cause faults.
- If that doesn't work, even though your code may be slower, try unloading opcache first.
- If any of this changes or fixes the problem, update the bug report you made.
Disable all unnecessary extensions at once.
- Begin to enable all your extensions individually, thoroughly testing after each configuration change.
- If you find the problem extension, update your bug report with more info.
Profit.

There may not be any profit ... I said at the start, you may be able to find a way to change your symptoms by messing with configuration, but this is extremely hit and miss, and doesn't help the next time you have the same zend_mm_heap corrupted message, there are only so many configuration options.

It's really important that we create bugs reports when we find bugs, we cannot assume that the next person to hit the bug is going to do it ... more likely than not, the actual resolution is in no way mysterious, if you make the right people aware of the problem.

USE_ZEND_ALLOC

If you set USE_ZEND_ALLOC=0 in the environment, this disables Zend's own memory manager; Zend's memory manager ensures that each request has it's own heap, that all memory is free'd at the end of a request, and is optimized for the allocation of chunks of memory just the right size for PHP.

Disabling it will disable those optimizations, more importantly it will likely create memory leaks, since there is a lot of extension code that relies upon the Zend MM to free memory for them at the end of a request (tut, tut).

It may also hide the symptoms, but the system heap can be corrupted in exactly the same way as Zend's heap.

It may seem to be more tolerant or less tolerant, but fix the root cause of the problem, it cannot.

The ability to disable it at all, is for the benefit of internals developers; You should never deploy PHP with Zend MM disabled.

Touchmenot answered 1/4, 2016 at 7:37 Comment(6)

So the underlying problem could be which version of PHP you're running? – Incision 20/4, 2016 at 16:21

@Incision Yes, as well as versions of all extensions, as the warning may arise from an extension. – Insurgency 20/12, 2016 at 18:16

This answer seems to be the best one for me. I've personally experienced the problem a few times and it was always related to a faulty extension (in my case, the Enchant spelling library). Other than php itself, it could also be a bad environment (lib version mismatch, wrong dependencies, etc.) – Ecclesiastic 31/7, 2018 at 5:34

By far, the best answer for this question, and for many other similar questions as well – Epidote 4/8, 2018 at 8:41

This answer is indeed instructive but I believe it's not the job of an application developper to debug the server core. Indeed it's way more easy if you have a full stack trace but what's next ? ask to fix it on a pull request ? Not everyone is devops or able to understand low level language like C. The opposite is true too. So in the end I believe it would be much easier is the developpers would not make memory management errors in the first place. Which as you suggest is kinda common with opcache, but not surprisingly not with all the modules, because you know some dev know how to dev. – Stockjobber 20/8, 2019 at 18:15

I didn't suggest that the developer should debug the problem. I gave an explanation of the problem in easy to understand code and words, and advised them to create a bug report, and lastly gave them advice about creating and maintaining a useful bug report. The only thing to do here is create a bug report, messing with settings, extensions, versions, and environment variables is just terrible guesswork; Someone can fix the problem in two seconds, you don't need to debug it, or be a C guru, or even know how GDB works, just send a mail (report) to the right person and the problem goes away. – Touchmenot 21/8, 2019 at 4:9