Resource garbage collected too early
Asked Answered
G

1

9

I've created a PHP extension with SWIG and everything works fine, but I'm observing some strange garbage collection behavior when chaining method calls. For example, this works:

$results = $response->results();
$row = $results->get(0)->iterator()->next();
printf('%s %s' . "\n", $row->getString(0), $row->getString(1));

But this seg faults:

$row = $response->results()->get(0)->iterator()->next();
printf('%s %s' . "\n", $row->getString(0), $row->getString(1));

The only difference is that the first creates $results, while the second chains the calls together.

SWIG actually only exposes functions to PHP and generates PHP proxy classes to interact with them. These proxy classes basically hold a resource that is passed to each of the exposed functions along with whatever other arguments those functions would normally take. Thinking that maybe these proxy classes were the problem, I reworked the code to bypass them and instead use the exposed functions directly. As before, this works:

$results = InvocationResponse_results($response->_cPtr);
$row = TableIterator_next(Table_iterator(Tables_get($results, 0)));
printf('%s %s' . "\n", Row_getString($row, 0), Row_getString($row, 1));

And again, this seg faults:

$row = TableIterator_next(Table_iterator(Tables_get(InvocationResponse_results($response->_cPtr), 0)));
printf('%s %s' . "\n", Row_getString($row, 0), Row_getString($row, 1));

Again, the only difference is that the first creates $results, while the second chains the calls together.

At this point, I spent awhile debugging in gdb/valgrind and determined that the destructor for what InvocationResponse_results returns is called too early when chaining calls together. To observe, I inserted std::cout statements at the tops of the exposed C++ functions and their destructors. This is the output without chaining:

InvocationResponse_results()
Tables_get()
Table_iterator()
TableIterator_next()
__wrap_delete_TableIterator
Row_getString()
Row_getString()
Hola Mundo
---
__wrap_delete_InvocationResponse
__wrap_delete_Row
__wrap_delete_Tables

I printed --- at the end of the script to be able to differentiate what happens during the script's execution and what happens after. Hola Mundo is from printf. The rest is from C++. As you can see, everything gets called in the expected order. Destructors are only called after the script's execution, though the TableIterator destructor is called earlier than I would have expected. However, this has not caused any problems and is likely unrelated. Now compare this to the output with chaining:

InvocationResponse_results()
Tables_get()
__wrap_delete_Tables
Table_iterator()
TableIterator_next()
__wrap_delete_TableIterator
Row_getString()
Segmentation fault (core dumped)

Without the return value of InvocationResponse_results being saved into $results, it is happily garbage collected before execution even gets out of the call chain (between Tables_get and Table_iterator) and this quickly causes problems down the road, ultimately leading to a seg fault.

I also inspected reference counts using xdebug_debug_zval() in various places, but didn't spot anything unusual. Here is its output on $results and $row without chaining:

results: (refcount=1, is_ref=0)=resource(18) of type (_p_std__vectorT_voltdb__Table_t)
row: (refcount=1, is_ref=0)=resource(21) of type (_p_voltdb__Row)

And on $row with chaining:

row: (refcount=1, is_ref=0)=resource(21) of type (_p_voltdb__Row)

I've spent a couple days on this now and I'm just about out of ideas, so really any insight on how to go about solving this would be greatly appreciated.

Guiltless answered 19/8, 2010 at 20:44 Comment(8)
It's highly unlikely that anyone without psychic debugging powers is going to be able to figure this out. I suggest you put a breakpoint in _zend_list_delete and figure out why the calling code is deleting the resource. It may be the resource refcount hitting 0 or a direct delete.Talley
@Talley I peeked inside _zend_list_delete while __wrap_delete_Tables is being called and in both cases (no seg fault and seg fault), it is garbage collected because its refcount (--le->refcount) is -1.Guiltless
So find out why __wrap_delete_Tables is called at that specific time in one occasion but not in the other and continue going up.Talley
Probably the refcount of the associated zval is being manipulated in a different way in both cases. Setup a data breaking for its refcount field (refcount__gc in 5.3+).Talley
Finally, make sure you're using a debug version and when you're using valgrind it's useful to turn off the zend memory manager.Talley
@Talley I've been trying to set a watch point on the refcount field of the zval, but the one I'm setting a watch point on doesn't seem to be the same as the one that gets cleaned up further down the road. I'm setting the watch point when the resource is created in zend_list_insert() in zend_list.c. It initializes refcount to 1 there, but that doesn't seem to be the memory I want to watch. Any tips on how to go about setting the watch point correctly?Guiltless
One possible way is to put a writing data breakpoint on the refcount of the resource (different from the refcount of the zval). Of course, if the bug is caused precisely because a new zval for that resource is created without incrementing the resource refcount (missing call to zval_copy_ctor or, directly, _zend_list_addref), than the breakpoint won't catch it. Your best bet is to put a reading breakpoint on the value of the original zval in the hope that it's read when the shallow copy of the zval is created.Talley
Even that may not not work. For instance, you can create a resource, store its id, create a resource zval with it and then create another resource zval with the stored id, without copying from the first.Talley
E
1

This turned out to be part of the problem on a similar debug problem segfaulting. (what Artefacto said)

Euphroe answered 17/9, 2010 at 16:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.