Why does pcntl_fork() copy PHP objects?
Asked Answered
H

2

6

The manual for pcntl_fork() says:

The pcntl_fork() function creates a child process that differs from the parent process only in its PID and PPID.

However, running this simple test surprised me:

class Foo
{
    public function bar()
    {
        if (pcntl_fork()) {
            echo spl_object_hash($this), PHP_EOL;
        } else {
            echo spl_object_hash($this), PHP_EOL;
        }
    }
}

(new Foo)->bar();

The result looks like:

000000005ec7fd31000000003f0fcfe6
000000006b4cd5fc000000007fee8ab7

From what the documentation says, I would have expected the parent and the child to share the same variables, and in particular, when fork()ed from within an object, I would have expected the reference to the object to be the same in both processes. But the example above shows they're not.

Interesting to note, there is no cloning happening here, it looks like the object is just copied. If I add a __clone() function, I can see it's not called during the forking.

Any reason why the variables/objects are not shared by both processes, or any good reading on the subject folks?

Haroldson answered 2/4, 2013 at 17:30 Comment(3)
Well, the fork2() man page, which is suggested reading according to the PHP docs, goes into somewhat more details, including "The entire virtual address space of the parent is replicated in the child", as you encountered, "replicated in the child" is NOT the same as "shared with the child". PHP isn't built with applications that need that in mind, and although you can make your own elaborate mechanisms, something like Java or possibly asynchronous node.js seems more what you're after.Saith
I guess you're right, this explanation is similar to @hek2mgl's answer, and I've been misled by the fact that resources (that can obviously not be copied) are shared, I assumed that all variables etc. were shared as well.Haroldson
Depends on the resource I believe: open file-descriptors: yes, but not all 'resources' would be either shared or copied, it depends on their implementation in the php source.Saith
O
1

The object hash will not being calculated when the object is created (as one could think). The object hash will be calculated when spl_object_hash() is called the first time for the object. This is after fork in your example.

Further note that for the calculation of the hash some randomness is used, therefore the different hashes.

Oof answered 2/4, 2013 at 17:37 Comment(11)
But if the objects were the same, it does not matter when spl_object_hash() is called, right? The hashes should be identical, as far as I understand it.Haroldson
Note that fork creates two separate processes after fork all variables and memory is duplicatedOof
@LtWorf Have a look at the php source code. File: ext/spl/php_spl.c Line: 722Oof
If you called spl_object_hash before the fork, the random number it uses and generates on first call would be fixed and the same for both processes, and the output for the two objects would be the same value. But even then there is no sharing involved, the two processes have entirely separate memory spaces.Systematize
@Benjamin I think you have a wrong understanding of fork(). Fork fully duplicates the current process (at kernel level) duplicates all memory, environment and resources the forking process owns and important: The instruction pointer on CPU. After that two separate processes run from the same origin. Nothing is shared any longer. After you understand this you'll know that __clone() is off topic in this questionOof
I've run some extra tests, which seem to confirm what you say. It's very confusing that when spl_object_hash() is called before fork()ing, the hash is the same for both the parent & the child, even though they're in fact two different objects (same hash, but in two different address spaces).Haroldson
@Oof Yes, I've been confused by several tutorials giving the caveat that resources are shared by both processes, which can be dangerous. So correct me if I'm wrong, only resources are shared, all variables, objects etc. are copied, and parent & child do not share any reference apart from these resources?Haroldson
Let me try to explain further. The most naive process execution, which one is expecting is exec. Using exec a process will be loaded in memory and the instruction pointer is set to the first instruction. Then the process starts. But this is different with fork. The process will not being loaded in memory, a different already running processes memory will being duplicated instead. Also the instruction pointer of the process will being duplicated. This is done at kernel level. Now you have two fully independed processes will continue form the the same point.Oof
Ok, so the entire memory space of the php process is duplicated (which explains why variables, objects etc. are not shared), leaving only the resources opened before fork()ing shared by the parent & child?Haroldson
@Benjamin. No, they don't share variables and memory after this moment. But be careful with external resources. If you open a file for example before forking, both processes will write to the same file (of course). But if one process closes the file, the file will be still open for the other. Note that this is not the same for TCP connection's. For example a TCP connection to mysql. If one processes closes the connection, the other process won't being able to use the connection anymore.Oof
@Oof Thank you very much for your explanation, it's much clearer now!Haroldson
E
3

The reference to the object is the same in the forked process, because the memory location of the object in the child process's memory space is the same.

The hash is calculated as the object address XOR a random mask (which is generated only once) , as you can read in the PHP source code, ext/spl/php_spl.c:

PHPAPI void php_spl_object_hash(zval *obj, char *result TSRMLS_DC) /* {{{*/
{
    intptr_t hash_handle, hash_handlers;
    char *hex;

    if (!SPL_G(hash_mask_init)) {
        if (!BG(mt_rand_is_seeded)) {
            php_mt_srand(GENERATE_SEED() TSRMLS_CC);
        }    

        SPL_G(hash_mask_handle)   = (intptr_t)(php_mt_rand(TSRMLS_C) >> 1);
        SPL_G(hash_mask_handlers) = (intptr_t)(php_mt_rand(TSRMLS_C) >> 1);
        SPL_G(hash_mask_init) = 1; 
    }    

    hash_handle   = SPL_G(hash_mask_handle)^(intptr_t)Z_OBJ_HANDLE_P(obj);
    hash_handlers = SPL_G(hash_mask_handlers)^(intptr_t)Z_OBJ_HT_P(obj);

    spprintf(&hex, 32, "%016x%016x", hash_handle, hash_handlers);

    strlcpy(result, hex, 33); 
    efree(hex);
}
/* }}} */

If the random number generator was seeded before the function was called you would get the exact same output for both the child and the parent process. But in this case it isn't, and each process calculates it own seed. The code for GENERATE_SEED goes:

#ifdef PHP_WIN32
#define GENERATE_SEED() (((long) (time(0) * GetCurrentProcessId())) ^ ((long) (1000000.0 * php_combined_lcg(TSRMLS_C))))
#else
#define GENERATE_SEED() (((long) (time(0) * getpid())) ^ ((long) (1000000.0 * php_combined_lcg(TSRMLS_C))))
#endif

As you can see, the seed depends on the process ID, which is of course different for the parent and the child.

So, different random number generator seed, different random mask, different hash.

Earplug answered 2/4, 2013 at 18:3 Comment(2)
Thanks for the extra information. I've now understood why the hashes are different. However, it looks like you're wrong regarding the two address spaces, which are different: if I fork in an object, make the parent assign a variable, and read it from the child, I don't get this value. Or maybe did I misunderstand your point?Haroldson
My point is that the address space of the child is an exact copy of the address space of the parent: all objects are at exactly the same addresses. Otherwise fork() would either break pointers or would have to rewrite them.Earplug
O
1

The object hash will not being calculated when the object is created (as one could think). The object hash will be calculated when spl_object_hash() is called the first time for the object. This is after fork in your example.

Further note that for the calculation of the hash some randomness is used, therefore the different hashes.

Oof answered 2/4, 2013 at 17:37 Comment(11)
But if the objects were the same, it does not matter when spl_object_hash() is called, right? The hashes should be identical, as far as I understand it.Haroldson
Note that fork creates two separate processes after fork all variables and memory is duplicatedOof
@LtWorf Have a look at the php source code. File: ext/spl/php_spl.c Line: 722Oof
If you called spl_object_hash before the fork, the random number it uses and generates on first call would be fixed and the same for both processes, and the output for the two objects would be the same value. But even then there is no sharing involved, the two processes have entirely separate memory spaces.Systematize
@Benjamin I think you have a wrong understanding of fork(). Fork fully duplicates the current process (at kernel level) duplicates all memory, environment and resources the forking process owns and important: The instruction pointer on CPU. After that two separate processes run from the same origin. Nothing is shared any longer. After you understand this you'll know that __clone() is off topic in this questionOof
I've run some extra tests, which seem to confirm what you say. It's very confusing that when spl_object_hash() is called before fork()ing, the hash is the same for both the parent & the child, even though they're in fact two different objects (same hash, but in two different address spaces).Haroldson
@Oof Yes, I've been confused by several tutorials giving the caveat that resources are shared by both processes, which can be dangerous. So correct me if I'm wrong, only resources are shared, all variables, objects etc. are copied, and parent & child do not share any reference apart from these resources?Haroldson
Let me try to explain further. The most naive process execution, which one is expecting is exec. Using exec a process will be loaded in memory and the instruction pointer is set to the first instruction. Then the process starts. But this is different with fork. The process will not being loaded in memory, a different already running processes memory will being duplicated instead. Also the instruction pointer of the process will being duplicated. This is done at kernel level. Now you have two fully independed processes will continue form the the same point.Oof
Ok, so the entire memory space of the php process is duplicated (which explains why variables, objects etc. are not shared), leaving only the resources opened before fork()ing shared by the parent & child?Haroldson
@Benjamin. No, they don't share variables and memory after this moment. But be careful with external resources. If you open a file for example before forking, both processes will write to the same file (of course). But if one process closes the file, the file will be still open for the other. Note that this is not the same for TCP connection's. For example a TCP connection to mysql. If one processes closes the connection, the other process won't being able to use the connection anymore.Oof
@Oof Thank you very much for your explanation, it's much clearer now!Haroldson

© 2022 - 2024 — McMap. All rights reserved.