how does PHP opcode relate to the actually executed binary code?
Asked Answered
H

2

14

test.php as plain text:

<?php
$x = "a";
echo $x;

test.php as opcode:

debian:~ php -d vld.active=1 -d vld.execute=0 -f test.php

Finding entry points
Branch analysis from position: 0
Return found
filename:       /root/test.php
function name:  (null)
number of ops:  5
compiled vars:  !0 = $x
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   2     0  >   EXT_STMT
         1      ASSIGN                                                   !0, 'a'
   3     2      EXT_STMT
         3      ECHO                                                     !0
   4     4    > RETURN                                                   1

branch: #  0; line:     2-    4; sop:     0; eop:     4
path #1: 0,

test.php as binary representation:

debian:~ php -d apc.stat=0 -r "
  require '/root/test.php'; 
  echo PHP_EOL; 
  echo chunk_split(bin2hex(
    apc_bin_dump(array('/root/test.php'))
  ),64);
"

(skipping test.php's echo-output)

    b110000001000000325dedaa64d801bca2f73027abf0d5ab67f3023901000000
    2c0000000a000000871000000300000000000000000000004c0000005b000000
    8a0200008a020000650000002f726f6f742f746573742e7068700002070f9c00
    00000000000000000000000000000000000000000000000000000000000100fa
    000000fe00000005000000050000007c02000001000000100000000100000000
    00000000000000ffffffff0000000000000000000000000000000000000000ff
    ffffffeb00000000000000000000000000000000000000ffffffff0000000000
    00000001000000000000002f726f6f742f746573742e7068700001000000204a
    3308080000000000000000000000000000000000000008000000000000000000
    0000000000000000000008000000000000000000000000000000000000000000
    00000200000065000000204a3308040000000000000001000000000000000000
    00001000000000000000100000000100000006000000010000007a0200000100
    00000100000006000000000000000200000026000000204a3308080000000000
    0000000000000000000000000000080000000000000000000000000000000000
    0000080000000000000000000000000000000000000000000000030000006500
    0000900f34080800000000000000000000000000000000000000100000000000
    0000100000000100000006000000080000000000000000000000000000000000
    0000000000000300000028000000204a33080800000000000000000000000000
    00000000000001000000010000002c70d7b6010000000100d7b6080000000000
    000000000000000000000000000000000000040000003e000000610088020000
    01000000bd795900780000000000000000000000000000000000000000000000
[ ... a lot of lines just containing 0s ... ]
    0000000000000038000000c30000007f0000007a010000830000007c0200008f
    0000003c000000400000004400000008

Now I want to find out more about how the opcode translates to the binary representation.

The edited and clarified question:

How is the opcode translated into the binary version? Can you see there the ASSIGN of 'a' to !0? Is in there somewhere the ECHO statement and what it outputs?

I found few patterns in the binary version that hint at a line by line representation of the opcode.

("2f726f6f742f746573742e706870" is the hexadecimal representation of "/root/test.php")

EDIT:

the hexadecimal representation reveals patterns when the the line-length is set to 4 bytes and compared between different programs.

...
00000002  // 2 seems to be something like the "line number"
00000065  // seems to increase by 1 for every subsequent statement.
00000040  // 
06330808  // seems to mark the START of a statement
00000000
00000000
00000000
00000000
00000001  //
00000012  // In a program with three echo statements,
03000007  // this block was present three times. With mild
00000001  // changes that seem to represent the spot where
00000006  // the output-string is located.
00000008  //
00000000
00000000
00000000
00000000
00000000
00000002  // 2 seems to be something like the "line number"
00000028  //
00000020  //
4a330808  // seems to mark the END of a statement
00000000
00000000
00000000
00000000
00000008  // repeating between (echo-)statements
00000000
00000000
00000000
00000000
00000008  // repeating between (echo-)statements
...

But my knowledge of how virtual machines work on such a level is too weak to be able to really analyze that propperly and link it to the C code.

EDIT:

Does PHP have a virtual machine like Java?

Is the Zend engine embeddable outside of PHP?

Hereld answered 1/12, 2011 at 11:58 Comment(0)
H
9

Great question...

UPDATE: opcodes are executed directly by the PHP Virtual Machine (the Zend Engine). It looks as though they're executed by different handler functions defined in ./Zend/zend_vm_execute.h

See the architecture of the Zend Engine for more info on how Zend opcodes are executed.

These resources might help a bit:

http://php.net/manual/en/internals2.opcodes.list.php

http://www.php.net/manual/en/internals2.opcodes.ops.php

Also, I'm going to checkout the PECL VLD Source for more clues...

http://pecl.php.net/package/vld

http://derickrethans.nl/projects.html#vld

Also, writing the authors of the VLD Pecl extension may help: Derick Rethans, Andrei Zmievski or Marcus Börger

Their email addresses are at the top of srm_oparray.c in the extension source.

UPDATE: found some more clues

In PHP 5.3.8, I found three leads for where the opcodes are executed:

./Zend/zend_execute.c:1270 
ZEND_API void execute_internal

./Zend/zend.c:1214:ZEND_API int zend_execute_scripts(int type TSRMLS_DC, zval **retval, int file_count, ...)
./Zend/zend.c:1236:                  zend_execute(EG(active_op_array) TSRMLS_CC);

./Zend/zend_vm_gen.php

I couldn't find the definition for zend_execute(), but I'm guessing it might be generated with ./zend_vm_gen.php

I think I found it...

./Zend/zend_vm_execute.h:42
ZEND_API void execute(zend_op_array *op_array TSRMLS_DC)

I could be wrong, but it looks like all of the opcode handlers are defined in ./Zend/zend_vm_execute.h too.

See ./Zend/zend_vm_execute.h:2413 for an example of what looks to the be "integer addition" opcode.

Hohenlinden answered 1/12, 2011 at 15:7 Comment(10)
already checked those resources; as I am using VLD above. They just seem to cover the transition from PHP code to opcode.Hereld
Updated... added email address suggestion.Hohenlinden
So you already checked out the source code? This would lead somewhere for sure. But I am not conversant with C, that's my handicap. All I found so far is that T_ECHO is assigned to the number 316.Hereld
Have you looked at the PHP source: ./Zend/README.ZEND_VMHohenlinden
These files look promising... ./Zend/zend_vm_opcodes.h ./Zend/zend_vm_def.hHohenlinden
Update: found some more clues in the PHP source.Hohenlinden
Update: found the definitions for Zend VM opcode handler functions.Hohenlinden
looks very interesting! but I don't see the link between the opcode and the binary version yet. This doesn't mean anything though, as my I have basically no clue about how those virtual machines work on such a level.Hereld
My understanding is that there is no binary version. The VM directly executes each instruction that in encoded as a Zend opcode.Hohenlinden
The Zend VM is part of the ZE.Hohenlinden
J
3

apc_bin_dump() returns the raw representation of an in-memory cache entry.

It returns the content of a apc_bd_t struct.

This struct is an array of apc_bd_entry_t with some checksums for error detection.

apc_bd_entry_t contains a apc_cache_entry_value_t.

You can look at apc_bin_dump and apc_bin_load internal functions to see how dump and load are made.

Jeaniejeanine answered 2/12, 2011 at 11:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.