Why does PHP use opcode caches while Java compiles to bytecode files?
Asked Answered
A

5

38

From my point of view, both PHP and Java have a similar structure. At first you write some high-level code, which then must be translated in a simpler code format to be executed by a VM. One difference is, that PHP works directly from the source code files, while Java stores the bytecode in .class files, from where the VM can load them.

Nowadays the requirements for speedy PHP execution grow, which leads people to believe that it would be better to directly work with the opcodes and not go through the compiling step each time a user hits a file.

The solution seem to be a load of so called Accelerators, which basically store the compiled results in cache and then use the cached opcodes instead of compiling again.

Another approach, done by Facebook, is to completely compile the PHP code to a different language.

So my question is, why is nobody in the PHP world doing what Java does? Are there some dynamic elements that really need to be recompiled each time or something like that? Otherwise it would be really smarter to compile everything when the code goes into production and then just work with that.

Aryn answered 23/5, 2012 at 8:58 Comment(1)
Swings and roundabouts. One produces faster runtimes, the other can be deployed quickly.Jowl
S
52

The most important difference is that the JVM has an explicit specification that covers the bytecode completely. That makes bytecode files portable and useful for more than just execution by a specific JVM implementation.

PHP doesn't even have a language specification. PHP opcodes are an implementation detail of a specific PHP engine, so you can't really do anything interesting with them and there's little point in making them more visible.

Saltus answered 23/5, 2012 at 9:17 Comment(4)
But would my Ubuntu LAMP system produce different opcode from yours or from a Windows box?Aryn
@erikb - unlikely but it could, the Zend engine (although by far the most popular) isn't the only option, and there are also options such as Caucho Resin or IBM's Project Zero for converting your PHP scripts to Java bytecode rather than PHP bytecodesSoutine
@erikb: there's nothing that says they have to produce the same opcodes. They probably do if both run the same version of the Zend engine, but even disregarding alternative implementations, as soon as the versions differ, I wouldn't want to depend on it.Saltus
Er, so what? CPython opcodes are an implementation detail of a specific Python engine, but the existence of precompiled bytecode files is a fundamental part of interacting with CPython.Onstad
B
12

PHP opcodes are not the same as Java classfiles. Java classfiles are well specified, and are portable between machines. PHP opcodes are not portable in any way. They have memory addresses baked into them, for example. They are strictly an implementation detail of the PHP interpreter, and shouldn't be considered anything like Java bytecode.

Does it have to be this way? No, probably not. But the PHP source code is a mess, and there is neither the desire, nor the political will in the PHP internals community to make this happen. I think there was talk of baking an opcode cache into PHP 6, but PHP 6 died, and I don't know the status of that idea.

Reference: I wrote phc so I was pretty knee deep in PHP implementation/compilation for a few years.

Belovo answered 29/5, 2012 at 20:39 Comment(0)
S
4

It's not quite true that nobody in the PHP world is doing what java does. Projects such as Alexey Zakhlestin's appserver provide a degree of persistence more akin to a java servlet container (though his inspiration is more Ruby’s Rack and Python’s WSGI than Java)

Soutine answered 23/5, 2012 at 9:31 Comment(0)
E
3

PHP does not use a standard mechanism for opcodes. I wish it either stuck to a stack VM (python,java) or a register VM (x86, perl6 etc). But it uses something absolutely homegrown and there in lies the rub.

It uses a connected list in memory which results in each opcode having a ->op1 ->op2 and ->result. Now each of those are either constants or entries in a temp table etc. These pointers cannot be serialized in any sane fashion.

Now, people have accomplished this using items like pecl/bcompiler which does dump the stream into the disk.

But the classes make this even more complicated, which means that there are potential code fragments like

if(<conditon>)
{
   class XYZ() { }
}
else 
{
   class XYZ() { }
}

class ABC extends XYZ {}

Which means that a large number of decisions about classes & functions can only be done at runtime - something like Java would choke on two classes with the same name, which are defined conditionally at runtime. Basically, APC's inheritance & class caching code is perhaps the most complicated & bug-prone part of the codebase. Whenever a class is cached, all parent inherited members have to be scrubbed out before it can be saved to the opcode cache.

The pointer problem is not insurmountable. There is an apc_bindump which I have never bothered to fix up to load entire cache entries off disk directly whenever a restart is done. But it's painful to debug all that to get something that still needs to locate all system pointers - the apache case is too easy, because all php processes have the same system pointers because of the fork behaviour. The old fastcgi versions were slower because they used to fork first & init php later - the php-fpm fixed that by doing it the other way around.

But eventually, what's really missing in PHP is the will to invent a bytecode format, throw away the current engine & all modules - to rewrite it using a stack VM & build a JIT. I wish I had the time - the fb guys are almost there with their hiphop HHVM. Which sacrifies eval() for faster performance - which is a fair sacrifice :)

PS: I'm the guy who can't find time to update APC for 5.4 properly

Euphonic answered 1/6, 2012 at 16:14 Comment(0)
S
3

I think all of you are misinformed. HHVM is not a compiler to another languague is a virtual machine itself. The confusion is because facebook use to compile to c++, but that approach was to slowly for the requirements of the developers (ten minutes compiling only for test some tiny things).

Snap answered 29/12, 2013 at 15:44 Comment(1)
Please post this kind of additional information about another answer as a comment to that answer. If you post such information as an answer, it will be down voted at some point, because it doesn't answer the question asked.Aryn

© 2022 - 2024 — McMap. All rights reserved.