Why are there binary safe AND binary unsafe functions in php?
Asked Answered
S

3

10

Is there any reason for this behavior/implementation ?
Example:

$array = array("index_of_an_array" => "value");
class Foo {
    private $index_of_an_array;
    function __construct() {}   
}
$foo = new Foo();
$array = (array)$foo;
$key = str_replace("Foo", "", array_keys($array)[0]);
echo $array[$key];

Gives us an error which is complete:

NOTICE Undefined index: on line number 9

Example #2:

echo date("Y\0/m/d");

Outputs:

2016

BUT! echo or var_dump(), for example, and some other functions, would output the string "as it is", just \0 bytes are being hidden by browsers.

$string = "index-of\0-an-array";
$strgin2 = "Y\0/m/d";
echo $string;
echo $string2;
var_dump($string);
var_dump($string2);

Outputs:

index-of-an-array
"Y/m/d"
string(18) "index-of-an-array"
string(6) "Y/m/d"

Notice, that $string lenght is 18, but 17 characters are shown.

EDIT

From possible duplicate and php manual:

The key can either be an integer or a string. The value can be of any type. Strings containing valid integers will be cast to the integer type. E.g. the key "8" will actually be stored under 8. On the other hand "08" will not be cast, as it isn't a valid decimal integer. So in short, any string can be a key. And a string can contain any binary data (up to 2GB). Therefore, a key can be any binary data (since a string can be any binary data).

From php string details:

There are no limitations on the values the string can be composed of; in particular, bytes with value 0 (“NUL bytes”) are allowed anywhere in the string (however, a few functions, said in this manual not to be “binary safe”, may hand off the strings to libraries that ignore data after a NUL byte.)

But I still do not understand why the language is designed this way? Are there reasons for this behavior/implementation? Why PHP does'nt handle input as binary safe everywhere but just in some functions?

From comment:

The reason is simply that many PHP functions like printf use the C library's implementation behind the scenes, because the PHP developers were lazy.

Arent those such as echo, var_dump, print_r ? In other words, functions that output something. They are in fact binary safe if we take a look at my first example. Makes no sense to me to implement some binary-safe and binary-unsafe functions for output. Or just use some as they are in std lib in C and write some completely new functions.

Solvency answered 29/4, 2016 at 8:53 Comment(9)
Well, \0 represents end of a string... if you place it between double quotes, it will be interpreted. Did you try putting it into single quotes ?Threat
Possible duplicate of Characters allowed in php array keys?Sycee
@Uchiha check editSolvency
Because it's cheap and easy to write, because C can do most of it.Bubble
What do you mean by "output as PHP string"? The NUL character is printed, just as expected. Maybe your browser doesn't show it, but that has nothing to do with PHP. The only area where NUL characters aren't handled consistently are functions that aren't binary safe, like date.Rebbecarebbecca
@Rebbecarebbecca I might change question title to "Why are there binary safe and binary unsafe functions in php?" to make it more clear. char greeting[5] = {'H', 'e', 'l', '\0', 'o'}; printf(greeting); outputs Hel. Because of \0 in C. In php $str = "Hel\0o"; print($str) would output Helo / Hel\0o. But array index or some functions like date(), for example, reads data till they find \0. I just want to know why PHP does'nt handle input as binary safe everywhere but just 'there and there' ?Solvency
@ksno Yes, this would make a better question, although it's probably off-topic for StackOverflow. The reason is simply that many PHP functions like printf use the C library's implementation behind the scenes, because the PHP developers were lazy.Rebbecarebbecca
@Alnitak - what is the point of your comment? I stumbled upon here using Google, now I'm reading a 200k rep member bashing a language using no facts whatsoever. Should we, mere mortals, trust established members of SO when they ramble about without any facts or how does this work?Boysenberry
The issue of 'C' null terminated strings can be 'awkward' in routines that would be expected to use just the length. There can be issues with some of the older 'security' routines. Imagine someone uses a password that has a null byte in it? everything after the null byte will be ignored?Tabling
K
8

The short answer to "why" is simply history.

PHP was originally written as a way to script C functions so they could be called easily while generating HTML. Therefore PHP strings were just C strings, which are a set of any bytes. So in modern PHP terms we would say nothing was binary-safe, simply because it wasn't planned to be anything else.

Early PHP was not intended to be a new programming language, and grew organically, with Lerdorf noting in retrospect: "I don’t know how to stop it, there was never any intent to write a programming language […] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way."

Over time the language grew to support more elaborate string-processing functions, many taking the string's specific bytes into account and becoming "binary-safe". According to the recently written formal PHP specification:

As to how the bytes in a string translate into characters is unspecified. Although a user of a string might choose to ascribe special semantics to bytes having the value \0, from PHP's perspective, such null bytes have no special meaning. PHP does not assume strings contain any specific data or assign special values to any bytes or sequences.

As a language that has grown organically, there hasn't been a move to universally treat strings in a manner different from C. Therefore functions and libraries are binary-safe on a case-by-case basis.

Kropp answered 3/5, 2016 at 14:44 Comment(0)
C
1

Fist Example from Question

Your first example is a confusing because the error message is the part that's terminating on the null character not because the string is being handled incorrectly by the array. The original code you posted with the error message follows:

$array = array("index-of-an-array" => "value");
$string = "index-of\0-an-array";
echo $array[$string];

Notice: Undefined index: index-of in

Note, the error message above has been truncated index-of due to the null character, the array is working as expected because if you try it this way it will work just fine:

$array = array("index-of\0-an-array" => "value");
$string = "index-of\0-an-array";
echo $array[$string];

The error message correctly identified the that the two keys were wrong, which they are

"index-of\0-an-array" != "index-of-an-array"

The problem is that the error message printed out everything up to the null character. If that's the case then it might be considered a bug by some.

Second Example is starting plumb the depths of PHP :)

I've added some code to it so we can see what's happening

<?php
class Foo {
  public    $index_public;
  protected $index_prot;
  private   $index_priv;
  function __construct() {
    $this->index_public = 0;
    $this->index_prot   = 1;
    $this->index_priv   = 2;
  }   
}
$foo = new Foo();
$array = (array)$foo;
print_r($foo);
print_r($array);
//echo $array["\0Foo\0index_of_an_array2"];//This prints 2
//echo $foo->{"\0Foo\0index_of_an_array2"};//This fails
var_dump($array);
echo array_keys($array)[0]       . "\n";
echo $array["\0Foo\0index_priv"] . "\n";
echo $array["\0*\0index_prot"]   . "\n";

The above codes output is

Foo Object
(
    [index_public] => 0
    [index_prot:protected] => 1
    [index_priv:Foo:private] => 2
)
Array
(
    [index_public] => 0
    [*index_prot] => 1
    [Fooindex_priv] => 2
)
array(3) {
  'index_public' =>
  int(0)
  '\0*\0index_prot' =>
  int(1)
  '\0Foo\0index_priv' =>
  int(2)
}
index_public
2
1

The PHP developers choose to use the \0 character as a way to split member variable types. Note, protected fields use a * to indicate that the member variable may actually belong to many classes. It's also used to protect private access ie this code would not work.

echo $foo->{"\0Foo\0index_priv"}; //This fails

but once you cast it to an array then there is no such protection ie this works

echo $array["\0Foo\0index_priv"]; //This prints 2

Is there any reason for this behavior/implementation?

Yes. On any system that you need to interface with you need to make system calls, if you want the current time or to convert a date etc you need to talk to the operating system and this means calling the OS API, in the case of Linux this API is in C.

PHP was original developed as a thin wrapper around C quite a few languages start out this way and evolve, PHP is no exception.

Is there any reason for this behavior/implementation?

In the absence of any backwards compatibility issues I'd say some of the choices are less than optimal but my suspicion is that backwards compatibility is a large factor.

But I still do not understand why the language is designed this way?

Backwards compatibility is almost always the reason why features that people don't like remain in a language. Over time languages evolve and remove things but it's incremental and prioritized. If you had asked all the PHP developers do they want better binary string handling for some functions or a JIT compiler I think a JIT might win which it did in PHP 7. Note, the people doing the actual work ultimately decide what they work on and working on a JIT compiler is more fun than fixing libraries that do things in seemingly odd ways.

I'm not aware of a any language implementor that doesn't wish they'd done some things differently from the outset. Anyone implementing a compiler before a language is popular is under a lot of pressure to get something that works for them and that means cutting corners, not all languages in existence today had a huge company backing them, most often it was a small dedicated team and they made mistakes, some were lucky enough to get paid to do it. Calling them lazy is a bit unfair.

All language have dark corners warts and boils and features you'll eventually hate. Some more than others and PHP has a bad rep because it has/had a lot more than most. Note, PHP 5 is a vast leap forward from PHP 4. I'd imagine that PHP 7 will improve things even more.

Anyone that thinks their favorite language is free from problems is delusional and has almost certainly not plumbed the depths of the tool their using to any great depth.

Cressida answered 6/5, 2016 at 0:49 Comment(2)
Sorry for confusing example. Check my edit to see a real-life situation where you can face with \0 malformed strings. And what do you mean the error message is incomplete ?Solvency
Your original example actually highlighted a peculiar issue with the printing of the error message ie it was truncated on the \0 character. Note, there are reasons why that has likely not been fixed. The array was working fine.Cressida
T
0

Functions in PHP which internally operate with C strings are "not binary safe" in PHP terminology. C string is an array of bytes ending with byte 0. When a PHP function internally uses C strings, it reads one by one character and when it encounters byte 0 it considers it as an end of string. Byte 0 tells C string functions where is the end of string since C string does not contain any information about string length.

"Not binary safe" means that, if function which operates with C string is somehow handed a C string not terminated with byte 0, behavior is unpredictable because function will read/write bytes beyond end of the string, adding garbage to string and/or potentially crashing PHP.

In C++, for example, we have string object. This object also contains an array of characters, but it has also a length field which it updates on any length change. So it does not require byte 0 to tell it where the end is. This is why string object can contain any number of 0 bytes, although this is generally not valid since it should contain only valid characters.

In order for this to be corrected, the whole PHP core, including any modules which operate with C strings, need to be rewritten in order to send "non binary safe" functions to history. The amount of job needed for this is huge and all the modules' creators need to produce new code for their modules. This can introduce new bugs and instabilities into the whole story.

Issue with byte 0 and "non binary safe" functions is not that much critical to justify rewriting PHP and PHP modules code. Maybe in some newer PHP version where some things need to be coded from scratch it would make sense to correct this.

Until then, you just need to know that any arbitrary binary data put to some string by using binary-safe functions needs to have byte 0 added at the end. Usually you will notice this when there is unexpected garbage at end of your string or PHP crashes.

Thaxter answered 3/5, 2016 at 12:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.