What are the functions PHP which are said not to be "binary safe"? To which libraries these "non-binary safe" functions hand off the strings? And why?
Asked Answered
Q

3

10

I'm using Windows 10 Home Single Language Edition which is a 64-bit Operating System on my machine.

I've installed the most latest version of XAMPP which has installed PHP 7.2.7 on my machine.

I'm asking this question based on the excerpt taken from the PHP Manual :

The string in PHP is implemented as an array of bytes and an integer indicating the length of the buffer. It has no information about how those bytes translate to characters, leaving that task to the programmer. There are no limitations on the values the string can be composed of; in particular, bytes with value 0 (“NUL bytes”) are allowed anywhere in the string (however, a few functions, said in this manual not to be “binary safe”, may hand off the strings to libraries that ignore data after a NUL byte.)

I understand very well the difference between binary-safe and non-binary safe functions in PHP. I've following doubts in my mind. Please answer them in one-by-one fashion with appropriate explanation accompanied with suitable examples.

  • Is the phenomenon of "non-binary safe" and "binary-safe" functions present in PHP only because the entire PHP parser has been written in C language?
  • What are the differences between C and PHP in case of handling strings containing any value(including NUL byte)?
  • I want the complete lists of functions in PHP which are "non-binary safe" and which are "binary-safe".
  • Is the characteristic of "non-binary safe" and "binary-safe" applicable only to functions that manipulate over strings and not applicable to PHP functions that deal with other types in PHP?
  • Why do the non-binary safe functions hand off the strings to libraries?
  • Do the non-binary safe functions hand off the strings to libraries only when the string they are handling contains NUL byte?
  • What are those libraries to which these "non-binary safe" functions hand off the strings?
  • How these libraries handle the strings received from "non-binary safe" functions?
  • Do the "non-binary safe" functions work like "binary safe" functions after handing off the strings that contain NUL byte to some library?
Quell answered 23/6, 2018 at 11:18 Comment(1)
What do you mean with PHP? If you're referring to PHP as in the whole of XAMPP, that list you ask for is quite extensive.Breannabreanne
M
8

Like arkascha explained, the issue of "binary-safe" and "non-binary-safe" has nothing to do with the language.

Using a null byte (0x00) to indicate the end of the string is simpler (which is probably why C went with it), but the downside is you can't have a null byte anywhere in the string which is a big limitation if you have to be able to handle all kinds of data. Storing the length as a metadata part of a string is more complex, as shown by Pete, but it allows you to handle any kind of data.

Regarding which functions that are "binary-safe" or "non-binary-safe", just read the PHP Manual before using the functions. That's what I do. There is no need to construct a list because the PHP Manual already explains what you need to know about the functions, including if they are binary-safe or not.

Most of your post, I believe, is due to a misunderstanding of PHP Manual's explanation that you quoted, particularly this part:

however, a few functions, said in this manual not to be “binary safe”, may hand off the strings to libraries that ignore data after a NUL byte.

Let me try making it clearer by adding some of my own words:

however, a few functions, said in this manual not to be “binary safe”, are the functions that may hand off the strings to libraries that ignore data after a NUL byte.

So it really doesn't say "non-binary safe functions hand off the strings to libraries", this is a misinterpretation. What it means is "functions that may hand off the strings to libraries that ignore data after a NUL byte, are said in this manual as not binary-safe".

"Handing off to libraries" is just another way of saying "calling functions from other libraries". "Ignoring data after a NUL byte" is a behavior that is called not binary-safe.

Another way of putting it is:

A few functions in this manual are said not to be "binary safe" because they may call other functions that are also not "binary safe" (functions that ignore data after a NUL byte).

I hope this clears it up for you.

Musette answered 31/7, 2018 at 10:46 Comment(4)
Explain : Which entity ignores data after a NUL byte? Those "few functions", said in the manual not to be “binary safe” which may hand off the strings to libraries or the "libraries" themselves that receives the string data from "few functions" which are said in the manual not to be "binary safe"?Quell
In last statement you said that "may call other functions that are also not "binary safe" (functions that ignore data after a NUL byte)". As per your statement, if the so called libraries(i.e. other functions) also ignore data after a NUL byte then why do the few functions in the manual said not to be "binary safe" hand off the strings to them? If both the entities viz. the functions which hand off the string to libraries and the library functions ignore data after a NUL byte then what's the purpose of handing off the string to such useless libraries?Quell
@user2839497 Assuming that such functions are useless is another misinterpretation. We know that C uses null-terminated string, thus C has a number of such functions. If they are useless, then C itself won't be widely used. It won't be implemented for a wide range of hardware from mainframes to microcomputers. I could go on, but in short, they are not useless. So, your question becomes: "what's the purpose of handing off the string to such useful libraries?" An obvious answer is code reuse, using existing libraries means shorter development time and in some cases, faster applications.Musette
@Musette : I think the easiest way the statement can be put would be like this : "however, a few functions, said in this manual not to be “binary safe”, that ignore data after a NUL byte, may hand off the strings to libraries.". I think only 'non-binary safe' functions ignore data after a 'NUL byte' and they need help from additional libraries in order to consider data after a 'NUL byte'. So, such 'non-binary safe' functions may hand off the strings to libraries which doesn't ignore data after a 'NUL byte'. If you think the statement I suggested is appropriate then please do the change to answerJuline
L
4

Traditionally there are two ways to represent strings: by signaling the end of the string using a special character or by storing its length along with the string data. C uses the former; a string is a char-array with a null character at the end. However, this has the limitation that strings in C cannot use a null character anywhere else but at the end.

To overcome this limitation, the PHP engine uses this struct to represent a string:

struct _zend_string {
    zend_refcounted_h gc; /* refcount struct */
    zend_ulong        h;  /* hash value */
    size_t            len; /* length of string */
    char              val[1]; /* array of chars (using struct "hack") */
};

As you can see, the PHP devs chose to store the length of the string along with its data.

Now what happens if mix "binary safe" and "non-binary safe" functionality?

Consider the following piece of C code that may be used when writing a PHP extension:

zend_string *a = zend_string_init("a\0b", /* string length */ 3, 0);
zend_string *b = zend_string_init("a\0c", /* string length */ 3, 0);

if (strcmp(a->val, b->val) == 0) {
    php_printf("Strings are equal!");
}

What do you think will happen? This code outputs "Strings are equal!" while they clearly are not equal. Since strcmp does not take the length of strings into account, it is a non-binary safe function.

Most of C's standard library string functions can be classified as "non-binary safe" since it relies on the null termination character.

When dealing with zend_string in extension code, you should use the Zend string functions (zend_string_*) instead of C's string library.

To fix the previous code:

if (zend_string_equals(a, b)) {
    php_printf("Equal!");
} else {
    php_printf("Not equal");
}

This now correctly prints "Not equal".

Litchi answered 29/7, 2018 at 10:25 Comment(0)
A
2

The question whether a function processes runtime data in a "binary safe" way or not has nothing to do with the language the system has been implemented in. It is a question of how the data is handled. PHP is a high level language which means it has a high level implementation of a string type. That does not depend on a terminating null character as C relies on, instead the string type maintains meta data about the stored string which allows a much more flexible and robust implementation. That however has little to do with being "binary safe" or not.

The rest of your points cannot really be answered in a clear way. What libraries php uses itself depends on your setup, that is a dynamic environment. How potential libraries handle data handed over to them has again nothing to do with whether a php function can be considered "binary safe" - the library does not know about php, it only gets handed over data and processes that according to how the library is implemented.

Apex answered 23/6, 2018 at 11:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.