Why $'\0' or $'\x0' is an empty string? Should be the null-character, isn't it?
Asked Answered
E

3

23

allows $'string' expansion. My man bash says:

Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Backslash escape sequences, if present, are decoded as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\ backslash
\' single quote
\" double quote
\nnn the eight-bit character whose value is the octal value nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal value HH (one or two hex digits)
\cx a control-x character

The expanded result is single-quoted, as if the dollar sign had not been present.

But why does not convert $'\0' and $'\x0' into a null character?
Is it documented? Is there a reason? (Is it a feature or a limitation or even a bug?)

$ hexdump -c <<< _$'\0'$'\x1\x2\x3\x4_'
0000000   _ 001 002 003 004   _  \n
0000007

echo gives the expected result:

> hexdump -c < <( echo -e '_\x0\x1\x2\x3_' )
0000000   _  \0 001 002 003   _  \n
0000007

My bash version

$ bash --version | head -n 1
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)

Why echo $'foo\0bar' does not behave as echo -e 'foo\0bar'?

Enemy answered 7/10, 2013 at 14:33 Comment(2)
Good question! Maybe it's a Posix thing? Good luck.Piscator
Thanks for all the answers. I have been having the same problem while using netcat to test the SGCI interface to a server. The SCGI header has NUL characters. After reading here, especially the suggestion to use pipes, I developed a workaround. I use octal 377 (ASCII 255) where the NUL characters need to be, and then pipe the string through tr just before piping it into netcat xmlreq='<?xml version="1.0" encoding="UTF-8"?><methodCall><methodName>system.client_version</methodName><params></params></methodCall>' scgihdr=CONTENT_LENGTH$'\377'${#xmlreq}$'\377'SCGI$'\377'1$'\377' echo -n ${#scgihdrCindacindee
F
27

It's a limitation. bash does not allow string values to contain interior NUL bytes.

Posix (and C) character strings cannot contain interior NULs. See, for example, the Posix definition of character string (emphasis added):

3.92 Character String

A contiguous sequence of characters terminated by and including the first null byte.

Similarly, standard C is reasonably explicit about the NUL character in character strings:

§5.2.1p2 …A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.

Posix explicitly forbids the use of NUL (and /) in filenames (XBD 3.170) or in environment variables (XBD 8.1 "... are considered to end with a null byte."

In this context, shell command languages, including bash, tend to use the same definition of a character string, as a sequence of non-NUL characters terminated by a single NUL.

You can pass NULs freely through bash pipes, of course, and nothing stops you from assigning a shell variable to the output of a program which outputs a NUL byte. However, the consequences are "unspecified" according to Posix (XSH 2.6.3 "If the output contains any null bytes, the behavior is unspecified."). In bash, the NULs are removed, unless you insert a NUL into a string using bash's C-escape syntax ($'\0'), in which case the NUL will end up terminating the value.

On a practical note, consider the difference between the two following ways of attempting to insert a NUL into the stdin of a utility:

$ # Prefer printf to echo -n
$ printf $'foo\0bar' | wc -c
3
$ printf 'foo\0bar' | wc -c
7
$ # Bash extension which is better for strings which might contain %
$ printf %b 'foo\0bar' | wc -c
7
Foppish answered 7/10, 2013 at 15:54 Comment(7)
Great info. Re: "nothing stops you from assigning a shell variable to the output of a program which outputs a NUL" - it's worth pointing out that the variable's value will invariably be cut off at the first NUL encountered. Re "if you insert a NUL into a string using one of bash's backslash escape sequences ($'\0'), it will end up terminating the value." - to clarify: inserting $'\0' into another string will not terminate the overall string, but simply ignore the $'\0'; e.g., a$'\0'b -> ab; a \0 inside $'…', however, will cut off that string there; e.g., $'a\0b' -> a.Illusage
@mklement0: Yeah, I got that wrong two years ago. Thanks. Fixed, now, I believe.Foppish
Thanks for updating. Re assigning command output to a variable: Given that bash variable values are internally stored as C strings, they can never contain NULs. However, it's worth distinguishing between (a) var=$(...), in which case, as you state, all NULs are simply stripped, so the value that is assigned by definition never contains NULs but contains all other characters, and (b) read -r d '' var < <(....), where the input may contain NULs, but read won't allow reading past the first NUL and the value is therefore cut off at the first NUL.Illusage
@mklement0: But stopping at the NUL is because you've told read to do so with -d ''. If a NUL is encountered by read and record terminator is a newline (or anything other than a NUL), then the behaviour is consistent with command substitution: the NUL is stripped but the string is not terminated. IOW, read will read past the NUL unless you've specified NUL as a terminator.Foppish
Thanks for pointing that out: it turns out that read stripping NULs was introduced in Bash 4.3.3. In bash 4.3.2 and below, read never reads past a NUL; for instance, in Bash 3.2.57 read -r var < <(printf 'a\0b') assigns just a to $var - anything after the NUL is dropped.Illusage
@mklement0: Ah, interesting. I think I'll stick with "unspecified" in this answer; read isn't really very relevant. (If it were, one would have to look at printf -v and other things, which are totally tangential to the OP.)Foppish
Understood. Just to close the tangent: printf -v cuts off at the first NUL (in all bash versions to date). Final thought: Your examples show the behavior well, but the wording of "unless you insert a NUL into a string using bash's C-escape syntax ($'\0')" still sounds like a'$'\0'b (-> ab) rather than the intended $'a\0b' (-> a) to me.Illusage
R
5

But why does bash not convert $'\0' and $'\x0' into a null character?

Because a null character terminates a string.

$ echo $'hey\0you'
hey
Riplex answered 7/10, 2013 at 14:54 Comment(0)
P
4

It is a null character, but it depends on what you mean by that.

The null character represents an empty string, which is what you get when you expand it. It is a special case and I think that is implied by the documentation but not actually stated.

In C binary zero '\0' terminates a string and on its own also represents an empty string. Bash is written in C, so it probably follows from that.

Edit: POSIX mentions a null string in a number of places. In the "Base definitions" it defines a null string as:

3.146 Empty String (or Null String)
A string whose first byte is a null byte.

Praefect answered 7/10, 2013 at 14:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.