Why does std::string("\x00") report length of 0?
Asked Answered
N

3

27

I have a function which needs to encode strings, which needs to be able to accept 0x00 as a valid 'byte'. My program needs to check the length of the string, however if I pass in "\x00" to std::string the length() method returns 0.

How can I get the actual length even if the string is a single null character?

Nobles answered 14/1, 2018 at 23:39 Comment(6)
Have a look at the available constructors and which is used in your case.Barrick
You could also try strlen("\x00"); for the same result.Couchant
See also: #48210711Victuals
Would you not be better to store a vector (other containers are available) of bytes instead of a string?Ilysa
@JackAidley the data is coming in as a string, once processed it is stored as a vector of bytes.Nobles
@BoPersson, strlen(3) is not a C++ function. It's a C legacy function, that does not know about c++ string type. You cannot use it with strings but by converting the string to a legacy C char * string. That way, strlen(3) doesn't know about array sizes, it only searches for the \0 char and returns the difference between the pointer passed to it and the place where it found the null char.Fatma
S
30

You're passing in an empty string. Use std::string(1, '\0') instead.

Or std::string{ '\0' } (thanks, @zett42)

Salado answered 14/1, 2018 at 23:44 Comment(6)
Shorter: std::string{ '\0' }Isagoge
Perhaps coincidentally this question just popped up.Shiftless
@zett42: since when exists a constructor of std::string with only a single char argument?Mantoman
@Mantoman Since C++11. It is the constructor that takes an initializer list (9).Isagoge
or std::string("\0", 1), or std::string{ '\0' }.Favorite
Can use ASCII digit codes for same effect: std::string{ 0 } or std::string(1, 0)Leadwort
P
47

std::string is perfectly capable of storing nulls. However, you have to be wary, as const char* is not, and you very briefly construct a const char*, from which you create the std::string.

std::string a("\x00");

This creates a constant C string containing only the null character, followed by a null terminator. But C strings don't know how long they are; so the string thinks it runs until the first null terminator, which is the first character. Hence, a zero-length string is created.

std::string b("");
b.push_back('\0');

std::string is null-clean. Characters (\0) can be the zero byte freely as well. So, here, there is nothing stopping us from correctly reading the data structure. The length of b will be 1.

In general, you need to avoid constructing C strings containing null characters. If you read the input from a file directly into std::string or make sure to push the characters one at a time, you can get the result you want. If you really need a constant string with null characters, consider using some other sentinel character instead of \0 and then (if you really need it) replace those characters with '\0' after loading into std::string.

Phocis answered 14/1, 2018 at 23:46 Comment(10)
"But C strings don't know how long they are" -- To be more precise, std::string("\x00") first creates a string literal of type const char[2], so at this point the size is still well known. Though this array decays into const char* which is passed to the std::string constructor. At this point the array size is "lost" because the constructor can only scan for the 1st \0 to determine the size. In theory std::string c'tor could have an overload for arrays, that would allow embedded \0 in string literal.Isagoge
@Isagoge such an overload would have to be templated, and be instantiated for each new length of the array — there's no other native way to pass sized arrays in C++.Azzieb
@Azzieb So? That would have been perfectly acceptable. Of course it would also have been silly, as it’d break the C string literal convention and thus violates the user’s expectation in most cases (nobody wants to find a null char in their string when initialising it as std::string("hi")).Stets
@KonradRudolph So, that would generate a new function for each size of string literals passed to std::string::string. Not perfect from code size perspective; you'd only have to hope the linker will omit these functions and the compiler inline their code into callers.Azzieb
@Azzieb Code inlining wouldn’t reduce code size. On the contrary, to help with generated code size the constructor template could dispatch to a size-erased non-generic function that isn’t inlined. But for this particular constructor, inlining probably works just fine, and results in the same code size regardless of whether you’d have a constructor template or a non-template constructor (since it’s inlined either way).Stets
@KonradRudolph inlining won't, but creation of additional callable instance of the function will.Azzieb
@Isagoge We already have std::string (const char* s, size_t n); overload, and it can handle dynamically allocated char arrays. Calling it as my_str("\0", ARRAYSIZE("\0")); is usually a minor inconvenience.Upshot
@Upshot Agreed. I wasn't saying that an array overload would actually make sense. I mentioned it solely to underline the fact that the compiler knows the size of a string literal, whether or not it has embedded \0s.Isagoge
I think you wanted to type b.push_back('\0'); instead of a.push_back('\0');. Apart from this, great answer!Text
@FabioTurati It's always the little things that slip by, isn't it? Thanks! :)Phocis
S
30

You're passing in an empty string. Use std::string(1, '\0') instead.

Or std::string{ '\0' } (thanks, @zett42)

Salado answered 14/1, 2018 at 23:44 Comment(6)
Shorter: std::string{ '\0' }Isagoge
Perhaps coincidentally this question just popped up.Shiftless
@zett42: since when exists a constructor of std::string with only a single char argument?Mantoman
@Mantoman Since C++11. It is the constructor that takes an initializer list (9).Isagoge
or std::string("\0", 1), or std::string{ '\0' }.Favorite
Can use ASCII digit codes for same effect: std::string{ 0 } or std::string(1, 0)Leadwort
L
24

With C++14, you can use a string literal operator to store strings with null bytes:

using namespace std::string_literals;

std::string a = "\0"s;
std::string aa = "\0\0"s; // two null bytes are supported too
Leventhal answered 15/1, 2018 at 9:50 Comment(1)
@sp2danny ... but may also bring in other, unwanted literal operators (e.g. from std::literals::chrono_literals).Favorite

© 2022 - 2024 — McMap. All rights reserved.