How to compare two strings which can have null character \0 in between in C++?
Asked Answered
B

2

6

I want to compare two strings s1 and s2 and both strings can have null characters in between. I want both case sensitive and insensitive compare like strcmp and strcasecmp. Suppose my strings are:

std::string s1="Abcd\0abcd"
std::string s2="Abcd\0cccc"

Currently, I'm doing strcmp(s1.c_str(), s2.c_str()) and strcasecmp(s1.c_str(), s2.c_str()) but strcasecmp and strcmp end up giving equal in this case and skip the comparison after \0. Any libraries I can use to compare these strings.

Bridle answered 13/9, 2023 at 6:29 Comment(8)
Well, SOMEONE has to tell it the actual length. There's no way to tell that. memcmp can compare two arbitrary memory buffers.Hokeypokey
@TimRoberts I can pass the length, memcmp works well for case sensitive comparison, anything for case insensitive as well?Bridle
Why do you need this? What is your use case?Windowshop
Where are these "strings" coming from? Why do you need to compare them?Erik
Just convert to lowercase if you want case-insensitive compareSundstrom
And std::string constructed using literals already does the jobSundstrom
That is four strings, two without a symbol to directly reference them. Looks like an X-Y problem. Whatever you are trying to do, ask about that, because there is likely a better solution. Moreover, given this is C++ "string" is ambiguous. Are s1, s2 const char*, char[] or std::string?Haemoid
You probably want "Abcd\0abcd"s, actually, your std::string doesn't have \0 in the middle.Renault
S
11

Case-sensitive comparison

Case-sensitive comparison is simple. However, we need to use the sv literal to make a std::string_view that can contain null characters. Some of its constructors could also handle it, but no solution is as concise as sv literals (or s for std::string).

using namespace std::string_view_literals;

// prior to C++17, you can use std::string and "..."s
std::string_view s1 = "Abcd\0abcd"sv;
std::string_view s2 = "Abcd\0cccc"sv;

bool eq = s1 == s2; // false

std::string_view already doesn't care about null characters in the string, so you can use the overloaded == operator.

In general, you should avoid C functions like strcmp; there are much better alternatives in C++ that don't require null-terminated strings.

Case-insensitive comparison

Case-insensitive comparison is slightly more difficult, but can be easily done with std::ranges::equal or std::equal.

#include <cctype>    // std::tolower
#include <algorithm> // std::ranges::equal or std::equal

// C++20
bool eq = std::ranges::equal(s1, s2, [](unsigned char a, unsigned char b) {
    return std::tolower(a) == std::tolower(b);
});
// legacy
bool eq = std::equal(std::begin(s1), std::end(s1), std::begin(s2), std::end(s2),
    [](unsigned char a, unsigned char b) {
        return std::tolower(a) == std::tolower(b);
    });

Note: it's important that the lambda accepts unsigned char, not char; std::tolower doesn't work properly if we input negative values, and char may be negative.

Note: std::tolower doesn't handle unicode strings. See also Case-insensitive string comparison in C++ for more robust solutions.

Seedman answered 13/9, 2023 at 6:38 Comment(0)
L
2

This:

std::string s1="Abcd\0abcd"

Will result in std::string getting assigned "Abcd" since the assignment will stop at the first null char in the string literal.

This will include the full binary string with the null chars that appear in the middle.

std::string s1("Abcd\0abcd", 9);
std::string s2=("Abcd\0cccc", 9);

Then you can do:

if (s1 < s2) {
}

Or case insensitive:

auto s1lower = s1;
std::transform(s1lower.begin(), s1lower.end(), s1lower.begin(),
[](char c){ return std::tolower(c); });

auto s2lower = s2;
std::transform(s2lower.begin(), s2lower.end(), s2lower.begin(),
[](char c){ return std::tolower(c); });

if (s1lower < s2lower) {
    ...
}
Loculus answered 13/9, 2023 at 6:55 Comment(1)
s/char/unsigned char/ to avoid possible UB :-(Renault

© 2022 - 2024 — McMap. All rights reserved.