Is it legal to pass a non-null-terminated string to strncmp in C?
Asked Answered
A

2

45

I have an array of 16 bytes that holds the name of an executable's segment.

char segname[16];

If the segment name length is less than 16 bytes, then the rest is padded with null bytes. Otherwise, there is no terminating null byte.

I want to compare segname to various strings, e.g. __text.

Is it legal to call strncmp with a non-null-terminated string?

This post assumes it is legal. This source code makes it legal too. But my man's page says:

The strncmp() function lexicographically compares the null-terminated strings s1 and s2.

The size passed to strncmp will be the size of segname.

I'm wondering what I should refer to.

Appendicectomy answered 1/1, 2017 at 20:31 Comment(6)
An char array which is not '\0'-terminated is not a string!Turnbull
@Olaf Right, I will call it possibly null-terminated array next time.Appendicectomy
You missed my point! There is no string type in C. However, you should consider spending that extra char and always terminate the array. That allows to use other string functions. Better safe than sorry!Turnbull
Not sure which man-page you cite. But the POSIX man-page on Ubuntu says "The strncmp() function shall compare not more than n bytes (bytes that follow a null byte are not compared) from the array pointed to by s1 to the array pointed to by s2. " which should be clear and is directly from the C standard. Maybe you want to update your man-pages?Turnbull
@Olaf Actually I'm recoding a part of nm which parses binaries with mach-o format, and I don't have the choice, segname is part of the section struct in the mach-o header files. I would have spend an extra char otherwise. By string we mean null-terminated byte arrays even if there is no string type in C. Please check out the discussion below about the difference between read bytes and compared bytes`. Man pages on my computers (osx / ubuntu) both talk about strings or null-terminated arrays, and as man pages differ I expected a citation from the standard.Appendicectomy
It is not me having a problem with strncmp. It is always good to read the standard before asking a question. Or - as I wrote - to have a correct man-page available.Turnbull
N
67

According to the C99 standard, section 7.21.4.4, §3., it is legal:

The strncmp function returns an integer greater than, equal to, or less than zero, accordingly as the possibly null-terminated array pointed to by s1 is greater than, equal to, or less than the possibly null-terminated array pointed to by s2.

Notice, however, that it says array of characters. By definition, if an array of characters is not null-terminated, it is not a string.

Nocturn answered 1/1, 2017 at 20:37 Comment(12)
C99 has been superseded; C2011 is the current C standard. Its specification for strncmp() in section 7.24.4.4, does indeed omit any requirement that the arrays to be compared be null-terminated, however.Paulownia
It should be observed, however, that the omission of a requirement for null termination does not allow access outside the bounds of either input array. It is incumbent on the caller to ensure that the third argument is appropriate to avoid the function overrunning the bounds of either array. The caller risks invoking undefined behavior if he fails to do so.Paulownia
@JohnBollinger: When the Standard indicates that characters followiing a zero byte are not compared, should that to be taken to imply that an implementation will behave as though it does not read such characters? It would hardly seem unusual that an application might need to compare a zero-padded string [which might lack a trailing zero if it precisely fills its buffer] with a zero-terminated one [whose buffer might be shorter than that of the zero-padded string], and such a guarantee would be needed to make strncmp suitable for such purpose.Apthorp
That's an excellent question, @supercat. The standard doesn't explicitly say anything about what parts of the input arrays are read, only about which characters are compared. Since that applies also to the n-character limit, interpreting that function to give any guarantee useful for ensuring defined behavior requires taking it to mean that no characters are read other than those that are compared. Nevertheless, I consider it a best practice to specify the n argument no larger than the size of the smaller array.Paulownia
In the beginning of the section that explains <string.h>, the standard says that the argument n specifies the length of the array. strncmp also doesn't required NULs. This implies that nothing before s1 or s2 and nothing at or beyond s1+n or s2+n will be accessed.Nocturn
@Apthorp Unclear on "a zero-padded string [which might lack a trailing zero if it precisely fills its buffer]" How can a string lack a trailing zero character?Dodge
@Nocturn Interesting case "n is larger than the size of the array s2, but strlen(s1)<n (or less than the size of array s2)" in your now deleted comment. I suspect it is UBDodge
Come to think of it, in the worst case the implementation has to access all characters of s1 in the comparison. It doesn't compare characters at different lags from their pointers. I'm tempted to say that if s1 is null terminated and smaller than s2, then s2 will only be accessed at its first strlen(s1) characters. But I'm unsure.Nocturn
@giusti, I take supercat to be asking not what we think should happen, but what the standard requires to happen. These are not necessarily the same thing.Paulownia
Like you said before, the standard is not explicit in this case. So when I say "I'm thinking X should happen", I'm not just taking it out of the blue. It's my interpretation of the consequences of what the standard is explicit about. I'm unsure that my interpretation is correct, but I'd love if someone could read something from the standard that gave us a definitive answer.Nocturn
@JohnBollinger: My question was whether any subsequent documents have clarified what is required. What should really matter is whether any existing programs are relying on a particular behavior, and whether there are any implementations where guaranteeing such behavior would be expensive; if the answer to the first question is "yes", and the second, "no", the Standard should be regarded as mandating the behavior. Unfortunately, such thinking isn't always fashionable these days.Apthorp
@JohnBollinger: The design of the Standard Library generally presumes that, with the exception of memcpy/memmove, it is better to have one function that can be used for everything, than to have other functions optimized for different purposes. A version of strncmp which only needed to report whether zero-padded strings were equal and was allowed--but not required--to stop processing if it found a mismatch or a zero byte--could be faster than an strncmp which is not allowed to read anything after a zero byte, but the Standard makes no provisions for such a thing.Apthorp
M
15

The strncmp function compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to by s2.

Specification 7.24.4.2 says that.C11 standard.

Characters that don't follow a null charcaters are not compared so it expects null ended character array or string.1

You can use non-null terminated characters too in here but in that case we have to specify the length upto which we have to check it which is useful in some cases.

Corrections


[1] That characters that don't follow a null character are not compared does not mean that strncmp expects null-terminated strings. It just means that strncmp needs a special case so as to say (for example) that abc\0def... and abc\0xyz... compare equal. There's nothing wrong with comparing two char arrays that aren't null-terminated (up to the specified length) or comparing one null terminated char array with another that isn't null terminated
This is directly added from the comment of David Hammen

Mellman answered 1/1, 2017 at 20:35 Comment(10)
Note "... follow a NULL charcaters...." makes more sense as "... follow a null character ... ". NULL is the null pointer constant associated with pointers. Save the uppercase NULL for discussion about pointers.Dodge
Or you can write NUL, as it is usually denoted in ASCII and ASCII-derived standards.Nocturn
@Nocturn C is not specified to use ASCII, yet commonly does. That is why NUL (an ASCII defined value) is not used in the C specification (except in a non-normative footnote.) A good alternative to the null character is '\0'. See thisDodge
True. But I didn't mention ASCII suggesting any ties to the language. Only that, given its importance, you'll probably get yourself understood without risk of confusion if you write NUL instead of null character, \0, or simply 0.Nocturn
@giusti: As it happens, EBCDIC also refers to the null character as NUL. On the other hand, C allows character sets other than ASCII and EBCDIC. On the other other hand, I've never heard of a C implementation that uses a character set not based on either ASCII or EBCDIC.Aiken
"Characters that don't follow a null charcaters are not compared so it expects null ended character array or string": that answer is incorrect. The text you quoted establishes a restriction for strncmp: it will never read past a null character, if one is present. It does not impose a requirement that there must be a null character. Also with strncmp you always have to specify the maximum number of characters that should be checked.Nocturn
@giusti.: I specified the second part and also the first part...Really if it reflects anything else then it's my fault... maybe you can suggest and edit...Sometimes it happens that the way you put words give people wrong meaning and it's wrong..would you please suggest an edit?Mellman
Regarding an edit: Fix this: Characters that don't follow a null charcaters are not compared so it expects null ended character array or string. That characters that don't follow a null character are not compared does not mean that strncmp expects null-terminated strings. It just means that strncmp needs a special case so as to say (for example) that abc\0def... and abc\0xyz... compare equal. There's nothing wrong with comparing two char arrays that aren't null-terminated (up to the specified length) or comparing one null terminated char array with another that isn't null terminated.Borchert
@DavidHammen.: I added your comment...hope you don;t mind. If you want me to write a version on my own I can do that too...please let me know.Mellman
FWIW, I took back my downvote after your first edit to the answer.Nocturn

© 2022 - 2024 — McMap. All rights reserved.