Assign result of sizeof() to ssize_t
Asked Answered
L

4

5

It happened to me that I needed to compare the result of sizeof(x) to a ssize_t.

Of course GCC gave an error (lucky me (I used -Wall -Wextra -Werror)), and I decided to do a macro to have a signed version of sizeof().

#define ssizeof (ssize_t)sizeof

And then I can use it like this:

for (ssize_t i = 0; i < ssizeof(x); i++)

The problem is, do I have any guarantees that SSIZE_MAX >= SIZE_MAX? I imagine that sadly this is never going to be true.

Or at least that sizeof(ssize_t) == sizeof(size_t), which would cut half of the values but would still be close enough.

I didn't find any relation between ssize_t and size_t in the POSIX documentation.

Related question:

What type should be used to loop through an array?

Lek answered 22/3, 2019 at 15:4 Comment(10)
What is the actual problem that require this? This smells like XY-problem.Prentiss
The problem is I just want to avoid unsigned types as much as I can, especially in loops, and therefore, I always use ssize_t for looping though arrays. But clearly it's a bad design of ssize_t. In my case I'm OK, I could just use the same variable I used to create the VLA in the comparison, but it would be better to be able to use ssize_t, which purpose is exactly that.Lek
It's perfectly valid to compare a size_t to a ssize_t, and though a compiler might warn -- about a signed to unsigned comparison, for example -- it ought not to emit an error.Yorgo
As for preferring ssize_t to size_t, do note that the former is not defined by standard C. It is a posixism. That does not by any means imply that you should avoid it if programming specifically for POSIX, but it is at least something that you should keep in the back of your mind.Yorgo
@JohnBollinger I don't think it's perfectly valid: for (ssize_t i = 0; i <= SIZE_MAX; i++) is a good way to test if the infinite exists (in case you don't get a segfault :). I use ssize_t in loops to avoid that nasty bug. For the error, yeah I used -Wall -Wextra -Werror.Lek
I am speaking from a C language perspective. The comparison of any two values of any arithmetic types is allowed and well-defined, and in that sense valid. As a quality-of-implementation matter, then, I expect compilers to accept all such comparisons. I thus find it misleading to claim that GCC rejected such code with an error without (previously) disclosing that you had specifically requested that it treat conditions that normally would elicit only a warning as fatally erroneous.Yorgo
Edited the question to mention that :)Lek
What exactly is the bug that you claim is fixed by using a signed index type and how exactly does the signed type fix the bug? Testing against the largest representable value is a problem for both signed and unsigned types, so I must be misunderstanding something.Urion
@Urion Looping between 0 and the highest positive number (both including) is an infinite loop in unsigned types, but trivial with signed ones (you need to reverse from highest to 0).Lek
Ok. Of course, you could just use an end-tested loop: {T i = 0; do { ... } while (i++ < limit); }. Here, T should be an unsigned type to avoid overflow in i++ when limit is T_MAX. (In practice, integer overflow doesn't trap and the possibly-overflown last value is never used.)Urion
U
7

There is no guarantee that SSIZE_MAX >= SIZE_MAX. In fact, it is very unlikely to be the case, since size_t and ssize_t are likely to be corresponding unsigned and signed types, so (on all actual architectures) SIZE_MAX > SSIZE_MAX. Casting an unsigned value to a signed type which cannot hold that value is Undefined Behaviour. So technically, your macro is problematic.

In practice, at least on 64-bit platforms, you're unlikely to get into trouble if the value you are converting to ssize_t is the size of an object which actually exists. But if the object is theoretical (eg sizeof(char[3][1ULL<<62])), you might get an unpleasant surprise.

Note that the only valid negative value of type ssize_t is -1, which is an error indication. You might be confusing ssize_t, which is defined by Posix, with ptrdiff_t, which is defined in standard C since C99. These two types are the same on most platforms, and are usually the signed integer type corresponding to size_t, but none of those behaviours is guaranteed by either standard. However, the semantics of the two types are different, and you should be aware of that when you use them:

  • ssize_t is returned by a number of Posix interfaces in order to allow the function to signal either a number of bytes processed or an error indication; the error indication must be -1. There is no expectation that any possible size will fit into ssize_t; the Posix rationale states that:

    A conforming application would be constrained not to perform I/O in pieces larger than {SSIZE_MAX}.

    This is not a problem for most of the interfaces which return ssize_t because Posix generally does not require interfaces to guarantee to process all data. For example, both read and write accept a size_t which describes the length of the buffer to be read/written and return an ssize_t which describes the number of bytes actually read/written; the implication is that no more than SSIZE_MAX bytes will be read/written even if more data were available. However, the Posix rationale also notes that a particular implementation may provide an extension which allows larger blocks to be processed ("a conforming application using extensions would be able to use the full range if the implementation provided an extended range"), the idea being that the implementation could, for example, specify that return values other than -1 were to be interpreted by casting them to size_t. Such an extension would not be portable; in practices, most implementations do limit the number of bytes which can be processed in a single call to the number which can be reported in ssize_t.

  • ptrdiff_t is (in standard C) the type of the result of the difference between two pointers. In order for subtraction of pointers to be well defined, the two pointers must refer to the same object, either by pointing into the object or by pointing at the byte immediately following the object. The C committee recognised that if ptrdiff_t is the signed equivalent of size_t, then it is possible that the difference between two pointers might not be representable, leading to undefined behaviour, but they preferred that to requiring that ptrdiff_t be a larger type than size_t. You can argue with this decision -- many people have -- but it has been in place since C90 and it seems unlikely that it will change now. (Current standard wording from , §6.5.6/9: "If the result is not representable in an object of that type [ptrdiff_t], the behavior is undefined.")

    As with Posix, the C standard does not define undefined behaviour, so it would be a mistake to interpret that as forbidding the subtraction of two pointers in very large objects. An implementation is always allowed to define the result of behaviour left undefined by the standard, so that it is completely valid for an implementation to specify that if P and Q are two pointers to the same object where P >= Q, then (size_t)(P - Q) is the mathematically correct difference between the pointers even if the subtraction overflows. Of course, code which depends on such an extension won't be fully portable, but if the extension is sufficiently common that might not be a problem.

As a final point, the ambiguity of using -1 both as an error indication (in ssize_t) and as a possibly castable result of pointer subtraction (in ptrdiff_t) is not likely to be a present in practice provided that size_t is as large as a pointer. If size_t is as large as a pointer, the only way that the mathematically correct value of P-Q could be (size_t)(-1) (aka SIZE_MAX) is if the object that P and Q refer to is of size SIZE_MAX, which, given the assumption that size_t is the same width as a pointer, implies that the object plus the following byte occupy every possible pointer value. That contradicts the requirement that some pointer value (NULL) be distinct from any valid address, so we can conclude that the true maximum size of an object must be less than SIZE_MAX.

Urion answered 22/3, 2019 at 15:19 Comment(3)
That's a big problem, but the biggest problem is that ssize_t could even be smaller than size_t because there's nothing in the documentation that says otherwise. I think it's a big bug in POSIX. Read my comments to @Joshua 's answer.Lek
@cacahuete: the problem is pretty well-known but you're not going to get much traction complaining about it in a SO comment. In practice, the impact of the problem is limited because the only valid negative value fir ssize_t is -1. (I'm going to add that observation to my answer when I get back.) ssize_t is not guaranteed to be the same as ptrdiff_t.Urion
Perfect answer. Yes, I should use ptrdiff_t, which is more suited for that task, but still has the same problem of ironically not being able to represent all possible pointer differences. Good to know that at least they considered it, and decided to let it be UB. Thanks to this question I learnt about it :)Lek
A
3

Please note that you can't actually do this.

The largest possible object in x86 Linux is just below 0xB0000000 in size, while SSIZE_T_MAX is 0x7FFFFFFF.

I haven't checked if read and stuff actually can handle the largest possible objects, but if they can it worked like this:

ssize_t result = read(fd, buf, count);
if (result != -1) {
    size_t offset = (size_t) result;
    /* handle success */
} else {
    /* handle failure */
}

You may find libc is busted. If so, this would work if the kernel is good:

ssize_t result = sys_read(fd, buf, count);
if (result >= 0 || result < -256) {
    size_t offset = (size_t) result;
    /* handle success */
} else {
    errno = (int)-result;
    /* handle failure */
}
Aphasic answered 22/3, 2019 at 15:13 Comment(5)
Then I guess that answers that there's no guarantee of what I asked, but at the same time this proves a bug in POSIX's ssize_t and SSIZE_MAX, right? ssize_t should be able to hold the size of the largest possible object, or it is useless otherwise.Lek
And for that reason, IMHO, (sizeof(ssize_t) > sizeof(size_t)) && (SSIZE_MAX >= SIZE_MAX) this should be true for that type to make sense.Lek
On Linux you'll find that you can't read() or write() more than 2147479552 bytes (2 GB - 4k) at a time. But you're using a loop around your read/write anyway to handle short reads and writes and EINTR so it all works out ok.Bernetta
And at least on some versions of OSX, trying to read/write more than 2 GB at a time fails outright.Bernetta
@janneb: I was wondering if that would be the case. The manual doesn't declare it...Aphasic
A
3

I'm gonna take this on as an X-Y problem. The issue you have is that you want to compare a signed number to an unsigned number. Rather than casting the result of sizeof to ssize_t, You should check if your ssize_t value is less than zero. If it is, then you know it is less than the your size_t value. If not, then you can cast it to size_t and then do a comparison.

For an example, here's a compare function that returns -1 if the signed number is less than the unsigned number, 0 if equal, or 1 if the signed number is greater than the unsigned number:

int compare(ssize_t signed_number, size_t unsigned_number) {
    int ret;
    if (signed_number < 0 || (size_t) signed_number < unsigned_number) {
        ret = -1;
    }
    else {
        ret = (size_t) signed_number > unsigned_number;
    }
    return ret;
}

If all you wanted was the equivalent of < operation, you can go a bit simpler with something like this:

(signed_number < 0 || (size_t) signed_number < unsigned_number))

That line will give you 1 if signed_number is less than unsigned_number and it limits the branching overhead. Just takes an extra < operation and a logical-OR.

Anadromous answered 22/3, 2019 at 15:32 Comment(8)
Yeah, a bit of overhead to do what could be done with a single < if the people who designed ssize_t weren't braindamaged. :)Lek
@CacahueteFrito: There is only one negative number in ssize_t: -1.Aphasic
@Aphasic Is it a guarantee by the POSIX standard that -1 is the only negative number, or -1 simply the only negative number that is guaranteed to be stored by ssize_t?Anadromous
@Aphasic I knew that, and only need that one to return error, and also avoid infinite loops :)Lek
@CacahueteFrito I've edited with a drop-in replacement for the < operation that should have fairly limited over-head.Anadromous
@ChristianGibbons That replacement is ok for checking a return value, but not for a loop condition. That's the same as if I just used size_t for the loop counter, which could lead to an infinite loop.Lek
@CacahueteFrito Could you explain what the use-case would be that would lead to an infinite loop?Anadromous
Read the related question link I added to the questionLek
B
3

ssize_t is a POSIX type, it's not defined as part of the C standard. POSIX defines that ssize_t must be able to handle numbers in the interval [-1, SSIZE_MAX], so in principle it doesn't even need to be a normal signed type. The reason for this slightly weird definition is that the only place ssize_t is used is as the return value for read/write/etc. functions.

In practice it's always a normal signed type of the same size as size_t. But if you want to be really pedantic about your types, you shouldn't use it for other purposes than handling return values for IO syscalls. For a general "pointer-sized" signed integer type C89 defines ptrdiff_t. Which in practice will be the same as ssize_t.

Also, if you look at the official spec for read(), you'll see that for the 'nbyte' argument it says that 'If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.'. So even if a size_t is capable of representing larger values than SSIZE_MAX, it's implementation-defined behavior to use larger values than that for the IO syscalls (the only places where ssize_t is used, as mentioned). And similar for write() etc.

Bernetta answered 22/3, 2019 at 15:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.