Is `long` guaranteed to be at least 32 bits?
Asked Answered
M

5

53

By my reading of the C++ Standard, I have always understood that the sizes of the integral fundamental types in C++ were as follows:

sizeof(char) <= sizeof(short int) <= sizeof(int) <= sizeof(long int)

I deduced this from 3.9.1/2:

  1. There are four signed integer types: “signed char”, “short int”, “int”, and “long int.” In this list, each type provides at least as much storage as those preceding it in the list. Plain ints have the natural size suggested by the architecture of the execution environment

Further, the size of char is described by 3.9.1/ as being:

  1. [...] large enough to store any member of the implementation’s basic character set.

1.7/1 defines this in more concrete terms:

  1. The fundamental storage unit in the C + + memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and is composed of a contiguous sequence of bits, the number of which is implementation-defined.

This leads me to the following conclusion:

1 == sizeof(char) <= sizeof(short int) <= sizeof(int) <= sizeof(long int)

where sizeof tells us how many bytes the type is. Furthermore, it is implementation-defined how many bits are in a byte. Most of us are probably used to dealing with 8-bit bytes, but the Standard says there are n bits in a byte.


In this post, Alf P. Steinbach says:

long is guaranteed (at least) 32 bits.

This flies in the face of everything I understand the size of the fundamental types to be in C++ according to the Standard. Normally I would just discount this statement as a beginner being wrong, but since this was Alf I decided it was worth investigating further.

So, what say you? Is a long guaranteed by the standard to be at least 32 bits? If so, please be specific as to how this guarantee is made. I just don't see it.

  1. The C++ Standard specifically says that in order to know C++ you must know C (1.2/1) 1

  2. The C++ Standard implicitly defines the minimum limit on the values a long can accommodate to be LONG_MIN-LONG_MAX 2

So no matter how big a long is, it has to be big enough to hold LONG_MIN to LONG_MAX.

But Alf and others are specific that a long must be at least 32 bits. This is what I'm trying to establish. The C++ Standard is explicit that the number of bits in a byte are not specified (it could be 4, 8, 16, 42) So how is the connection made from being able to accommodate the numbers LONG_MIN-LONG_MAX to being at least 32 bits?


(1) 1.2/1: The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

  • ISO/IEC 2382 (all parts), Information technology – Vocabulary
  • ISO/IEC 9899:1999, Programming languages – C
  • ISO/IEC 10646-1:2000, Information technology – Universal Multiple-Octet Coded Character Set (UCS) – Part 1: Architecture and Basic Multilingual Plane

(2) Defined in <climits> as:

LONG_MIN -2147483647 // -(2^31 - 1)
LONG_MAX +2147483647 //   2^31 - 1
Microscope answered 1/12, 2010 at 22:24 Comment(22)
I've never even heard of a byte that wasn't 8 bits long.Swan
@Mark: There are definitely machines that can address and work only with quantities greater than 8 bits, e.g. 32 bit words, there it's common to use char == int == long == 32 bit.Hippie
@Mark Storer: Bytes that aren't 8 bits are rare nowadays, but there used to be a greater variety of systems around. A computer with 36-bit words would have a 9-bit byte. The old CDC Cyber systems with 60-bit words would have to have a 60-bit byte (which would cause problems) if they ever got a C compiler, but normally characters took 6 bits (there was a 6/12-bit scheme if you really wanted lowercase).Backcourt
@Mark Storer: Texas Instruments has a series of DSPs with C++ compilers and 16-bit byte. Or they used to have, a few years back. There's also an anachronistic beast with 9-bit byte, with ancestry going back to middle ages. "Unisys"? Not sure. I could look it up if you're really interested. Cheers,Bodycheck
@John: for those who read the commentry in that post, please do note that in spite of the sarcastic tone of the troll at first I just pointed him to the C standard, which I think was a pretty succinct and purely technical answer. I was unable to satisfy him, however.Bodycheck
Well, I guess if someone invents some way to store (2^32)-1 distinct values in fewer than 32 bits, then there might not be 32 bits in a long. However, on any binary platform, as long as mathematics is valid, you will have 32 bits.Henkel
@Anon.: Yep, that would do itMicroscope
@Alf: Succinct, definitely. Correct, yes, as has been shown in this post. I would suggest however that you did leave a lot of steps out, and were rather curt in the "says so in the C std" responses. I didn't see the connection either, so I can understand his confusion.Microscope
Yes: See stackoverflow.com/questions/271076/… sizeof(long) * CHAR_BITS >= 32Profession
@John: thanks for your thorough treatment of this topic. I too am convinced now, and I had the same understanding you did to begin with. The main language in the standard (non-footnotes, etc.) seems to leave enough variables in play that I thought the safer play was to assume no specific sizes ever, in general. I was also surprised to see something like LONG_MAX having a value specified by the standard -- I thought the point of such a macro was to enable implementation definition.Equiprobable
@Alf: I could just point someone to the X standard to answer any C++ question. It's not an answer. It's not unreasonable to list a section/paragraph, etc., when you are citing the standard. You brought the first sarcastic comments to the discussion. You should try earning some respect for being a decent human being in addition to a C++ God.Equiprobable
@John: re your point 4, "In C++ integral types are stored in 2's compliment in the underlying representation" no, they're guaranteed binary representation, but not guaranteed 2's complement form. C and C++ support 2's complement, 1's complement and magnitude-&-sign. I'm happy for the insight and SO rep everybody's gained from this. :) Unhappy to see the troll's followed here. :(Bodycheck
@John: also, sorry, re your summary points 2 and 3, the defined range is a defined minimum range. that is, the standard doesn't require those specific values, it allows any range that is that greater or greater.Bodycheck
@Alf: Did you fail to read #6?Microscope
@John: no, I read it. conclusion is OK. Cheers,Bodycheck
@John Your answer should be pulled out into an answer, not edited to the top of your question. (Hence why it's called a 'question').Onestep
@George: And should that answer be marked CW?Microscope
@John Diblings Heavens no, you deserve reputation if you have the right answer. However, if you are just compiling what another person has partially (or totally) said, then you should give them the 'accepted' checkmark and edit their answer with any additional commentary.Onestep
@George: Problem with the currently accepted answer is that the complete answer was only arrived at through long conversation with many participants. My answer is simply a distillation of everything said by everybody. It is more consise and more complete than MSN's actual answer, and directly addresses concerns from people who, like me, doubted that C++ inherited things from the C standard. Anyway, I'll just leave it the way it is for now.Microscope
In other words, its not my answer -- i just posted it.Microscope
This site would be 300% better if no one could edit other people's content -- the obsessive compulsive editors almost never have appreciation for the subtleties of understanding that are changed by their edits.Equiprobable
In C, it's not necessarily the case that sizeof (int) <= sizeof (long int); int could have additional padding bits that could make it bigger than long int. I'm not sure whether this applies to C++. In any case, no sane implementation would do this.Estuary
M
17

The answer is definitively YES. Read my OP and all the comments to understand why exactly, but here's the short version. If you doubt or question any of this, I encourage you to read the entire thread and all of the comments. Otherwise accept this as true:

  1. The C++ standard includes parts of the C standard, including the definitions for LONG_MIN and LONG_MAX
  2. LONG_MIN is defined as no greater than -2147483647
  3. LONG_MAX is defined as no less than +2147483647
  4. In C++ integral types are stored in binary in the underlying representation
  5. In order to represent -2147483647 and +2147483647 in binary, you need 32 bits.
  6. A C++ long is guaranteed to be able to represent the minimum range LONG_MIN through LONG_MAX

Therefore a long must be at least 32 bits1.

EDIT:

LONG_MIN and LONG_MAX have values with magnitudes dictated by the C standard (ISO/IEC 9899:TC3) in section §5.2.4.2.1:

[...] Their implementation-defined values shall be equal or greater in magnitude [...] (absolute value) to those shown, with the same sign [...]

— minimum value for an object of type long int
LONG_MIN -2147483647 // -(2 ^ 31 - 1)
— maximum value for an object of type long int
LONG_MAX +2147483647 // 2 ^ 31 - 1

1 32 bits: This does not mean that sizeof (long) >= 4, because a byte is not necessarily 8 bits. According to the Standard, a byte is some unspecified (platform-defined) number of bits. While most readers will find this odd, there is real hardware on which CHAR_BIT is 16 or 32.

Microscope answered 2/12, 2010 at 16:51 Comment(7)
I was asked to make this an answer on its own, hence i will accept this when time elapsesMicroscope
The C standard includes guarantees on the minimum magnitude of those values. LONG_MAX must be at least +2147483647, and LONG_MIN must be at most -2147483647. Google for e.g. C standard minimum value of LONG_MAX if you don't believe me.Calif
There are two technicalities that you should mention. First, the signed ranges are symmetric (-2147483647 ... +2147483647 instead of -214748364‌‌ 8 ... +2147483647) to allow for the possibility that signed integers might not use two's complement. The very latest C and C++ standards still consider this a realistic possibility, even though the last commercial non-two's-complement machine went out of production sometime in the 1970s (one of the UNIVAC series, not sure exactly which). (cont'd)Tench
Second, and rather more importantly, a long being at least 32 bits does not mean that sizeof(long) >= 4. There are real machines for which CHAR_BIT is 16 or 32, and therefore sizeof(long) might be as small as 2 or 1. Unlike the one's complement mainframes and the 9-bit minis, these are still in production AFAIK. (Mostly they are unusual microcontrollers. Yes, working with them is kind of a pain.)Tench
@Zack: To your second point, I do say in the OP that, "Furthermore, it is implementation-defined how many bits are in a byte. Most of us are probably used to dealing with 8-bit bytes, but the Standard says there are n bits in a byte." However I guess that's a bit buried and it would be useful to mention it more explicitly in the answer.Microscope
@zwol: If the Standard would allow such a thing, there could be practical advantages to having LONG_LONG_MIN be -0x7FFF000000000000, such that all signed numbers of the form 0x8000xxxxxxxxxxxx would behave as NaN and could thus be used for overflow trapping (if the only NaN was 0x8000000000000000, code which produced an overflow would need to reprocess the lower bits after the upper bits to store a NaN value; extending the NaN range would allow 16-bit machines to do math on the lower bits of a long long before worrying about whether the upper bits of the result should yield a number or NaN.Flatus
I think adding a reference to §3.9.1.3 in the C++ standard to your answer, would make sense, since this states: "The signed and unsigned integer types shall satisfy the constraints given in the C standard, section 5.2.4.2.1." And in the C standard §5.2.4.2.1 it states the minimum range as you wrote. But without the note from §3.9.1.3 in the C++ standard the rules enforced in §5.2.4.2.1 in the C standard, would count for C only and not C++.Dolliedolloff
C
38

C++ uses the limits defined in the C standard (C++: 18.3.2 (c.limits), C: 5.2.4.2.1):

LONG_MIN -2147483647 // -(2^31 - 1)
LONG_MAX +2147483647 //   2^31 - 1

So you are guaranteed that a long is at least 32 bits.

And if you want to follow the long circuitous route to whether LONG_MIN/LONG_MAX are representable by a long, you have to look at 18.3.1.2 (numeric.limits.members) in the C++ standard:

static constexpr T min() throw(); // Equivalent to CHAR_MIN, SHRT_MIN, FLT_MIN, DBL_MIN, etc.
static constexpr T max() throw(); // Equivalent to CHAR_MAX, SHRT_MAX, FLT_MAX, DBL_MAX, etc.

I moved the footnotes into the comment, so it's not exactly what appears in the standard. But it basically implies that std::numeric_limits<long>::min()==LONG_MIN==(long)LONG_MIN and std::numeric_limits<long>::max()==LONG_MAX==(long)LONG_MAX.

So, even though the C++ standard does not specify the bitwise representation of (signed) negative numbers, it has to either be twos-complement and require 32-bits of storage in total, or it has an explicit sign bit which means that it has 32-bits of storage also.

Chiffon answered 1/12, 2010 at 22:32 Comment(16)
This is roughly what you get if you follow the incredibly long comment chain in that post through all the ad-hominem attacks. Nice to see it can be explained succinctly and without calling anybody names.Considering
Where in the C++ standard does it say that a long must accomodate the values [LONG_MIN, LONG_MAX]? I can't find any such reference.Microscope
The section "Sizes of integer types" describes LONG_MIN as "minimum value for an object of type long int" (and similarly for LONG_MAX).Henkel
However, note that these values do not encompass the entire two's-complement range found on most modern chips. Where is -(2^31)?Contemporaneous
@Anon.: Is this a section in the C++ Standard you refer to, or the C Standard? If C++, could you please give me a section number? I searched my PDF for "sizes of int" and found no hits.Microscope
@pst: Note that the limits are allowed to be larger than those values. A conforming implementation could allow -2^31, or it could allow +2^31.Henkel
@John Dibling, there's a lot of bouncing around between C and C++ standards. The sizes of integer types section is from the C standard. Even in the C++ standard, there are references back to the ISO C standard demonostrated by "SEE ALSO: ISO C subclass xxx"Bakemeier
@birryree, @Anon.: If this is from the C standard, how does it apply to C++?Microscope
@John, I edited my comment, but the C++ standard makes many references back to the ISO C standard. Annex C of C++03 discusses compatibility/incompatibilities between C and C++ standards, and they do not mention any differences between the two in type widths. This combined with what the standard says about <climits> having the same contents as <limits.h> (S 18.2.2) indicates to me that the type widths declared in the C standard also apply to C++.Bakemeier
C++: 18.3.2 (c.limits) C: 5.2.4.2.1 Numerical limitsChiffon
Section 1.2 of the C++ standard states that the C standard, and in particular, the Standard C Library, are "indispensable for the application of this document". While nothing explicitly says that a "C long" and a "C++ long" have to be the same thing, I don't see any way you can argue against that.Keirakeiser
@Kristopher - thanks for pointing that out - now I see that they do refer to the C99 standard in C++03 (don't know about C++98)Bakemeier
@Kristopher: I'm not trying to argue for or against anything. I'm trying to establish the truth.Microscope
I updated my answer to include how you would verify that (long)LONG_MAX==LONG_MAX and (long)LONG_MIN==LONG_MIN.Chiffon
@Kristopher: Thanks for the reference. This was a key missing piece. However, there is still one major piece remaining, and I'll update my OP to focus in on this.Microscope
Accepting this as the answer because the post combined with the comments from everybody helped me put all the pieces together. Thanks.Microscope
M
17

The answer is definitively YES. Read my OP and all the comments to understand why exactly, but here's the short version. If you doubt or question any of this, I encourage you to read the entire thread and all of the comments. Otherwise accept this as true:

  1. The C++ standard includes parts of the C standard, including the definitions for LONG_MIN and LONG_MAX
  2. LONG_MIN is defined as no greater than -2147483647
  3. LONG_MAX is defined as no less than +2147483647
  4. In C++ integral types are stored in binary in the underlying representation
  5. In order to represent -2147483647 and +2147483647 in binary, you need 32 bits.
  6. A C++ long is guaranteed to be able to represent the minimum range LONG_MIN through LONG_MAX

Therefore a long must be at least 32 bits1.

EDIT:

LONG_MIN and LONG_MAX have values with magnitudes dictated by the C standard (ISO/IEC 9899:TC3) in section §5.2.4.2.1:

[...] Their implementation-defined values shall be equal or greater in magnitude [...] (absolute value) to those shown, with the same sign [...]

— minimum value for an object of type long int
LONG_MIN -2147483647 // -(2 ^ 31 - 1)
— maximum value for an object of type long int
LONG_MAX +2147483647 // 2 ^ 31 - 1

1 32 bits: This does not mean that sizeof (long) >= 4, because a byte is not necessarily 8 bits. According to the Standard, a byte is some unspecified (platform-defined) number of bits. While most readers will find this odd, there is real hardware on which CHAR_BIT is 16 or 32.

Microscope answered 2/12, 2010 at 16:51 Comment(7)
I was asked to make this an answer on its own, hence i will accept this when time elapsesMicroscope
The C standard includes guarantees on the minimum magnitude of those values. LONG_MAX must be at least +2147483647, and LONG_MIN must be at most -2147483647. Google for e.g. C standard minimum value of LONG_MAX if you don't believe me.Calif
There are two technicalities that you should mention. First, the signed ranges are symmetric (-2147483647 ... +2147483647 instead of -214748364‌‌ 8 ... +2147483647) to allow for the possibility that signed integers might not use two's complement. The very latest C and C++ standards still consider this a realistic possibility, even though the last commercial non-two's-complement machine went out of production sometime in the 1970s (one of the UNIVAC series, not sure exactly which). (cont'd)Tench
Second, and rather more importantly, a long being at least 32 bits does not mean that sizeof(long) >= 4. There are real machines for which CHAR_BIT is 16 or 32, and therefore sizeof(long) might be as small as 2 or 1. Unlike the one's complement mainframes and the 9-bit minis, these are still in production AFAIK. (Mostly they are unusual microcontrollers. Yes, working with them is kind of a pain.)Tench
@Zack: To your second point, I do say in the OP that, "Furthermore, it is implementation-defined how many bits are in a byte. Most of us are probably used to dealing with 8-bit bytes, but the Standard says there are n bits in a byte." However I guess that's a bit buried and it would be useful to mention it more explicitly in the answer.Microscope
@zwol: If the Standard would allow such a thing, there could be practical advantages to having LONG_LONG_MIN be -0x7FFF000000000000, such that all signed numbers of the form 0x8000xxxxxxxxxxxx would behave as NaN and could thus be used for overflow trapping (if the only NaN was 0x8000000000000000, code which produced an overflow would need to reprocess the lower bits after the upper bits to store a NaN value; extending the NaN range would allow 16-bit machines to do math on the lower bits of a long long before worrying about whether the upper bits of the result should yield a number or NaN.Flatus
I think adding a reference to §3.9.1.3 in the C++ standard to your answer, would make sense, since this states: "The signed and unsigned integer types shall satisfy the constraints given in the C standard, section 5.2.4.2.1." And in the C standard §5.2.4.2.1 it states the minimum range as you wrote. But without the note from §3.9.1.3 in the C++ standard the rules enforced in §5.2.4.2.1 in the C standard, would count for C only and not C++.Dolliedolloff
B
8

The C++ standard notes that the contents of <climits> are the same as the C header <limits.h> (18.2.2 in ISO C++03 doc).

Unfortunately, I do not have a copy of the C standard that existed pre-C++98 (i.e. C90), but in C99 (section 5.2.4.2.1), <limits.h> has to have at least this minimum values. I don't think this changed from C90, other than C99 adding the long long types.

— minimum value for an object of type long int

LONG_MIN -2147483647 // −(2^31 − 1)

— maximum value for an object of type long int

LONG_MAX +2147483647 // 2^31 − 1

— maximum value for an object of type unsigned long int

ULONG_MAX 4294967295 // 2^32 − 1

— minimum value for an object of type long long int

LLONG_MIN -9223372036854775807 // −(2^63− 1)
Bakemeier answered 1/12, 2010 at 22:35 Comment(2)
I don't have the C standard either, (I'm a C++ guy). But let's assume that what you've posted here is what applies to C++. I'm trying to connect the dots between the C standard and the C++ standard and figure out ultimately the chain of references that say in no uncertain terms that a long has to be at least 32 bits. Let's just simplify that and say that a long has to accomodate at least the range LONG_MIN-LONG_MAX.Microscope
@John - I think this discussion is going to continue in MSN's answer more than in mine, so I think all the good answers will be in there.Bakemeier
C
8

Yes, the C++ standard is explicit that the number of bits in a byte is not specified. The number of bits in a long isn't specified, either.

Setting a lower bound on a number is not specifying it.

The C++ standard says, in one place:

1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long).

It says, in effect, in another place, via inclusion of the C standard:

CHAR_BITS >= 8; SHORT_BITS >= 16; INT_BITS >= 16; LONG_BITS >= 32

(except that AFAIK, the identifiers SHORT_BITS, INT_BITS and LONG_BITS don't exist, and that these limits are inferred by the requirements for minimum values on the types.)

This follows from the fact that a certain number of bits are required, mathematically, to encode all of the values in the (e.g. for longs) LONG_MIN..LONG_MAX range.

Finally, shorts, ints and longs must all be made up of an integral number of chars; sizeof() always reports an integral value. Also, iterating through memory char by char must access every bit, which places some practical limitations.

These requirements are not inconsistent in any way. Any sizes that satisfy the requirements are OK.

There were machines long ago with a native word size of 36 bits. If you were to port a C++ compiler to them, you could legally decide to have 9 bits in a char, 18 in both short and int, and 36 in long. You could also legally decide to have 36 bits in each of those types, for the same reason that you can have 32 bits in an int on a typical 32-bit system today. There are real-world implementations that use 64-bit chars.

See also sections 26.1-6 and 29.5 of the C++ FAQ Lite.

Calif answered 1/12, 2010 at 23:52 Comment(2)
True, but my question was does a long have to be at least 32 bits, not exactly 32 bitsMicroscope
And the answer is "yes, 'at least'". All over my answer, values are specified in terms of lower bounds, not exact quantities (except for sizeof(char), because as far as C++ is concerned, char s are bytes, but bytes are not necessarily octets). Because that's how the Standard specifies them.Calif
B
7

But Alf and others are specific that a long must be at least 32 bits. This is what I'm trying to establish. The C++ Standard is explicit that the number of bits in a byte are not specified. Could be 4, 8, 16, 42... So how is the connection made from being able to accomodate the numbers LONG_MIN-LONG_MAX to being at least 32 bits?

You need 32 bits in the value representation in order to get at least that many bitpatterns. And since C++ requires a binary representation of integers (explicit language to that effect in the standard, §3.9.1/7), Q.E.D.

Bodycheck answered 1/12, 2010 at 23:22 Comment(2)
And why do you say that is that many bit patterns is required? There's a LOT of steps you have to add before you can write "Q.E.D."Restricted
@MooingDuck: no, with common arithmetic there are no intermediate steps. the question is "how is the connection made", and it goes like this: 2^n = M gives directly n = log2(M). see, no intermediate step. well, except if you want to calculate it on a calculator with no log2 button. then log2(M) = ln(M)/ln(2). :-)Bodycheck

© 2022 - 2024 — McMap. All rights reserved.