Why not enforce 2's complement in C++?
Asked Answered
Y

4

30

The new C++ standard still refuses to specify the binary representation of integer types. Is this because there are real-world implementations of C++ that don't use 2's complement arithmetic? I find that hard to believe. Is it because the committee feared that future advances in hardware would render the notion of 'bit' obsolete? Again hard to believe. Can anyone shed any light on this?

Background: I was surprised twice in one comment thread (Benjamin Lindley's answer to this question). First, from piotr's comment:

Right shift on signed type is undefined behaviour

Second, from James Kanze's comment:

when assigning to a long, if the value doesn't fit in a long, the results are implementation defined

I had to look these up in the standard before I believed them. The only reason for them is to accommodate non-2's-complement integer representations. WHY?

Yerga answered 13/4, 2011 at 18:59 Comment(30)
Why are you writing code that depends on signed values being stored as two's-complement is a better question.Perpetua
@yan: Because I know that it's going to run on a 2's-complement processor. Don't be silly.Yerga
The point of having a (more or less) high-level language is not having to care about implementation details. A number is a number and it shouldn't matter how it's implemented. Look at any high-level language spec. Nobody requires anything about the inner workings of implementations.Pratincole
Voting to close this as this is a highly subjective question based on the implementation of the Standards Committee who have decided and to be written in stone for the language implementation by compiler makers...Peculate
@delnan: No, there are plenty of problems that require bit-fiddling. Have you ever written a chess program? Or an arbitrary-precision floating-point library? Both of these would be much harder to write if you couldn't rely on the fact that the processor uses 2's complement arithmetic. They would be much slower too.Yerga
@tommieb75: I am gobsmacked. Unless you're joking...Are you?Yerga
@delnan: that's not true at all. The representation of unsigned integer types is hedged about by a whole bunch of requirements, the only implementation freedom is to add padding bits. The representation of signed types slightly less so, but still only 3 representations are allowed (2's complement, 1s' complement, sign-magnitude). If it were true that nobody required anything of the inner workings, then why that list of 3?Chun
There's this SO question where a couple people mention having used one's complement machines, but generally only in the distant past: #162297 Anecdotal data and small sample size, but it sure feels like there's nothing left using one's complement or sign-magnitude (for ints anyway).Transponder
TonyK: ok... no need to get pedantic... we're small fry unlike those that are getting their heads wrecked over the going-ons on the C++ committee... we should not have to worry about the implementations - just write the bloody code and leave it up to the compiler manufacturer to sort it out...Peculate
@TonyK: If you ask me, I'd be glad to completely move C++ to the application development level and leave the very heavy lifting that requires such trickery to C. But granted, these absolutely peformance critical programs are exceptions. Although they're not the majority of the programs and I've seen way too many "How do I optimize the living heck out of this code that will run once a week for three seconds" questions not to be careful whenever someone claims to need this level of optimization.Pratincole
@TonyK, Knowing that you are running on a processor or another isn't per itself a good reason to depend on it.Trevor
Why are you asking about piotr's comment when James Kanze clearly explained that the comment in question is wrong?Saddlecloth
@yan, @delnan: I agree with @TonyK. There are all sorts of places that you can take advantage of 2's-complement representation. FIR filters and Viterbi decoders are just two, off the top of my head. Anywhere where you'd like to be able to have x & 0xF return a value between 0 and 15 in a sane manner is another.Action
@Oli: How could x & 0xF possibly not return a value between 0 and 15? C++ does require a binary representation, it just doesn't specify the negative representation.Saddlecloth
@Ben, by "sane manner", I mean akin to what people usually want x % 16 to do (i.e. the mathematical definition of "mod").Action
@Ben Voigt: piotr's comment, although technically imprecise, was correct in substance. Right shift on signed type is implementation-defined, not undefined behaviour. A nitpick.Yerga
@Oli, so write x % 16 and let the optimizer do the transform if it is valid.Trevor
@TonyK: No. Right shift on a negative value is implementation-defined. Right shift on a non-negative value stored in a signed type is completely defined. The purpose of the ambiguity is likely to allow either round-to-zero or round-down behavior.Saddlecloth
@AProgrammer: No, because it's implementation-defined whether x % 16 acts like mathematical mod!Action
@Oli: note that in C++0x (as in C99), it's no longer implementation-defined. % is now guaranteed not to act like mathematical modulus for negative numbers, since / is now guaranteed to truncate towards zero.Chun
@Ben Voigt: No. In the context of piotr's comment, the value may be negative. Right?Yerga
@Steve: Interesting, I don't think I knew that.Action
@TonyK: piotr was wrong on two counts: (1) He said it's undefined behavior, when it's actually implementation-defined. You recognized that. (2) He said the boundaries of well-defined behavior depend on the type, actually they concern the value. But since you knew the comment was imprecise, why did you quote it in your question without fixing it? This whole question is perched precariously on the edge of pedanticism.Saddlecloth
@Steve: Isn't that the difference between the modulus and remainder operations? @Oli: You just defined sane as "return a value between 0 and 15". It does. So the standard IS sane, and expecting it to return the remainder of division by 16 is not.Saddlecloth
@Ben: shrug - true modulo arithmetic in mathematics is done using equivalence classes of numbers, not numbers themselves. It doesn't care what representative member you use for each equivalence class, but Oli is correct that it isn't all that unusual for a programmer to want the least non-negative member. Whether you call that "modulus", "remainder" or whatever, it's the thing that x & 0xf gives you if your integers are 2's complement. If they aren't, you need ((unsigned int)x) & 0x0f instead.Chun
And I don't believe Oli did "define" sane as you said he did. In his first comment he left it undefined as a desired property beyond being in the range 0 to 15. In his second comment he defined it in terms of what (unspecified) people "usually" want from a modulus. There's surely no point arguing with him on the basis of a definition that, even if you can read his comments to have said it at first, certainly isn't what he meant.Chun
@Steve: Ok, I can see that x & 0x0f doesn't preserve equivalence classes in one's complement or sign-magnitude, when the input is negative. But it does give a non-negative number, and the remainder, given a negative input, is negative, even for two's complement, so x & 0x0f doesn't yield remainder with a negative argument, ever. That's why I'm surprised this came up in a discussion which apparently concerns representation of negative numbers.Saddlecloth
"the remainder, given a negative input, is negative" according to mathematicians, do you mean? Not necessarily, but you introduced the term 'remainder', not Oli, so I guess you can define it how you like. What of it? ;-)Chun
By the way, "because I know my target system has [feature]" is a horrible reason to write code that relies on [feature]". Good reasons are of the shape "because my logic/solution so much more clear/elegant/readable/verifiable if I can rely on [feature] that it makes the dependency on [feature] worth it".Mellophone
For the record, if you have good reason to rely on bitwise representation details like 2s complement, C since '89 has already had a clear and standard and portable way to express that - use unsigned integers, either for all operations, or just cast to unsigned for operations where the bitwise representation matters. Negative values are guaranteed to convert to unsigned integer values as-if by 2s complement, even on non-2s complement machines (it's not worded so simply, I think to get the most rigor and least ambiguity concisely, but that's what it effectively means).Mellophone
T
28

(Edit: C++20 now imposes 2's complement representation, note that overflow of signed arithmetic is still undefined and shifts continue to have undefined and implementation defined behaviors in some cases.)

  • A major problem in defining something which isn't, is that compilers were built assuming that is undefined. Changing the standard won't change the compilers and reviewing those to find out where the assumption was made is a difficult task.

  • Even on 2 complement machine, you may have more variety than you think. Two examples: some don't have a sign preserving right shift, just a right shift which introduce zeros; a common feature in DSP is saturating arithmetic, there assigning an out of range value will clip it at the maximum, not just drop the high order bits.

Trevor answered 13/4, 2011 at 19:9 Comment(7)
+1, explains why the OP's claim "The only reason for them is to accommodate non-2's-complement" is false, which is probably the closest this question will get to a definitive answer, even if it is only to the supplementary background :-)Chun
This is much the most helpful response to my question! But still: are there any implementations of C++ for such architectures? And are there likely to be, ever?Yerga
@TonyK: don't forget that even today, and even though it hasn't adopted all C99 features, one of the goals of C++0x is a kind of limited "compatibility" with C. Surely there are C implementations for some of those DSPs with saturating arithmetic, and certainly could be C++ implementations without too much bother, given the flexibility of GCC or LLVM. So the fact that out-of-range assignment is implementation-defined does have practical consequences, I reckon.Chun
This is a good point. Historically, the C++ Standard is more about getting the behavior down on paper rather than dictating what the behavior should be. So if implementations vary, then that variation should be noted in the standard.Oceanic
For what kind of architecture? Non 2's complement? -- I don't think so, AFAIK the only one still built is by Unisys and has a C compiler but not a C++ one. Saturating arithmetic? I'd not be surprised that it exists in vector extension for common architecture.Trevor
@AProgrammer: maybe the C++ committee is still secretly hoping that Unisys or Comeau/EDG will get around to putting out a C++ compiler based on that C compiler ;-)Chun
Re: narrowing conversions like long to short: that's never undefined behaviour, but it is implementation defined when the destination is signed so your example of saturating conversion is possible (for an implementation that defines that behaviour to always happen, never truncation.) Conversion from out-of-range float to int or unsigned is undefined behaviour, but conversion between integral types is guaranteed not to be UB.Acklin
C
5

I suppose it is because the Standard says, in 3.9.1[basic.fundamental]/7

this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types.

which, I am willing to bet, came along from the C programming language, which lists sign and magnitude, two's complement, and one's complement as the only allowed representations in 6.2.6.2/2. And there sure were 1's complement systems around when C was wide-spread: UNIVACs are the most often mentioned, it seems.

Colucci answered 13/4, 2011 at 19:3 Comment(7)
Because there is simply no reason to do so.Aussie
@delnan - What would the standard benefit from banning specific kinds of hardware? If you want to write code for 2's complement only, you are free to do so. Why forbid everyone else from doing otherwise?Devitrify
@Bo @Michael: That's an answer for the OP, not for me ;)Pratincole
@Bo: the standard does ban specific kinds of hardware, or rather it demands that their C++ implementations behave like other kinds of hardware, even if doing so is inefficient. If I want 8-bit excess-127 bytes in my C++ implementation then I'm out of luck, the standard bans that.Chun
@BoPersson the biggest benefit to the standard would be substantially empowering the majority of developers who are interested in writing code that works, reliably and consistently, on all standard-compliant compilers, by significantly simplifying the edge-cases they have to worry about. I spend hours thinking through how to thread the needle between various undefined, unspecified, and implemention-defined behaviors in C - hours I don't have to lose in other languages. Every guarantee from the language is a factor removed from the combinatoric explosion of "how could this code misbehave?"Mellophone
@Mellophone 2's complement is standard in C++ by now. With some notorious exceptions (with known UB due to illformed programs), C++ is very predictable compared to other languages. E.g. it has special strengths in resource management with RAII and abatractions from very low level to very high level.Addle
@Addle right, I know, but the question I was answering was "what benefit does the standard get from [things which disadvantage different hardware, such as making 2s complement behavior standard]?" and C, especially old-school and portable C, is a good example of not having those benefits. As for C++'s predictability... it really depends on what you're comparing to, and what things we're talking about predicting, but hashing that out seems out-of-scope for this comment section.Mellophone
O
4

It seems to me that, even today, if you are writing a broadly-applicable C++ library that you expect to run on any machine, then 2's complement cannot be assumed. C++ is just too widely used to be making assumptions like that.

Most people don't write those sorts of libraries, though, so if you want to take a dependency on 2's complement you should just go ahead.

Oceanic answered 13/4, 2011 at 19:7 Comment(1)
But as AProgrammer points out, even being a two's-complement machine doesn't assure the behavior TonyK is asking about. Assignment of an out-of-range value can saturate instead of discarding high bits, and signed right-shift can round-toward-zero (add the most significant of the shifted-out bits back in) instead of round-down (discard shifted out bits). These behaviors are perfectly possible on two's-complement hardware.Saddlecloth
B
1

Many aspects of the language standard are as they are because the Standards Committee has been extremely loath to forbid compilers from behaving in ways that existing code may rely upon. If code exists which would rely upon one's complement behavior, then requiring that compilers behave as though the underlying hardware uses two's complement would make it impossible for the older code to run using newer compilers.

The solution, which the Standards Committee has alas not yet seen fit to implement, would be to allow code to specify the desired semantics for things in a fashion independent of the machine's word size or hardware characteristics. If support for code which relies upon ones'-complement behavior is deemed important, design a means by which code could expressly demand one's-complement behavior regardless of the underlying hardware platform. If desired, to avoid overly complicating every single compiler, specify that certain aspects of the standard are optional, but conforming compilers must document which aspects they support. Such a design would allow compilers for ones'-complement machines to support both two's-complement behavior and ones'-complement behavior depending upon the needs of the program. Further, it would make it possible to port the code to two's-complement machines with compilers that happened to include ones'-complement support.

I'm not sure exactly why the Standards Committee has as yet not allowed any way by which code can specify behavior in a fashion independent of the underlying architecture and word size (so that code wouldn't have some machines use signed semantics for comparisons where other machines would use unsigned semantics), but for whatever reason they have yet to do so. Support for ones'-complement representation is but a part of that.

Bywaters answered 25/10, 2014 at 4:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.