Why does byte + byte = int?
Asked Answered
D

16

406

Looking at this C# code:

byte x = 1;
byte y = 2;
byte z = x + y; // ERROR: Cannot implicitly convert type 'int' to 'byte'

The result of any math performed on byte (or short) types is implicitly cast back to an integer. The solution is to explicitly cast the result back to a byte:

byte z = (byte)(x + y); // this works

What I am wondering is why? Is it architectural? Philosophical?

We have:

  • int + int = int
  • long + long = long
  • float + float = float
  • double + double = double

So why not:

  • byte + byte = byte
  • short + short = short?

A bit of background: I am performing a long list of calculations on "small numbers" (i.e. < 8) and storing the intermediate results in a large array. Using a byte array (instead of an int array) is faster (because of cache hits). But the extensive byte-casts spread through the code make it that much more unreadable.

Damato answered 2/6, 2009 at 19:59 Comment(11)
#927891Amend
I would guess it's something to do with how easy it would be to overflow a byte with a few additions. However, I would think that'd be left to the programmer, rather than architecturally restring it like this.Callable
Are you sure you are not micro-optimizing this? IIRC, on a 32-bit machine byte will be aligned at 32-bit borders for optimized access, i.e. one byte would actually use 4 bytes in memory.Balough
The various musings below are a reasonable approximation of the design considerations. More generally: I don't think of bytes as "numbers"; I think of them as patterns of bits that could be interpreted as numbers, or characters, or colors or whatever. If you're going to be doing math on them and treating them as numbers, then it makes sense to move the result into a data type that is more commonly interpreted as a number.Ivanna
@Eric: That makes a lot of sense for byte, but probably not as much sense for short/ushort.Gilpin
By the by if your numbers are always less than 8 you can store two of them per byte and halve your cache misses yet again.Dewaynedewberry
@Eric: byte1 | byte2 is not at all treating them as numbers. This is treating them precisely as patterns of bits. I understand your point of view, but it just so happens that every single time I did any arithmetic on bytes in C#, I was actually treating them as bits, not numbers, and this behaviour is always in the way.Ewan
@Dewaynedewberry even though bytes appear to be only 8 bits, c# is prolly rounding them on a 4 or 8 byte boundary on the stack when used as local variables. Its exactly the same for register usage - even though regs on 86 are 32 bits, generated asm from msil is not going to pack 4 byte variables into one register. theres going to be lots or loading and saving just like for 32 bit values.Buxtehude
Possible duplicate of Integer summing blues, short += short problemSaunders
@EricLippert, (yes I know it's years later) that doesn't explain why the exact same issue exists for doing logical operations on bytes.Beadsman
https://mcmap.net/q/87598/-why-is-ushort-ushort-equal-to-intTelephonist
E
247

The third line of your code snippet:

byte z = x + y;

actually means

byte z = (int) x + (int) y;

So, there is no + operation on bytes, bytes are first cast to integers and the result of addition of two integers is a (32-bit) integer.

Empyrean answered 2/6, 2009 at 20:17 Comment(3)
I have tried code below but it still not working. byte z = (byte)x + (byte)y;Ankledeep
that is because there is no + operation for bytes (see above). Try byte z = (byte)( (int) x + (int) y)Empyrean
The OP asks why it needs to cast. It doesn't cast for int/long/uint..., but why just for byte? and also short/ushort. You add two short and end up with an int type? Then why not two int becomes long? This sounds more like an inconsistent design decision.Apportion
G
185

In terms of "why it happens at all" it's because there aren't any operators defined by C# for arithmetic with byte, sbyte, short or ushort, just as others have said. This answer is about why those operators aren't defined.

I believe it's basically for the sake of performance. Processors have native operations to do arithmetic with 32 bits very quickly. Doing the conversion back from the result to a byte automatically could be done, but would result in performance penalties in the case where you don't actually want that behaviour.

I think this is mentioned in one of the annotated C# standards. Looking...

EDIT: Annoyingly, I've now looked through the annotated ECMA C# 2 spec, the annotated MS C# 3 spec and the annotation CLI spec, and none of them mention this as far as I can see. I'm sure I've seen the reason given above, but I'm blowed if I know where. Apologies, reference fans :(

Gilpin answered 2/6, 2009 at 20:8 Comment(12)
I would argue that for symmetry, 32 + 240 becoming 16 is just as logical as int.MaxValue + 1 being int.MinValue (modulo Eric's comment about byte not really being a number so much as a collection of bits). It's nice that we have the concept of a checked context in C#...Gilpin
Some people don't seem to like this "boring" answer, because it's too practical. They want something more conceptual. To me, this practical answer seems so much more plausible: when you design a spec, you also need to take into practical considerations. An int is designed to be added using a CPU, a byte is designed to store data. When you do an addition, you use a data type optimized for addition.Arciniega
@JonSkeet: btw, do you have links for the annotated C# specs? Or do you have hard copies?Piggish
@Will: I have hard copies - it's well worth getting: amazon.com/dp/0321741765Gilpin
Back on topic... my guess would be that it's because a byte is more commonly used to represent exactly 8 bits of information, whereas an int (and long etc) is far more commonly used to represent a whole number. An int is usually more than the sum of its parts, whereas a byte is usually for bit fields, raw memory access etc. It makes sense that if you're using a byte as a number (eg by adding another number) that it's inferred to be a more common numeric type.Possessed
@JonSkeet the only place I have seen mentioning similar to what you said is "C# in a nutshell": 8 and 16 bit integrals lack their own arithmetic operators and compiler implicitly converted them into larger type (int32) as required.Oaten
Hmm... Don't most 32-bit processor architectures also natively support 8 and 16 bit arithmetic operations? I know for sure that x86 does because I've done it myself many times in x86 assembly. Furthermore, most x86 chips these days are actually 64-bit, so, according to the argument of this answer, all of the types should automatically widen to 64 bits before doing math, which they don't.Mortal
@reirab: I honestly don't know. I'm sure I've read this efficiency argument somewhere convincing, but I can't remember where right now :(Gilpin
@JonSkeet: the reason you describe is exactly the case for ARM architecture. ARMv4-based processors can efficiently load and store 8-, 16-, and 32-bit data. However, most ARM data processing operations are 32-bit only. For this reason, you should use a 32-bit datatype, int or long, for local variables wherever possible. Avoid using char and short as local variable types, even if you are manipulating an 8- or 16-bit value - link, page 107Wrap
Somehow this works. byte x = 1; byte y = 2; x += y; Console.WriteLine(x);Hacker
@KevinMuhuri: Yes, because += is a compound assignment operator, which has an implicit cast in it.Gilpin
I note that there are no IL instructions for it directly anyway, so that may have factored into the C# compiler writers' decisions. Then again, it only requires another conv.u1 instruction, so maybe not. I note that VB.NET does not behave this way, and does force a truncation unless you explicitly cast one of the operands to intSri
P
70

From the article Why do operations on "byte" result in "int"? on Raymond Chen's blog, The Old New Thing:

Suppose we lived in a fantasy world where operations on 'byte' resulted in 'byte'.

byte b = 32;
byte c = 240;
int i = b + c; // what is i?

In this fantasy world, the value of i would be 16! Why? Because the two operands to the + operator are both bytes, so the sum "b+c" is computed as a byte, which results in 16 due to integer overflow. (And, as I noted earlier, integer overflow is the new security attack vector.)

Raymond is defending, essentially, the approach C and C++ took originally. In the comments, he defends the fact that C# takes the same approach, on the grounds of language backward compatibility.

Peg answered 2/6, 2009 at 20:4 Comment(11)
With integers if we add them and it overflows it doesn't automatically cast it as a different datatype though so why do it with byte?Allin
@Ryan: I suppose Microsoft foresaw more problems with byte arithmetic than they did with int math, due to byte's lesser dynamic range.Peg
With ints it does overflow. Try adding int.MaxValue + 1 you get -2147483648 instead of 2147483648.Stagey
@Longhorn213: Yep, that's what Ryan's saying: int math can overflow, but int math doesn't return longs.Peg
Exactly. If this is meant to be a security measure, it's a very poorly implemented one ;)Gilpin
I'm not sure if Raymond (for once) should be considered the authority on this. Read the comments on the blog entry. His defense of this in C# is basically "because C++ does it that way" blogs.msdn.com/oldnewthing/archive/2004/03/10/87247.aspx#87811.Herrenvolk
C was designed to run efficiently on very primitive (by our standards) processors that probably had no instructions for integer arithmetic on anything but a standard word size. If you look at the "C Programming Language", it mentions that "int" is usually the "native" word size of the processor the compiler targets.Sackett
@JonSkeet Is it possible that it's not a security/safety issue, but instead the fact that perhaps at the processor level, short+short overflow is undetectable. I.e. if the actual processor instruction .NET maps the operation to is 32bit, and it is checked arithmetic, the processor will not flag an overflow because AFAitK it wasn't a 32bit overflow even though it would be an overflow in terms of short. So in addition to the performance issues you noted in your answer, checked arithmetic would suffer more because .NET couldn't rely on the processor detecting overflows(thus manually checking)Flitter
I'm speaking only on superficial memory of a long ago processor design class :/ My point being that perhaps overflows are indeed a contributing issue to this design decision, but likely not for the reason this answerer implied.Flitter
@AaronLS: Checked arithmetic can always be emulated - I don't even know what it looks like at an assembly level for integers, to be honest, but you could certainly add checks.Gilpin
This reasoning is fallacious... Try replacing int with long and byte with int..., like long v = int.MaxValue + 1; Or perhaps the old problem of double x = 5 / 2;...Autodidact
E
61

C#

ECMA-334 states that addition is only defined as legal on int+int, uint+uint, long+long and ulong+ulong (ECMA-334 14.7.4). As such, these are the candidate operations to be considered with respect to 14.4.2. Because there are implicit casts from byte to int, uint, long and ulong, all the addition function members are applicable function members under 14.4.2.1. We have to find the best implicit cast by the rules in 14.4.2.3:

Casting(C1) to int(T1) is better than casting(C2) to uint(T2) or ulong(T2) because:

  • If T1 is int and T2 is uint, or ulong, C1 is the better conversion.

Casting(C1) to int(T1) is better than casting(C2) to long(T2) because there is an implicit cast from int to long:

  • If an implicit conversion from T1 to T2 exists, and no implicit conversion from T2 to T1 exists, C1 is the better conversion.

Hence the int+int function is used, which returns an int.

Which is all a very long way to say that it's buried very deep in the C# specification.

CLI

The CLI operates only on 6 types (int32, native int, int64, F, O, and &). (ECMA-335 partition 3 section 1.5)

Byte (int8) is not one of those types, and is automatically coerced to an int32 before the addition. (ECMA-335 partition 3 section 1.6)

Episiotomy answered 2/6, 2009 at 23:0 Comment(1)
That the ECMA only specifies those particular operations would not prevent a language from implementing other rules. VB.NET will helpfully allow byte3 = byte1 And byte2 without a cast, but unhelpfully will throw a runtime exception if int1 = byte1 + byte2 yields a value over 255. I don't know if any languages would allow byte3 = byte1+byte2 and throw an exception when that exceeds 255, but not throw an exception if int1 = byte1+byte2 yields a value in the range 256-510.One
L
29

The answers indicating some inefficiency adding bytes and truncating the result back to a byte are incorrect. x86 processors have instructions specifically designed for integer operation on 8-bit quantities.

In fact, for x86/64 processors, performing 32-bit or 16-bit operations are less efficient than 64-bit or 8-bit operations due to the operand prefix byte that has to be decoded. On 32-bit machines, performing 16-bit operations entail the same penalty, but there are still dedicated opcodes for 8-bit operations.

Many RISC architectures have similar native word/byte efficient instructions. Those that don't generally have a store-and-convert-to-signed-value-of-some-bit-length.

In other words, this decision must have been based on perception of what the byte type is for, not due to underlying inefficiencies of hardware.

Lycanthropy answered 2/6, 2009 at 21:30 Comment(5)
+1; if only this perception wasn't wrong every single time I have ever shifted and OR'd two bytes in C#...Ewan
There shouldn't be any performance cost for truncating the result. In x86 assembly it is just the difference between copying one byte of out the register or four bytes out of the register.Nabala
@JonathanAllen Exactly. The only difference is, ironically enough, when performing a widening conversion. The current design incurs a performance penalty to execute the widening instruction (either signed extend or unsigned extend.)Mortal
"perception of what the byte type is for" -- That may explain this behavior for byte (and char), but not for short which semantically is clearly a number.Christalchristalle
I still don't understand. Why would they limit the potential use of bytes? I have a program which I want to optimize by using bytes instead of ints to save memory, and addition is such a natural thing to want to use. I don't care about the overflow aspect.Saenz
C
13

I remember once reading something from Jon Skeet (can't find it now, I'll keep looking) about how byte doesn't actually overload the + operator. In fact, when adding two bytes like in your sample, each byte is actually being implicitly converted to an int. The result of that is obviously an int. Now as to WHY this was designed this way, I'll wait for Jon Skeet himself to post :)

EDIT: Found it! Great info about this very topic here.

Clydeclydebank answered 2/6, 2009 at 20:6 Comment(0)
F
8

This is because of overflow and carries.

If you add two 8 bit numbers, they might overflow into the 9th bit.

Example:

  1111 1111
+ 0000 0001
-----------
1 0000 0000

I don't know for sure, but I assume that ints, longs, anddoubles are given more space because they are pretty large as it is. Also, they are multiples of 4, which are more efficient for computers to handle, due to the width of the internal data bus being 4 bytes or 32 bits (64 bits is getting more prevalent now) wide. Byte and short are a little more inefficient, but they can save space.

Fulgor answered 2/6, 2009 at 20:3 Comment(7)
But the larger data types dont follow the same behavior.Recollection
Issues of overflow are an aside. If you were to take your logic and apply it to the language, then all data types would return a larger data type after addition arithmetic, which is most definitely NOT the case. int + int = int, long + long = long. I think the question is in regards to the inconsistency.Hopeless
That was my first thought but then why doesn't int+int = long? So I'm not buying the "possible overflow" arguement... yet <grin>.Damato
Oh, and about the "possible overflow" argeument, why not byte + byte = short?Damato
A) Why does it work the way it works given the rules of C#? See my answer below. B) Why was it designed the way it is? Probably just usability considerations, based on subjective judgements on the way most people tend to use ints and bytes.Copaiba
@Joseph: Would there be any real problem with a spec that mandated that in all cases where an integer expression which doesn't involve non-constant right shifts could be evaluated with all intermediate results fitting within the largest signed integer type, the expressions should yield the same result as they would if evaluated using such types? It shouldn't be hard for compilers to identify the largest possible types needed at each stage (e.g. if integers are specified to wrap, int1=int2+int3; could use 32 bits, but int1=(int2+int3)/2; should not unless it can keep the 'carry'.)One
@Joseph: The fact that longVar &= ~0x80000000 works very differently from longVar &= 0x40000000 or longVar &= 0x100000000 represents to me a significant deficiency in the spec. Further, if a language is going to squawk when assigning larger numbers to smaller variables, it should regard the result of x & y where both types are signed or both unsigned, as being the smaller input type; if one is signed and the other not, the result should be the unsigned type.One
A
5

From the C# language spec 1.6.7.5 7.2.6.2 Binary numeric promotions it converts both operands to int if it can't fit it into several other categories. My guess is they didn't overload the + operator to take byte as a parameter but want it to act somewhat normally so they just use the int data type.

C# language Spec

Allin answered 2/6, 2009 at 20:13 Comment(0)
C
4

My suspicion is that C# is actually calling the operator+ defined on int (which returns an int unless you are in a checked block), and implicitly casting both of your bytes/shorts to ints. That's why the behavior appears inconsistent.

Copaiba answered 2/6, 2009 at 20:5 Comment(1)
It pushs both bytes on the stack, then it calls the "add" command. In IL, add "eats" the two values and replaces them with an int.Nabala
T
3

This was probably a practical decision on the part of the language designers. After all, an int is an Int32, a 32-bit signed integer. Whenever you do an integer operation on a type smaller than int, it's going to be converted to a 32 bit signed int by most any 32 bit CPU anyway. That, combined with the likelihood of overflowing small integers, probably sealed the deal. It saves you from the chore of continuously checking for over/under-flow, and when the final result of an expression on bytes would be in range, despite the fact that at some intermediate stage it would be out of range, you get a correct result.

Another thought: The over/under-flow on these types would have to be simulated, since it wouldn't occur naturally on the most likely target CPUs. Why bother?

Toddler answered 2/6, 2009 at 20:6 Comment(0)
M
2

This is for the most part my answer that pertains to this topic, submitted first to a similar question here.

All operations with integral numbers smaller than Int32 are rounded up to 32 bits before calculation by default. The reason why the result is Int32 is simply to leave it as it is after calculation. If you check the MSIL arithmetic opcodes, the only integral numeric type they operate with are Int32 and Int64. It's "by design".

If you desire the result back in Int16 format, it is irrelevant if you perform the cast in code, or the compiler (hypotetically) emits the conversion "under the hood".

For example, to do Int16 arithmetic:

short a = 2, b = 3;

short c = (short) (a + b);

The two numbers would expand to 32 bits, get added, then truncated back to 16 bits, which is how MS intended it to be.

The advantage of using short (or byte) is primarily storage in cases where you have massive amounts of data (graphical data, streaming, etc.)

Mousebird answered 5/7, 2009 at 20:32 Comment(0)
P
1

Addition is not defined for bytes. So they are cast to int for the addition. This true for most math operations and bytes. (note this is how it used to be in older languages, I am assuming that it hold true today).

Parapet answered 2/6, 2009 at 20:6 Comment(0)
I
1

I've test performance between byte and int.
With int values :

class Program
{
    private int a,b,c,d,e,f;

    public Program()
    {
        a = 1;
        b = 2;
        c = (a + b);
        d = (a - b);
        e = (b / a);
        f = (c * b);
    }

    static void Main(string[] args)
    {
        int max = 10000000;
        DateTime start = DateTime.Now;
        Program[] tab = new Program[max];

        for (int i = 0; i < max; i++)
        {
            tab[i] = new Program();
        }
        DateTime stop = DateTime.Now;

        Debug.WriteLine(stop.Subtract(start).TotalSeconds);
    }
}

With byte values :

class Program
{
    private byte a,b,c,d,e,f;

    public Program()
    {
        a = 1;
        b = 2;
        c = (byte)(a + b);
        d = (byte)(a - b);
        e = (byte)(b / a);
        f = (byte)(c * b);
    }

    static void Main(string[] args)
    {
        int max = 10000000;
        DateTime start = DateTime.Now;
        Program[] tab = new Program[max];

        for (int i = 0; i < max; i++)
        {
            tab[i] = new Program();
        }
        DateTime stop = DateTime.Now;

        Debug.WriteLine(stop.Subtract(start).TotalSeconds);
    }
}

Here the result:
byte : 3.57s 157mo, 3.71s 171mo, 3.74s 168mo with CPU ~= 30%
int : 4.05s 298mo, 3.92s 278mo, 4.28 294mo with CPU ~= 27%
Conclusion :
byte use more the CPU but it cost les memory and it's faster (maybe because there are less byte to alloc)

Iolaiolande answered 1/11, 2018 at 11:37 Comment(1)
There are many things wrong with this benchmark, please use BenchmarkDotNet instead, especially for tiny measurements like thisMusgrove
N
0

I think it's a design decission about which operation was more common... If byte+byte = byte maybe much more people will be bothered by having to cast to int when an int is required as result.

Notepaper answered 2/6, 2009 at 20:5 Comment(3)
I for once am bothered the other way :) I always seem to need the byte result, so I always have to cast.Ewan
Except you don't have to cast to int. The cast is implicit. Only the other way is explicit.Hopfinger
@nikie I think you didn't understand my answer. If adding two bytes would produce a byte, in order to prevent overflows someone would have to cast the operands (not the result) to int prior the addition.Notepaper
P
0

From .NET Framework code:

// bytes
private static object AddByte(byte Left, byte Right)
{
    short num = (short) (Left + Right);
    if (num > 0xff)
    {
        return num;
    }
    return (byte) num;
}

// shorts (int16)
private static object AddInt16(short Left, short Right)
{
    int num = Left + Right;
    if ((num <= 0x7fff) && (num >= -32768))
    {
        return (short) num;
    }
    return num;
}

Simplify with .NET 3.5 and above:

public static class Extensions 
{
    public static byte Add(this byte a, byte b)
    {
        return (byte)(a + b);
    }
}

now you can do:

byte a = 1, b = 2, c;
c = a.Add(b);

Polymorphous answered 1/2, 2010 at 10:23 Comment(0)
P
-1

In addition to all the other great comments, I thought I would add one little tidbit. A lot of comments have wondered why int, long, and pretty much any other numeric type doesn't also follow this rule...return a "bigger" type in response to arithmatic.

A lot of answers have had to do with performance (well, 32bits is faster than 8bits). In reality, an 8bit number is still a 32bit number to a 32bit CPU....even if you add two bytes, the chunk of data the cpu operates on is going to be 32bits regardless...so adding ints is not going to be any "faster" than adding two bytes...its all the same to the cpu. NOW, adding two ints WILL be faster than adding two longs on a 32bit processor, because adding two longs requires more microops since you're working with numbers wider than the processors word.

I think the fundamental reason for causing byte arithmetic to result in ints is pretty clear and straight forward: 8bits just doesn't go very far! :D With 8 bits, you have an unsigned range of 0-255. That's not a whole lot of room to work with...the likelyhood that you are going to run into a bytes limitations is VERY high when using them in arithmetic. However, the chance that you're going to run out of bits when working with ints, or longs, or doubles, etc. is significantly lower...low enough that we very rarely encounter the need for more.

Automatic conversion from byte to int is logical because the scale of a byte is so small. Automatic conversion from int to long, float to double, etc. is not logical because those numbers have significant scale.

Poucher answered 3/6, 2009 at 16:17 Comment(3)
This still doesn't explain why byte - byte returns int, or why they don't cast to short...Tuinenga
Why would you want addition to return a different type than subtraction? If byte + byte returns int, because 255+anything is greater than a byte can hold, it doesn't make sense to have any byte minus any other byte return anything other than an int from a return type consistency standpoint.Poucher
I wouldn't, it just shows that the above reason is probably not right. If it had to do with "fitting" into the result, then byte subtraction would return a byte, and byte addition would return a short (byte + byte will always fit into a short). If it was about consistency like you say, then short would still suffice for both operations rather than int. Clearly there is a mixture of reasons, not all of them necessarily well thought-out. Or, the performance reason given below may be more accurate.Tuinenga

© 2022 - 2024 — McMap. All rights reserved.