How does Java's bit shift operator work under the hood?
Asked Answered
L

2

7

I didn't study IT, and only very recently came across bit shifts and an application for two's complement. So, can you please use simple English in your explanations and assume I know hardly anything about IP addresses, bit operations, and Java datatypes?

Today, I found the following piece of code (abbreviated):

long m = (-1) << (byte) 16;

Now, this is for IP subnet masking. I know I need to start out with 4 blocks of 8 bits (i.e. 4 bytes), and all bits have to be "switched on": 11111111 11111111 1111111 1111111 Next, zeros are shifted in from the right, in this case 16 bits' worth; so we get 11111111 11111111 00000000 0000000, the mask.

But I do have a few questions:

  1. Does the 16 have to be of type byte for this to work?
  2. The result is of type long. When the expression above runs, -1 gets converted into - effectively - 4x8bit blocks. How does Java know it needs 32 positions/bits (an IP address' length), and not, say, 16, or 8, when applying two's complement? (I'm guessing that has to do with the long datatype?)
  3. Why is two's complement applied to -1 to begin with? (Google gives you -0b1 if you ask it what -1 is in binary. I first thought it might be to do with overflow, but it isn't, is it...?)
  4. Really, what datatypes does the compiler convert this to while it's running the code, to make it all work?

UPDATE: The 16 is produced at runtime by a method; I just put a constant in here as an example. In hindsight probably a bad idea...

Logistics answered 8/9, 2015 at 15:31 Comment(4)
A 32 bit value would fit in a register, and the bitshifting operation would typically be performed by the processor itself, something that would in assembly look like shl AX, 16. At that level there aren't really data types. You just got blocks of 8, 16, 32 or 64 bits.Strangeness
32 bits almost definitely has more to do with the fact int is 32bits wide than that is the length of an IP (also, as far as I remember, when bitshifting in Java, everything is automatically propagated to int)Sparhawk
None of your four questions have anything to do with how shifts work under the hood, so you should probably use a more descriptive title.Calistacalisthenics
@harold: Sorry. To me, it somehow did when I came up with the question, though admittedly, I know very, very little about the world of bits and bytes (as you can probably tell). Sotirios Delimanolis did come up with references about what types are applied to bit shifts in Java, so I think the title isn't too wrong. But I'm happy to rephrase it. Any suggestions? :)Logistics
M
6

Really, what datatypes does the compiler convert this to while it's running the code, to make it all work?

This

(-1) << (byte) 16;

is a constant expression. Its value is known at compile time. It's a long with the value -65536 (in decimal representation).

If the expression wasn't a constant expression, the type of the variable wouldn't matter when evaluating the expression. It would only matter later when its value is assigned to the variable.

Take for example

int i = -1;
long m = i << (byte) 16;

The expression above is one that involves a shift operator and two operands, one of type int and another of type byte.

The JLS states the following concerning shift operators and their operands

Unary numeric promotion (§5.6.1) is performed on each operand separately.

which is

Otherwise, if the operand is of compile-time type byte, short, or char, it is promoted to a value of type int by a widening primitive conversion (§5.1.2).

So the byte value is widened to an int. So no to your first question.

The result of the expression would be a value of type int (32 bits). It has to be assigned to a long (64 bits) variable, so the value would be widened to a long before being assigned.

From the JLS again

The integral types are byte, short, int, and long, whose values are 8-bit, 16-bit, 32-bit and 64-bit signed two's-complement integers, respectively, and char, whose values are 16-bit unsigned integers representing UTF-16 code units (§3.1).

That's how they are stored.

Minimum answered 8/9, 2015 at 15:44 Comment(3)
Nitpick: You seem to be contradicting yourself. "The result of the expression is a value of type int " and "It's a long with the value ".Amandine
So, in less academic terms, the numbers on both sides are int, but - because it's an operation on the bit level - we look at their bit values/representations? And because an integer in Java is 32 bits, and thanks to the two's complement, -1ends up being 32 ´1´bits (when we look at its bit representation)?Logistics
Everything is done at the bit level. These are computers we are talking about. -1 is encoded as 32 1 bits, yes.Minimum
T
2

It is actually confusing that your m variable is of long type because an IP address is 32-bit and corresponds to an int. Your right-hand side is indeed int and only after it's fully computed is it extended to long (64-bit). Answering your questions:

  1. It doesn't. You can remove the cast.
  2. The result is actually of type int, but gets converted to long because the type of m requires it.
  3. Two's complement is not "applied" to anything, really. The number -1 is encoded in two's complement. You need some way to represent negative numbers with nothing but bits. Plus, here two's complement plays a side role: it is just about -1 being encoded as all 1-bits.
  4. It's all just a block of 32 one-bits being shifted to the left, zeroes filling in the vacancy. Then, to convert to long, 32 more 1-bits are added on the left side.
Tuberculate answered 8/9, 2015 at 15:41 Comment(2)
Ok, so, next daft question: are all numbers in Java encoded with two's complement? If so, does that mean that while an integer is 32 bit long/big in Java, it really only holds 31 bits of numeric data, and the leading bit is used to indicate if it's a positive or a negative number? (I.e., is the MSB always the sign bit?)Logistics
All signed integers are in two's complement and this is not at all Java-specific. There is actually no other represetation in use. I wouldn't separate "numeric data" from sign because all 32 bits equally cooperate in representing 2^32 distinct integers.Tuberculate

© 2022 - 2024 — McMap. All rights reserved.