Converting a String representation of bits to a byte
Asked Answered
M

3

9

I'm just beginning to learn about file compression and I've run into a bit of a roadblock. I have an application that will encode a string such as "program" as a compressed binary representation "010100111111011000"(note this is still stored as a String).

Encoding
g       111
r       10
a       110
p       010
o       011
m       00

Now I need to write this to the file system using a FileOutputStream, the problem I'm having is, how can I convert the string "010100111111011000" to a byte[]/bytes to be written to the file system with FileOutputStream?

I've never worked with bits/bytes before so I'm kind of at a dead end here.

Maisonette answered 26/11, 2011 at 0:43 Comment(8)
You talk about a "compressed binary representation" then say you have a String that is 18 characters long ("010100111111011000") to represent a word that is 7 characters long ("program"). Are you sure you mean what you're asking? Normally you would have those bits set in X number of bytes (3 in this case).Minard
Look up 'bit shift operators': >>, >>>, <<.Wrist
Brian, the original message is 56bits in size when translated to binary, the encoded message is only 18bits. Kevin,people keep telling me that, but I still can't draw the link between using those operators and being able to translate this to a byte array.Maisonette
@JohnLotacs - No, it's not, if you're talking about Strings which you say you are in your question which is the source of confusion. If you have a String as you say, you don't have bits. You have a bunch of the characters 0 and 1 (specifically, you have a 16bit Unicode char for each, making your memory use 36 bytes before the overhead of the String object) - to be clear, if you have a String you have the textual representation of a set of bits, expressed using the characters 0 and 1.Minard
Brian, that IS the question, converting a String representation of bits to a set of bytes.Maisonette
@JohnLotacs - you wouldn't, ever, in relation to the things you are talking about. Why do you have a String ?Minard
Because it was easiest to build that encoding map with a huffman tree by doing traversals and appending 0/1 to a prefix on a StringBuffer. en.wikipedia.org/wiki/Huffman_codingMaisonette
@JohnLotacs Do you have your final solutions stil somewhere in code? I have the exact same problem, but I can't get it workingOrdure
W
6

An introduction to bit-shift operators:

First, we have the left-shift operator, x << n. This will shift all the bits in x left by n bits, filling the new bits with zero:

      1111 1111 
<< 3: 1111 1000

Next, we have the signed right-shift operator, x >> n. This shifts all the bits in x right by n, copying the sign bit into the new bits:

      1111 1111 
>> 3: 1111 1111

      1000 0000
>> 3: 1111 0000

      0111 1111 
>> 3: 0000 1111

Finally, we have the zero-fill right-shift operator, x >>> n. This shifts all bits in x right by n bits, filling the new bits with zero:

       1111 1111 
>>> 3: 0001 1111

You may also find useful the bitwise-or operator, x | y. This compares the bits in each position in x and y, setting the new number's bit on if it was on in either x or y, off otherwise:

  1010 0101
| 1010 1010
  ---------
  1010 1111

You should only need the previous operators for the problem at hand, but for the sake of completeness, here are the last two:

The bitwise-and operator, x & y sets the bits in the output to one if and only if the bit is on in both x and y:

  1010 0101
& 1010 1010
  ---------
  1010 0000

The bitwise-xor operator, x ^ y sets the output bits to one if the bit is on in one number or the other but not both:

  1010 0101
^ 1010 1010
  ---------
  0000 1111

Now, applying these to the situation at hand:

You will need to use the bit-shift operators to add and manipulate bits. Start setting bits at the right side according to their string representations and shift them over. Continue until you hit the end of a byte, and then move to the next byte. Say we want to create a byte representation of "1100 1010":

Our byte    Target
---------   --------
0000 0000
            1100 1010
0000 0001   ^
            1100 1010
0000 0011    ^
            1100 1010
0000 0110     ^
            1100 1010
0000 1100      ^
            1100 1010
0001 1001        ^
            1100 1010
0011 0010         ^
            1100 1010
0110 0101          ^
            1100 1010
1100 1010           ^

I will, of course, leave it to you to apply this to your work.

Wrist answered 26/11, 2011 at 2:18 Comment(2)
One question, to start my byte as 0000 0001, this is the same as writing byte b = 1; ? I'm unsure, because of the signed nature of the byte, how to know what the binary representation is because I don't know what bit is representing the sign.Maisonette
You could do that, but for consistency you will want to start with a zero byte and then enter a for or while loop. I'll edit the example a bit to see if I can make this a bit more clear.Wrist
D
1

Chop your String up into lengths of 8 and call Byte#parseByte. If you set the radix to 2, it will parse the String as a binary number.

Duston answered 26/11, 2011 at 1:40 Comment(6)
Exception in thread "main" java.lang.NumberFormatException: Value out of range. Value:"10000000" Radix:2 It works only on lengths of 7 unless there are leading zeros, any idea?Maisonette
@John Lotacs I have no idea why it's doing this, but you can can use Integer#parseInt and cast it to byte for a workaround.Duston
@jeff It's doing that because byte is signed, so it needs to be -111 1111 to +111 1111 (-128 to +127). A byte with bits of 1000 0000 is actually -128, and would have to be fed to the parser as -1000 0000.Wrist
@Wrist Why can't it just take 1000 000? Is it just a bit of laziness on the coder's part or am I missing something?Duston
The parseByte method parses the value of the text, not the individual bits. 1000 0000 is 128, which is out of bounds for a byte, which has a max of 127. It would be in range for an unsigned byte, but Java doesn't have unsigned types (except, I believe, char).Wrist
@Wrist Ahhhh, now I see. Yeah, char is unsigned.Duston
I
0

I guess, you want to write these zeros and ones as binary values in a file. I so, you can iterate the string taking 8 signs everytime (String.substring() or smth) and create bytes with Byte(String) constructor. It's the easiest solution that comes to my mind for now.

If i'm not right about the problem, tell more about it please.

Infantile answered 26/11, 2011 at 1:3 Comment(2)
I tried this, the Byte(String) constructor will take a string "0011" and literally interpret it as the decimal number 11.Maisonette
That's why you should you Byte(String s, int radix) constructor to set binary radix.Infantile

© 2022 - 2024 — McMap. All rights reserved.