Writing files in bit form to a file in C
Asked Answered
C

1

13

I am implementing the huffman algorithm in C. I have got the basic functionality down up to the point where the binary codewords are obtained. so for example, abcd will be 100011000 or something similar. now the question is how do you write this code in binary form in the compressed file. I mean if I write it normally each 1 and 0 will be one character so there is no compression.

I need to write those 1s and 0s in their bit form. is that possible in C. if so how?

Citrin answered 6/12, 2009 at 20:33 Comment(4)
well i was merely asking how to do in this situation.writing the codes simply as ascii does not serve the purpose. there must be some other way.Citrin
you should generate an int instead of a char* in the encoding function, or else write a function that will convert the string into an int or long representing that sequence of bits.Judicature
@Vinko: converting to number and then storing is no good. He would have to be careful about sign and endianness, not to mention architecture (int can be different size on different arch). Unsigned char is safest choice.Disjointed
@Stan: Correct. My idea was along the lines of what Nils wrote, even if the implementation details I suggested were not appropriate. This is, do not write the ASCII value of each 1 or 0.Judicature
G
22

Collect bits until you have enough bits to fill a byte and then write it..

E.g. something like this:

int current_bit = 0;
unsigned char bit_buffer;

FILE *f;

void WriteBit (int bit)
{
  if (bit)
    bit_buffer |= (1<<current_bit);

  current_bit++;
  if (current_bit == 8)
  {
    fwrite (&bit_buffer, 1, 1, f);
    current_bit = 0;
    bit_buffer = 0;
  }
}

Once you're done writing your bits you have to flush the bit-buffer. To do so just write bits until current_bit equals to zero:

void Flush_Bits (void)
{
  while (current_bit) 
    WriteBit (0);
}
Gabardine answered 6/12, 2009 at 20:38 Comment(5)
thanks for pointing this out...so how to end the file then?...i presume we need to do this ourselves in this case.Citrin
just call Flush_Bits as defined above.Gabardine
There is a problem in the code: You only shift, when a bit is set. Assume you want to output 10000000. The 1 will never reach the most significant bit position, right?Sphere
Therefore, the first two code lines in WriteBit() must be bit_buffer <<= 1; if (bit) bit_buffer |= 0x1;Sphere
I have implemented Huffman in C++ (Planet-Source-Code.com/vb/scripts/… in case you're interested). You should end the encoded file with trailing 0s (in case the number of encoded bits is not a multiple of 8). If you implement the Decoding phase correctly, then it will simply and transparently ignore the redundant 0s at the end, because the number of symbols left to decode will be 0 at that point.Anywhere

© 2022 - 2024 — McMap. All rights reserved.