Efficient way to encode bit-vectors?
Asked Answered
E

1

7

Currently using the run length encoding for encoding bit-vectors, and the current run time is 2log(i), where is the size of the run. Is there another way of doing it to bring it down to log(i)? Thanks.

Erminna answered 13/10, 2012 at 22:21 Comment(0)
I
6

The most efficient way of encoding a bit vector is to isolate any specific properties of the bit source. If it is totally random, there is no real noticeable gain (actually, a totally random stream of bit cannot be compressed in any way).

If you can find properties in your bit stream you could try to define a collection of vectors which will define the base of a Vector Space. In such case, the result will be very efficient.

We'll need a few more details on your bit stream.


(Edit)

Just a few more details to understand the previous statement: "a totally random stream of bits cannot be compressed in any way"

It is not possible to compress a totally random vector of bits if by "compress" we mean the "transformed/compressed stream" plus the "vector base definition" plus the decompression program. But in most cases the decompression program (and often the vector base too) is embedded in client software. Thus, only the "compressed stream" is needed.

A good explanation (and funny story) about that is Patrick Craig 5000$ compression challenge

More scientific the theory of information, especially entropy section

And, the final one, the full story.

But whatever the solution is, if you have an unknown number of unknown streams to compress you won't be ale to do anything. You have to find a pattern.

Incapacitate answered 13/10, 2012 at 22:31 Comment(2)
Ok, this is a hypothetical situation for me (there is no bit source here). This is actually for a test I have coming up. In a sitaution where I had a bitvector of something like 000000010000, if I use run length encoding I get 110 111 (110 being the number of bits I have to look at next, 111 being the integer representation of the number of zeros followed by a run. ) This roughly equates to 2log(i). But something tells me there's a way to reduce closer it to log(i). Hope that clears things upErminna
It seems that your "2log(i)" is just a observation on that very example. The overall algorithm seems to be in average O(log(i)). You have to analyse this problem in worst/best case to estimate the complexity (or efficiency depending on what your are measuring) of the RunLength algo. Maybe look at "Huffman coding", could lead to something interesting.Incapacitate

© 2022 - 2024 — McMap. All rights reserved.