PHP Efficient way to Convert String of Binary into Binary
Asked Answered
U

1

6

here is the skinny (scroll down to see the problem): I am doing Huffman Encoding to compress a file using PHP (for a project). I have made the map, and made everything into a string like so:

00101010001100001110011101001101111011111011

Now, I need to convert that into an actual binary string, in its current state, it is only a string of 1s and 0s.

Here is the problem:

The string of 1s and 0s is 17,747,595 characters long, and it is really slowing down at around 550,000

This is the code I have:

<?php

$i=0
$len = strlen($binaryString);

while ($i < $len){
    $section = substr($binaryString,$i,$i+8);
    $out .= chr(bindec($section));
    $i=$i+8;
}

?>

How can I make this efficient enough to run the 17 million character string?

Thanks very much for any support!

Uuge answered 6/11, 2012 at 19:50 Comment(8)
Did you take a look @ #6383238Incognito
Yes, base_convert won't accept it because it is too long :PUuge
Don't write it to a variable in whole, but to some file cache after X bytes. That way, not the whole string is loaded on each iteration to append the next few bytes.Nebulose
Yes, the original file being encoded is 4MB, and then broken out via Huffman to the 17m… I know there has to be an efficient way of doing this, I just dont know what it is lol.Uuge
@feeela, I actually am, I just didn't want to clutter the code with that :) It writes at $i % 2000Uuge
can't you convert it to decimal or hex first then do what you need to and then convert it back to binary? Also what are you trying to do with the binary string?Micrometeorite
Normally PHP should run the garbage collector to unset not used data. I would try to use a unset( $out ); before the end of the loop-block and see if that matters. Or use some fopen function to read in X bytes of the input, perform your actions and write to another file. There should be only X bytes of memory used on each iteration.Nebulose
why didnt you try to make bit stream instead of bit string? i mean just use 8 bit of the bye from the begining. it s because of locality of reference.Dasie
N
5

You don't need to loop you can use gmp with pack

$file = "binary.txt";
$string = file_get_contents($file);
$start = microtime(true);

// Convert the string
$string = simpleConvert($string);
//echo $string ;

var_dump(number_format(filesize($file),2),microtime(true)- $start);

function simpleConvert($string) {
    return pack('H*',gmp_strval(gmp_init($string, 2), 16));
}

Output

string '25,648,639.00' (length=13) <---- Length Grater than 17,747,595
float 1.0633520126343  <---------------- Total Conversion Time 

Links

Note Solution requires GMP Functions

Newsletter answered 6/11, 2012 at 20:27 Comment(3)
Wow! I like that approach, but I seem to be getting a "Segmentation fault" upon initializing GMP in gmp_init($string, 2); any ideas what that is about? (Yes, I have GMP installed :)Uuge
What version of PHP & GMP ?Newsletter
Ahh… I was running it on PHP/5.2.10 GD/4.1.4, but I moved it to my server with PHP/5.4.7 GMP/4.3.2 and it works like a charm :) Well done! Thanks @NewsletterUuge

© 2022 - 2024 — McMap. All rights reserved.