How do I transparently compress/decompress a file as a program writes to/reads from it?
Asked Answered
I

5

8

I have a program that reads and writes very large text files. However, because of the format of these files (they are ASCII representations of what should have been binary data), these files are actually very easily compressed. For example, some of these files are over 10GB in size, but gzip achieves 95% compression.

I can't modify the program but disk space is precious, so I need to set up a way that it can read and write these files while they're being transparently compressed and decompressed.

The program can only read and write files, so as far as I understand, I need to set up a named pipe for both input and output. Some people are suggesting a compressed filesystem instead, which seems like it would work, too. How do I make either work?

Technical information: I'm on a modern Linux. The program reads a separate input and output file. It reads through the input file in order, though twice. It writes the output file in order.

Interpellation answered 16/4, 2009 at 7:56 Comment(2)
Feel free to edit my tags. I found it very difficult to choose appropriate ones. Also, if this is a duplicate, as always, let me know and I'll be happy to delete ...Interpellation
this isn't programming related, as you can't change your program. you either need bigger disks, or a r/w compressed file system.Almshouse
H
5

Check out zlibc: http://zlibc.linux.lu/.

Also, if FUSE is an option (i.e. the kernel is not too old), consider: compFUSEd http://www.biggerbytes.be/

Haukom answered 16/4, 2009 at 8:2 Comment(5)
Can I write with zlibc, too? It's as crucial that I can write as read.Interpellation
zlibc is mainly for writing new programs that compress, and you said you couldn't touch your program. I voted this one up for the mention of compuFUSEd, that sounds like a good fit for your problem.Photoflash
zlibc is read-only, but definitely can be used without recompiling too, through LD_PRELOAD mechanism.Haukom
Dead link for compFUSEd and I couldn't find a replacement.Greave
@KenSharp Perhaps, code.google.com/p/fusecompress/wiki/Usage ? Or something from the link given in the answer: https://mcmap.net/q/1311123/-how-do-i-transparently-compress-decompress-a-file-as-a-program-writes-to-reads-from-it ? Or lessfs.com/wordpress described at phoronix.com/scan.php?page=news_item&px=MTA0MzQ ?Fleabag
K
2

named pipes won't give you full duplex operations, so it will be a little bit more complicated if you need to provide just one filename.

Do you know if your applications needs to seek through the file ?

Does your application work with stdin, stdout ?

Maybe a solution is to create a mini compressed file system that contains only a directory with your files

Since you have separate input and output file you can do the following :

mkfifo readfifo
mkfifo writefifo
zcat your inputfile > readfifo &
gzip writefifo > youroutputfile &

launch your program !

Now, you probably will get in trouble with the read twice in order of the input, because as soon as zcat is finished reading the input file, yout program will get a SIGPIPE signal

The proper solution is probably to use a compressed file system like CompFUSE, because then you don't have to worry about unsupported operations like seek.

Kaslik answered 16/4, 2009 at 8:0 Comment(1)
I've edited my question to address your inquiries. The program does not read or write stdin/out.Interpellation
U
2

btrfs:

https://btrfs.wiki.kernel.org/index.php/Main_Page

provides support for pretty fast "automatic transparent compression/decompression" these days, and is present (though marked experimental) in newer kernels.

Universally answered 20/3, 2013 at 19:40 Comment(0)
M
0

Which language are you using?

If you are using Java, take a look at GZipInputStream and GZipOutputStream classes in the API doc.

If you are using C/C++, zlibc is probably the best way to go about it.

Mealtime answered 16/4, 2009 at 8:11 Comment(1)
I cannot change the program, so this must work outside of the program. I'm cool with any language, but I thought this was more working with Linux than any programming.Interpellation

© 2022 - 2024 — McMap. All rights reserved.