For working with images that are stored as .gz files (my image processing software can read .gz files for shorter/smaller disk time/space) I need to check the header of each file.
The header is just a small struct of a fixed size at the start of each image, and for images that are not compressed, checking it is very fast. For reading the compressed images, I have no choice but to decompress the whole file and then check this header, which of course slows down my program.
Would it be possible to read the first segment of a .gz file (say a couple of K), decompress this segment and read the original contents? My understanding of gz is that after some bookkeeping at the start, the compressed data is stored sequentially -- is that correct?
so instead of
1. open big file F
2. decompress big file F
3. read 500-byte header
4. re-compress big file F
do
1. open big file F
2. read first 5 K from F
as stream A
3. decompress A
as stream B
4. read 500-byte header from B
I am using libz.so
but solutions in other languages are appreciated!
dd
this way depends ongzip
writing in multiples of 1024 bytes, becausedd
is block-oriented (number ofread
system calls), not byte-oriented. Usehead -c $((1024*10))
which is easier and more efficient. See the related How to partially extract zipped huge plain text file? – Caspar