How to decompress lzo_deflate file?
Asked Answered
C

3

7

I used LZO to compress reduce output. I tried this: Hadoop-LZO project of Kevin Weil and then used LzoCodec class with my job:

TextOutputFormat.setOutputCompressorClass(job, LzoCodec.class);

Now compression works just fine.

My problem is that compression result is a .lzo_deflate file which I just can't decompress.
Lzop utility doesn't seem to support that type of file.
LzopCodec is supposed to give a .lzo file, but it did not work, however it's in th same package as LzoCodec (org.apache.hadoop.io.compress) which may refer to a compatibility issue, since I used the old API (0.19) to make compression works.

Answers to this question suggest Python solutions, however I need it in Java.
I'm using Hadoop 1.1.2 and Java 6.

Corinnecorinth answered 21/5, 2013 at 18:27 Comment(5)
What do you mean by "it did not work" for LzopCodec? LzopCodec is recommended over LzoCodec, it should be working. Can you include the error you have using that?Negligence
Yes.Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/io/compress/LzopCodec. I tried to fix the issue until I read somewhere that LzoCodec is more recommended. So that should be clear before.Corinnecorinth
The big difference is that Lzop adds headers while Lzo doesn't. Have you updated your hadoop-env.sh and set the HADOOP_CLASSPATH and JAVA_LIBRARY_PATH correctly?Negligence
Yes I have. I also commented out the JAVA_LIBRARY_PATH = '' in the /path/to/hadoop/bin/hadoop file. I checked the lzop library using /path/to/hadoop/bin/hadoop classpath command, lzop lib is there the last one. It should work like LzoCodec worked. Have you any idea @CharlesMenguy ?Corinnecorinth
I also tried to execute the two exports (export HADOOP_CLASSPATH= and export JAVA_LIBRARY_PATH=) through the command line, but the same thing.Corinnecorinth
C
7

.lzo_deflate means an LZO stream without the usual header and trailer. So you would need to wrap the raw .lzo_deflate stream with the header and trailer expected by lzop. Or at least the header, and then ignore errors from the missing trailer. You'll need to look at the header and trailer documentation.

The "deflate" in the name is an odd choice, but it refers to the gzip analogy, where the raw compressed data format without the gzip header and trailer is called deflate.

Ceric answered 21/5, 2013 at 20:48 Comment(2)
Thanks @Mark Adler for your answer. I understand more now.Corinnecorinth
Can't we decompress it via command line using the "lzop" tool? I am getting an error like "not a lzop file".Rebba
C
4

I came across the same issue. The reason it happened because I was not using the right codec. Please check your codec in job configuration.

job.getConfiguration().set("mapred.output.compression.codec","com.hadoop.compression.lzo.LzopCodec");
Cralg answered 23/10, 2014 at 21:18 Comment(0)
S
0

This answer helped me to convert from .lzo_deflate to required output format:

hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-0.20.2-cdh3u2.jar \
  -Dmapred.output.compress=true \
  -Dmapred.compress.map.output=true \
  -Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
  -input <input-path> \
  -output $OUTPUT \
  -mapper "/bin/cat"
Scrubber answered 21/5, 2020 at 19:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.