hadoop mapreduce: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

Asked 3/3, 2014 at 15:16 Answered 29/12, 2018 at 4:0

java hadoop mapreduce sequencefile snappy

I am trying to write a snappy block compressed sequence file from a map-reduce job. I am using hadoop 2.0.0-cdh4.5.0, and snappy-java 1.0.4.1

Here is my code:

package jinvestor.jhouse.mr;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.Arrays;
import java.util.List;

import jinvestor.jhouse.core.House;
import jinvestor.jhouse.core.util.HouseAvroUtil;
import jinvestor.jhouse.download.HBaseHouseDAO;

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.SnappyCodec;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;

/**
 * Produces mahout vectors from House entries in HBase.
 * 
 * @author Michael Scott Knapp
 * 
 */
public class HouseVectorizer {

    private final Configuration configuration;
    private final House minimumHouse;
    private final House maximumHouse;

    public HouseVectorizer(final Configuration configuration,
            final House minimumHouse, final House maximumHouse) {
        this.configuration = configuration;
        this.minimumHouse = minimumHouse;
        this.maximumHouse = maximumHouse;
    }

    public void vectorize() throws IOException, ClassNotFoundException, InterruptedException {
        JobConf jobConf = new JobConf();
        jobConf.setMapOutputKeyClass(LongWritable.class);
        jobConf.setMapOutputValueClass(VectorWritable.class);

        // we want the vectors written straight to HDFS,
        // the order does not matter.
        jobConf.setNumReduceTasks(0);

        Path outputDir = new Path("/home/cloudera/house_vectors");
        FileSystem fs = FileSystem.get(configuration);
        if (fs.exists(outputDir)) {
            fs.delete(outputDir, true);
        }

        FileOutputFormat.setOutputPath(jobConf, outputDir);

        // I want the mappers to know the max and min value
        // so they can normalize the data.
        // I will add them as properties in the configuration,
        // by serializing them with avro.
        String minmax = HouseAvroUtil.toBase64String(Arrays.asList(minimumHouse,
                maximumHouse));
        jobConf.set("minmax", minmax);

        Job job = Job.getInstance(jobConf);
        Scan scan = new Scan();
        scan.addFamily(Bytes.toBytes("data"));
        TableMapReduceUtil.initTableMapperJob("homes", scan,
                HouseVectorizingMapper.class, LongWritable.class,
                VectorWritable.class, job);
        job.setOutputFormatClass(SequenceFileOutputFormat.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(VectorWritable.class);
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(VectorWritable.class);

        SequenceFileOutputFormat.setOutputCompressionType(job, SequenceFile.CompressionType.BLOCK);
        SequenceFileOutputFormat.setOutputCompressorClass(job, SnappyCodec.class);
        SequenceFileOutputFormat.setOutputPath(job, outputDir);
        job.getConfiguration().setClass("mapreduce.map.output.compress.codec", 
                SnappyCodec.class, 
                CompressionCodec.class);

        job.waitForCompletion(true);
    }

When I run it I get this:

java.lang.Exception: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:401)
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
    at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
    at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118)
    at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1169)
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1080)
    at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.<init>(SequenceFile.java:1400)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:274)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:617)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:737)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:233)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

If I comment out these lines then my test passes:

SequenceFileOutputFormat.setOutputCompressionType(job, SequenceFile.CompressionType.BLOCK);
        SequenceFileOutputFormat.setOutputCompressorClass(job, SnappyCodec.class);
        job.getConfiguration().setClass("mapreduce.map.output.compress.coded", 
                SnappyCodec.class, 
                CompressionCodec.class);

However, I really want to use snappy compression in my sequence files. Can somebody please explain to me what I am doing wrong?

Starve answered 3/3, 2014 at 15:16 Comment(10)

How you installed the LZO and how you run your Job? – Annemarie 3/3, 2014 at 15:18

I am not using LZO compression afaik, just snappy. I am running the job from a unit test. – Starve 3/3, 2014 at 15:23

True, my mistake. However, you need to set the property java.library.path. For example: -Djava.library.path=/lib/hadoop/native – Annemarie 3/3, 2014 at 15:25

I create my configuration using it's default no-arg constructor, and pass that as a constructor-arg to my HouseVectorizer. Then I call the vectorize method. I am running this on cloudera's pre-built cdh 4.5 VM – Starve 3/3, 2014 at 15:26

I don't think I need to set java.library.path here, like I said this whole thing passes if I just comment out the lines to do snappy compression. I am using maven to control dependencies, so that is how the hadoop stuff is getting on my classpath. – Starve 3/3, 2014 at 15:27

Actually you need to set it. You are getting linking error. You comment the lines, the code executed. Does this tell you something? – Annemarie 3/3, 2014 at 15:28

I added this but it did not change anything: System.setProperty("java.library.path",System.getProperty("java.library.path")+ ":/lib/hadoop/native"); – Starve 3/3, 2014 at 15:32

Depends on your installation. /lib/hadoop/native is an example – Annemarie 3/3, 2014 at 15:33

my maven dependencies all match what is in my hadoop lib directories, I seriously doubt that is the problem. I just checked and my snappy jar and hadoop jars all have the same version as those on my maven pom files. In any case I tried your advice anyways, using /usr/lib/hadoop/lib, /usr/lib/hadoop-mapreduce/lib, and /usr/lib/hadoop-0.20-mapreduce/lib as the first three entries in my system property. It did not change anything. – Starve 3/3, 2014 at 15:48

switching to DefaultCodec works, but now it is using the deflate algorithm, which is not as fast as snappy. – Starve 3/3, 2014 at 16:31

Found the following information from the Cloudera Communities

Ensure that LD_LIBRARY_PATH and JAVA_LIBRARY_PATH contains the native directory path having the libsnappy.so** files.
Ensure that LD_LIBRARY_PATH and JAVA_LIBRARY path have been exported in the SPARK environment(spark-env.sh).

For example I use Hortonworks HDP and I have the following configuration in my spark-env.sh

export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/hdp/2.2.0.0-2041/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/hdp/2.2.0.0-2041/hadoop/lib/native
export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"

Quagmire answered 14/5, 2015 at 22:31 Comment(2)

I had a similar issue.Mine was a java aplication. Adding native lib path to LD_LIBRARY_PATH helped to resolve the issue. Did export LD_LIBRARY_PATH = $LD_LIBRARY_PATH :/path/to/hadoop/lib/native. Then did java -jar <application.jar>. Thanks a lot!! – Fic 9/5, 2018 at 14:35

This did not resolve the issue for me. Even after exporting the libsnappy paths to the above two library paths, the same error remains. – Incalescent 27/8, 2018 at 15:3

check your core-site.xml and mapred-site.xml they should contain correct properties and path of the folder with libraries

core-site.xml

<property>
  <name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

mapred-site.xml

 <property>
      <name>mapreduce.map.output.compress</name>
      <value>true</value>
    </property>

    <property>
     <name>mapred.map.output.compress.codec</name>  
     <value>org.apache.hadoop.io.compress.SnappyCodec</value>
    </property>


    <property>
      <name>mapreduce.admin.user.env</name>
      <value>LD_LIBRARY_PATH=/usr/hdp/2.2.0.0-1084/hadoop/lib/native</value>
    </property>

LD_LIBRARY_PATH - has to contain path of libsnappy.so .

Wreckful answered 28/1, 2015 at 8:42 Comment(0)

My problem was that my JRE did not contain the appropriate native libraries. This may or may not be because I switched the JDK from cloudera's pre-built VM to JDK 1.7. The snappy .so files are in your hadoop/lib/native directory, the JRE needs to have them. Adding them to the classpath did not seem to resolve my issue. I resolved it like this:

$ cd /usr/lib/hadoop/lib/native
$ sudo cp *.so /usr/java/latest/jre/lib/amd64/

Then I was able to use the SnappyCodec class. Your paths may be different though.

That seemed to get me to the next problem:

Caused by: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.

Still trying to resolve that.

Starve answered 3/3, 2014 at 16:47 Comment(1)

Guys copying those files will result in a problematic situation once you upgrade the CDH version. You need to copy them with every CDH upgrade and believe me you would have forgotten that you copied those files by than. The proper way is to work with LD_LIBRARY_PATH! You need to make sure it has the correct value on the Gateway instances. At CDH, it might be that you have overridden it. The defaults are normally fine there. When doing this remotely you can user java -cp … then you set -Djava.library.path. – Neoimpressionism 14/1, 2015 at 20:30

I you need all files, not only the *.so ones. Also ideally you would include the folder to your path instead of copying the libs from there. You need to restart the MapReduce service after this, so that the new libraries are taken and can be used.

Niko

Neoimpressionism answered 10/7, 2014 at 14:23 Comment(0)

after removing hadoop.dll (which i copied manually) from windows\system32 and setting up HADOOP_HOME=\hadoop-2.6.4 IT WORKS!!!

Chlamydospore answered 29/8, 2016 at 15:53 Comment(0)

In my case, you may check the hive-conf files : mapred-site.xml , and check the key: mapreduce.admin.user.env 's value,

I tested it in a new datanode, and received unlinked-buildSnappy error on the machine where is no native dependencies ( libsnappy.so , etc)

Teeth answered 29/12, 2018 at 4:0 Comment(0)

Recommended topics

Hot tags