Hive 1.2 Metastore Service doesn't start after configuring it to S3 storage instead HDFS
Asked Answered
P

1

2

I have an Apache Spark Cluster(2.2.0) in standalone mode. Till now was running using HDFS to store the parquet files. I'm using the Hive Metastore Service of Apache Hive 1.2 to access, using the Thriftserver, Spark over JDBC.

Now I want to use S3 Object Storage instead HDFS. I have added the following configuration to my hive-site.xml:

<property>
  <name>fs.s3a.access.key</name>
  <value>access_key</value>
  <description>Profitbricks Access Key</description>
</property>
<property>
  <name>fs.s3a.secret.key</name>
  <value>secret_key</value>
  <description>Profitbricks Secret Key</description>
</property>
<property>
  <name>fs.s3a.endpoint</name>
  <value>s3-de-central.profitbricks.com</value>
  <description>ProfitBricks S3 Object Storage Endpoint</description>
</property>
<property>
  <name>fs.s3a.endpoint.http.port</name>
  <value>80</value>
  <description>ProfitBricks S3 Object Storage Endpoint HTTP Port</description>
</property>
<property>
  <name>fs.s3a.endpoint.https.port</name>
  <value>443</value>
  <description>ProfitBricks S3 Object Storage Endpoint HTTPS Port</description>
</property>
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>s3a://dev.spark.my_bucket/parquet/</value>
  <description>Profitbricks S3 Object Storage Hive Warehouse Location</description>
</property>

I have the hive metastore in a MySQL 5.7 database. I have added to the Hive lib folder the following jar files:

  • aws-java-sdk-1.7.4.jar
  • hadoop-aws-2.7.3.jar

I have deleted the old hive metastore schema on MySQL and then I start the metastore service with the following command: hive --service metastore & and I get the following error:

java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/ObjectMapper
        at com.amazonaws.util.json.Jackson.<clinit>(Jackson.java:27)
        at com.amazonaws.internal.config.InternalConfig.loadfrom(InternalConfig.java:182)
        at com.amazonaws.internal.config.InternalConfig.load(InternalConfig.java:199)
        at com.amazonaws.internal.config.InternalConfig$Factory.<clinit>(InternalConfig.java:232)
        at com.amazonaws.ServiceNameFactory.getServiceName(ServiceNameFactory.java:34)
        at com.amazonaws.AmazonWebServiceClient.computeServiceName(AmazonWebServiceClient.java:703)
        at com.amazonaws.AmazonWebServiceClient.getServiceNameIntern(AmazonWebServiceClient.java:676)
        at com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:278)
        at com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
        at com.amazonaws.services.s3.AmazonS3Client.setEndpoint(AmazonS3Client.java:475)
        at com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:447)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:391)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:371)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
        at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
        at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.ObjectMapper
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

The missing class belongs to the Jackson library, then I have copied the Jackson-*.jar located on my spark-2.2.0-bin-hadoop2.7/jars/ folder which are:

  • jackson-annotations-2.6.5.jar
  • jackson-core-2.6.5.jar
  • jackson-core-asl-1.9.13.jar
  • jackson-databind-2.6.5.jar
  • jackson-jaxrs-1.9.13.jar
  • jackson-mapper-asl-1.9.13.jar
  • jackson-module-paranamer-2.6.5.jar
  • jackson-module-scala_2.11-2.6.5.jar
  • jackson-xc-1.9.13.jar

But then I got the following error:

2018-01-05 17:51:00,819 ERROR [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5920)) - Metastore Thrift Server threw an exception...
java.lang.NumberFormatException: For input string: "100M"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
        at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
        at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

I think the error here it have something to do with some jar version incompatibility but I'm not able to find the correct versions.

Can someone help me here?

Pupil answered 8/1, 2018 at 12:3 Comment(4)
It looks like you have some mistake in your configuration java.lang.NumberFormatException: For input string: "100M" - could you identify this property?Riess
@user8371915 The configurations I have changed are the one I put in the question, I have just added those required in order to make hive able to connect to S3. Nevertheless I have search in all hive configurations files for the "100M" value and I haven't found any match.Redoubtable
How about Spark configuration? It has to come from somewhere.Riess
@user8371915 could be, but the Hive Metastore service is a standalone service independent of spark. Now for tests spark is not active, and I'm just trying to start the hive metastore service, and there is no configurations telling hive where is spark located.Redoubtable
H
7
  1. You absolutely cannot mix versions of the Hadoop-common, hadoop-aws, aws-s3-sdk and jackson versions from what everything expects, or you will see stack traces.
  2. And its all open source, so if you D/L all the source JARs locally, your IDE will help you find what's causing the stack trace. This is what we all do. It's not magic, modern IDEs (intellij IDEA) even have special stack debugging.

This one is coming in because the value of fs.s3a.multipart.size set in hadoop-common's /core-default.xml resource is 100M, which came in with HADOOP-13680 and the range parsing handling numbers like "100M" instead of 104857600 . This stack trace says "Hadoop 2.8+ configuration"

You could try setting the property in your configs to that numeric value, but its a warning sign that versions of JARs are out of sync and you will probably only get a few lines further before something else breaks.

Fix: make sure that hadoop-common.jar and hadoop-aws.jar are in sync. It looks like you've got the jackson and aws ones lined up, though jackson is complex enough you can never take that for granted.

Herrera answered 8/1, 2018 at 17:40 Comment(2)
The problem was in the Hadoop Version I had installed, I was working with it before but after changing to S3 I thought it wasn't used anymore, but Hive needs it. I have took your advice and I have started playing with the source code. Thanks. :)Redoubtable
@José So your solution is changing the version of Hadoop. Finally, what version did you use to make it work?Malcolm

© 2022 - 2024 — McMap. All rights reserved.