How to use Hive without hadoop
Asked Answered
E

6

8

I am a new to NoSQL solutions and want to play with Hive. But installing HDFS/Hadoop takes a lot of resources and time (maybe without experience but I got no time to do this).

Are there ways to install and use Hive on a local machine without HDFS/Hadoop?

Erebus answered 24/1, 2014 at 10:10 Comment(1)
You mean HDFS? Hadoop is an eco-system, Hive, is part of Hadoop.Calcaneus
C
17

yes you can run hive without hadoop 1.create your warehouse on your local system 2. give default fs as file:/// than you can run hive in local mode with out hadoop installation

In Hive-site.xml

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<configuration>
      <property>
         <name>hive.metastore.schema.verification</name> 
         <value>false</value> 
      </property> 
     <property> 
      <!-- this should eventually be deprecated since the metastore should supply this --> 
        <name>hive.metastore.warehouse.dir</name> 
        <value>file:///tmp</value>
        <description></description> 
     </property>
     <property> 
        <name>fs.default.name</name> 
        <value>file:///tmp</value> 
     </property> 
</configuration>
Chiropteran answered 23/10, 2017 at 16:57 Comment(3)
put these configurations inside your hive-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <!-- this should eventually be deprecated since the metastore should supply this --> <name>hive.metastore.warehouse.dir</name> <value>file:///tmp</value> <description></description> </property> <property> <name>fs.default.name</name> <value>file:///tmp</value> </property> </configuration>Chiropteran
To be precise, it means running Hive without HDFS from a hadoop cluster, it still need jars from hadoop-core in CLASSPATH so that hive server/cli/services can be started. btw, hive.metastore.schema.verification is about metastore schema verification, it's optional for this answer if you have a metastore DB with existing schema.Cephalothorax
I installed Hive from the Apache foundation without hadoop. Because I will quit hadoop since its became not free.I would like to use hive metastore by PrestoSQL. But I got an error when I launch hive metastore: Unable to determine HAdoop version information. 'hadoop version' returned: ERROR: Cannot execute /usr/bin/../libexec/hadoop-config.sh Someone please can help please ? I'm following these links medium.com/@binfan_alluxio/… And prestodb.io/docs/current/installation/deployment.htmlDiscursive
L
3

If you are just talking about experiencing Hive before making a decision you can just use a preconfigured VM as @Maltram suggested (Hortonworks, Cloudera, IBM and others all offer such VMs)

What you should keep in mind that you will not be able to use Hive in production without Hadoop and HDFS so if it is a problem for you, you should consider alternatives to Hive

Lolly answered 24/1, 2014 at 18:4 Comment(0)
C
2

You cant, just download Hive, and run:

./bin/hiveserver2                                                                                                                                        
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path

Hadoop is like a core, and Hive need some library from it.

Calcaneus answered 9/2, 2018 at 19:34 Comment(2)
Right, I got the exact same thing. However, after download/extract hadoop core and setting up $HADOOP_HOME, hive can be started w/o HDFS since it needs metastore only when querying against data on S3Cephalothorax
Yes you are right, but for me Hadoop != HDFS, HDFS is a more a component, like YARN / Tez etc...Calcaneus
A
2

Top answer works for me. But need few more setups. I spend a quite some time search around to fix multiple problems until I finally set it up. Here I summarize the steps from scratch:

  • Download hive, decompress it
  • Download hadoop, decompress it, put it in the same parent folder as hive
  • Setup hive-env.sh
    $ cd hive/conf
    $ cp hive-env.sh.template hive-env.sh
    
    Add following environment in hive-env.sh (change path accordingly based on actual java/hadoop version)
    JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home
    export path=$JAVA_HOME/bin:$path
    export HADOOP_HOME=${bin}/../../hadoop-3.3.1
    
  • setup hive-site.xml
    $ cd hive/conf
    $ cp hive-default.xml.template hive-site.xml
    
    Replace all the variable ${system:***} with constant paths (Not sure why this is not recognized in my system). Set database path to local with following attributes (copied from top answer)
    <configuration>
        <property>
           <name>hive.metastore.schema.verification</name> 
           <value>false</value> 
        </property> 
       <property> 
        <!-- this should eventually be deprecated since the metastore should supply this --> 
          <name>hive.metastore.warehouse.dir</name> 
          <value>file:///tmp</value>
          <description></description> 
       </property>
       <property> 
          <name>fs.default.name</name> 
          <value>file:///tmp</value> 
       </property> 
      </configuration>
    
  • setup hive-log4j2.properties (optional, good for troubleshooting)
    cp hive-log4j2.properties.template hive-log4j2.properties
    
    Replace all the variable ${sys:***} to constant path
  • Setup metastore_db If you directly run hive, when do any DDL, you will got error of:
    FAILED: HiveException org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ? createDatabaseIfNotExist=true for mysql))
    
    In that case we need to recreate metastore_db with following command
    $ cd hive/bin
    $ rm -rf metastore_db
    $ ./schematool -initSchema -dbType derby
    
  • Start hive
    $ cd hive/bin
    $ ./hive
    

Now you should be able run hive on you local file system. One thing to note, the metastore_db will always be created on you current directory. If you start hive in a different directory, you need to recreate it again.

Alleviation answered 10/2, 2022 at 15:51 Comment(2)
The dbtype could be "derby", because the database type script directory needs to exist here and the name may have changed in your version: I have 3.1.2 $HIVE_HOME/scripts/metastore/upgrade/Issus
With version Hive 3.1.2. the guava related error can be resolved by following issues.apache.org/jira/browse/HIVE-22915 ` $ rm /opt/shared/apache-hive-3.1.2-bin/lib/guava-19.0.jar $ cp /opt/shared/hadoop-3.2.1/share/hadoop/hdfs/lib/guava-27.0-jre.jar /opt/shared/apache-hive-3.1.2-bin/lib/ `Sherasherar
S
1

Update This answer is out-of-date : with Hive on Spark it is no longer necessary to have hdfs support.


Hive requires hdfs and map/reduce so you will need them. The other answer has some merit in the sense of recommending a simple / pre-configured means of getting all of the components there for you.

But the gist of it is: hive needs hadoop and m/r so in some degree you will need to deal with it.

Skindeep answered 24/1, 2014 at 14:18 Comment(4)
Wrong, Hive can run without HDFS and map/reduce, there is a mode called "local"; Also, Hive can run against Tez engine instead of map/reduce.Calcaneus
@ThomasDecaux Check your dates : this was written in Jan 2014. It is no longer the case that this restriction is in place. That makes your statement misleading without clarification.Skindeep
Yes you are right, this is always difficult with SO answersCalcaneus
Hive needs hadoop libraries, and using hive to execute queries requires hadoop and map reduce. But... can I install hadoop and not actually run it if I just want to use the hive metastore? I was looking forward to using the "standalone" metastore in hive 3 but it doesn't play well with Presto yet unfortunately.Headed
C
0

Although, there are some details that you have to keep in mind it's completely normal to use Hive without HDFS. There are a few details one should keep in mind.

  1. As a few commenters mentioned above you'll still need some .jar files from hadoop common.
  2. As of today(XII 2020) it's difficult to run Hive/hadoop3 pair. Use stable hadoop2 with Hive2.
  3. Make sure POSIX permissions are set correctly, so your local hive can access warehouse and eventually derby database location.
  4. Initialize your database by manual call to schematool

You can use site.xml file pointing to local POSIX filesystem, but you can also set those options in HIVE_OPTS environmen variable. I covered that with examples of errors I've seen on my blog post

Cruel answered 9/12, 2020 at 10:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.