When to use Sqoop --create-hive-table
Asked Answered
B

2

5

Can anyone tell the difference between create-hive-table & hive-import method? Both will create a hive table, but still what is the significance of each?

Badman answered 20/7, 2015 at 11:36 Comment(0)
S
9

hive-import command:
hive-import commands automatically populates the metadata for the populating tables in hive metastore. If the table in Hive does not exist yet, Sqoop will simply create it based on the metadata fetched for your table or query. If the table already exists, Sqoop will import data into the existing table. If you’re creating a new Hive table, Sqoop will convert the data types of each column from your source table to a type compatible with Hive.
create-hive-table command:
Sqoop can generate a hive table (using create-hive-tablecommand) based on the table from an existing relational data source. If set, then the job will fail if the target hive table exists. By default this property is false.

Using create-hive-table command involves three steps: importing data into HDFS, creating hive table and then loading the HDFS data into Hive. This can be shortened to one step by using hive-import.

During a hive-import, Sqoop will first do a normal HDFS import to a temporary location. After a successful import, Sqoop generates two queries: one for creating a table and another one for loading the data from a temporary location. You can specify any temporary location using either the --target-dir or --warehouse-dir parameter.

Added a example below for above description

Using create-hive-table command:
Involves three steps:

  1. Importing data from RDBMS to HDFS

    sqoop import --connect jdbc:mysql://localhost:3306/hadoopexample --table employees --split-by empid -m 1;

  2. Creating hive table using create-hive-table command

    sqoop create-hive-table --connect jdbc:mysql://localhost:3306/hadoopexample --table employees --fields-terminated-by ',';

  3. Loading data into Hive

    hive> load data inpath "employees" into table employees; Loading data to table default.employees Table default.employees stats: [numFiles=1, totalSize=70] OK Time taken: 2.269 seconds hive> select * from employees; OK 1001 emp1 101 1002 emp2 102 1003 emp3 101 1004 emp4 101 1005 emp5 103 Time taken: 0.334 seconds, Fetched: 5 row(s)

Using hive-import command:

sqoop import --connect jdbc:mysql://localhost:3306/hadoopexample --table departments --split-by deptid -m 1 --hive-import;

Swaney answered 22/7, 2015 at 23:48 Comment(3)
I think that this copy & paste from user guide is not particularly helpful.Damnation
But when I tried using this command more than one time, I did'nt get error like table already exists..this is the commnad I tried.. sqoop create-hive-table --connect jdbc:mysql://localhost:3306/TestDB -username root -password root --table tb2; ..Can you please explain what do you mean by "create-hive-table command involves three steps" ..Can you please give me sample commands in this?? Thanks again :)Badman
Please tell me what is the difference between the below 2 commands sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-importBadman
D
3

The difference is that create-hive-table will create table in Hive based on the source table in database but will NOT transfer any data. Command "import --hive-import" will both create table in Hive and import data from the source table.

Damnation answered 24/7, 2015 at 17:34 Comment(1)
Does this create an internal or external table?Affectation

© 2022 - 2024 — McMap. All rights reserved.