Sqoop import as OrC file
Asked Answered
J

4

5

Is there any option in sqoop to import data from RDMS and store it as ORC file format in HDFS?

Alternatives tried: imported as text format and used a temp table to read input as text file and write to hdfs as orc in hive

Jonahjonas answered 30/4, 2015 at 21:55 Comment(0)
D
8

At least in Sqoop 1.4.5 there exists hcatalog integration that support orc file format (amongst others).

For example you have the option

--hcatalog-storage-stanza

which can be set to

stored as orc tblproperties ("orc.compress"="SNAPPY")

Example:

sqoop import 
 --connect jdbc:postgresql://foobar:5432/my_db 
 --driver org.postgresql.Driver 
 --connection-manager org.apache.sqoop.manager.GenericJdbcManager 
 --username foo 
 --password-file hdfs:///user/foobar/foo.txt 
 --table fact 
 --hcatalog-home /usr/hdp/current/hive-webhcat 
 --hcatalog-database my_hcat_db 
 --hcatalog-table fact 
 --create-hcatalog-table 
 --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'
Deoxygenate answered 1/12, 2015 at 14:2 Comment(2)
I am using Sqoop 1.4.6.2.3.4.0 to move table from Netezza to HIVE in ORC format. But I see the following issue. #36783019Indiscrimination
I do something similar (except I do not specify hcatalog-home and I do specify direct mode) to import from Oracle to Hive. My resulting hive table is okay at first but after some time I can't query it - I get {"trace":"org.apache.ambari.view.hive.client.HiveErrorStatusException: H170 Unable to fetch results. java.io.IOException: java.io.IOException: Error reading file: hdfs://host:port/apps/hive/warehouse/db/table/part-m-0000Maw
K
5

Sqoop import supports only below formats.

--as-avrodatafile   Imports data to Avro Data Files

--as-sequencefile   Imports data to SequenceFiles

--as-textfile   Imports data as plain text (default)

--as-parquetfile    Imports data as parquet file (from sqoop 1.4.6 version)
Kerril answered 30/4, 2015 at 22:39 Comment(0)
L
3

In current version of sqoop available, it is not possible to import data from RDBS to HDFS in ORC format in a single shoot command. This is something known issue in sqoop. Reference link for this issue raised: https://issues.apache.org/jira/browse/SQOOP-2192

I think the only alternative available for now, is the same as you mentioned. I also came across the similar use case, and have used the alternative two step approach.

Lechery answered 1/5, 2015 at 8:14 Comment(0)
C
0

Currently there is no option to import the rdms table data directly as ORC file using sqoop. We can achieve the same using two steps.

  1. Import the data in any available format (say text).
  2. Read the data using Spark SQL and save it as an orc file.

Example: Step 1: Import the table data as a text file.

sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username retail_dba --password cloudera \
--table orders \
--target-dir /user/cloudera/text \
--as-textfile

Step 2: Use spark-shell on command prompt to get scala REPL command shell.

scala> val sqlHiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlHiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@638a9d61

scala> val textDF = sqlHiveContext.read.text("/user/cloudera/text")
textDF: org.apache.spark.sql.DataFrame = [value: string]

scala> textDF.write.orc("/user/cloudera/orc/")

Step 3: Check the output.

[root@quickstart exercises]# hadoop fs -ls /user/cloudera/orc/
Found 5 items
-rw-r--r--   1 cloudera cloudera          0 2018-02-13 05:59 /user/cloudera/orc/_SUCCESS
-rw-r--r--   1 cloudera cloudera     153598 2018-02-13 05:59 /user/cloudera/orc/part-r-00000-24f75a77-4dd9-44b1-9e25-6692740360d5.orc
-rw-r--r--   1 cloudera cloudera     153466 2018-02-13 05:59 /user/cloudera/orc/part-r-00001-24f75a77-4dd9-44b1-9e25-6692740360d5.orc
-rw-r--r--   1 cloudera cloudera     153725 2018-02-13 05:59 /user/cloudera/orc/part-r-00002-24f75a77-4dd9-44b1-9e25-6692740360d5.orc
-rw-r--r--   1 cloudera cloudera     160907 2018-02-13 05:59 /user/cloudera/orc/part-r-00003-24f75a77-4dd9-44b1-9e25-6692740360d5.orc
Career answered 13/2, 2018 at 14:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.