hive-drop-import-delims not removing newline while using HCatalog in Sqoop
Asked Answered
I

3

5

Sqoop while used with HCatalog import not able to remove new line (\n) from column data even after using --hive-drop-import-delims option in the command when running Apache Sqoop with Oracle.

Sqoop Query:

    sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

Data in Oracle Column col4 as below: (Data has control characters such as ^M)

<li>Details:^M
    <ul>^M
        <li>

Does Control character causing this problem?

Am I missing anything ? Is there any workaround or solution for this problem?

Icebox answered 21/1, 2015 at 20:26 Comment(0)
I
10

Use --map-column-java option to explicitly state the column is of type String. Then --hive-drop-import-delims works as expected (to remove \n from data).

Changed Sqoop Command :

sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string "" --map-column-java col4=String
Icebox answered 10/2, 2015 at 4:34 Comment(1)
I had a column with data type CLOB so i used --map-column-java to convert it to string then both hive-drop-import-delims and --hive-delims-replacement "|" workedEllisellison
B
5
sqoop import \
--connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username 123 \
--password 123 \
--table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 \
--hive-delims-replacement "anything" \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

You can try this --hive-delims-replacement "anything" this will replace all \n , \t , and \01 characters with the string you provided(in this case replace with string "anything").

Brail answered 10/2, 2015 at 20:45 Comment(2)
Both --hive-delims-replacement "null" and --hive-drop-import-delims did not work untill --map-column-java col4=String was added. That mean whichever column you want delims switch to work, it has to be java String type.Icebox
How can I do the same if I need to dump data into HDFS?Madewell
H
0

From the official website: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Hive will have problems using Sqoop-imported data if your database’s rows contain string fields that have Hive’s default row delimiters (\n and \r characters) or column delimiters (\01 characters) present in them. You can use the --hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data. Alternatively, you can use the --hive-delims-replacement option to replace those characters with a user-defined string on import to give Hive-compatible text data. These options should only be used if you use Hive’s default delimiters and should not be used if different delimiters are specified.

Homer answered 27/2, 2019 at 5:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.