storing pig output into Hive table in a single instance
Asked Answered
S

4

5

I would like to insert the pig output into Hive tables(tables in Hive is already created with the exact schema).Just need to insert the output values into table. I dont want to the usual method, wherein I first store into a file, then read that file from Hive and then insert into tables. I need to reduce that extra hop which is done.

Is it possible. If so please tell me how this can be done ?

Thanks

Spheroidicity answered 8/7, 2015 at 9:30 Comment(0)
R
6

Ok. Create a external hive table with a schema layout somewhere in HDFS directory. Lets say

create external table emp_records(id int,
                                  name String,
                                  city String)
                                  row formatted delimited 
                                  fields terminated by '|'
                                  location '/user/cloudera/outputfiles/usecase1';

Just create a table like above and no need to load any file into that directory.

Now write a Pig script that we read data for some input directory and then when you store the output of that Pig script use as below

A =  LOAD 'inputfile.txt' USING PigStorage(',') AS(id:int,name:chararray,city:chararray);
B = FILTER A by id > = 678933;
C = FOREACH B GENERATE id,name,city;
STORE C INTO '/user/cloudera/outputfiles/usecase1' USING PigStorage('|');

Ensure that destination location and delimiter and schema layout of final FOREACH statement in you Pigscript matches with Hive DDL schema.

Ruff answered 8/7, 2015 at 9:50 Comment(3)
Thanks for your reply. But this again results in two works right ? I mean after running PIG, I will have to again run the load command in Hive. I want something like, say when I write store command in PIG, it should directly get stored in HIVE table, not in any file.Spheroidicity
Ok.. Then Please look for HCatStorer() ... Store C into 'tablename' using org.apache.hcatalog.pig.HCatStorer(); but for this you need to include all needed Hcatlog jars in respective classpathRuff
Okie let me try that out. Thanks for the help !Spheroidicity
F
3

There are two approaches explained below with 'Employee' table example to store pig output into hive table. (Prerequisite is that hive table should be already created)

A =  LOAD 'EMPLOYEE.txt' USING PigStorage(',') AS(EMP_NUM:int,EMP_NAME:chararray,EMP_PHONE:int);

Approach 1: Using Hcatalog

// dump pig result to Hive using Hcatalog 
store A into 'Empdb.employee' using org.apache.hive.hcatalog.pig.HCatStorer();

(or)

Approach 2: Using HDFS physical location

// dump pig result to external hive warehouse location
STORE A INTO 'hdfs://<<nmhost>>:<<port>>/user/hive/warehouse/Empdb/employee/' USING PigStorage(',')

;

Farewell answered 3/8, 2016 at 4:51 Comment(0)
P
0

you can store it using Hcatalog

STORE D INTO 'tablename' USING org.apache.hive.hcatalog.pig.HCatStorer();

see below link https://acadgild.com/blog/loading-and-storing-hive-data-into-pig

Pozsony answered 8/7, 2018 at 7:50 Comment(0)
S
0

The best way is to use HCatalog and write the data in hive table.

STORE final_data INTO 'Hive_table_name' using org.apache.hive.hcatalog.pig.HCatStorer();

But before storing the data, make sure the columns in the 'final_data' dataset is perfectly matched and mapped with the schema of the table.

And run your pig script like this : pig script.pig -useHCatalog

Sohn answered 8/11, 2019 at 9:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.