A way to export the results from Pig to a database
Asked Answered
E

5

11

Is there a way to export the results from Pig directly to a database like mysql?

Ernie answered 10/1, 2011 at 16:11 Comment(0)
F
7

While keeping in mind what orangeoctopus said (beware of DDOS...) have you had a look to DBStorage?

data = LOAD '...' AS (...);
...
STORE data INTO DBStorage('com.mysql.jdbc.Driver', 'dbc:mysql://host/db', 'INSERT ...');
Fulfil answered 11/1, 2011 at 18:56 Comment(0)
B
4

The main problem I see is that each reducer is effectively going to insert into the database around the same time.

If you don't think this will be an issue, I suggest you write a custom Storage method that uses JDBC (or something similar) to insert into the database directly and writing nothing out to HDFS.

If you are afraid of performing a DDOS attack on your own database, perhaps collecting the data on HDFS and performing a separate bulk load into mysql would be better.

Bigod answered 10/1, 2011 at 22:48 Comment(1)
Seems like there is no way around writing a UDF that uses JDBC..thanks!Ernie
G
2

I'm currently experimenting with an embedded pig application which loads results into mysql via PigServer.OpenIterator and a JDBC connection. It's worked very well in testing, but I haven't tried it at scale yet. This is similar to the custom storage method already suggested, but runs from a single point, so no accidental DDOS attack. You effectively end up paying the network transfer cost twice (cluster -> staging machine, staging machine -> DB server) if you don't run the load off the DB server (I personally prefer to run nothing except the DB itself off the DB server), but that's no different than the "write the file out and bulk load it" option.

Glove answered 11/1, 2011 at 1:4 Comment(0)
S
2

Sqoop may be the good way to go, but it is difficult to set-up (IMHO) as all these Hadoop related projects...

Pig's DBStorage is working fine (at least for storing).

Don't forget to register the PiggyBank and your MySQL driver:

-- Register Piggy bank
REGISTER /opt/cmr/pig/pig-0.10.0/lib/piggybank.jar;

-- Register MySQL driver
REGISTER /opt/cmr/mysql/drivers/mysql-connector-java-5.1.15-bin.jar

Here is a sample call:

-- Store a relation into a SQL table
STORE relation INTO 'unused' USING org.apache.pig.piggybank.storage.DBStorage('com.mysql.jdbc.Driver', 'jdbc:mysql://<mysqlserver>/<database>', '<login>', '<password>', 'REPLACE INTO <table> (<column1>, <column2>) VALUES (?, ?)');
Shea answered 19/10, 2012 at 15:22 Comment(0)
P
1

Try using Sqoop

Phare answered 11/9, 2011 at 10:24 Comment(1)
Whilst this may theoretically answer the question, it would be preferable to include the essential parts of the answer here, and provide the link for reference.Equator

© 2022 - 2024 — McMap. All rights reserved.